CITADEL
Contents
Overview
CITADEL (Contract-Inference Tool that Applies Daikon to the Eiffel Language) is a tool that provides an Eiffel front-end for the Daikon invariant detector. It is being implemented as a Master's project by Nadia Polikarpova supervised by Ilinca Ciupa (ETH).
Discussions
Unfolded variables
Nadia: Daikon can process values of only few types: five scalar types (boolean,
integer, double, string and hashcode) and five more types, which are
arrays of these scalars.
So, if in Eiffel code we a have a variable of one of basic types
(BOOLEAN, NATURAL, INTEGER, CHARACTER, REAL or STRING) we can print
its value directly as if it has one of Daikon's types.
If, however, we have a variable of any other type, we have to collect
all the information about it that is of interest and is available at
certain program point, and present this information to Daikon as a set
of variables, which have Daikon types. The information of interest
about a variable includes results of all queries that can be called on
this variable at this program point. However, since queries results may
themselves be of non-basic types, we have to perform this operation
recursively (to a certain depth).
For a reference variable information of interest includes also its
address (Daikon type "hashcode" is intended for representing object
addresses).
For example, if we had a variable c: POINT, queries of class POINT
available at current program point were `x: REAL', `y: REAL' and `twin:
POINT' and unfolding depth were set to 2, then `c.unfolded' would be a
set of variables: {$c, c.x, c.y, $c.twin, c.twin.x, c.twin.y}.
Containers need special kind of unfolding. If we want Daikon to handle
them sensibly, they should be converted to arrays of scalars (to be more
precise, it should be done only if all container elements are observable
at current program point...). It isn't difficult for descendants of
CONTAINER class from standard library. Instrumenter may check, if a
variable is an indirect instance of CONTAINER, get its
`linear_representation', then traverse it and output as array. I think
that in the first version we may leave aside those containers, that
aren't inherited from CONTAINER.
Functions in unfolded form
Ilinca: I'm not sure about the idea of including argument-less functions in this unfolded form. What if the precondition of the function doesn't hold or we run into some kind of trouble when executing the function body (trouble such as the contracts of a sub-call made by this function not holding or other bugs)? On the other hand, including functions would bring significantly more information about the state of the object... Does the Daikon frontend for Java, for instance, by default evaluate functions too as part of the unfolded form of the objects?
Nadia: Well, this idea is disputable, of cause. In Java considering argument-less functions as variables is more dangerous, because Java, like all C-family languages, encourages side-effects in functions. However, in Daikon front-end for Java there exists an option that allows using functions as variables. Together with enabling this option user has to provide a special "purity" file that lists those functions from the system, that are side-effect free. This feature is described in Daikon User Manual, page 82. Considering functions as variables is even more relevant in Eiffel context by two main reasons. Firstly, Eiffel encourages absence of abstract side-effects in functions (and, I believe, most Eiffel functions indeed do not have abstract side-effect, am I right?). For those cases, when some functions in system are known to have side-effect, we can allow user to specify an "impurity" file, that would list those functions. I think that these cases would be rare. Ilinca: Indeed, I also think these cases would be rare, but, depending on the size of the system, it might be very cumbersome for users to determine these functions.
Nadia: Well, a parameter may be introduced that will turn this facility on
and off. So, if a user knows that there are many impure functions in system,
he/she will just turn it off. I think, that its enough for the first version.
In future some additional functionality may be introduced. In fact, a user may
not know about all impure functions in system (especially in code that was written
by someone else). The tool can help user to determine, did anything go wrong with
using functions in unfolding. For example, it may do instrumentation twice: first
time using only attributes and second time using functions also. As a result two
trace files are produces. Then the tool can compare attributes values in two files.
If they are the same, then with certain degree of confidence we can say, that called
functions had no abstract side effect.
As I understand, precise analysis of abstract side-effect presence or absence in beyond
the state of the art. However, some methods which perform approximate analysis may be
introduced.
Another interesting solution that came to mind is to store object's state each time before
calling a function and then recover object after the call. In this case we don't care about
function purity, we just get a result that this function might have had, if it was called at
this point :)
Anyway, different partial solutions to this problem may be implemented and user may combine
different strategies to get a desired result with a desired degree of confidence.
Conceptually, my point here is: why an "honest" Eiffel programmer, who never writes impure
functions, has to suffer (get less expressive contracts) because of other programmers, who
sometimes write impure functions?:)
Ilinca: :) Indeed, the honest Eiffel programmer shouldn't suffer. So in conclusion, I
think we should go for the solution with the file containing the names of impure functions,
as Daikon does it. Adding the switch which turns it on and off is also a good idea.
Ilinca: Something else: what Daikon calls "purity" is actually weak purity, isn't it? (Weak purity means that creations of local variables are allowed in the body of a pure function. Strong purity disallows this.)
Nadia: Chicory (Java front end) doesn't check purity of functions that are listed in a purity
file. User is totally responsible for what he/she writes in this file. So, I would say that, what
Chicory calls purity isn't actually any special kind of purity. It only means that user allowed
Chicory to call this functions anywhere in the system, and if something goes wrong, it's user's
fault
Nadia: Secondly, Eiffel advocates uniform access principle. Routine's clients
don't have to know, which queries, used in precondition, are attributes
and which are functions. If inferred preconditions contained only
attributes, it would be inconsistent with uniform access principle, I
think...
As for troubles that may arise when calling functions... As I understand, exception in a function, called when its precondition holds, is a bug. So, if it happens, we can just tell user that we found a bug in certain function and that before inferring contracts the bug should be fixed.
Checking preconditions (static vs. dynamic)
The issue with functions' preconditions that you've mentioned isn't a
big problem. If after unfolding we got a variable of the form, e.g.,
`x.f1.f2', where `f1' and `f2' are functions, printing instructions for
this variable can be like this:
if x.f1_precondition_holds and then x.f1.f2_precondition_holds
then
print (x.f1.f2)
else
print (Nonsensical)
end
"Nonsensical" is a special Daikon value that indicates that variable
cannot be evaluated.
Function preconditions are known after parsing. To be able to evaluate
them separately, we may, for example, wrap them into functions.
`f1_precondition_holds' in the above code is an example of such a
function, that was introduced by instrumenter and added to x's
generating class.
Ilinca: I don't think it's that easy, mainly because of polymorphism. Statically
you can't determine the type that x will be a direct instance of at
runtime. Hence, you don't know which preconditions to wrap. Of course,
since redefined routines can keep or weaken the preconditions of the
original routines, you can take the pessimistic approach of wrapping
only the precondition of the function from the static type of x, but
then you might miss cases in which the weakened version of the
precondition does hold. Do you understand my point?
Nadia: Yeah, I see...
I think that the main question here is: should we check "static" or "dynamic"
preconditions when calling a function?
On the one hand, in Eiffel we can write something like:
class A
...
sqrt (x: REAL): REAL
require
x >= 0.0
do ... end
end
class B
inherit A
redefine sqrt end
...
sqrt (x: REAL): REAL
require else
True
do ... end
end
class TEST
...
an_a: A
make is
do
create {B} an_a
print (an_a.sqrt (-1.0))
end
end
It works, and assertion checking mechanism doesn't indicate any error.
But, on the other hand, is it legal? Am I allowed to call a function if it's
static precondition doesn't hold?
As I understand, if I want to make sure that it's legal to call a routine at some
program point, I have to check it's static precondition. Even if today I'm sure that
dynamic type of `an_a' is always `B', this needn't remain the same tomorrow...
Ilinca: That's correct. If the "static" precondition of a routine holds, this
implies of course that the preconditions of all its redefinitions will also hold.
But I was calling this approach "pessimistic" because the precondition of a redefined
version of a routine may hold even though the "static" precondition doesn't.
So, if a programmer wanted to know function value at a program point, he/she would check
its static precondition. Then why instrumenter should act differently? Well, actually
there is a difference: instrumenter not only avoids function call when precondition
doesn't hold, but also claims that function value is nonsensical. From one point of view,
such behavior is incorrect, because function actually could be called and a sensible value
could be received. From the other point, since it was illegal to call the function this time,
it makes sense to declare function value nonsensical.
To sum up, I think that it is rather a philosophical then a technical problem, but, as for me,
checking static preconditions is more appropriate.
However, if we wanted to check a dynamic precondition, it's also possible. We can wrap function's
precondition into a special function and redefine that special function any time the original
function is redefined. It is possible since instrumenter has access to the whole system.
Ilinca: Well, this would definitely be the more... correct solution, but it's also a lot more
work. Couldn't this also be done by not wrapping the function, but simply calling it and adding
some exception handling code to the unfolder? I mean, the unfolder would try to call all
argument-less functions of a class on an object, and, if it gets a precondition violation,
it catches the exception and it declares that particular function as nonsensical?
Nadia: I would like to return to the subject of checking "static" or "dynamic" preconditions before function calls in unfolding. I think that this problem arises because the semantics of special variable value "nonsensical" in Daikon is ambiguous.
In Java and C there are no preconditions, so this value is originally used in Daikon for the only purpose: if a pointer of reference is null, then the dereferenced value (and its attributes, if the value is not of primitive type) is declared "nonsensical". In Eiffel we will use "nonsensical" for this purpose too: if a reference entity is void, then its queries will be output as "nonsensical".
However, in Eiffel we have one more source of problems with evaluating variables: if a variable is a result of a function call, then function precondition may not hold, so we won't be able to evaluate its result. In this case the variable value should also be declared "nonsensical", because this is the only special value in Daikon, which denotes that variable can't be evaluated. However, in this case there appears a question: what does it mean that the variable can't be evaluated? Does it mean that something terrible will happen (e.g., exception), if we try to evaluate it? Or does it mean that it isn't legal to evaluate this variable in this program state?
If we take the first point, then we should check dynamic precondition, because if static precondition doesn't hold, but dynamic does, nothing terrible will happen during function call.
If we take the second point, we should check static precondition, because if it doesn't hold then it's illegal to call this function, even if dynamic precondition holds. This point seems more correct to me, and here is an example that illustrates the situation.
Consider class `INT_LIST' (with feature `item' that has a precondition `not off') and it's heir - class EXTRAPOLATED_INT_LIST (that weakens precondition of `item'):
class INT_LIST
...
feature
item: INTEGER is
require
not off
...
end
end
class EXTRAPOLATED_INT_LIST
inherit
INT_LIST
redefine
item
end
...
feature
item: INTEGER is
require else True
do
if not off then
Result := Precursor
else
Result := 0
end
end
end
Consider also class CLIENT, which is a client of INT_LIST:
class CLIENT
...
feature
test (a_list: INT_LIST) is
do ... end
end
Imagine that during system execution, when contracts were inferred, actual argument of `test' was always a direct instance of EXTRAPOLATED_INT_LIST, and in addition it was always passed to this procedure in such a state that cursor was before the first element (`off' was true). If we check dynamic preconditions for functions in unfolding, than a precondition, inferred for `test', will be something like this:
require a_list /= Void a_list.off a_list.item = 0
But this contract isn't correct (as for me), although it is, certainly, true for a specific system execution. It becomes a little more correct, if we infer:
require a_list /= Void a_list.off a_list.generator = "EXTRAPOLATED_INT_LIST" a_list.item = 0
However, such contracts (connected with dynamic type of variable) seem useless to me. One of the goals of dynamic contract inference is to eliminate those properties that happened by chance and are not intended properties of the code. Properties that are connected with dynamic types definitely happen by chance: they can't be intended properties, because if programmer wanted his software to have such a property, he would change static entity type to a more specific (based on descendant).
Such properties may be useful only if we treat them not as contracts but rather as some kind of runtime statistics. Programmer may be interested in information that is true for a specific system execution, e.g. for debug purposes. It might be useful to add such a feature (inference of dynamic type information and properties, connected with dynamic types) in future, but it isn't "classic" contract inference, as for me, so I don't think that we need it in the first version. And if we don't want to know anything about dynamic types, then we should check static preconditions of functions during unfolding.
Checking hand-written contracts
Ilinca: There's one further problem with this approach: when a function is called as part of the evaluation of a contract, this function's contracts are not checked. The reason for this is to avoid infinite loops. If you wrap your precondition into a function, the contracts of any routines it calls will be evaluated, and thus the system can actually behave differently.
Should we check at all the hand-written contracts of a system for which CITADEL
is trying to infer contracts? We could also start from the assumption that Daikon
uses: the tool is given a set of passing test cases and from those it has to infer
contracts. If the inferred contracts somehow contradict the hand-written ones, then
there's a problem in the system, and the source of the problem may be that the
hand-written contracts are faulty. What do you think?
Nadia: I think that it makes sense to turn assertion checking off when running
instrumented system. This will solve the problem with functions inside wrapped
preconditions, which you've mentioned.
In current version instrumented system is supposed to behave exactly the same way
as the original system did (except for creating a trace file). So, if a user was
sure about the original, hand-written contracts and wanted to check the system against
them, he/she could run the original system with assertion checking on, make sure that
everything is ok and after that perform contract inference.
On the other hand, returning to using functions in unfolding: for these calls we have to
check hand-written preconditions, because they were introduced by instrumenter, so they are
not parts of "passing test cases". Fortunately, as you said before, erroneous preconditions
are very rare.
Checking preconditions (wrapping vs. qualification)
Nadia: I started implementing function preconditions checks in unfolding recently and realized that my first idea (to wrap these preconditions into functions) has an important disadvantage. Without creating new functions for preconditions if we instrument a class, then only the source code of that very class changes. However, if during class instrumentation we add functions to its clients, then the source code of those clients also changes. It isn't a disaster, of cause, but it's an unpleasant property, as for me. It would be nice, if class instrumentation affected only the class itself.
The other solution is not to wrap preconditions into functions, but to "qualify" them: if we want to check precondition before calling a function `x.f', we should take f's precondition and replaced all unqualified calls in it by qualified calls with target `x' and all occurrences of `Current' by `x'. This replacement is not difficult conceptually, but its implementation is quite voluminous (during this operation we have to replace some identifiers in an expression with function calls, so we need to redefine all `process_<something>' procedures of ET_AST_ITERATOR, where <something> can appear in expression and can contain identifiers).
What do you think about these two solutions ("wrapping into functions" and "qualification")?
Ilinca: I think the best way to go is to wrap the precondition in a function in the class that it belongs to. Wrapping it in a function in clients would break encapsulation: you would only be allowed to access the exported features. I don't see any advantages of qualification over function-wrapping, so I think it makes sense to choose the least-work solution
Nadia: My previous explanation of the problem with "wrapping" was a little bit confused
(I wrote that functions have to be added to clients of instrumented class, but actually
I meant quite the opposite: that they should be added to suppliers:) )
Wrapping a precondition in a function in the class that it belongs to is exactly what I
wanted to do firstly. The reason for it was that, if we instrument a class `C' and it has,
say, an attribute `x: LIST[INTEGER]', then unfolded form of `Current' variable will include
a variable `x.first'. Precondition of function `first' in class LIST is `not empty'. However,
we need to check this precondition not inside LIST, but inside routines of class `C'. In fact,
printing instruction for variable `x.first' should be guarded by condition like this:
if x /= Void and then not x.empty then
print (x.first)
end
We can't just take a precondition from `first' and insert it into the guarding condition as is,
instead we should "qualify" the call to `empty' with target `x'.
Firstly qualification seemed too complex to me, so I decided to wrap `not empty' into function, say,
`first_precondition: BOOLEAN' and add it to class `LIST', so that inside `C' I can check simpler
condition before printing x.first:
if x /= Void and then x.first_precondition then
print (x.first)
end.
However, as you can see, it involves modifying class `LIST' (and possibly any other supplier of
`C') while instrumenting `C'. If only `C' changed, then we could print only its source code after
instrumentation, we could compile only `C' (if incremental compilation is used) and don't recompile
base library, we would have no problems with tracking added functions (the problems are like this:
if we instrument class `LIST' after `C', we probably don't want neither to instrument added functions,
nor to see them in unfolded form of `Current' variable of class `LIST'). As for me, wrapping adds
surplus dependencies into the system, and software never benefits from dependencies :)
That is why later I inclined to "qualification" solution. Actually, I have already mostly implemented
it:) Current implementation might be incomplete, there are several syntactic constructs that it doesn't
handle, because I'm not sure that they can appear in preconditions, but I'll check it and fix the problem.
Ilinca: Ok, so what I was thinking about was that you wanted to wrap the whole precondition of
`first' into a function in class C. So, in class C you would have a routine
precondition_first (l: LIST[G]): BOOLEAN
require
l_not_void: l /= Void
do
Result := not l.empty
end
Then the guard for printing x.first would be
if x /= Void and then precondition_first (x) then
print (x.first)
end
But of course whether or not you wrap the precondition in a function or just quote it in the if
statement does not make a big difference.
One more question: should the tool allow instrumenting library classes? If a library was precompiled, then instrumenting any of its classes will result in recompiling the whole library, am I right? Are precompiled libraries anyway recompiled during system finalization or not? (It matters for choosing between "wrapping into functions" and "qualification" solutions, because "wrapping into functions" can change library classes, even if user wants to implement only classes from his own clusters).
Support for loop invariants
Ilinca: Does the "classic" Daikon support loop invariants?
Nadia: Daikon uses standard program point suffixes to infer more invariants. E.g., it understands that some_routine:::ENTER and some_routine:::EXIT are entry and exit points of the same routine. Thanks to this, it knows `old' values of variables at exit and can infer invariants like "x = old x". As I understood, if Daikon sees an unknown suffix, it treats corresponding program point as, so to say, "standalone" and doesn't try to connect it with any other. We don't need any extra support from Daikon to handle loop invariants.
Ilinca: You'll have to pay attention though to inserting the corresponding program point(s) at the right location(s). A loop invariant has to hold after initialization (the "from" clause) and be preserved by the loop body, i.e. it must hold after every execution of the loop body, including when the exit condition becomes true. So actually, I guess the program point should be in the "until" clause, because it gets evaluated after initialization and then after every execution of the loop body. What do you think?
Nadia: `Until' clause is conceptually the right place, but instructions
can't be inserted in there. So printing instructions can be inserted at
the end of `from' clause and at the end of `loop' clause. As I understand,
all variable values that are observed in `until' condition appear at one of
those points.
Nadia: If we want to infer loop invariants, we should introduce a separate
program point for each loop. So, we need a way to distinguish loops
within a routine. One way is to take its position in source code as
index. However, I thought, that position is too unstable to become loop
identifier, so I chose instruction number instead. These numbers are
unique numbers of instructions within surrounding routine, they may be
used not only for loops, but for arbitrary instructions.
CI_INSTRUCTION_COUNTER is responsible for counting instruction while
iterating through AST.
Inferring contracts for generic classes
Nadia: there are actually two different cases. 1. Contracts inside generic classes. This is not really a problem. Entities, which have parameter type, can be treated as if their generator is constraining class (ANY, if genericity is unconstrained). 2. Contracts in clients of generic classes. Here some problems arise. If we have an entity `x: ARRAYED_LIST[INTEGER]', its unfolding involves processing the `item' query. `item' is declared to be of type `G' in ARRAYED_LIST and, as I understand, parsing is not enough (some type analysis is needed) to infer that `G' means `INTEGER' in current context. Two approaches to this problem, that I see at present, are: perform such a type analysis in compile time (with the help of gobo compilation facilities) or leave some unfolding work until runtime, when reflections may be used to determine variable's generating class.
Ilinca: As far as I know gobo is able to do this kind of analysis.
I think it's preferable that all variables get unfolded statically
-- it would be a cleaner system design. Besides, doesn't Daikon need
the .decls files as input?
Nadia: I also think that static unfolding is better.
As for .decls files: if a batch version of Daikon is used, then Daikon
itself is run only after instrumented system terminates. A .decls file
thus may be generated not only during instrumentation, but also during
instrumented system execution.
If an on-line Daikon version is used, Daikon is executed simultaneously
with instrumented system. However, even in this case it is possible to
generate program point declarations during runtime. Actually, Daikon
allows to place declarations not only in separate .decls file, but also
inside the .dtace file, together with program point samples. The only
requirement is that a declaration of a program point must appear before
any sample of this program point.
On my opinion, a separate .decls file, generated during instrumentation
phase, is a better design decision.
Experiment
See Google doc.

