Difference between revisions of "CITADEL"

Line 563: Line 563:
  
 
GENERAL CONCLUSIONS:
 
GENERAL CONCLUSIONS:
- Daikon does a better/worse job than programmers at coming up with contracts.
+
* Daikon does a better/worse job than programmers at coming up with contracts.
- Daikon can/cannot be used to strengthen already existing contracts.
+
* Daikon can/cannot be used to strengthen already existing contracts.
- Daikon can/cannot be used to correct already existing contracts.
+
* Daikon can/cannot be used to correct already existing contracts.
- Comparing Daikon's output to user-written contracts allows/does not allow improving test suites.
+
* Comparing Daikon's output to user-written contracts allows/does not allow improving test suites.
  
  

Revision as of 04:23, 29 April 2008


Overview

CITADEL (Contract-Inference Tool that Applies Daikon to the Eiffel Language) is a tool that provides an Eiffel front-end for the Daikon invariant detector. It is being implemented as a Master's project by Nadia Polikarpova supervised by Ilinca Ciupa (ETH).

Discussions

Unfolded variables

Nadia: Daikon can process values of only few types: five scalar types (boolean, 
 integer, double, string and hashcode) and five more types, which are 
 arrays of these scalars.
 So, if in Eiffel code we a have a variable of one of basic types 
 (BOOLEAN, NATURAL, INTEGER, CHARACTER, REAL or STRING) we can print 
 its value directly as if it has one of Daikon's types.
 If, however, we have a variable of any other type, we have to collect 
 all the information about it that is of interest and is available at 
 certain program point, and present this information to Daikon as a set 
 of  variables, which have Daikon types. The information of interest 
 about a variable includes results of all queries that can be called on 
 this variable at this program point.  However, since queries results may 
 themselves be of non-basic types, we have to perform this operation 
 recursively (to a certain depth).
 For a reference variable information of interest includes also its 
 address (Daikon type "hashcode" is intended for representing object 
 addresses).
 For example, if we had a variable c: POINT, queries of class POINT 
 available at current program point were `x: REAL', `y: REAL' and `twin: 
 POINT' and unfolding depth were set to 2, then `c.unfolded' would be a 
 set of variables: {$c, c.x, c.y, $c.twin, c.twin.x, c.twin.y}.
 
 Containers need special kind of unfolding. If we want Daikon to handle 
 them sensibly, they should be converted to arrays of scalars (to be more 
 precise, it should be done only if all container elements are observable 
 at current program point...). It isn't difficult for descendants of 
 CONTAINER class from standard library. Instrumenter may check, if a 
 variable is an indirect instance of CONTAINER, get its 
 `linear_representation', then traverse it and output as array. I think 
 that in the first version we may leave aside those containers, that 
 aren't inherited from CONTAINER.

Functions in unfolded form

Ilinca: I'm not sure about the idea of including argument-less functions in 
 this unfolded form. What if the precondition of the function doesn't 
 hold or we run into some kind of trouble when executing the function 
 body (trouble such as the contracts of a sub-call made by this 
 function not holding or other bugs)? On the other hand, including 
 functions would bring significantly more information about the state 
 of the object... Does the Daikon frontend for Java, for instance, by 
 default evaluate functions too as part of the unfolded form of the 
 objects?
Nadia: Well, this idea is disputable, of cause.
 In Java considering argument-less functions as variables is more 
 dangerous, because Java, like all C-family languages, encourages 
 side-effects in functions. However, in Daikon front-end for Java there 
 exists an option that allows using functions as variables. Together with 
 enabling this option user has to provide a special "purity" file that 
 lists those functions from the system, that are side-effect free. This 
 feature is described in Daikon User Manual, page 82.
 Considering functions as variables is even more relevant in Eiffel 
 context by two main reasons. Firstly, Eiffel encourages absence of 
 abstract side-effects in functions (and, I believe, most Eiffel 
 functions indeed do not have abstract side-effect, am I right?). For 
 those cases, when some functions in system are known to have 
 side-effect, we can allow user to specify an "impurity" file, that would 
 list those functions. I think that these cases would be rare.
 
   Ilinca: Indeed, I also think these cases would be rare, but, depending on the 
   size of the system, it might be very cumbersome for users to determine 
   these functions.
       Nadia: Well, a parameter may be introduced that will turn this facility on 
       and off. So, if a user knows that there are many impure functions in system, 
       he/she will just turn it off. I think, that its enough for the first version.
       
       In future some additional functionality may be introduced. In fact, a user may 
       not know about all impure functions in system (especially in code that was written 
       by someone else). The tool can help user to determine, did anything go wrong with 
       using functions in unfolding. For example, it may do instrumentation twice: first 
       time using only attributes and second time using functions also. As a result two 
       trace files are produces. Then the tool can compare attributes values in two files. 
       If they are the same, then with certain degree of confidence we can say, that called 
       functions had no abstract side effect.
       As I understand, precise analysis of abstract side-effect presence or absence in beyond 
       the state of the art. However, some methods which perform approximate analysis may be 
       introduced.
       Another interesting solution that came to mind is to store object's state each time before 
       calling a function and then recover object after the call. In this case we don't care about 
       function purity, we just get a result that this function might have had, if it was called at 
       this point :)
       Anyway, different partial solutions to this problem may be implemented and user may combine 
       different strategies to get a desired result with a desired degree of confidence.
       Conceptually, my point here is: why an "honest" Eiffel programmer, who never writes impure 
       functions, has to suffer (get less expressive contracts) because of other programmers, who 
       sometimes write impure functions?:)
           Ilinca: :) Indeed, the honest Eiffel programmer shouldn't suffer. So in conclusion, I 
           think we should go for the solution with the file containing the names of impure functions, 
           as Daikon does it. Adding the switch which turns it on and off is also a good idea. 
   Ilinca: Something else: what Daikon calls "purity" is actually 
   weak purity, isn't it? (Weak purity means that creations of local 
   variables are allowed in the body of a pure function. Strong purity 
   disallows this.)
       Nadia: Chicory (Java front end) doesn't check purity of functions that are listed in a purity 
       file. User is totally responsible for what he/she writes in this file. So, I would say that, what 
       Chicory calls purity isn't actually any special kind of purity. It only means that user allowed 
       Chicory to call this functions anywhere in the system, and if something goes wrong, it's user's 
       fault
 
 Nadia: Secondly, Eiffel advocates uniform access principle. Routine's clients 
 don't have to know, which queries, used in precondition, are attributes 
 and which are functions. If inferred preconditions contained only 
 attributes, it would be inconsistent with uniform access principle, I 
 think...
 As for troubles that may arise when calling functions... As I 
 understand, exception in a function, called when its precondition holds, 
 is a bug. So, if it happens, we can just tell user that we found a bug 
 in certain function and that before inferring contracts the bug should 
 be fixed.

Checking preconditions (static vs. dynamic)

 The issue with functions' preconditions that you've mentioned isn't a 
 big problem. If after unfolding we got a variable of the form, e.g., 
 `x.f1.f2', where `f1' and `f2' are functions, printing instructions for 
 this variable can be like this:
   if x.f1_precondition_holds and then x.f1.f2_precondition_holds
   then
      print (x.f1.f2)
   else
      print (Nonsensical)
   end
 
 "Nonsensical" is a special Daikon value that indicates that variable 
 cannot be evaluated.
 Function preconditions are known after parsing. To be able to evaluate 
 them separately, we may, for example, wrap them into functions. 
 `f1_precondition_holds' in the above code is an example of such a 
 function, that was introduced by instrumenter and added to x's 
 generating class.
 
   Ilinca: I don't think it's that easy, mainly because of polymorphism. Statically 
   you can't determine the type that x will be a direct instance of at 
   runtime. Hence, you don't know which preconditions to wrap. Of course, 
   since redefined routines can keep or weaken the preconditions of the 
   original routines, you can take the pessimistic approach of wrapping 
   only the precondition of the function from the static type of x, but 
   then you might miss cases in which the weakened version of the 
   precondition does hold. Do you understand my point?
       Nadia: Yeah, I see...
       I think that the main question here is: should we check "static" or "dynamic" 
       preconditions when calling a function?
       On the one hand, in Eiffel we can write something like:
       class A
          ...
          sqrt (x: REAL): REAL
              require
                  x >= 0.0
              do ... end
       end
       class B
          inherit A
             redefine sqrt end
          ...
          sqrt (x: REAL): REAL
              require else
                True
              do ... end
       end
       class TEST
           ...
           an_a: A
           make is
              do
                   create {B} an_a
                   print (an_a.sqrt (-1.0))
              end
       end
       It works, and assertion checking mechanism doesn't indicate any error.
       But, on the other hand, is it legal? Am I allowed to call a function if it's 
       static precondition doesn't hold?
       As I understand, if I want to make sure that it's legal to call a routine at some 
       program point, I have to check it's static precondition. Even if today I'm sure that 
       dynamic type of `an_a' is always `B', this needn't remain the same tomorrow...
           Ilinca: That's correct. If the "static" precondition of a routine holds, this 
           implies of course that the preconditions of all its redefinitions will also hold. 
           But I was calling this approach "pessimistic" because the precondition of a redefined 
           version of a routine may hold even though the "static" precondition doesn't.
       So, if a programmer wanted to know function value at a program point, he/she would check 
       its static precondition. Then why instrumenter should act differently? Well, actually 
       there is a difference:  instrumenter not only avoids function call when precondition 
       doesn't hold, but also claims that function value is nonsensical. From one point of view, 
       such behavior is incorrect, because function actually could be called and a sensible value 
       could be received. From the other point, since it was illegal to call the function this time, 
       it makes sense to declare function value nonsensical.
       To sum up, I think that it is rather a philosophical then a technical problem, but, as for me, 
       checking static preconditions is more appropriate.
       However, if we wanted to check a dynamic precondition, it's also possible. We can wrap function's 
       precondition into a special function and redefine that special function any time the original 
       function is redefined. It is possible since instrumenter has access to the whole system.
           Ilinca: Well, this would definitely be the more... correct solution, but it's also a lot more 
           work. Couldn't this also be done by not wrapping the function, but simply calling it and adding 
           some exception handling code to the unfolder? I mean, the unfolder would try to call all 
           argument-less functions of a class on an object, and, if it gets a precondition violation, 
           it catches the exception and it declares that particular function as nonsensical?
Nadia: I would like to return to the subject of checking "static" or "dynamic" preconditions 
before function calls in unfolding. I think that this problem arises because the semantics 
of special variable value "nonsensical" in Daikon is ambiguous.
In Java and C there are no preconditions, so this value is originally used in Daikon for the 
only purpose: if a pointer of reference is null, then the dereferenced value (and its attributes, 
if the value is not of primitive type) is declared "nonsensical". In Eiffel we will use 
"nonsensical" for this purpose too: if a reference entity is void, then its queries will be output 
as "nonsensical".
However, in Eiffel we have one more source of problems with evaluating variables: if a variable is 
a result of a function call, then function precondition may not hold, so we won't be able to 
evaluate its result. In this case the variable value should also be declared "nonsensical", because 
this is the only special value in Daikon, which denotes that variable can't be evaluated. However, 
in this case there appears a question: what does it mean that the variable can't be evaluated? Does 
it mean that something terrible will happen (e.g., exception), if we try to evaluate it? Or does 
it mean that it isn't legal to evaluate this variable in this program state?
If we take the first point, then we should check dynamic precondition, because if static precondition 
doesn't hold, but dynamic does, nothing terrible will happen during function call.
If we take the second point, we should check static precondition, because if it doesn't hold then 
it's illegal to call this function, even if dynamic precondition holds. This point seems more 
correct to me, and here is an example that illustrates the situation.
Consider class `INT_LIST' (with feature `item' that has a precondition `not off') and it's heir - 
class EXTRAPOLATED_INT_LIST (that weakens precondition of `item'):
class INT_LIST
...
feature
   item: INTEGER is
      require
         not off
      ...
      end
end
class EXTRAPOLATED_INT_LIST
inherit
   INT_LIST
   redefine
      item
   end
...
feature
   item: INTEGER is
      require else True
      do
         if not off then
            Result := Precursor
         else
            Result := 0
         end
      end
end
Consider also class CLIENT, which is a client of INT_LIST:
class CLIENT
...
feature
   test (a_list: INT_LIST) is
      do ... end
end
Imagine that during system execution, when contracts were inferred, actual argument of `test' was 
always a direct instance of EXTRAPOLATED_INT_LIST,  and in addition it was always passed to this 
procedure in such a state that cursor was before the first element (`off' was true). If we check 
dynamic preconditions for functions in unfolding, than a precondition, inferred for `test', will 
be something like this:
require
  a_list /= Void
  a_list.off
  a_list.item = 0
But this contract isn't correct (as for me), although it is, certainly, true for a specific system 
execution. It becomes a little more correct, if we infer:
require
  a_list /= Void
  a_list.off
  a_list.generator = "EXTRAPOLATED_INT_LIST"
  a_list.item = 0
However, such contracts (connected with dynamic type of variable) seem useless to me. One of the 
goals of dynamic contract inference is to eliminate those properties that happened by chance and 
are not intended properties of the code. Properties that are connected with dynamic types definitely 
happen by chance: they can't be intended properties, because if programmer wanted his software to 
have such a property, he would change static entity type to a more specific (based on descendant).
Such properties may be useful only if we treat them not as contracts but rather as some kind of 
runtime statistics. Programmer may be interested in information that is true for a specific system 
execution, e.g. for debug purposes. It might be useful to add such a feature (inference of dynamic 
type information and properties, connected with dynamic types) in future, but it isn't "classic" 
contract inference, as for me, so I don't think that we need it in the first version. And if we 
don't want to know anything about dynamic types, then we should check static preconditions of 
functions during unfolding.

Checking hand-written contracts

   Ilinca: There's one further problem with this approach: when a function 
   is called as part of the evaluation of a contract, this function's 
   contracts are not checked. The reason for this is to avoid infinite 
   loops. If you wrap your precondition into a function, the contracts of 
   any routines it calls will be evaluated, and thus the system can 
   actually behave differently.
   Should we check at all the hand-written contracts of a system for which CITADEL  
   is trying to infer contracts? We could also start from the assumption that Daikon 
   uses: the tool is given a set of passing test cases and from those it has to infer 
   contracts. If the inferred contracts somehow contradict the hand-written ones, then 
   there's a problem in the system, and the source of the  problem may be that the 
   hand-written contracts are faulty. What do you think?
   
       Nadia: I think that it makes sense to turn assertion checking off when running 
       instrumented system. This will solve the problem with functions inside wrapped 
       preconditions, which you've mentioned.
       In current version instrumented system is supposed to behave exactly the same way 
       as the original system did (except for creating a trace file). So, if a user was 
       sure about the original, hand-written contracts and wanted to check the system against 
       them, he/she could run the original system with assertion checking on, make sure that 
       everything is ok and after that perform contract inference.
       On the other hand, returning to using functions in unfolding: for these calls we have to 
       check hand-written preconditions, because they were introduced by instrumenter, so they are 
       not parts of "passing test cases". Fortunately, as you said before, erroneous preconditions 
       are very rare.

Checking preconditions (wrapping vs. qualification)

Nadia: I started implementing function preconditions checks in unfolding recently and realized 
that my first idea (to wrap these preconditions into functions) has an important disadvantage. 
Without creating new functions for preconditions if we instrument a class, then only the source 
code of that very class changes. However, if during class instrumentation we add functions to 
its clients, then the source code of those clients also changes. It isn't a disaster, of cause, 
but it's an unpleasant property, as for me. It would be nice, if class instrumentation affected 
only the class itself.
The other solution is not to wrap preconditions into functions, but to "qualify" them: if we want 
to check precondition before calling a function `x.f', we should take f's precondition and 
replaced all unqualified calls in it by qualified calls with target `x' and all occurrences of 
`Current' by `x'. This replacement is not difficult conceptually, but its implementation is quite 
voluminous (during this operation we have to replace some identifiers in an expression with function 
calls, so we need to redefine all `process_<something>' procedures of ET_AST_ITERATOR, where 
<something> can appear in expression and can contain identifiers).
What do you think about these two solutions ("wrapping into functions" and "qualification")?
   Ilinca: I think the best way to go is to wrap the precondition in a function in the class that 
   it belongs to. Wrapping it in a function in clients would break encapsulation: you would only 
   be allowed to access the exported features. I don't see any advantages of qualification over 
   function-wrapping, so I think it makes sense to choose the least-work solution
       Nadia: My previous explanation of the problem with "wrapping" was a little bit confused 
       (I wrote that functions have to be added to clients of instrumented class, but actually 
       I meant quite the opposite: that they should be added to suppliers:) )
       Wrapping a precondition in a function in the class that it belongs to is exactly what I 
       wanted to do firstly. The reason for it was that, if we instrument a class `C' and it has, 
       say, an attribute `x: LIST[INTEGER]', then unfolded form of `Current' variable will include 
       a variable `x.first'. Precondition of function `first' in class LIST is `not empty'. However, 
       we need to check this precondition not inside LIST, but inside routines of class `C'. In fact, 
       printing instruction for variable `x.first' should be guarded by condition like this:
       if x /= Void and then not x.empty then
          print (x.first)
       end
       We can't just take a precondition from `first' and insert it into the guarding condition as is, 
       instead we should "qualify" the call to `empty' with target `x'.
       Firstly qualification seemed too complex to me, so I decided to wrap `not empty' into function, say,
       `first_precondition: BOOLEAN' and add it to class `LIST', so that inside `C' I can check simpler 
       condition before printing x.first:
       if x /= Void and then x.first_precondition then
           print (x.first)
       end.
       However, as you can see, it involves modifying class `LIST' (and possibly any other supplier of 
       `C') while instrumenting `C'. If only `C' changed, then we could print only its source code after 
       instrumentation, we could compile only `C' (if incremental compilation is used) and don't recompile 
       base library, we would have no problems with tracking added functions (the problems are like this: 
       if we instrument class `LIST' after `C', we probably don't want neither to instrument added functions, 
       nor to see them in unfolded form of `Current' variable of class `LIST'). As for me, wrapping adds 
       surplus dependencies into the system, and software never benefits from dependencies :)
       That is why later I inclined to "qualification" solution. Actually, I have already mostly implemented 
       it:) Current implementation might be incomplete, there are several syntactic constructs that it doesn't 
       handle, because I'm not sure that they can appear in preconditions, but I'll check it and fix the problem.
           Ilinca: Ok, so what I was thinking about was that you wanted to wrap the whole precondition of 
           `first' into a function in class C. So, in class C you would have a routine
           precondition_first (l: LIST[G]): BOOLEAN
                require
                     l_not_void: l /= Void
                do
                     Result := not l.empty
                end
           Then the guard for printing x.first would be
                if x /= Void and then precondition_first (x) then
                     print (x.first)
                end
           But of course whether or not you wrap the precondition in a function or just quote it in the if 
           statement does not make a big difference.
One more question: should the tool allow instrumenting library classes? If a library was precompiled, 
then instrumenting any of its classes will result in recompiling the whole library, am I right? Are 
precompiled libraries anyway recompiled during system finalization or not? (It matters for choosing 
between "wrapping into functions" and "qualification" solutions, because "wrapping into functions" can 
change library classes, even if user wants to implement only classes from his own clusters).

Support for loop invariants

Ilinca: Does the "classic" Daikon support loop invariants?
Nadia: Daikon uses standard program point 
 suffixes to infer more invariants. E.g., it understands that 
 some_routine:::ENTER and some_routine:::EXIT are entry and exit points 
 of the same routine. Thanks to this, it knows `old' values of variables 
 at exit and can infer invariants like "x = old x". As I understood, if 
 Daikon sees an unknown suffix, it treats corresponding program point as, 
 so to say, "standalone" and doesn't try to connect it with any other.
 We don't need any extra support from Daikon to handle loop invariants.
   Ilinca: You'll have to pay attention though to inserting the corresponding 
   program point(s) at the right location(s). A loop invariant has to hold after 
   initialization (the "from" clause) and be preserved by the loop body, i.e. it 
   must hold after every execution of the loop body, including when the exit 
   condition becomes true. So actually, I guess the program point should be in 
   the "until" clause, because it gets evaluated after initialization and then 
   after every execution of the loop body. What do you think?
       Nadia: `Until' clause is conceptually the right place, but instructions 
       can't be inserted in there. So printing instructions can be inserted at 
       the end of `from' clause and at the end of  `loop' clause. As I understand, 
       all variable values that are observed in `until' condition appear at one of 
       those points.
 
 Nadia: If we want to infer loop invariants, we should introduce a separate 
 program point for each loop. So, we need a way to distinguish loops 
 within a routine. One way is to take its position in source code as 
 index. However, I thought, that position is too unstable to become loop 
 identifier, so I chose instruction number instead. These numbers are 
 unique numbers of instructions within surrounding routine, they may be 
 used not only for loops, but for arbitrary instructions. 
 CI_INSTRUCTION_COUNTER is responsible for counting instruction while 
 iterating through AST.

Inferring contracts for generic classes

Nadia: there are actually two different cases.
 1. Contracts inside generic classes. This is not really a problem. 
 Entities, which have parameter type, can be treated as if their 
 generator is constraining class (ANY, if genericity is unconstrained).
 
 2. Contracts in clients of generic classes. Here some problems arise. If 
 we have an entity `x: ARRAYED_LIST[INTEGER]', its unfolding involves 
 processing the `item' query. `item' is declared to be of type `G' in 
 ARRAYED_LIST and, as I understand, parsing is not enough (some type 
 analysis is needed) to infer that `G' means `INTEGER' in current 
 context. Two approaches to this problem, that I see at present, are: 
 perform such a type analysis in compile time (with the help of gobo 
 compilation facilities) or leave some unfolding work until runtime, 
 when reflections may be used to determine variable's generating class.
     Ilinca: As far as I know gobo is able to do this kind of analysis. 
     I think it's preferable that all variables get unfolded statically 
     -- it would be a cleaner system design. Besides, doesn't Daikon need 
     the .decls files as input?
         Nadia: I also think that static unfolding is better.
         As for .decls files: if a batch version of Daikon is used, then Daikon 
         itself is run only after instrumented system terminates. A .decls file 
         thus may be generated not only during instrumentation, but also during 
         instrumented system execution.
         If an on-line Daikon version is used, Daikon is executed simultaneously 
         with instrumented system. However, even in this case it is possible to 
         generate program point declarations during runtime. Actually, Daikon 
         allows to place declarations not only in separate .decls file, but also 
         inside the .dtace file, together with program point samples. The only 
         requirement is that a declaration of a program point must appear before 
         any sample of this program point.
         On my opinion, a separate .decls file, generated during instrumentation 
         phase, is a better design decision.


Experiment

EXPERIMENTS

Experiment 1: Take c classes from Base and Gobo and run CITADEL on them first with a test suite of size n1, then with a test suite of size n2, then with a test suite of size n3 (where n1 < n2 < n3). ni represents the number of calls to routines of the class under test contained in the test suite. c = 10 (or 20?) n1 = #routines of the class under test n2 = 2 * n1 n3 = 3 * n1

How to choose the c classes: - They should have various sizes (LOC, #routines, depth of inheritance tree) - They should have various semantics - They should also contain routines missing preconditions/postconditions. - They should also contain routines using loops. - They should also contain contracts (either already given or intended) that can only be expressed using functions.


Observations: - We will manually create the n1 test cases, then add n2-n1 which test other cases - so hopefully and informally also increase coverage - and then add n3-n2 test cases which also cover other cases. - We take test suite size as a measure of its quality. We will also measure coverage (manually) and then see if there's any relation between increased coverage (as the extended test suites will probably have) and better contract inference. - The test suite size as proposed initially might be too small, but we can increase it during the experiment if we see that we're not getting results that are relevant enough. - It would be interesting to compare results of experiment for library classes and classes from ordinary application (open source written by someone else), because manually written contracts for library classes must be of higher quality. We will run this experiment too if we have the time.


ASSUMPTIONS

Assumption 1: Daikon finds more assertions than programmers write. How to check: Look for cases in which the programmer did not write any assertion (for a routine pre- or postcondition, class invariant or loop invariant) but Daikon correctly inferred one and see how often this happens. Note: We expect this will be mostly true for loop invariants. Experiment: 1

Assumption 2: Daikon-inferred assertions are generally stronger than the user-supplied assertions expressible in Daikon's grammar, but as test suite size and coverage increases, they "converge" to user-supplied ones. How to check: Try to find inclusion relations between the sets of states allowed by Daikon-inferred and by user-written contracts. Note: It would be interesting to see how many of the user-written assertions are not expressible in Daikon's grammar. Experiment: 1

Assumption 3: Daikon-inferred assertions are sometimes incompatible with programmer-written contracts. How to check: Try to find exclusion relations between the sets of states allowed by Daikon-inferred and by user-written contracts. Experiment: 1

Assumption 4: Most assertions that were inferred by Daikon follow one of a small number of patterns (e.g., equality, `/= Void', comparisons between printable variables), while most assertions that were not inferred contain one- or two-argument functions from the code. How to check: Take user-supplied assertions that were inferred by Daikon and those that were not, examine the nature of assertions from both sets and try to find what is specific about each set. Measure the exact percentage of assertions from both sets that have the assumed property. Interpretation of the conclusion (if it proves correct): Daikon does all the routine work for the developer (nobody likes writing these endless `an_arg /= Void' in preconditions or `attr = an_arg' after setter routines), however it still leaves the creative part of work to human. It would also give some empirical grounds for the idea to include more functions in unfolding. Experiment: 1

Assumption 5: Comparing inferred assertions to user-supplied ones can be used to improve test-suites (and, as a consequence, finding and fixing bugs). How to check: Try extending test-suite until inferred precondition is implied by user-defined one. Try on a few existing classes -- if we find more bugs in them with the extended test suite than with the original one, then we're done. If this doesn't work for existing classes, we will create some realistic examples of our own to illustrate the point.

Note: Test cases are created manually and preconditions are compared by human. Experiment: 1 modified: start with a test suite of size n1. We then infer assertions with this small test suite and inspect them, picking out those, that are likely to be test suit properties. Finally we extend the existing test suite with cases that violate those properties. If we do not find more faults in the classes under test this way, we'll have to create our own examples. (This approach will most likely produce the most relevant inferred assertion, however neither it can be used with existing test suites, nor can such test suites be easily constructed automatically.)


GENERAL CONCLUSIONS:

  • Daikon does a better/worse job than programmers at coming up with contracts.
  • Daikon can/cannot be used to strengthen already existing contracts.
  • Daikon can/cannot be used to correct already existing contracts.
  • Comparing Daikon's output to user-written contracts allows/does not allow improving test suites.


( The following 2 assumptions are probably not verifiable in a reliable manner, because they are based on us having access to the "real" or "full" specification, which we can only approximate. I record them here for reference anyway. Assumption 6: Daikon does not infer fully specified contracts. How to check: Manually provide "full" specifications for the subject classes and check if Daikon found all of them. Nadia:: As I said before, I'm afraid it won't be able to infer all assertions even from a regular user-supplied contract, not to mention full specification...

Ilinca:: Well that's fine, it's also an interesting result. But as I mentioned above, I'm afraid we won't be able to make any reasonable estimation of what the "full" specification would be.

Assumption 7: Daikon more often determines contracts stronger than the real specification than it determines contracts weaker than the real specification. How to check: Try to find relations between the sets of states allowed by Daikon-inferred and by the "real" contracts. (Note that the "real" contracts are not the ones written by programmers, which might be buggy too, but the contracts as implied by the intended specification of the software.) Nadia:: I think that we rarely can find any implication between real and inferred contracts. To say informally, inferred contracts are stronger, as they also state properties of a test suit, and at the same time they are weaker, as not all assertions can be expressed by Daikon...

Ilinca:: That's a good point, we probably won't get a black-or-white result anyway.

)


OBSERVATIONS: - Assumption 1 is purely quantitative, the rest are qualitative. - Each of the above assertions depends on the following parameters: - number of tests given to Daikon - quality of tests given to Daikon (traditional coverage criteria don't fit in this case [4, 5]) - The set of inferred assertions is itself a useful measure of test suite quality. It allows using it for test suit improvement. And what is even better there is no dilemma, whether to use assertion inference for test suit improvement OR for contract strengthening. It can be used for both simultaneously. - We might want to check each of the above assumptions separately for preconditions, postconditions and class invariants.



OTHER QUESTIONS: - How does the set of templates used by Daikon influence its results? - How can we improve Daikon's results? - The set of expressible assertions depends mainly on two things: Daikon's grammar (I mean, the set of assertion templates) and the set of variables at program point (those that were produced by instrumenter + those that are derived by Daikon). In Eiffel assertion grammar is rather simple, all its expressive power is "hidden" in ability to use function calls. So, to make Daikon's output close to what Eiffel developers write by hand, we shouldn't extend its grammar (I think, that we could even simplify it a bit...), instead we should make it use more functions from the code of processed system (as you know, currently only zero-argument functions are used). Instrumenter could include not only zero-argument functions in unfolding, but also functions with 1 or even 2 arguments (other variables in scope and also some predefined constants could serve as actuals). Of cause, it will significantly increase the number of variables at program point, however in this case we can: - turn off variable derivation inside Daikon (Daikon doesn't know anything about context, its variable derivation templates are too common, so they are unlikely to be more relevant than function calls) - simplify Daikon's grammar. I guess we can eliminate a number of templates from Daikon grammar and still get more relevant assertions inferred (I mean, "using functions + less templates" is better than "not using functions + more templates"). At the same time, eliminating assertion templates considerably saves time: e.g., if we get rid of ternary invariants, worst inference time reduces from cubic to square in the number of variables. Including functions with number of arguments up to 1 we'll already get much information, and up to 2 - we'll have almost the whole picture :) - Are unit tests or system tests better for Daikon? Note: Nimmer & Ernst state in [1] that system tests are much better for Daikon than unit tests, where they define unit tests as "tests that check specific boundary values of procedures in a single module in isolation". This explains why they were not successful: Daikon observed not a typical behavior of a module, but rather a small set of special case behaviors; while during system tests typical and valid behavior is observed, and there are many samples. If we consider unit tests in a wider sense, i.e. including not only a small number of boundary-value tests, but also tests with input values distributed uniformly throughout routine's domain, we'll get quite good results.



FACTS from already performed evaluations of Daikon: - Daikon seems to produce, even from relatively small test suites, assertions that are consistent and sufficient for automatic verification (proving the absence of runtime errors) with very little change. [1] - Daikon does not scale, because: - it writes traces to disk, hence it can easily run out of storage space [1] (Note: Even if we have a large system and a number of large system tests that execute all parts of system, nobody makes us instrument the whole system at once! We can instrument it class by class and each time get a small trace. Inferred assertions for different classes are independent.) - invariant detection time is cubic in the number of variables in scope at a program point, linear in the number of times a program point is executed, and linear in the number of instrumented program points [2]. The number of variables over which invariants are checked is the most important factor. - Randomly generated tests are not good for inferring assertions [2] - Users' experience with Daikon [3, chapter 6]: - Using Daikon does not speed up or slow down users trying to annotate programs with assertions, but it improves other factors: - recall (how many of the assertions written by users are correct) - bonus (ratio of verifiable assertions to the minimal set) - Half of the users considered Daikon to be helpful. They used the generated assertions as support for finding others. - 1/3 of the users complained about the textual size of the generated assertions. - More than half the users found removing incorrect assertions easy. - Existing code coverage criteria (branch coverage, definition-use pair coverage) don’t provide test suites that are good enough for invariant detection (they produce many false, test suite dependent invariants). However, test suites that satisfy these traditional criteria produce more relevant assertions than random [4, 5].


Bibliography [1] Nimmer, Erst, "Automatic Generation of Program Specifications" [2] Nimmer, Ernst, "Invariant Inference for Static Checking: An Empirical Evaluation" [3] Ernst, PhD thesis [4] Gupta "Generating test data for dynamically discovering likely program invariants" [5] Gupta, Heidepriem "A new structural coverage criterion for dynamic detection of program invariants" [6] Perkins, Ernst Efficient Incremental Algorithms for Dynamic Detection of Likely Invariants