Interfacing the Garbage Collector
Warning: Warning: Article under development
This article tries to explain, how the EiffelStudio GC works together with the runtime. It should help the programmer to better understand the C and byte code generated by EiffelStudio. The exact mechanisms behind the GC are not of concern here.
Introduction
Most important thing first, EiffelStudio has a mark and sweep moving Garbage Collector (GC) (instead of moving GC one may say compacting GC).
A mark and sweep GC traverses during the mark phase all the objects, that are reachable from a given root set (locals, arguments uws...). After that, the not reached objects are detected as garbage and may be freed. To avoid memory fragmentation the GC may move around the reached objects.
Thats about all there is to know for interfacing the GC, for modifying or even extending him a little more in-deft might be handy but is not provided here.
To recap, the GC needs to know the current root set and enough information to traverse the objects currently on the heap. The runtime and the generated code on the other side need to be able to cope with objects that change their memory location after a GC run.
The root set
The root set of EiffelStudio is surprisingly complex. Only the more interesting part is shown here.
The loc_set
The loc_set is a stack that contains all the arguments and locals of a reference type of the currently active features. Active features, are features that are currently being executed. Only the references of frozen code are on this stack, for melted code the stack is not touched!
The loc_set is a subset of the root set.
The following two snippets shows the feature foo:
foo (s: STRING; i: INTEGER): LIST [ANY] local b: BOOLEAN a: ANY do ... end |
This feature has two arguments, two locals and a return value. The ones that are of a reference type, namely s, a and the return value, need to be protected during the execution of feature foo. They are pushed on the loc_set stack and thus become part of the root set as long as the frame is active.
The following snippet shows the part of the workbench translation of foo that handles the stack management of loc_set::
/* foo */ EIF_REFERENCE Fadkw7f (EIF_REFERENCE Current, EIF_REFERENCE arg1, EIF_INTEGER_32 arg2) { EIF_BOOLEAN loc1 = (EIF_BOOLEAN) 0; EIF_REFERENCE loc2 = (EIF_REFERENCE) 0; EIF_REFERENCE Result = (EIF_REFERENCE) 0; RTLI(4); //Reserve space for 4 references on the loc_stack RTLR(0,arg1); //Push the address of s onto loc_stack RTLR(1,Current); //Push the address of Current onto loc_stack RTLR(2,loc2); //Push the address of a onto loc_stack RTLR(3,Result); //Push the address of the Result onto loc_stack ... RTLE; //Pop the 4 entries from the loc_stack. return Result; } |
Note, that the macro RTLR pushes the address of its parameter on loc_set and not the parameter itself. This makes it possible for the GC to rewrite the protected references when he moves the objects.
The loc_stack
The loc_stack contains pointers to references used by the runtime that need to be protected. The entries of this stack also build a subset of the root set.
This stack is managed with the macros:
- RT_GC_PROTECT
- RT_GC_WEAN
RT_GC_PROTECT pushes the pointer of a reference onto the stack and RT_GC_WEN pops the top entry from the stack.
This stack is managed by the runtime itself and not by generated code (as opposed to the loc_set).
The op_stack
The op_stack is the stack the interpreter (remember that this is a stack machine) works with when it executes byte code (melted code). Objects referenced by entries of the op_stack should thus better not be collected. They may be moved around though as long as the references on the stack are modified accordingly.
The op_stack not only contains references but also primitive types. Since the elements are typed (they have the type EIF_TYPED_ELEMENT) this is not a problem since every entry has knowledge about its type.