Difference between revisions of "Interfacing the Garbage Collector"

m (The loc_stack)
m
 
(18 intermediate revisions by 3 users not shown)
Line 1: Line 1:
[[Category:Compiler]]
+
[[Category:Code Generation]]
{{Warning|'''Warning''': Article under development}}
+
{{UnderConstruction}}
  
 
This article tries to explain, how the EiffelStudio GC works together with the runtime. It should help the programmer to better understand the C and byte code generated by EiffelStudio.
 
This article tries to explain, how the EiffelStudio GC works together with the runtime. It should help the programmer to better understand the C and byte code generated by EiffelStudio.
 
The exact mechanisms behind the GC are not of concern here.
 
The exact mechanisms behind the GC are not of concern here.
  
===Introduction===
+
====Introduction====
 
Most important thing first, EiffelStudio has a mark and sweep moving Garbage Collector (GC) (instead of moving GC one may say compacting GC).  
 
Most important thing first, EiffelStudio has a mark and sweep moving Garbage Collector (GC) (instead of moving GC one may say compacting GC).  
  
A mark and sweep GC traverses during the mark phase all the objects, that are reachable from a given root set (locals, arguments uws...). After that, the not reached objects are detected as garbage and may be freed. To avoid memory fragmentation the GC may move around the reached objects.  
+
A mark and sweep GC traverses during the mark phase all the objects, that are reachable from a given root set (locals, arguments etc...). After that, the not reached objects are detected as garbage and may be freed. To avoid memory fragmentation the GC may move around the reached objects.  
  
Thats about all there is to know for interfacing the GC, for modifying or even extending him a little more in-deft might be handy but is not provided here.
+
The ES GC actually moves objects during the marking phase and thus doesn't need a special sweep traversal.
 +
 
 +
Thats about all there is to know for interfacing the GC, for modifying or even extending it a little more in-deft might be handy but is not provided here.
  
 
To recap, the GC needs to know the current root set and enough information to traverse the objects currently on the heap. The runtime and the generated code on the other side need to be able to cope with objects that change their memory location after a GC run.
 
To recap, the GC needs to know the current root set and enough information to traverse the objects currently on the heap. The runtime and the generated code on the other side need to be able to cope with objects that change their memory location after a GC run.
  
===The root set===
+
====The root set====
The root set of EiffelStudio is surprisingly complex. Only the more interesting part is shown here.  
+
The root set of EiffelStudio is surprisingly complex. Only the more interesting part is shown here. It is consisting of several stacks that will be presented here.
  
 
====The loc_set====
 
====The loc_set====
Line 26: Line 28:
 
|-valign="top" -halign="center"
 
|-valign="top" -halign="center"
 
|
 
|
<code>[eiffel,N]foo (s: STRING; i: INTEGER): LIST [ANY]
+
<eiffel>
 +
foo (s: STRING; i: INTEGER): LIST [ANY]
 
   local
 
   local
 
       b: BOOLEAN
 
       b: BOOLEAN
Line 33: Line 36:
 
       ...
 
       ...
 
   end
 
   end
</code>
+
</eiffel>
 
|
 
|
 
|}
 
|}
Line 44: Line 47:
 
|-valign="top" -halign="center"
 
|-valign="top" -halign="center"
 
|
 
|
<code>[c,N]/* foo */
+
<c>
 +
/* foo */
 
EIF_REFERENCE Fadkw7f (EIF_REFERENCE Current,  
 
EIF_REFERENCE Fadkw7f (EIF_REFERENCE Current,  
 
                       EIF_REFERENCE arg1,  
 
                       EIF_REFERENCE arg1,  
Line 63: Line 67:
 
   return Result;
 
   return Result;
 
}
 
}
</code>
+
</c>
 
|
 
|
 
|}
 
|}
  
Note, that the macro RTLR pushes the address of its parameter on loc_set and not the parameter itself. This makes it possible for the GC to rewrite the protected references when he moves the objects.
+
Note, that the macro RTLR pushes the address of its parameter on loc_set and not the parameter itself. This makes it possible for the GC to rewrite the protected references when it moves the objects.
  
 
====The loc_stack====
 
====The loc_stack====
Line 75: Line 79:
 
*RT_GC_PROTECT
 
*RT_GC_PROTECT
 
*RT_GC_WEAN
 
*RT_GC_WEAN
RT_GC_PROTECT pushes the pointer of a reference onto the stack and RT_GC_WEN pops the top entry of the stack.
+
RT_GC_PROTECT pushes the pointer of a reference onto the stack and RT_GC_WEAN pops the top entry from the stack.
 +
 
 +
This stack is managed by the runtime itself and not by generated code (as opposed to the loc_set).
 +
 
 +
====The op_stack====
 +
The op_stack is the stack the interpreter (remember that this is a stack machine) works with when it executes byte code (melted code). Objects referenced by entries of the op_stack should thus better not be collected. They may be moved around though as long as the references on the stack are modified accordingly.
 +
 
 +
The op_stack not only contains references but also primitive types. Since the elements are typed (they have the type EIF_TYPED_ELEMENT) this is not a problem since every entry has knowledge about its type.
 +
 
 +
The elements that on the op_stack that have a reference type are part of the root set.
 +
 
 +
In finalized mode there is no op_stack.

Latest revision as of 13:25, 28 January 2008

Construction.png Not Ready for Review: This Page is Under Development!

This article tries to explain, how the EiffelStudio GC works together with the runtime. It should help the programmer to better understand the C and byte code generated by EiffelStudio. The exact mechanisms behind the GC are not of concern here.

Introduction

Most important thing first, EiffelStudio has a mark and sweep moving Garbage Collector (GC) (instead of moving GC one may say compacting GC).

A mark and sweep GC traverses during the mark phase all the objects, that are reachable from a given root set (locals, arguments etc...). After that, the not reached objects are detected as garbage and may be freed. To avoid memory fragmentation the GC may move around the reached objects.

The ES GC actually moves objects during the marking phase and thus doesn't need a special sweep traversal.

Thats about all there is to know for interfacing the GC, for modifying or even extending it a little more in-deft might be handy but is not provided here.

To recap, the GC needs to know the current root set and enough information to traverse the objects currently on the heap. The runtime and the generated code on the other side need to be able to cope with objects that change their memory location after a GC run.

The root set

The root set of EiffelStudio is surprisingly complex. Only the more interesting part is shown here. It is consisting of several stacks that will be presented here.

The loc_set

The loc_set is a stack that contains all the arguments and locals of a reference type of the currently active features. Active features, are features that are currently being executed. Only the references of frozen code are on this stack, for melted code the stack is not touched!

The loc_set is a subset of the root set.

The following two snippets shows the feature foo:

foo (s: STRING; i: INTEGER): LIST [ANY]
   local
      b: BOOLEAN
      a: ANY
   do
      ...
   end

This feature has two arguments, two locals and a return value. The ones that are of a reference type, namely s, a and the return value, need to be protected during the execution of feature foo. They are pushed on the loc_set stack and thus become part of the root set as long as the frame is active.

The following snippet shows the part of the workbench translation of foo that handles the stack management of loc_set::

/* foo */
EIF_REFERENCE Fadkw7f (EIF_REFERENCE Current, 
                       EIF_REFERENCE arg1, 
                       EIF_INTEGER_32 arg2) 
{
   EIF_BOOLEAN loc1 = (EIF_BOOLEAN) 0;
   EIF_REFERENCE loc2 = (EIF_REFERENCE) 0;
   EIF_REFERENCE Result = (EIF_REFERENCE) 0;
 
   RTLI(4);                //Reserve space for 4 references on the loc_stack
   RTLR(0,arg1);           //Push the address of s onto loc_stack
   RTLR(1,Current);        //Push the address of Current onto loc_stack
   RTLR(2,loc2);           //Push the address of a onto loc_stack
   RTLR(3,Result);         //Push the address of the Result onto loc_stack
   ...
 
   RTLE;                   //Pop the 4 entries from the loc_stack.
   return Result;
}

Note, that the macro RTLR pushes the address of its parameter on loc_set and not the parameter itself. This makes it possible for the GC to rewrite the protected references when it moves the objects.

The loc_stack

The loc_stack contains pointers to references used by the runtime that need to be protected. The entries of this stack also build a subset of the root set.

This stack is managed with the macros:

  • RT_GC_PROTECT
  • RT_GC_WEAN

RT_GC_PROTECT pushes the pointer of a reference onto the stack and RT_GC_WEAN pops the top entry from the stack.

This stack is managed by the runtime itself and not by generated code (as opposed to the loc_set).

The op_stack

The op_stack is the stack the interpreter (remember that this is a stack machine) works with when it executes byte code (melted code). Objects referenced by entries of the op_stack should thus better not be collected. They may be moved around though as long as the references on the stack are modified accordingly.

The op_stack not only contains references but also primitive types. Since the elements are typed (they have the type EIF_TYPED_ELEMENT) this is not a problem since every entry has knowledge about its type.

The elements that on the op_stack that have a reference type are part of the root set.

In finalized mode there is no op_stack.