Difference between revisions of "Melting Ice Technology"
m (→Compiled versus interpreted) |
Peter gummer (Talk | contribs) m (→Compiled versus interpreted) |
||
(23 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
− | [[Category: | + | [[Category:Compiler]] |
− | {{ | + | {{UnderConstruction}} |
− | + | ====Introduction==== | |
There are two main criteria for any good compiler. Both the compilation and the compiled program need to be fast. There is a trade off, the more time a compiler takes to make optimizations, the faster the compiled program is and the slower the compilation process becomes. | There are two main criteria for any good compiler. Both the compilation and the compiled program need to be fast. There is a trade off, the more time a compiler takes to make optimizations, the faster the compiled program is and the slower the compilation process becomes. | ||
− | Fortunately the need for fast compilation and fast compiled programs occurs at different times during the development cycle. When the developer is incrementally writing and testing a program he needs short compilation time to be productive. At the end, when his work is thoroughly tested and ready to ship there can a slower compilation that generates a very fast delivery. | + | Fortunately the need for fast compilation and fast compiled programs occurs at different times during the development cycle. When the developer is incrementally writing and testing a program he needs short compilation time to be productive. At the end, when his work is thoroughly tested and ready to ship there can be a slower compilation that generates a very fast delivery. |
− | EiffelStudio exploits this by providing two basic compilation modes: workbench and finalized mode. | + | EiffelStudio's compiler, known from here on in as "Eiffel compiler", exploits this by providing two basic compilation modes: workbench and finalized mode. |
The C code generated by the compiler looks different for workbench and finalized mode. The terms workbench code and finalized code are used to refer to the corresponding generated C code. | The C code generated by the compiler looks different for workbench and finalized mode. The terms workbench code and finalized code are used to refer to the corresponding generated C code. | ||
Line 14: | Line 14: | ||
* It supports precompiles. | * It supports precompiles. | ||
− | Finalized code has only two advantages, | + | Finalized code has only two advantages, footprint and speed. A small footprint contributes to speed due to better cache efficiency. Finalized code can only be debugged at the C level. This article focuses on the workbench mode of the Eiffel compiler. |
====Compiled versus interpreted==== | ====Compiled versus interpreted==== | ||
− | + | The Eiffel compiler was designed with the following principle in mind: when a programmer makes a small change he expects a short recompilation time, and when he makes a big change he will accept some waiting time. | |
Traditionally, the fastest programming environments (in terms of recompilation time) were interpreted languages (like VisualBasic). Unfortunately interpreted languages are not competitive performance wise. | Traditionally, the fastest programming environments (in terms of recompilation time) were interpreted languages (like VisualBasic). Unfortunately interpreted languages are not competitive performance wise. | ||
− | The designers of | + | The designers of the Eiffel compiler circumnavigated the trade off by taking the best of both approaches. Thus, in workbench mode the Eiffel compiler is half compiler and half interpreter. The technology behind this is called ''melting ice''. |
+ | |||
+ | An Eiffel system compiled in workbench mode consists of both frozen and melted code. Frozen code is code that is translated to C code. Melted code is not yet translated to C code but to a form of byte code that has to be interpreted. This byte code is called Eiffel byte code (EBC). | ||
+ | |||
+ | This yields two requirements to the Eiffel runtime. It needs both to be able to execute the EBC and to handle calls from frozen code into melted code and vice versa. | ||
+ | |||
+ | ====The interpreter and the byte code==== | ||
+ | The EBC interpreter is a stack machine. The byte code itself is not very different from .NET byte code or Java Byte Code. | ||
+ | |||
+ | This article won't explain the byte codes. This is partially done in [[Byte Code]]. To give a general idea the Fibonacci feature and its EBC are shown: | ||
+ | |||
+ | {|border="0" cellpadding="2" cellspacing="0" align="center" | ||
+ | |-valign="top" -halign="center" | ||
+ | | | ||
+ | <code>[eiffel,N]fibonacci (n: INTEGER): INTEGER | ||
+ | require | ||
+ | n >= 0 | ||
+ | do | ||
+ | if n = 0 then | ||
+ | Result := 0 | ||
+ | elseif n = 1 then | ||
+ | Result := 1 | ||
+ | else | ||
+ | Result := | ||
+ | fibonacci (n - 1) + | ||
+ | fibonacci (n - 2) | ||
+ | end | ||
+ | end</code> | ||
+ | | | ||
+ | | | ||
+ | <code> 1: BC_START | ||
+ | Routine Id : 186 | ||
+ | Body Id : 185 | ||
+ | Result Type : [INTEGER_32] | ||
+ | Nr. args : 1 | ||
+ | Nr. locals : 0 | ||
+ | 18: BC_NO_CLONE_ARG | ||
+ | Routine name : fibonacci | ||
+ | Written : 9 | ||
+ | 32: BC_PRECOND offset 23 | ||
+ | 37: BC_HOOK 1 | ||
+ | 42: BC_ASSERT <8, 16> | ||
+ | 45: BC_ARG 1 | ||
+ | 48: BC_INT32 0 | ||
+ | 53: BC_GE | ||
+ | 54: BC_END_PRE offset 0 | ||
+ | 59: BC_RAISE_PREC | ||
+ | 60: BC_HOOK 2 | ||
+ | 65: BC_ARG 1 | ||
+ | 68: BC_INT32 0 | ||
+ | 73: BC_EQ | ||
+ | 74: BC_JMP_F 16 | ||
+ | 79: BC_HOOK 3 | ||
+ | 84: BC_INT32 0 | ||
+ | 89: BC_RASSIGN | ||
+ | 90: BC_JMP 80</code> | ||
+ | | | ||
+ | | | ||
+ | <code> | ||
+ | 95: BC_HOOK 4 | ||
+ | 100: BC_ARG 1 | ||
+ | 103: BC_INT32 1 | ||
+ | 108: BC_EQ | ||
+ | 109: BC_JMP_F 16 | ||
+ | 114: BC_HOOK 5 | ||
+ | 119: BC_INT32 1 | ||
+ | 124: BC_RASSIGN | ||
+ | 125: BC_JMP 45 | ||
+ | 130: BC_HOOK 6 | ||
+ | 135: BC_ARG 1 | ||
+ | 138: BC_INT32 1 | ||
+ | 143: BC_MINUS | ||
+ | 144: BC_CURRENT | ||
+ | 145: BC_FEATURE fid 36 [9: HELLO_WORLD] | ||
+ | 154: BC_ARG 1 | ||
+ | 157: BC_INT32 2 | ||
+ | 162: BC_MINUS | ||
+ | 163: BC_CURRENT | ||
+ | 164: BC_FEATURE fid 36 [9: HELLO_WORLD] | ||
+ | 173: BC_PLUS | ||
+ | 174: BC_RASSIGN | ||
+ | 175: BC_HOOK 7 | ||
+ | 180: BC_NULL </code> | ||
+ | |} | ||
+ | |||
+ | The byte code is not that difficult to understand. Most instructions have an effect on the interpreter stack. Instruction 124 for example pops one value from the stack and saves it to the result of the current feature. | ||
+ | |||
+ | ====Patterns to call the other side==== | ||
+ | Every feature call in workbench mode is a possible transition out of or into interpreted code. This transitions are handled by so called patterns (term comes from the idea of calling patterns). | ||
+ | |||
+ | Patterns that go out of interpreted code have the prefix ''toc'' (to C code) and the ones that go into interpreted code ''toi'' (to interpreted code). | ||
+ | |||
+ | The following snippet shows the ''toi'' pattern that could be used for the Fibonacci feature (or any other feature that expects an INTEGER_32 as argument and returns an INTEGER_32): | ||
+ | |||
+ | {|border="0" cellpadding="2" cellspacing="0" align="center" | ||
+ | |-valign="top" -halign="center" | ||
+ | | | ||
+ | <code>[c,N]static EIF_INTEGER_32 toi49 (EIF_REFERENCE Current, EIF_INTEGER_32 arg1) { | ||
+ | GTCX | ||
+ | struct item *it; | ||
+ | it = iget(); // Push arg1 on the interpreters stack | ||
+ | it->type = SK_INT32; | ||
+ | it->it_int32 = arg1; | ||
+ | it = iget(); // Push Current on the interpreters stack | ||
+ | it->type = SK_REF; | ||
+ | it->it_ref = Current; | ||
+ | xinterp(IC); // Call the interpreter | ||
+ | it = opop(); // Pop the result from the interpreters stack | ||
+ | return it->it_int32; // Return the result in the C way | ||
+ | }</code> | ||
+ | | | ||
+ | |} | ||
+ | |||
+ | Arguments that were passed to the C function are pushed onto the interpreters stack. After that the interpreter is called and finally the result is popped from the interpreter stack and returned to the caller. | ||
+ | |||
+ | What follows is the pattern that goes out of interpreted code: | ||
+ | |||
+ | {|border="0" cellpadding="2" cellspacing="0" align="center" | ||
+ | |-valign="top" -halign="center" | ||
+ | | | ||
+ | <code>[c,N]static void toc49 (fnptr ptr) { //The patterns expects a pointer to the C function | ||
+ | GTCX | ||
+ | EIF_REFERENCE Current; | ||
+ | EIF_INTEGER_32 result; | ||
+ | struct item *it; | ||
+ | EIF_INTEGER_32 arg1; | ||
+ | Current = opop()->it_ref; //The current is popped from the interpreter stack | ||
+ | arg1 = opop()->it_int32; //arg1 is popped from the interpreter stack | ||
+ | //The function is called with the arguments that are popped from the stack. | ||
+ | result = (FUNCTION_CAST(EIF_INTEGER_32, (EIF_REFERENCE, EIF_INTEGER_32)) ptr)(Current,arg1); | ||
+ | it = iget(); //The result of the function is pushed onto the interpreters stack | ||
+ | it->type = SK_INT32; | ||
+ | it->it_int32 = result; | ||
+ | }</code> | ||
+ | | | ||
+ | |} | ||
+ | |||
+ | The arguments for the feature call are popped from the interpreters stack. Then the C function for the feature is called before the result is pushed back onto the interpreters stack. | ||
− | + | This two patterns need to be available not for every feature in the system but for every signature type. They are generated into the file epattern.c. | |
− | + | ====Locating the right pattern==== | |
+ | In workbench mode every piece of translated code is referenced by its real_body_id. In the runtime the array mpatidtab serves as a function between real_body_id and pattern_id. There is a second array called pattern that makes the mapping between pattern_id and the two function pointers to the ''toi'' and ''toc'' patterns. | ||
− | + | So the right pattern ''toi'' could be resolved like: | |
− | + | ||
− | + | {|border="0" cellpadding="2" cellspacing="0" align="center" | |
+ | |-valign="top" -halign="center" | ||
+ | | | ||
+ | <code>[c,N] | ||
+ | pattern [mpatidtab [real_body_id]].toi | ||
+ | </code> | ||
+ | | | ||
+ | |} |
Latest revision as of 17:25, 14 June 2007
Contents
Introduction
There are two main criteria for any good compiler. Both the compilation and the compiled program need to be fast. There is a trade off, the more time a compiler takes to make optimizations, the faster the compiled program is and the slower the compilation process becomes.
Fortunately the need for fast compilation and fast compiled programs occurs at different times during the development cycle. When the developer is incrementally writing and testing a program he needs short compilation time to be productive. At the end, when his work is thoroughly tested and ready to ship there can be a slower compilation that generates a very fast delivery.
EiffelStudio's compiler, known from here on in as "Eiffel compiler", exploits this by providing two basic compilation modes: workbench and finalized mode. The C code generated by the compiler looks different for workbench and finalized mode. The terms workbench code and finalized code are used to refer to the corresponding generated C code.
Workbench code has the following properties:
- It is easily testable (by debugging).
- It compiles and recompiles very fast (due to melting ice technology).
- It supports precompiles.
Finalized code has only two advantages, footprint and speed. A small footprint contributes to speed due to better cache efficiency. Finalized code can only be debugged at the C level. This article focuses on the workbench mode of the Eiffel compiler.
Compiled versus interpreted
The Eiffel compiler was designed with the following principle in mind: when a programmer makes a small change he expects a short recompilation time, and when he makes a big change he will accept some waiting time.
Traditionally, the fastest programming environments (in terms of recompilation time) were interpreted languages (like VisualBasic). Unfortunately interpreted languages are not competitive performance wise.
The designers of the Eiffel compiler circumnavigated the trade off by taking the best of both approaches. Thus, in workbench mode the Eiffel compiler is half compiler and half interpreter. The technology behind this is called melting ice.
An Eiffel system compiled in workbench mode consists of both frozen and melted code. Frozen code is code that is translated to C code. Melted code is not yet translated to C code but to a form of byte code that has to be interpreted. This byte code is called Eiffel byte code (EBC).
This yields two requirements to the Eiffel runtime. It needs both to be able to execute the EBC and to handle calls from frozen code into melted code and vice versa.
The interpreter and the byte code
The EBC interpreter is a stack machine. The byte code itself is not very different from .NET byte code or Java Byte Code.
This article won't explain the byte codes. This is partially done in Byte Code. To give a general idea the Fibonacci feature and its EBC are shown:
fibonacci (n: INTEGER): INTEGER require n >= 0 do if n = 0 then Result := 0 elseif n = 1 then Result := 1 else Result := fibonacci (n - 1) + fibonacci (n - 2) end end |
1: BC_START Routine Id : 186 Body Id : 185 Result Type : [INTEGER_32] Nr. args : 1 Nr. locals : 0 18: BC_NO_CLONE_ARG Routine name : fibonacci Written : 9 32: BC_PRECOND offset 23 37: BC_HOOK 1 42: BC_ASSERT <8, 16> 45: BC_ARG 1 48: BC_INT32 0 53: BC_GE 54: BC_END_PRE offset 0 59: BC_RAISE_PREC 60: BC_HOOK 2 65: BC_ARG 1 68: BC_INT32 0 73: BC_EQ 74: BC_JMP_F 16 79: BC_HOOK 3 84: BC_INT32 0 89: BC_RASSIGN 90: BC_JMP 80 |
95: BC_HOOK 4 100: BC_ARG 1 103: BC_INT32 1 108: BC_EQ 109: BC_JMP_F 16 114: BC_HOOK 5 119: BC_INT32 1 124: BC_RASSIGN 125: BC_JMP 45 130: BC_HOOK 6 135: BC_ARG 1 138: BC_INT32 1 143: BC_MINUS 144: BC_CURRENT 145: BC_FEATURE fid 36 [9: HELLO_WORLD] 154: BC_ARG 1 157: BC_INT32 2 162: BC_MINUS 163: BC_CURRENT 164: BC_FEATURE fid 36 [9: HELLO_WORLD] 173: BC_PLUS 174: BC_RASSIGN 175: BC_HOOK 7 180: BC_NULL |
The byte code is not that difficult to understand. Most instructions have an effect on the interpreter stack. Instruction 124 for example pops one value from the stack and saves it to the result of the current feature.
Patterns to call the other side
Every feature call in workbench mode is a possible transition out of or into interpreted code. This transitions are handled by so called patterns (term comes from the idea of calling patterns).
Patterns that go out of interpreted code have the prefix toc (to C code) and the ones that go into interpreted code toi (to interpreted code).
The following snippet shows the toi pattern that could be used for the Fibonacci feature (or any other feature that expects an INTEGER_32 as argument and returns an INTEGER_32):
static EIF_INTEGER_32 toi49 (EIF_REFERENCE Current, EIF_INTEGER_32 arg1) { GTCX struct item *it; it = iget(); // Push arg1 on the interpreters stack it->type = SK_INT32; it->it_int32 = arg1; it = iget(); // Push Current on the interpreters stack it->type = SK_REF; it->it_ref = Current; xinterp(IC); // Call the interpreter it = opop(); // Pop the result from the interpreters stack return it->it_int32; // Return the result in the C way } |
Arguments that were passed to the C function are pushed onto the interpreters stack. After that the interpreter is called and finally the result is popped from the interpreter stack and returned to the caller.
What follows is the pattern that goes out of interpreted code:
static void toc49 (fnptr ptr) { //The patterns expects a pointer to the C function GTCX EIF_REFERENCE Current; EIF_INTEGER_32 result; struct item *it; EIF_INTEGER_32 arg1; Current = opop()->it_ref; //The current is popped from the interpreter stack arg1 = opop()->it_int32; //arg1 is popped from the interpreter stack //The function is called with the arguments that are popped from the stack. result = (FUNCTION_CAST(EIF_INTEGER_32, (EIF_REFERENCE, EIF_INTEGER_32)) ptr)(Current,arg1); it = iget(); //The result of the function is pushed onto the interpreters stack it->type = SK_INT32; it->it_int32 = result; } |
The arguments for the feature call are popped from the interpreters stack. Then the C function for the feature is called before the result is pushed back onto the interpreters stack.
This two patterns need to be available not for every feature in the system but for every signature type. They are generated into the file epattern.c.
Locating the right pattern
In workbench mode every piece of translated code is referenced by its real_body_id. In the runtime the array mpatidtab serves as a function between real_body_id and pattern_id. There is a second array called pattern that makes the mapping between pattern_id and the two function pointers to the toi and toc patterns.
So the right pattern toi could be resolved like:
pattern [mpatidtab [real_body_id]].toi |