Difference between revisions of "Code Generation Optimization Ideas"
m |
m (→Loop specialization: Cosmetics) |
||
(8 intermediate revisions by 2 users not shown) | |||
Line 1: | Line 1: | ||
[[Category:Code Generation]] | [[Category:Code Generation]] | ||
+ | [[Category:Optimization]] | ||
Various ideas about potential improvements to code generation. Once an idea is implemented and is showing a real improvement it should be removed from this page and a new page should describe the mechanism in details with the available results. | Various ideas about potential improvements to code generation. Once an idea is implemented and is showing a real improvement it should be removed from this page and a new page should describe the mechanism in details with the available results. | ||
Line 26: | Line 27: | ||
</e> | </e> | ||
If the call to <e>h</e> is guaranteed to not generate any new objects, then it should be safe to not generate the hooks. | If the call to <e>h</e> is guaranteed to not generate any new objects, then it should be safe to not generate the hooks. | ||
+ | |||
+ | ==Speeding up arrays/strings== | ||
+ | One of the issue with ARRAY and STRING is that they are resizable and thus causes an indirection to access the memory area holding the values. In addition, ARRAY have a lower index (an attribute) and STRING a constant lower index of 1. | ||
+ | |||
+ | ===Getting rid of `lower' in ARRAY=== | ||
+ | So when you have <e>array.item (i)</e> it is actually <e>array.area.item (i - array.lower)</e>. As you can see it is two attribute access via ARRAY. So the idea is to get rid of this by having an additional hidden field in ARRAY called `area0'. This field contains the address of `area' minus or plus the offset to `lower'. With this in mind, the above array access is just <e>array.area0.item (i)</e>. | ||
+ | |||
+ | This means that ARRAY and descendants have to be known by the GC so that when the area of the array is changed, the address of `area0' is also changed to its new value. | ||
+ | |||
+ | ===Removing the `area' access altogether=== | ||
+ | When accessing an of a STRING or ARRAY, we still need to access `area0'. If the object is used as a local, we could also record the object along with its `area0'. That is to say the following code: | ||
+ | |||
+ | <e> | ||
+ | local | ||
+ | a: ARRAY [INTEGER] | ||
+ | do | ||
+ | ... | ||
+ | i := a.item (j) | ||
+ | ... | ||
+ | end | ||
+ | </e> | ||
+ | |||
+ | would simply be translated in pseudo C: | ||
+ | <e> | ||
+ | local | ||
+ | a: ARRAY [INTEGER] | ||
+ | area: SPECIAL [INTEGER] | ||
+ | do | ||
+ | area := a.area0 | ||
+ | ... | ||
+ | i := area.item (j) | ||
+ | ... | ||
+ | end | ||
+ | </e> | ||
+ | |||
+ | Thus it is the same speed as today if instead of using an ARRAY you were using a SPECIAL object. | ||
+ | |||
+ | ===Tricky part=== | ||
+ | The tricky part of those optimizations are when resizing the ARRAY or STRING, we need to update the `area0' field of the object as well as the one recorded on the stack. I think it can only be done if the resizing is only done via some built-in routine which it is not currently. And also preventing in descendants of ARRAY or STRING the update of the `area' field. | ||
==Simplifying the stack management== | ==Simplifying the stack management== | ||
Line 51: | Line 91: | ||
locs [1] = s; | locs [1] = s; | ||
locs [2] = NULL; | locs [2] = NULL; | ||
− | protect (locs ); | + | protect (locs, 3); |
i = locs [1].count; | i = locs [1].count; | ||
Line 68: | Line 108: | ||
locs [1] = NULL; | locs [1] = NULL; | ||
locs [2] = NULL; | locs [2] = NULL; | ||
− | protect (locs ); | + | protect (locs , 3); |
i = args [1].count; | i = args [1].count; | ||
Line 78: | Line 118: | ||
} | } | ||
</c> | </c> | ||
− | That is to say | + | That is to say the last part of the allocated array on the stack is used for the argument passing. The above could even be simplified to just: |
<c> | <c> | ||
void f (EIF_REFERENCE args [2]) { | void f (EIF_REFERENCE args [2]) { | ||
Line 84: | Line 124: | ||
EIF_REFERENCE locs [1]; | EIF_REFERENCE locs [1]; | ||
locs [0] = NULL; | locs [0] = NULL; | ||
− | protect (locs ); | + | protect (locs, 1); |
i = args [1].count; | i = args [1].count; | ||
Line 92: | Line 132: | ||
} | } | ||
</c> | </c> | ||
+ | |||
+ | ==Invariant code motion== | ||
+ | This technique is widely used in production compilers to move invariant parts of code outside of the loops. Since important high-level information about Eiffel code is lost during code generation it makes sense to perform this kind of optimization at code generation time. Presence of the moving GC and dynamic nature of objects makes it difficult to track computations that may be performed before or after the loop without affecting the results, but the clear candidates for that are: | ||
+ | * types of read-only entities | ||
+ | * addresses of features of read-only entities | ||
+ | * offsets of attributes of read-only entities | ||
+ | They can be computed before the loop and used afterwards without the need to recompute them at every iteration. | ||
+ | |||
+ | ==Loop specialization== | ||
+ | As soon as a loop is applied to a specific data structure the corresponding dynamic feature calls can be replaced by static ones. For example, if we have a loop | ||
+ | <e> | ||
+ | from | ||
+ | i := 1 | ||
+ | until | ||
+ | i > a.count | ||
+ | loop | ||
+ | do_something_with (a [i]) | ||
+ | i := i + 1 | ||
+ | end | ||
+ | </e> | ||
+ | before the loop we can compute the type of <e>a</e> and then generate 3 different loops: one if the dynamic type is <e>ARRAY</e>, one if the dynamic type is <e>ARRAYED_LIST</e> and one if the dynamic type is anything else. In that case access to items of <e>ARRAY</e> and <e>ARRAYED_LIST</e> is performed at full speed without any dynamic feature calls, yet it also works for descendants, making the code flexible. | ||
+ | |||
+ | This optimization becomes even more important for iterator-based form of a loop as there are usually several feature calls for every iteration. Such feature calls can be inlined and become subject to [[#Invariant code motion|Invariant code motion]]. |
Latest revision as of 05:41, 27 April 2010
Various ideas about potential improvements to code generation. Once an idea is implemented and is showing a real improvement it should be removed from this page and a new page should describe the mechanism in details with the available results.
Contents
Reducing GC hooks
Code generation should only rely on GC hooks when really needed. For example, currently the code:
f (s: STRING) local i: INTEGER do i := s.count g (s, i) end
would generate a hook for the Current
object and s
, but has the flow shows there is no need for it.
A more complicated case is the following one:
f (s: STRING) local i: INTEGER do i := s.count h (s, i) g (s, i) end
If the call to h
is guaranteed to not generate any new objects, then it should be safe to not generate the hooks.
Speeding up arrays/strings
One of the issue with ARRAY and STRING is that they are resizable and thus causes an indirection to access the memory area holding the values. In addition, ARRAY have a lower index (an attribute) and STRING a constant lower index of 1.
Getting rid of `lower' in ARRAY
So when you have array.item (i)
it is actually array.area.item (i - array.lower)
. As you can see it is two attribute access via ARRAY. So the idea is to get rid of this by having an additional hidden field in ARRAY called `area0'. This field contains the address of `area' minus or plus the offset to `lower'. With this in mind, the above array access is just array.area0.item (i)
.
This means that ARRAY and descendants have to be known by the GC so that when the area of the array is changed, the address of `area0' is also changed to its new value.
Removing the `area' access altogether
When accessing an of a STRING or ARRAY, we still need to access `area0'. If the object is used as a local, we could also record the object along with its `area0'. That is to say the following code:
local a: ARRAY [INTEGER] do ... i := a.item (j) ... end
would simply be translated in pseudo C:
local a: ARRAY [INTEGER] area: SPECIAL [INTEGER] do area := a.area0 ... i := area.item (j) ... end
Thus it is the same speed as today if instead of using an ARRAY you were using a SPECIAL object.
Tricky part
The tricky part of those optimizations are when resizing the ARRAY or STRING, we need to update the `area0' field of the object as well as the one recorded on the stack. I think it can only be done if the resizing is only done via some built-in routine which it is not currently. And also preventing in descendants of ARRAY or STRING the update of the `area' field.
Simplifying the stack management
Currently all the hooks are generated one by one for the Current objects, the arguments, and the locals. The idea is to put all of those in one array/structure allocated on the stack. Then only one GC hook will be generated per routine needing the hooks. The other benefit is that because we allocate only one item at the time, we can simplify the code for handling this stack.
Drawback is that each argument and the Current object needs to be copied on routine entry. So for the Eiffel code:
f (s: STRING) local i: INTEGER l_s: STRING do i := s.count l_s := s.twin h (s, i) g (s, i) end
It looks like:
void f (EIF_REFERENCE Current, EIF_REFERENCE s) { EIF_INTEGER i; EIF_REFERENCE locs [3]; locs [0] = Current; locs [1] = s; locs [2] = NULL; protect (locs, 3); i = locs [1].count; locs [2] = ANY_twin (locs [1]); h (locs [0], locs [1], i); g (locs [0], locs [1], i); }
A solution to this is that all reference arguments are done via the locs array. Which would look like:
void f (EIF_REFERENCE args [2]) { EIF_INTEGER i; EIF_REFERENCE locs [3]; locs [0] = NULL; locs [1] = NULL; locs [2] = NULL; protect (locs , 3); i = args [1].count; locs [0] = ANY_twin (args [1]); locs [1] = args [0]; locs [2] = args [1]; h (&locs [1], i); g (&locs [1], i); }
That is to say the last part of the allocated array on the stack is used for the argument passing. The above could even be simplified to just:
void f (EIF_REFERENCE args [2]) { EIF_INTEGER i; EIF_REFERENCE locs [1]; locs [0] = NULL; protect (locs, 1); i = args [1].count; locs [0] = ANY_twin (args [1]); h (args, i); g (args, i); }
Invariant code motion
This technique is widely used in production compilers to move invariant parts of code outside of the loops. Since important high-level information about Eiffel code is lost during code generation it makes sense to perform this kind of optimization at code generation time. Presence of the moving GC and dynamic nature of objects makes it difficult to track computations that may be performed before or after the loop without affecting the results, but the clear candidates for that are:
- types of read-only entities
- addresses of features of read-only entities
- offsets of attributes of read-only entities
They can be computed before the loop and used afterwards without the need to recompute them at every iteration.
Loop specialization
As soon as a loop is applied to a specific data structure the corresponding dynamic feature calls can be replaced by static ones. For example, if we have a loop
from i := 1 until i > a.count loop do_something_with (a [i]) i := i + 1 end
before the loop we can compute the type of a
and then generate 3 different loops: one if the dynamic type is ARRAY
, one if the dynamic type is ARRAYED_LIST
and one if the dynamic type is anything else. In that case access to items of ARRAY
and ARRAYED_LIST
is performed at full speed without any dynamic feature calls, yet it also works for descendants, making the code flexible.
This optimization becomes even more important for iterator-based form of a loop as there are usually several feature calls for every iteration. Such feature calls can be inlined and become subject to Invariant code motion.