Difference between revisions of "Code Generation Optimization Ideas"

m (Added sections describing loop optimizations)
m (Loop specialization: Cosmetics)
 
(One intermediate revision by the same user not shown)
Line 1: Line 1:
[[Category:Code Generation, Optimization]]
+
[[Category:Code Generation]]
 +
[[Category:Optimization]]
 
Various ideas about potential improvements to code generation. Once an idea is implemented and  is showing a real improvement it should be removed from this page and a new page should describe the mechanism in details with the available results.
 
Various ideas about potential improvements to code generation. Once an idea is implemented and  is showing a real improvement it should be removed from this page and a new page should describe the mechanism in details with the available results.
  
Line 153: Line 154:
 
before the loop we can compute the type of <e>a</e> and then generate 3 different loops: one if the dynamic type is <e>ARRAY</e>, one if the dynamic type is <e>ARRAYED_LIST</e> and one if the dynamic type is anything else. In that case access to items of <e>ARRAY</e> and <e>ARRAYED_LIST</e> is performed at full speed without any dynamic feature calls, yet it also works for descendants, making the code flexible.
 
before the loop we can compute the type of <e>a</e> and then generate 3 different loops: one if the dynamic type is <e>ARRAY</e>, one if the dynamic type is <e>ARRAYED_LIST</e> and one if the dynamic type is anything else. In that case access to items of <e>ARRAY</e> and <e>ARRAYED_LIST</e> is performed at full speed without any dynamic feature calls, yet it also works for descendants, making the code flexible.
  
This optimization becomes even more important for iterator-based form of a loop as there are usually several feature calls for every iteration. Such feature calls can be inlined and become subject to [[#Invariant code motion]].
+
This optimization becomes even more important for iterator-based form of a loop as there are usually several feature calls for every iteration. Such feature calls can be inlined and become subject to [[#Invariant code motion|Invariant code motion]].

Latest revision as of 06:41, 27 April 2010

Various ideas about potential improvements to code generation. Once an idea is implemented and is showing a real improvement it should be removed from this page and a new page should describe the mechanism in details with the available results.

Reducing GC hooks

Code generation should only rely on GC hooks when really needed. For example, currently the code:

f (s: STRING)
	local
		i: INTEGER
	do
		i := s.count
		g (s, i)
	end

would generate a hook for the Current object and s, but has the flow shows there is no need for it.

A more complicated case is the following one:

f (s: STRING)
	local
		i: INTEGER
	do
		i := s.count
		h (s, i)
		g (s, i)
	end

If the call to h is guaranteed to not generate any new objects, then it should be safe to not generate the hooks.

Speeding up arrays/strings

One of the issue with ARRAY and STRING is that they are resizable and thus causes an indirection to access the memory area holding the values. In addition, ARRAY have a lower index (an attribute) and STRING a constant lower index of 1.

Getting rid of `lower' in ARRAY

So when you have array.item (i) it is actually array.area.item (i - array.lower). As you can see it is two attribute access via ARRAY. So the idea is to get rid of this by having an additional hidden field in ARRAY called `area0'. This field contains the address of `area' minus or plus the offset to `lower'. With this in mind, the above array access is just array.area0.item (i).

This means that ARRAY and descendants have to be known by the GC so that when the area of the array is changed, the address of `area0' is also changed to its new value.

Removing the `area' access altogether

When accessing an of a STRING or ARRAY, we still need to access `area0'. If the object is used as a local, we could also record the object along with its `area0'. That is to say the following code:

local
	a: ARRAY [INTEGER]
do
	...
	i := a.item (j)
	...
end

would simply be translated in pseudo C:

local
	a: ARRAY [INTEGER]
	area: SPECIAL [INTEGER]
do
	area := a.area0
	...
	i := area.item (j)
	...
end

Thus it is the same speed as today if instead of using an ARRAY you were using a SPECIAL object.

Tricky part

The tricky part of those optimizations are when resizing the ARRAY or STRING, we need to update the `area0' field of the object as well as the one recorded on the stack. I think it can only be done if the resizing is only done via some built-in routine which it is not currently. And also preventing in descendants of ARRAY or STRING the update of the `area' field.

Simplifying the stack management

Currently all the hooks are generated one by one for the Current objects, the arguments, and the locals. The idea is to put all of those in one array/structure allocated on the stack. Then only one GC hook will be generated per routine needing the hooks. The other benefit is that because we allocate only one item at the time, we can simplify the code for handling this stack.

Drawback is that each argument and the Current object needs to be copied on routine entry. So for the Eiffel code:

f (s: STRING)
	local
		i: INTEGER
		l_s: STRING
	do
		i := s.count
		l_s := s.twin
		h (s, i)
		g (s, i)
	end

It looks like:

void f (EIF_REFERENCE Current, EIF_REFERENCE s) {
	EIF_INTEGER i;
	EIF_REFERENCE locs [3];
	locs [0] = Current;
	locs [1] = s;
	locs [2] = NULL;
	protect (locs, 3);
 
	i = locs [1].count;
	locs [2] = ANY_twin (locs [1]);
	h (locs [0], locs [1], i);
	g (locs [0], locs [1], i);
}

A solution to this is that all reference arguments are done via the locs array. Which would look like:

void f (EIF_REFERENCE args [2]) {
	EIF_INTEGER i;
	EIF_REFERENCE locs [3];
	locs  [0] = NULL;
	locs  [1] = NULL;
	locs  [2] = NULL;
	protect (locs , 3);
 
	i = args [1].count;
	locs [0] = ANY_twin (args [1]);
	locs [1] = args [0];
	locs [2] = args [1];
	h (&locs [1], i);
	g (&locs [1], i);
}

That is to say the last part of the allocated array on the stack is used for the argument passing. The above could even be simplified to just:

void f (EIF_REFERENCE args [2]) {
	EIF_INTEGER i;
	EIF_REFERENCE locs [1];
	locs  [0] = NULL;
	protect (locs, 1);
 
	i = args [1].count;
	locs [0] = ANY_twin (args [1]);
	h (args, i);
	g (args, i);
}

Invariant code motion

This technique is widely used in production compilers to move invariant parts of code outside of the loops. Since important high-level information about Eiffel code is lost during code generation it makes sense to perform this kind of optimization at code generation time. Presence of the moving GC and dynamic nature of objects makes it difficult to track computations that may be performed before or after the loop without affecting the results, but the clear candidates for that are:

  • types of read-only entities
  • addresses of features of read-only entities
  • offsets of attributes of read-only entities

They can be computed before the loop and used afterwards without the need to recompute them at every iteration.

Loop specialization

As soon as a loop is applied to a specific data structure the corresponding dynamic feature calls can be replaced by static ones. For example, if we have a loop

from
	i := 1
until
	i > a.count
loop
	do_something_with (a [i])
	i := i + 1
end

before the loop we can compute the type of a and then generate 3 different loops: one if the dynamic type is ARRAY, one if the dynamic type is ARRAYED_LIST and one if the dynamic type is anything else. In that case access to items of ARRAY and ARRAYED_LIST is performed at full speed without any dynamic feature calls, yet it also works for descendants, making the code flexible.

This optimization becomes even more important for iterator-based form of a loop as there are usually several feature calls for every iteration. Such feature calls can be inlined and become subject to Invariant code motion.