Reducing Eifgen Size


Objective

To reduce the size of EIFGENs to 10% of their levels as of version 5.7.62110

Motivation for reducing the size of EIFGENs

I recently benchmarked a "Hello World" project at taking over 40 Mb of storage (with both workbench and finalized version compiled). Many of my production applications consume hundreds of megabytes, or even more than 1 Gb. This high level of consumption has several negative effects. It puts off potential customers when they see the high costs of storage and time involved in using the compiler, and it inhibits automated compilation and testing of a large number of Eiffel applications.

Reducing the size of Eifgens could have numerous benefits. Significantly reducing the size could be reflected by a corresponding improvement in performance. It would make the compiler more competitive and more attractive to cost-conscious potential customers.

Manus told me that recently the size of the EIFGEN that's generated when compiling the compiler has gone from 1.8 Gb to 1 Gb. This is good, but I would like to set the goal of reducing EIFGENs by another 90%.

Technical Overview

In a recent posting on comp.lang.eiffel, Manus explained the bulky content of the EIFGEN directory as follows:

  1. The COMP directory contains
    1. one representation of the Abstract Syntax Tree of the source code in a format that's larger than the original source.
    2. a modifed second copy for code generation.
    3. A description of the classes in a compact format
    4. Dependencies for incremental recompilation
  2. The F_CODE directory contains
    1. C code for the finalized mode
    2. The C compiler output (Object files, static libraries, shared libraries, assemblies and executables.)
  3. The W_CODE directory contains
    1. C code for the Workbench mode
    2. The C compiler output (Object files, static libraries, shared libraries, assemblies and executables.
    3. Debug files from the C compiler (at least for MSC)

I also found a PARTIALS directory, but it was empty.

Right away, several ideas occur to me for reducing the size of the EIFGEN. First, there is almost certainly a HUGE amount of replication from project to project. Unless someone is radically modifying ANY, the majority of the code generated by one project is going to be identical to another project that's referencing the same libraries (or universe). With the current implementation, applications pay an enormous penalty for having to recompile all their libraries. A partial solution already exists with precompiled libraries, but historically these have been a pain to set up, and in my experience they can be difficult to port from one machine to another (especially for Eiffel .Net). There's also a major drawback to precompiles: your project can only reference one. So either you can reference a relatively small commmon subset, or you can reference a larger set that tries to engulf everything outside of your application. Neither solution is optimal. A better approach would allow precompiles to work more like traditional libraries, and for applications to be able to reference multiple precompiled libraries.

Second, if I understand what's going on there can be five separate representations for each class in the EIFGEN (three representations in the COMP directory and a C representation in both the F_CODE and W_CODE directories). I'm not counting the additional representations that are the C compiler's binary output. This sounds excessive on the surface of it, and there may be opportunities for combining some of these representations. For example, what if only one C code representation was generated, and C macros or ifdefs were used to control which type of binary was produced? Or could a single version of the AST serve all purposes?