Talk:Design of Roundtrip Parser

Optimization

BREAK_AS can be significantly optimized if the source text is not kept inside the node, but outside it. The following approaches are possible:

  1. Locked source code. The source code is saved in a secure place that can be accessed whenever needed. The pair position-count can be used to get the text of the node. BREAK_AS has no reference fields.
  2. Dictionary. All character sequences are stored in a central dictionary and BREAK_AS keeps an index of the corresponding sequence. BREAK_AS for the same character sequences share the same dictionary item. Text of the node corresponds to the text of an item in the dictionary. BREAK_AS has no reference fields. Dictionary can be organized in a way without reference fields as well.

It's even possible to avoid creating BREAK_AS altogether. It's sufficient to have a match list. If there are gaps in the indexes between two adjacent leaves in the AST, it means the missing items correspond to the BREAK_AS elements that can be found in match list. Any information that is written in BREAK_AS can be computed on the fly.