Difference between revisions of "PEG Library"

(Internal DSL)
(Building a domain model)
Line 37: Line 37:
 
==Building a domain model==
 
==Building a domain model==
 
A domain model can be directly created while parsing most of the times and doesn't have to be derived from the AST. With this implementation it can be achieved by defining builder agents on the various parser fragments.
 
A domain model can be directly created while parsing most of the times and doesn't have to be derived from the AST. With this implementation it can be achieved by defining builder agents on the various parser fragments.
To show its workings we will look at an example grammar for a definition of a "table language":
+
To show its workings we will look at an example grammar for a definition of a "list language":
 
<code>
 
<code>
 
'(' identifier (',' identifier)* ')'
 
'(' identifier (',' identifier)* ')'
Line 45: Line 45:
  
 
<code>
 
<code>
open_parenthesis + identifier + (- (comma + identifier)) + close_parenthesis
+
list = open_parenthesis + identifier + (- (comma + identifier)) + close_parenthesis
 +
</code>
 +
 
 +
No we can define a feature on identifier which builds a list item and one on list which creates a list with the items:
 +
 
 +
<code>
 +
list.set_behaviour (agent build_list)
 +
identifier.set_behaviour (agent build_list_item)
 +
</code>
 +
The implementation of those are the following:
 +
<code>
 +
build_list (a_result: PEG_PARSER_RESULT): PEG_PARSER_RESULT
 +
-- Builds a value attribute
 +
local
 +
l_internal_result: LIST [ANY]
 +
do
 +
Result := a_result
 +
l_internal_result := Result.internal_result.
 +
from
 +
l_internal_result.start
 +
until
 +
l_internal_result.after
 +
loop
 +
 +
l_internal_result.forth
 +
end
 +
end
 
</code>
 
</code>

Revision as of 16:18, 11 August 2009

This page describes the Parsing Expression Library implementation for Eiffel. Information about PEGs can be found here [1].

Basic classes

All the parsers inherit from PEG_ABSTRACT_PEG which defines the common functionalities. The parsers are the same as in the definition of Wikipedia with the additional classes like whitespace support.

The parsers are combined to a object hierarchy which defines the grammar. A string can then be parsed via the the feature parser.parse_string ("Some source") on the root object.

Internal DSL

Objects can be combined via features, but the easier way is to use the defined operators. For instance if we want to define the simple grammar 'a' 'b' 'c'* we will simply write: a + b + (-c) Where a, b, c are already defined as character parsers parsing the right character (PEG_CHARACTER). The '+' operator concatenates the parsers to a sequence (PEG_SEQUENCE), weil the prefix operator '-' wraps c into a one or more parser (PEG_ONE_OR_MORE). All the operators are:

  • binary '+': Sequence concatenation
  • binary '|': Choice concatenation
  • prefix '+': wraps one or more
  • prefix '-': wraps zero or more


Additionally there is the operator '|+' which acts like the binary '+' operator. In contrast to it, it inserts an whitespace* parser between the two operands. As it is often needed it makes sense to define it as an operator.

Be aware of a common mistake in combination with the binary '|'/'+' operators. If you for instance define an identifier as:

identifier := a_to_z + (- (a_to_z + '_'))

If you go on and define two new parsers based on the latter one:

identifier2 := identifier |+ identifier
identifier3 := identifier |+ identifier |+ identifier

.. then you won't get the expected result. Since the + operator (as well as |+ and |) reuse the Sequence instance, identifier32 use the sequence of identifier and add this very same instance to it. identifier3 will then use that corrupted identifier object and hell breaks loose. To prevent this problem identifier has to be fixated:

identifier.fixate

Building a domain model

A domain model can be directly created while parsing most of the times and doesn't have to be derived from the AST. With this implementation it can be achieved by defining builder agents on the various parser fragments. To show its workings we will look at an example grammar for a definition of a "list language":

'(' identifier (',' identifier)* ')'

We assume that identifier is already defined. The corresponding parser in the parser syntax would be

list = open_parenthesis + identifier + (- (comma + identifier)) + close_parenthesis

No we can define a feature on identifier which builds a list item and one on list which creates a list with the items:

list.set_behaviour (agent build_list)
identifier.set_behaviour (agent build_list_item)

The implementation of those are the following:

build_list (a_result: PEG_PARSER_RESULT): PEG_PARSER_RESULT
		-- Builds a value attribute
	local
		l_internal_result: LIST [ANY]
	do
		Result := a_result
		l_internal_result := Result.internal_result.
		from
			l_internal_result.start
		until
			l_internal_result.after
		loop
 
			l_internal_result.forth
		end
	end