Difference between revisions of "Syntax checking/Parser"
(→eiffel.y) |
(added information on eiffel.y and some examples) |
||
Line 44: | Line 44: | ||
* Michi: 1495 - 2130 Instruction Call | * Michi: 1495 - 2130 Instruction Call | ||
* Martin: 2131 - end (done) | * Martin: 2131 - end (done) | ||
+ | |||
+ | === eiffel.l === | ||
+ | * Martin: added TE_BAD_ID (done) | ||
=== error classes === | === error classes === | ||
Line 55: | Line 58: | ||
* [[http://furius.ca/xxdiff/ xxdiff]] | * [[http://furius.ca/xxdiff/ xxdiff]] | ||
* [[http://www.gobosoft.com/eiffel/gobo/geyacc/ geyacc]] | * [[http://www.gobosoft.com/eiffel/gobo/geyacc/ geyacc]] | ||
+ | |||
+ | == Grammar definition file (eiffel.y) == | ||
+ | |||
+ | some general information: | ||
+ | * capitalized names like TE_STR_LT are tokens. Search for them in eiffel.l (lower case L): | ||
+ | \""<"\" { | ||
+ | ast_factory.set_buffer (token_buffer2, Current) | ||
+ | last_token := TE_STR_LT | ||
+ | } | ||
+ | So TE_STR_LT corresponds to '<' (\""<"\" is a regular expression) | ||
+ | * all other names are non-terminals, so you can find them in eiffel.y. If you don't know what a non-terminal means, you can always look it up in eiffel.y or ask the person responsible for that part of the file. | ||
+ | |||
+ | === changes === | ||
+ | There are several types of changes we can do to the eiffel.y file while merging Paul's eiffel.y and the current eiffel.y: | ||
+ | |||
+ | ==== renaming ==== | ||
+ | This doesn't really change the functionality and shouldn't really be a problem. | ||
+ | |||
+ | Example: Paul renamed infix_operator to infix_string (for whatever reasons). | ||
+ | |||
+ | ==== new non-terminals ==== | ||
+ | new Non-terminals are introduced because they simplify an existing rule or simplify error handling. | ||
+ | |||
+ | Example: | ||
+ | |||
+ | current version: | ||
+ | Default_manifest_string: | ||
+ | Non_empty_string | ||
+ | { $$ := $1 } | ||
+ | | TE_EMPTY_STRING | ||
+ | { | ||
+ | $$ := ast_factory.new_string_as ("", line, column, string_position, position + text_count - string_position, token_buffer2) | ||
+ | } | ||
+ | | TE_EMPTY_VERBATIM_STRING | ||
+ | { | ||
+ | $$ := ast_factory.new_verbatim_string_as ("", verbatim_marker.substring (2, verbatim_marker.count), not has_old_verbatim_strings and then verbatim_marker.item (1) = ']', line, column, string_position, position + text_count - string_position, token_buffer2) | ||
+ | } | ||
+ | ; | ||
+ | |||
+ | changed to: | ||
+ | Default_manifest_string: | ||
+ | Non_empty_string | ||
+ | { $$ := $1 } | ||
+ | | Empty_string | ||
+ | { $$ := $1 } | ||
+ | ; | ||
+ | |||
+ | Empty_string: | ||
+ | TE_EMPTY_STRING | ||
+ | { $$ := ast_factory.new_string_as ("", line, column, string_position, position + text_count - string_position, token_buffer2) } | ||
+ | | TE_EMPTY_VERBATIM_STRING | ||
+ | { $$ := ast_factory.new_verbatim_string_as ("", verbatim_marker.substring (2, verbatim_marker.count), not has_old_verbatim_strings and then verbatim_marker.item (1) = ']', line, column, string_position, position + text_count - string_position, token_buffer2) } | ||
+ | ; | ||
+ | |||
+ | ==== new rules for error handling ==== | ||
+ | Rules added to non-terminals to do error handling | ||
+ | |||
+ | Example: | ||
+ | |||
+ | current Obsolete non-terminal: | ||
+ | |||
+ | Obsolete: -- Empty | ||
+ | -- { $$ := Void } | ||
+ | | TE_OBSOLETE Manifest_string | ||
+ | { | ||
+ | $$ := ast_factory.new_keyword_string_pair ($1, $2) | ||
+ | } | ||
+ | ; | ||
+ | |||
+ | Paul's Obsolete non-terminal: | ||
+ | |||
+ | Obsolete: -- Empty | ||
+ | -- { $$ := Void } | ||
+ | | TE_OBSOLETE Manifest_string | ||
+ | { | ||
+ | $$ := ast_factory.new_keyword_string_pair ($1, $2) | ||
+ | } | ||
+ | | TE_OBSOLETE error { report_expected_after_error (parser_errors.obsolete_keyword, $1, parser_errors.obsolete_string, False) } | ||
+ | ; | ||
+ | |||
+ | ==== changed error handling ==== | ||
+ | Already existing error handling in eiffel.y is usually longer than Paul's error handling. That's mainly because he put that code into features. | ||
+ | |||
+ | Example: | ||
+ | |||
+ | current version: | ||
+ | |||
+ | Inheritance: -- Empty | ||
+ | -- { $$ := Void } | ||
+ | | TE_INHERIT ASemi | ||
+ | { | ||
+ | if has_syntax_warning then | ||
+ | Error_handler.insert_warning ( | ||
+ | create {SYNTAX_WARNING}.make (line, column, filename, | ||
+ | "Use `inherit ANY' or do not specify an empty inherit clause")) | ||
+ | end | ||
+ | --- $$ := Void | ||
+ | $$ := ast_factory.new_eiffel_list_parent_as (0) | ||
+ | if $$ /= Void then | ||
+ | $$.set_inherit_keyword ($1) | ||
+ | end | ||
+ | } | ||
+ | [...] | ||
+ | ; | ||
+ | |||
+ | Paul's version: | ||
+ | |||
+ | Inheritance: -- Empty | ||
+ | -- { $$ := Void } | ||
+ | | TE_INHERIT ASemi | ||
+ | { | ||
+ | report_warning (parser_errors.empty_inherit_clause_warning, Void) | ||
+ | $$ := ast_factory.new_eiffel_list_parent_as (0) | ||
+ | if $$ /= Void then | ||
+ | $$.set_inherit_keyword ($1) | ||
+ | end | ||
+ | } | ||
+ | [...] | ||
+ | ; |
Revision as of 04:27, 12 June 2006
Contents
Important Classes/Files
eiffel.y
- Eiffel grammar description.
- use geyacc to generate eiffel_parser.e from this file
EIFFEL_PARSER
- inherits from EIFFEL_PARSER_SKELETON (where the features parse, parse_string, make_with_factory are implemented)
- make_with_factory (a_factory: AST_FACTORY): give argument of type AST_NULL_FACTORY (inherits from AST_FACTORY)
- AST_NULL_FACTORY doesn't build an AST (AST_FACTORY does, the AST is in EIFFEL_PARSER.root_node after parsing)
- parse (a_file: KL_BINARY_INPUT_FILE) and parse_from_string (a_string: STRING).
CLASS_AS
- AST of a class
ERROR
- deferred; superclass of all error types like EIFFEL_ERROR or SYNTAX_ERROR
- features line, column: INTEGER give location of error
ERROR_HANDLER
- feature error_list: ERROR is a list of errors found by the parser
SHARED_ERROR_HANDLER
- singleton used by all relevant classes
EIFFEL_CLASS_C
- features build_ast and parse_ast show how the parser can be used.
Implementation
- based on Paul's code
ERROR classes
- Create new SYNTAX_ERROR classes that correspond to Paul's classes, but fit into the current hierarchy
- store start and end position of the error
extend parser to generate the right ERRORs
- Integrate Paul's changes to eiffel.l and eiffel.y into the current versions.
- add facilities from Paul's EIFFEL_PARSER_ERROR_REPORTER
- in existing class like EIFFEL_PARSER_SKELETON (EP_ERROR_REPORTER only inherits SHARED_ERROR_HANDLER and so does EP_SKELETON
- in new class
Work distribution
eiffel.y
- Ueli: 0 - 844: Parent_List
- Marko: 845 - 1494 Formal Generics
- Michi: 1495 - 2130 Instruction Call
- Martin: 2131 - end (done)
eiffel.l
- Martin: added TE_BAD_ID (done)
error classes
- Chrigu
classes with errors to check if parser works
- nobody yet
Tools
Grammar definition file (eiffel.y)
some general information:
- capitalized names like TE_STR_LT are tokens. Search for them in eiffel.l (lower case L):
\""<"\" { ast_factory.set_buffer (token_buffer2, Current) last_token := TE_STR_LT }
So TE_STR_LT corresponds to '<' (\""<"\" is a regular expression)
- all other names are non-terminals, so you can find them in eiffel.y. If you don't know what a non-terminal means, you can always look it up in eiffel.y or ask the person responsible for that part of the file.
changes
There are several types of changes we can do to the eiffel.y file while merging Paul's eiffel.y and the current eiffel.y:
renaming
This doesn't really change the functionality and shouldn't really be a problem.
Example: Paul renamed infix_operator to infix_string (for whatever reasons).
new non-terminals
new Non-terminals are introduced because they simplify an existing rule or simplify error handling.
Example:
current version:
Default_manifest_string: Non_empty_string { $$ := $1 } | TE_EMPTY_STRING { $$ := ast_factory.new_string_as ("", line, column, string_position, position + text_count - string_position, token_buffer2) } | TE_EMPTY_VERBATIM_STRING { $$ := ast_factory.new_verbatim_string_as ("", verbatim_marker.substring (2, verbatim_marker.count), not has_old_verbatim_strings and then verbatim_marker.item (1) = ']', line, column, string_position, position + text_count - string_position, token_buffer2) } ;
changed to:
Default_manifest_string: Non_empty_string { $$ := $1 } | Empty_string { $$ := $1 } ; Empty_string: TE_EMPTY_STRING { $$ := ast_factory.new_string_as ("", line, column, string_position, position + text_count - string_position, token_buffer2) } | TE_EMPTY_VERBATIM_STRING { $$ := ast_factory.new_verbatim_string_as ("", verbatim_marker.substring (2, verbatim_marker.count), not has_old_verbatim_strings and then verbatim_marker.item (1) = ']', line, column, string_position, position + text_count - string_position, token_buffer2) } ;
new rules for error handling
Rules added to non-terminals to do error handling
Example:
current Obsolete non-terminal:
Obsolete: -- Empty -- { $$ := Void } | TE_OBSOLETE Manifest_string { $$ := ast_factory.new_keyword_string_pair ($1, $2) } ;
Paul's Obsolete non-terminal:
Obsolete: -- Empty -- { $$ := Void } | TE_OBSOLETE Manifest_string { $$ := ast_factory.new_keyword_string_pair ($1, $2) } | TE_OBSOLETE error { report_expected_after_error (parser_errors.obsolete_keyword, $1, parser_errors.obsolete_string, False) } ;
changed error handling
Already existing error handling in eiffel.y is usually longer than Paul's error handling. That's mainly because he put that code into features.
Example:
current version:
Inheritance: -- Empty -- { $$ := Void } | TE_INHERIT ASemi { if has_syntax_warning then Error_handler.insert_warning ( create {SYNTAX_WARNING}.make (line, column, filename, "Use `inherit ANY' or do not specify an empty inherit clause")) end --- $$ := Void $$ := ast_factory.new_eiffel_list_parent_as (0) if $$ /= Void then $$.set_inherit_keyword ($1) end } [...] ;
Paul's version:
Inheritance: -- Empty -- { $$ := Void } | TE_INHERIT ASemi { report_warning (parser_errors.empty_inherit_clause_warning, Void) $$ := ast_factory.new_eiffel_list_parent_as (0) if $$ /= Void then $$.set_inherit_keyword ($1) end } [...] ;