Compiler and encoding

Revision as of 20:19, 30 May 2012 by Ted (Talk | contribs) (→‎Validility)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

From 6.7, the compiler has been equiped with a Unicode parser. The core of the parser only accepts UTF-8 source code, for simplicity and generality. Before source code is passed into the core process of parsing, it is preprocessed and converted into UTF-8.

Internals

Data Storage

Abstracted syntax tree now stores STRING_8 as UTF-8 data on each node. There are also different features exporting UTF-8, UTF-32 or the written bytes.

Here is an example of how a character é is represented at various levels.

Source encoding	UTF-8 (BOM)	ISO-8859-1
Bytes in source	0xC3A9	0xE9
1. {STRING_AS}.value	0xC3A9	0xC3A9
2. {STRING_AS}.binary_value	0xC3A9	0xE9
3. {STRING_AS}.value_32	0xE9	0xE9
4. {STRING_AS}.string_value_32	0xE9	0xE9
5. Runtime	0xC3A9 (STRING_8) 0xE9 (STRING_32)	0xE9 (STRING_8) 0xE9 (STRING_32)

Validility

Source code encoding is either explicitly or implicitly specified.

Explicit
- File level: UTF-8 (BOM) is implemented
- Class level: note clause (not implemented)
- Configure file: .ecf (not implemented)
Implicit
- Implicit encoding is taken as ISO-8859-1 for compatibility, if no source code encoding is specified.

The following table shows how manifest strings are validated by the compiler:

	Explicit Encoding	Implicit Encoding (ISO-8859-1)
STRING_8 manifest	Unicode point (0-255)?	Valid (taken as bytes)
STRING_32 manifest	Valid	Valid

Retrieved from "https://dev.eiffel.com/index.php?title=Compiler_and_encoding&oldid=14476"

Category:

Compiler

Compiler and encoding

Internals

Data Storage

Validility

Navigation

Development

Wiki

Search

Tools