Difference between revisions of "Compiler and encoding"

(Internals)
(Internals)
Line 4: Line 4:
  
 
== Internals ==
 
== Internals ==
Abstracted syntax tree now stores STRING_8 as UTF-8 data on each node. There are also different features exporting UTF-8, UTF-32 or the written bytes. Here is an example of how a character é is represented in various levels.
+
Abstracted syntax tree now stores STRING_8 as UTF-8 data on each node. There are also different features exporting UTF-8, UTF-32 or the written bytes.  
  
 +
Here is an example of how a character é is represented at various levels.
 
<div class="WordSection1" style="layout-grid: 15.6pt">
 
<div class="WordSection1" style="layout-grid: 15.6pt">
  

Revision as of 04:04, 30 May 2012


From 6.7, the compiler has been equiped with a Unicode parser. The core of the parser only accepts UTF-8 source code, for simplicity and generality. Before source code is passed into the core process of parsing, it is preprocessed and converted into UTF-8.

Internals

Abstracted syntax tree now stores STRING_8 as UTF-8 data on each node. There are also different features exporting UTF-8, UTF-32 or the written bytes.

Here is an example of how a character é is represented at various levels.

Source encoding

UTF-8 (BOM)

ISO-8859-1

Bytes in source

0xC3A9

0xE9

1. {STRING_AS}.value

0xC3A9

0xC3A9

2. {STRING_AS}.binary_value

0xC3A9

0xE9

3. {STRING_AS}.value_32

0xE9

0xE9

4. {STRING_AS}.string_value_32

0xE9

0xE9

5. Runtime

0xC3A9 (STRING_8 Rejected)

0xE9 (STRING_32)

0xE9 (STRING_8)

0xE9 (STRING_32)