Difference between revisions of "Compiler and encoding"
|  (→Internals) | |||
| Line 5: | Line 5: | ||
| == Internals == | == Internals == | ||
| Abstracted syntax tree now stores STRING_8 as UTF-8 data on each node. There are also different features exporting UTF-8, UTF-32 or the written bytes. Here is an example of how a character é is represented in various levels. | Abstracted syntax tree now stores STRING_8 as UTF-8 data on each node. There are also different features exporting UTF-8, UTF-32 or the written bytes. Here is an example of how a character é is represented in various levels. | ||
| + | |||
| + | <div class="WordSection1" style="layout-grid: 15.6pt"> | ||
| + | |||
| + | {| class="MsoTableGrid" style="width: 431.25pt; border-collapse: collapse; border: none" width="575" border="1" | ||
| + | | style="width: 147.15pt; border: solid windowtext 1.0pt; padding: 0cm 5.4pt 0cm 5.4pt" width="196" valign="top" | | ||
| + | <span lang="EN-US">Source encoding</span> | ||
| + | | style="width: 142.05pt; border: solid windowtext 1.0pt; border-left: none; padding: 0cm 5.4pt 0cm 5.4pt" width="189" valign="top" | | ||
| + | <span lang="EN-US">UTF-8 (BOM)</span> | ||
| + | | style="width: 142.05pt; border: solid windowtext 1.0pt; border-left: none; padding: 0cm 5.4pt 0cm 5.4pt" width="189" valign="top" | | ||
| + | <span lang="EN-US">ISO-8859-1</span> | ||
| + | |- | ||
| + | | style="width: 147.15pt; border: solid windowtext 1.0pt; border-top: none; padding: 0cm 5.4pt 0cm 5.4pt" width="196" valign="top" | | ||
| + | <span lang="EN-US">Bytes in source</span> | ||
| + | | style="width: 142.05pt; border-top: none; border-left: none; border-bottom: solid windowtext 1.0pt; border-right: solid windowtext 1.0pt; padding: 0cm 5.4pt 0cm 5.4pt" width="189" valign="top" | | ||
| + | <span lang="EN-US">0xC3A9</span> | ||
| + | | style="width: 142.05pt; border-top: none; border-left: none; border-bottom: solid windowtext 1.0pt; border-right: solid windowtext 1.0pt; padding: 0cm 5.4pt 0cm 5.4pt" width="189" valign="top" | | ||
| + | <span lang="EN-US">0xE9</span> | ||
| + | |- | ||
| + | | style="width: 147.15pt; border: solid windowtext 1.0pt; border-top: none; padding: 0cm 5.4pt 0cm 5.4pt" width="196" valign="top" | | ||
| + | <span lang="EN-US">1. {STRING_AS}.value</span> | ||
| + | | style="width: 142.05pt; border-top: none; border-left: none; border-bottom: solid windowtext 1.0pt; border-right: solid windowtext 1.0pt; padding: 0cm 5.4pt 0cm 5.4pt" width="189" valign="top" | | ||
| + | <span lang="EN-US">0xC3A9</span> | ||
| + | | style="width: 142.05pt; border-top: none; border-left: none; border-bottom: solid windowtext 1.0pt; border-right: solid windowtext 1.0pt; padding: 0cm 5.4pt 0cm 5.4pt" width="189" valign="top" | | ||
| + | <span lang="EN-US">0xC3A9</span> | ||
| + | |- | ||
| + | | style="width: 147.15pt; border: solid windowtext 1.0pt; border-top: none; padding: 0cm 5.4pt 0cm 5.4pt" width="196" valign="top" | | ||
| + | <span lang="EN-US">2. {STRING_AS}.binary_value</span> | ||
| + | | style="width: 142.05pt; border-top: none; border-left: none; border-bottom: solid windowtext 1.0pt; border-right: solid windowtext 1.0pt; padding: 0cm 5.4pt 0cm 5.4pt" width="189" valign="top" | | ||
| + | <span lang="EN-US">0xC3A9</span> | ||
| + | | style="width: 142.05pt; border-top: none; border-left: none; border-bottom: solid windowtext 1.0pt; border-right: solid windowtext 1.0pt; padding: 0cm 5.4pt 0cm 5.4pt" width="189" valign="top" | | ||
| + | <span lang="EN-US">0xE9</span> | ||
| + | |- | ||
| + | | style="width: 147.15pt; border: solid windowtext 1.0pt; border-top: none; padding: 0cm 5.4pt 0cm 5.4pt" width="196" valign="top" | | ||
| + | <span lang="EN-US">3. {STRING_AS}.value_32</span> | ||
| + | | style="width: 142.05pt; border-top: none; border-left: none; border-bottom: solid windowtext 1.0pt; border-right: solid windowtext 1.0pt; padding: 0cm 5.4pt 0cm 5.4pt" width="189" valign="top" | | ||
| + | <span lang="EN-US">0xE9</span> | ||
| + | | style="width: 142.05pt; border-top: none; border-left: none; border-bottom: solid windowtext 1.0pt; border-right: solid windowtext 1.0pt; padding: 0cm 5.4pt 0cm 5.4pt" width="189" valign="top" | | ||
| + | <span lang="EN-US">0xE9</span> | ||
| + | |- | ||
| + | | style="width: 147.15pt; border: solid windowtext 1.0pt; border-top: none; padding: 0cm 5.4pt 0cm 5.4pt" width="196" valign="top" | | ||
| + | <span lang="EN-US">4. {STRING_AS}.string_value_32</span> | ||
| + | | style="width: 142.05pt; border-top: none; border-left: none; border-bottom: solid windowtext 1.0pt; border-right: solid windowtext 1.0pt; padding: 0cm 5.4pt 0cm 5.4pt" width="189" valign="top" | | ||
| + | <span lang="EN-US">0xE9</span> | ||
| + | | style="width: 142.05pt; border-top: none; border-left: none; border-bottom: solid windowtext 1.0pt; border-right: solid windowtext 1.0pt; padding: 0cm 5.4pt 0cm 5.4pt" width="189" valign="top" | | ||
| + | <span lang="EN-US">0xE9</span> | ||
| + | |- | ||
| + | | style="width: 147.15pt; border: solid windowtext 1.0pt; border-top: none; padding: 0cm 5.4pt 0cm 5.4pt" width="196" valign="top" | | ||
| + | <span lang="EN-US">5. Runtime</span> | ||
| + | | style="width: 142.05pt; border-top: none; border-left: none; border-bottom: solid windowtext 1.0pt; border-right: solid windowtext 1.0pt; padding: 0cm 5.4pt 0cm 5.4pt" width="189" valign="top" | | ||
| + | <span lang="EN-US">0xC3A9 (STRING_8 Rejected)</span> | ||
| + | |||
| + | <span lang="EN-US">0xE9 (STRING_32)</span> | ||
| + | | style="width: 142.05pt; border-top: none; border-left: none; border-bottom: solid windowtext 1.0pt; border-right: solid windowtext 1.0pt; padding: 0cm 5.4pt 0cm 5.4pt" width="189" valign="top" | | ||
| + | <span lang="EN-US">0xE9 (STRING_8)</span> | ||
| + | |||
| + | <span lang="EN-US">0xE9 (STRING_32)</span> | ||
| + | |} | ||
| + | |||
| + | </div> | ||
Revision as of 04:03, 30 May 2012
From 6.7, the compiler has been equiped with a Unicode parser. The core of the parser only accepts UTF-8 source code, for simplicity and generality. Before source code is passed into the core process of parsing, it is preprocessed and converted into UTF-8.
Internals
Abstracted syntax tree now stores STRING_8 as UTF-8 data on each node. There are also different features exporting UTF-8, UTF-32 or the written bytes. Here is an example of how a character é is represented in various levels.
| Source encoding | UTF-8 (BOM) | ISO-8859-1 | 
| Bytes in source | 0xC3A9 | 0xE9 | 
| 1. {STRING_AS}.value | 0xC3A9 | 0xC3A9 | 
| 2. {STRING_AS}.binary_value | 0xC3A9 | 0xE9 | 
| 3. {STRING_AS}.value_32 | 0xE9 | 0xE9 | 
| 4. {STRING_AS}.string_value_32 | 0xE9 | 0xE9 | 
| 5. Runtime | 0xC3A9 (STRING_8 Rejected) 0xE9 (STRING_32) | 0xE9 (STRING_8) 0xE9 (STRING_32) | 


