Compiler and encoding
From 6.7, the compiler has been equiped with a Unicode parser. The core of the parser only accepts UTF-8 source code, for simplicity and generality. Before source code is passed into the core process of parsing, it is preprocessed and converted into UTF-8.
Internals
Data Storage
Abstracted syntax tree now stores STRING_8 as UTF-8 data on each node. There are also different features exporting UTF-8, UTF-32 or the written bytes.
Here is an example of how a character é is represented at various levels.
| Source encoding | UTF-8 (BOM) | ISO-8859-1 | 
| Bytes in source | 0xC3A9 | 0xE9 | 
| 1. {STRING_AS}.value | 0xC3A9 | 0xC3A9 | 
| 2. {STRING_AS}.binary_value | 0xC3A9 | 0xE9 | 
| 3. {STRING_AS}.value_32 | 0xE9 | 0xE9 | 
| 4. {STRING_AS}.string_value_32 | 0xE9 | 0xE9 | 
| 5. Runtime | 0xC3A9 (STRING_8) 0xE9 (STRING_32) | 0xE9 (STRING_8) 0xE9 (STRING_32) | 
Validility
Source code encoding is either explicitly or implicitly specified.
-  Explicit
- File level: UTF-8 (BOM) is implemented
- Class level: note clause (not implemented)
- Configure file: .ecf (not implemented)
 
-  Implicit
- Implicit encoding is taken as ISO-8859-1 for compatibility, if no source code encoding is specified.
 
The following table shows how manifest strings are validated by the compiler:
| Explicit Encoding | Implicit Encoding (ISO-8859-1) | |
| STRING_8 manifest | Unicode point (0-255)? | Valid (taken as bytes) | 
| STRING_32 manifest | Valid | Valid | 


