Encoding library
Overview
The encoding library is a library used to convert string stream among various encodings. The main reason it's developed is internationalization of batch EiffelStudio. The idea is directing localized encoding strings to the console on Windows and UTF-8 encoding on Unix makes local languages supported be displayed correctly.
Layout
encoding |-ENCODING |-CODE_PAGE_CONSTANTS |-implementation | ENCODING_I |-unix |-ENCODING_IMP |-CODE_SET |-windows |-ENCODING_IMP |-CODE_PAGE
Usage
- The usage is simple.
- Initialize a from ENCODING object and a to object with `code_page's. - Invoke {ENCODING}.convert_to of the from ENCODING object. `convert_to' takes the to ENCODING object and original string as arguments, and returns the target encoded string.
- `code_page' should be valid a given OS so that the conversion can be achieved. A valid `code_page' on Windows are mostly the same as defined code page identifier at MSDN, there are also a few out of the table are valid as defined in CODE_PAGE_CONSTANTS. On Unix, a valid `code_page' is actually a name of encodings supported by libiconv. To guarentee a valid `code_page', it should be either from CODE_PAGE_CONSTANTS or from {I18N_LOCALE}.info.code_page of i18n library.
- "a_from_string" should be guaranteed to be of correct character set and encoding specified as from ENCODING object. Or error could occur, none or unexcepted output might be returned.
- Data converted from Unicode UTF-16 to non-Unicode code pages (code pages other than UTF-7 or UTF-8) is subject to data loss, because a code page might not be able to represent every character used in the specific Unicode data.
- Example:
foo is local l_encoding_from, l_encoding_to: ENCODING l_string_from: STRING_32 l_output: STRING_GENERAL do create l_string_from.make (2) l_string_from.append_code (0x0E0041) l_string_from.append_string ("A") create l_encoding_from.make ((create {CODE_PAGE_CONSTANTS}).utf32) create l_encoding_to.make ((create {CODE_PAGE_CONSTANTS}).utf16) l_output := l_encoding_from.convert_to (l_encoding_to, l_string_from) -- l_string_from is now 0x000E0041 0x00000041. -- l_output is now 0x0000DB40 0x0000DC41 0x00000041. end
Implemenation
Generally the library wraps Windows api and iconv library on Unix.
Windows
- Main apis are