Encoding library

Revision as of 01:31, 25 January 2007 by Ted (Talk | contribs) (Usage)

Overview

The encoding library is a library used to convert string stream among various encodings. The main reason it's developed is internationalization of batch EiffelStudio. The idea is directing localized encoding strings to the console on Windows and UTF-8 encoding on Unix makes local languages supported be displayed correctly.

Layout

 encoding
  |-ENCODING
  |-CODE_PAGE_CONSTANTS
  |-implementation
    | ENCODING_I
    |-unix
      |-ENCODING_IMP
      |-CODE_SET
    |-windows
      |-ENCODING_IMP
      |-CODE_PAGE

Usage

  • The usage is simple.
 - Initialize a from ENCODING object and a to object with `code_page's. 
 - Invoke {ENCODING}.convert_to of the from ENCODING object. `convert_to' takes the to ENCODING object and original string as arguments, and returns the target encoded  string.
  • `code_page' should be valid a given OS so that the conversion can be achieved. A valid `code_page' on Windows are mostly the same as defined code page identifier at MSDN, there are also a few out of the table are valid as defined in CODE_PAGE_CONSTANTS. On Unix, a valid `code_page' is actually a name of encodings supported by libiconv. To guarentee a valid `code_page', it should be either from CODE_PAGE_CONSTANTS or from {I18N_LOCALE}.info.code_page of i18n library.
  • "a_from_string" should be guaranteed to be of correct character set and encoding specified as from ENCODING object. Or error could occur, none or unexcepted output might be returned.
  • Example:
foo is
		local
			l_encoding_from, l_encoding_to: ENCODING
			l_string_from: STRING_32
			l_output: STRING_GENERAL
		do
			create l_string_from.make (2)
			l_string_from.append_code (0x0E0041)
			l_string_from.append_string ("A")
 
			create l_encoding_from.make ((create {CODE_PAGE_CONSTANTS}).utf32)
			create l_encoding_to.make ((create {CODE_PAGE_CONSTANTS}).utf16)
 
			l_output := l_encoding_from.convert_to (l_encoding_to, l_string_from)
				-- l_string_from is now 0x000E0041 0x00000041.
				-- l_output is now 0x0000DB40 0x0000DC41 0x00000041.
		end