Migration to Unicode

Revision as of 20:42, 8 November 2012 by Alexander Kogtenkov (Talk | contribs) (Fixed a typo.)


This is a summary of the recommendations for adapting applications to handle Unicode.

General rule

  1. Never use STRING_8 or any variant of it unless you write a program that is going to be deleted in 5 minutes after running it. Do not even consider using STRING_8 or its variant. Always use IMMUTABLE_STRING_32, READABLE_STRING_32 or STRING_32.
  2. Do not use classes from third-party libraries that take STRING_8 rather than STRING_32.
  3. Use PATH to manipulate file or directory names
  4. Whenever an API takes a READABLE_STRING_GENERAL argument, assume that STRING_8 will be treated as Unicode strings in the range 0 .. 255, unless explicitly noted.

Temporary solution

  1. Replace types using the following table:
Old class New class
STRING_8 STRING_32
FILE_NAME PATH
DIRECTORY_NAME PATH
KL_BINARY_INPUT_FILE KL_BINARY_INPUT_FILE_32
KL_TEXT_OUTPUT_FILE KL_TEXT_OUTPUT_FILE_32
EXECUTION_ENVIRONMENT EXECUTION_ENVIRONMENT_32
  1. Consider using READABLE_STRING_32 for argument types. If you cannot immediately change argument types to take READABLE_STRING_32, e.g. because this is a library class, use READABLE_STRING_GENERAL and perform all the necessary conversions inside the routine.
  2. If you need to convert one UTF encoding into another one, use UTF_CONVERTER. This is an expanded class, so it's possible to declare a local variable of this type and call features on it without explicit object creation.
  3. Consider using SHARED_EXECUTION_ENVIRONMENT to access EXECUTION_ENVIRONMENT_32.