Internationalization/User guide

Revision as of 07:29, 14 October 2006 by Leo (Talk | contribs) (added formatting)

Overview

The i18n library is intended to enable localisation of Eiffel programs.

localisation is the process of adapting a piece of software to a specific place - the locale, often expressed as a combination of language and country codes.

This normally means not only displaying strings in the appropriate language, but also adapting number formatting, date and time formatting etc. to use local conventions. The i18n library provides formatting facilities for numbers, currency values and dates, and the ability to identify and load translated strings at run-time.

Interface

The library provides most of it's services through one class: LOCALE. This presents all formatting and translation facilities for a given locale. LOCALE objects can't be created directly: one must go though the LOCALE_MANAGER class. A LOCALE_MANAGER finds out what information for which locales is available, and offers a list to chose from. It will then load the information for the chosen locale into a LOCALE object and give it to you. With this LOCALE object you can do the really interesting things, like formatting dates and translating strings.

Choosing a locale

First you must have a LOCALE_MANAGER. For details on creating them, please see the Datasources section. If you just want to use whatever locale is the default on the user's machine, as in most cases, then it's easy to get a LOCALE object: just call get_system_locale.

my_locale := locale_mananger.get_system_locale

If you want a specific locale, it's going to be a bit more complicated. A LOCALE_MANAGER knows what locales are available and exactly what information is available for a specific locale. You can get a list of all locales that are available to some degree by calling the available_locales feature. A locale is identified by a LOCALE_ID object, and normally this has two components: language code and country code. For example, the US english locale has language code "en" and country code "US". If you've found a locale id that you like in the list returned by available_locales, you can check exactly what is vailable for it by using

has_translations (a_locale_id: I18N_LOCALE_ID): BOOLEAN
has_formatting_info (a_locale_id: I18N_LOCALE_ID): BOOLEAN

You can then retrieve the corresponding LOCALE object by calling get_locale.


Using your locale

Now you've got a LOCALE. After the initial burst of euphoria has faded away, what can you do with it?

String translation

Interface
translate (original: STRING_GENERAL): STRING_32
translate_plural (original_singular, original_plural: STRING_GENERAL; plural_form : INTEGER): STRING_32
format_string (original: STRING_GENERAL; token_values: TUPLE[STRING_GENERAL]): STRING_32
Usage

In naïvely written software, you can often spot things like

io.put_string("My hovercraft is full of eels")

We'll use this example to illustrate the use of the string translation features of the i18n library.

If, as above, there is just one constant string to translate, the solution is very easy: simply use the translate function. The resulting code would look like this

io.put_string(my_locale.translate("My hovercraft is full of eels"))

If the translate function can't find a translation for this string it will simply return the original string - better then nothing!

But life is, of course, not always that simple. What if we have to deal with plurals? The "traditional" way of doing this is something like:

n := get_number_of_hovercraft
if n = 1 then
	io.put_string("My hovercraft is full of eels")
else
	io.put_string("My hovercraft are full of eels")
end

This is not so easy to translate as the above. Why can't we just translate both strings?

Depending on the language, there may be up to 4 different types of plural forms, used in strange and exotic ways. Clearly, it is important to know exactly _how_ many hovercraft there are so that we can choose the right plural form. This can be done by the translate_plural function, which we can use in this way:

n := get_number_of_hovercraft
io.put_string(my_locale.translate_plural("My hovercraft is full of eels","My hovercraft are full of eels",n))

This function will choose and return a translation in the correct plural form. If it can't find one, it will behave like translate and return either the original singular string or the original plural string, following English grammatical rules.

Often even the above is not enough. What if you want to tell the world exactly how many hovercraft you have? You might write something like this:

n := get_number_of_hovercraft
if n = 1 then
	io.put_string("My hovercraft is full of eels")
else
	io.put_string("My "+n.out+" hovercraft are full of eels")
end

How can translate_plural handle this? It needs some reinforcements: the solution is to also use string templates. This means that we can embed codes like "$1" in a string and replace them in the translation by the actual values. Let's see how this works:

n := get_number_of_hovercraft
plural_string := my_locale.translate_plural("My hovercraft is full of eels","My $1 hovercraft are full of eels",n)
io.put_string(my_locale.format_string(plural_string, [n]))

To replace the escape codes, such as $1, $2, we use the function format_string. This replaces all the escape codes it finds by the values in a tuple that you give to it a an argument.

Formatting

Interface

DATE_FORMATTER provides:

format_date(date:DATE):STRING_32 
format_time(time: TIME): STRING_32 
format_date_time(date_time:DATE_TIME):STRING_32

CURRENCY_FORMATTER provides:

format_currency (a_value: REAL_64): STRING_32

VALUE_FORMATTER provides:

format_integer_8 (a_integer_8: INTEGER_8): STRING_32 
format_integer_16 (a_integer_16: INTEGER_16): STRING_32 
format_integer_32 (a_integer_32: INTEGER_32): STRING_32
format_integer_64 (a_integer_64: INTEGER_64): STRING_32 
format_real_32 (a_real_32: REAL_32): STRING_32 
format_real_64 (a_real_64: REAL_64): STRING_32
Usage

The LOCALE class makes 3 formatters accessible to clients: a VALUE_FORMATTER, a DATE_FORMATTER and a CURRENCY_FORMATTER, exposed as features under the names value_formatter, date_formatter and currency_formatter respectively.

Using these formatters is fairly straightforward: you simply call the appropriate function for  the type of object that you want to format.
Date formatting

The DATE_FORMATTER class can format EiffelTime DATE, TIME and DATE_TIME classes in a way appropriate to the locale. For example, to get a string representation of today's date in a given locale, you might write:

io.put_string(my_locale.format_date(create {DATE}.make_now))

Currently eras of non-gregorian calendars are not well supported.

Value and currency formatting

The VALUE_FORMATTER and CURRENCY_FORMATTER classes can format integers and reals according to the conventions of a given locale - number of digits after the decimal separator, the decimal separator itself, grouping of digits and so on.

Using them is just as easy as DATE_FORMATTER: just call the function appropriate to the type of object who's value you want to format.

String extraction

Somebody has to translate all these strings. Mostly, this isn't a programmer, so somehow you have to be able to give this translator a list of strings that you want translated.

We can extract these strings from your application fairly easily by simply looking at the arguments for each call to translate or translate_plural. By clicking on a handy button in EiffelStudio, these strings will be extracted and placed in a .po file.

TODO: add more details about clicky thing.

A .po file is a reasonably widespread format for storing strings to be translated. There are several tools to aid translation of these files, such as poEdit for Windows and KBabel or gtranslator for KDE and Gnome. There are also many tools to convert .po files to other formats, such as the xml xliff format.

Datasources

The library has to load the translated strings from somewhere - sadly we can't do on the fly translation but if you can, please tell us! Instead we load the strings from a datasource. This is appropriately generic: it could be anything, from a database to a system that queries a server via RPC or SOAP, but currently we only have one implementation: files. And in fact, we only support one type of file: the .mo file format. The library can't guess the type of datasource you want to use, so you have to tell it when you create a LOCALE_MANAGER. This is done via an uri. Currently all uris are interpreted as directories where string catalog files may be found, and any .mo files in this directory will be used by the i18n library. This means that creating a LOCALE_MANAGER looks somewhat like this:

create my_locale_manager.make("/path/to/my/files")

Mo files

The mo file format is defined by the GNU gettext library, a widely-used C library that allows localisation of text strings. We support UTF-8 encoded mo files.

How do you get these .mo files? Once your translator has finished translating a .po file, you can convert it into a .mo file by using the msgfmt tool, which is obtainable as part of the gettext package under unix and distributed with poEdit under Windows. The resulting mo file should be named with the locale identifier of the locale it is intended for (zh_CN.mo or de_CH.mo, for example) and placed in the appropriate directory - this is to say, the one that you give to LOCALE_MANAGER as an uri.

The .mo file should then be seen and used by the i18n library.