Internationalization/Developer guide

Revision as of 19:50, 15 October 2006 by Manus (Talk | contribs)

General architecture of the i18n library

Where does the information come from?

The i18n library must obviously knows how to format things and finds translations for many different locales. Translations are application-dependant and thus we only have to deal with them on an infrastructural basis - the actual information is supplied by the user. However formatting information is not. Instead, we can fetch this from the operating system.

This leads us to divide the library into three main parts:

  1. A part which organises and provides translations from a user-supplied data source.
  2. A part which retrieves formatting information from the host operating system
  3. A part which provides an interface to the information
General structure of the i18n library

An overview of the structure is provided to the right: the two central classes, LOCALE and LOCALE_MANAGER are the main interface classes. The rightmost classes, SYSTEM_LOCALES and HOST_LOCALE, are responsible for fetching the formatting information, and the leftmost class, DATASOURCE_MANAGER, must deal with finding the translation of strings.

In addition there are several classes that are used to encapsulate information, not shown on diagrams to avoid them resembling a web drawn by an overcaffeinated spider.

Note: the 'I18N' prefix of class names is omitted here for clarity.

Interface

The main two classes of the interface are, as has been previously stated, LOCALE and LOCALE_MANAGER. LOCALE represents all operations associated with a given locale: formatting and translation. This is the class that clients use to actually localise things, but all it does is provide wrapper functions: the translations are retrieved from a DICTIONARY (more on this later) provided to it on creation, and the formatting is done by specialised formatting classes (DATE_FORMATTER, VALUE_FORMATTER, STRING_FORMATTER and CURRENCY_FORMATTER) which are also operated with information passed in a LOCALE_INFO object to the LOCALE on creation.

Obviously it should not be the user's job to do all this initialisation. This is why there must be a class that is in charge of presenting the user with a choice of locales and giving the user a correctly initialised LOCALE for the locale ultimately chosen. This class is LOCALE_MANAGER. A LOCALE_MANAGER uses SYSTEM_LOCALE and DATASOURCE_MANAGER to find out for which locales formatting information and/or translations are available and can provide the client with a list of supported locales. A locale is identified by a LOCALE_ID object; this is not only used internally but also by the client when requesting a LOCALE object.

TODO: expand? formatters?

Formatting information

Section of the i18n library that retrieves locale information



Translations

Section of the i18n library that retrieves translations


Possible expansion points

We hope our library is reasonably extensible. In particular, we foresee the following areas of expansion:


New file formats

Currently we only support the .po/.mo file format. This is because we have limited time and resources and we also feel that the .mo file format is the best current choice, as it it provides plural form handling. However, there are other file formats. Trolltech has their own format, there is a Solaris message catalog format, presumably some Windows formats, and OS X also has a native format.

In order to add support for one of these formats - or your own! - it's necessary to write both FILE* and FILE_HANDLER* implementations. Then the new effective descendant of FILE_HANDLER* must be added to the chain-of-responsibility (called chain) in the make feature of FILE_MANAGER.

Data sources

New data sources

Currently we only have one implementation of DATASOURCE_MANAGER*. But maybe a file is not suited to everything. A possible data source, however far-fetched, might be a database: all strings could be fetched via queries. Or maybe all strings could be fetched via SOAP or RPC from a remote machine, to ensure up-to-date translations. More realistically one could certainly imagine a data source that checks the locally-stored translations and fetches the latest version remotely if there has been changes. The easiest way to do such things is of course to write a new effective descendant of DATASTRUCTURE*. This may or may not require a new implementation of DICTIONARY* - for a system that fetches strings on.demand rather then loading them all at initialisation, a new DICTIONARY* would certainly be advisable!

To make the library know about a new DATASOURCE_MANAGER*, URI_PARSER must be told how to recognise an uri that requires it. It is advisable to choose a nice prefix.

FILE_MANAGER

Currently FILE_MANAGER has the simplistic policy of only examining files in the current directory and trusting their name. It is very well possible that there is a project policy of placing each locale in it's own directory (KDE does this), of having multiple .mo files for one locale, or of having one .mo file for multiple locales (for example: a fr.mo file could cover fr_FR, fr_CH and fr_CA, although this might not be appreciated by some users ;) )

A good place to implement such a project-dependant policy is a descendant of FILE_MANAGER, or an entirely new data source.

New dictionaries

New dictionaries might be required by new data sources. Or maybe the translations used by your project can be stored in a more efficient way then the general case - one could imagine a dictionary that takes advantage of singular/plural distribution, or that is keyed to the way translations are stored in a particular file format. To add a new dictionary, it's sufficient to write an implementation of DICTIONARY* and to make sure it's used.