General architecture of the i18n library
Where does the information come from?
The i18n library must obviously knows how to format things and finds translations for many different locales. Translations are application-dependant and thus we only have to deal with them on an infrastructural basis - the actual information is supplied by the user. However formatting information is not. Instead, we can fetch this from the operating system.
This leads us to divide the library into three main parts:
- A part which organises and provides translations from a user-supplied data source.
- A part which retrieves formatting information from the host operating system
- A part which provides an interface to the information
An overview of the structure is provided to the right: the two central classes, LOCALE and LOCALE_MANAGER are the main interface classes. The rightmost class, HOST_LOCALE, is responsible for fetching the formatting information, and the leftmost class, DATASOURCE_MANAGER, must deal with finding the translation of strings.
In addition there are several classes that are used to encapsulate information, not shown on diagrams to avoid them resembling a web drawn by an overcaffeinated spider.
Note: the 'I18N' prefix of class names is omitted in the text for clarity.
The main two classes of the interface are, as has been previously stated, LOCALE and LOCALE_MANAGER. LOCALE represents all operations associated with a given locale: formatting and translation. This is the class that clients use to actually localise things, but all it does is provide wrapper functions: the translations are retrieved from a DICTIONARY (more on this later) provided to it on creation, and the formatting is done by specialised formatting classes (DATE_FORMATTER, VALUE_FORMATTER, STRING_FORMATTER and CURRENCY_FORMATTER) which are also operated with information passed in a LOCALE_INFO object to the LOCALE on creation.
Obviously it should not be the user's job to do all this initialisation. This is why there must be a class that is in charge of presenting the user with a choice of locales and giving the user a correctly initialised LOCALE for the locale ultimately chosen. This class is LOCALE_MANAGER. A LOCALE_MANAGER uses an implementation of HOST_LOCALE and a DATASOURCE_MANAGER to find out for which locales formatting information and/or translations are available and can provide the client with a list of supported locales. A locale is identified by a LOCALE_ID object; this is not only used internally but also by the client when requesting a LOCALE object.
Most major operating systems have an API that provides localisation information. Often they also allow clients to format dates, times and values directly. We decided that instead of directly using the formatting functions of the operating system, we would write our own formatters in Eiffel. Retrieving the required information, however, still has to be done in C. Depending on the operating system, this is a more-or-less simple process.
The formatting information for a given locale is stored in objects of class LOCALE_INFO. Each LOCALE_INFO is initialized on creation with (what we think are sensible) default values, as not all platforms provide all the same information.
LOCALE_MANAGER is able to retrieve the LOCALE_ID of the default locale, filled LOCALE_INFOS and a list of locales with formatting information from a HOST_LOCALE_IMP.
The deferred class HOST_LOCALE* specifies the interface for operating system-specific implementations, and the right effective class, HOST_LOCALE_IMP, is included in the system through a conditional statement in the .ecf platform. This class normally makes use of C externals to actually access the formatting information, although the .NET implementation is an exception. The main jobs that it has are: assembling a list of supported locales, creating, filling and returning a LOCALE_INFO for a given locale, and identifying the locale set as default in the operating system preferences. Currently there are HOST_LOCALE_IMP classes for .NET, Windows, and what should be more or less POSIX - only, for now, tested under linux.
The part of the library that provides translated strings is slightly more complicated. The strings come from a so-called data source (or datasource, depending on preference). The uri given to a LOCALE_MANAGER on creation is examined by a URI_PARSER, which decides what sort of data source this uri represents. It then creates an appropriate DATASOURCE_MANAGER* with this uri and returns it to the LOCALE_MANAGER.
From this point onwards the DATASOURCE_MANAGER* is responsible for the main operations involving translations: providing a list of locales and languages for which a translation is present, and providing the translation in form of a DICTIONARY* for a given locale. A DICTIONARY* is nothing more then a collection of translated strings, with functions to access them. There are several effective descendants, as the best way to store the strings may depend on several factors (singular/plural ration, data source itself, etc.). The LOCALE_MANAGER gets the relevant DICTIONARY* from the DATASOURCE_MANAGER* and passes it to the LOCALE on creation. The LOCALE can then retrieve translations.
If there are translations for both a locale itself and it's language, the DATASOURCE_MANAGER* will of course return a DICTIONARY* containing the translations specifically for that locale. If there is no translation for the locale but one exists for it's language, that will be used.
File data source
The file datasource works in the following way: The FILE_MANAGER, an effective DATASOURCE_MANAGER*, has a chain-of-responsibility built from FILE_HANDLER*s. There are two operations provided by this chain of responsibility: determining the scope of a file (i.e which language or locale it corresponds to) and providing a DICTIONARY* containing the strings from this file. By using this chain and a list of files in the current directory, the FILE_MANAGER can list available locales, list available languages and provide DICTIONARY* objects. The precise type of DICTIONARY* provided is dependant on the type of file.
Each FILE_HANDLER* works by using a FILE* object, which provides the actual parsing functionality for a given file type. Currently the only supported file type is the .mo file format, so this is the only example I may provide.
Possible expansion points
We hope our library is reasonably extensible. In particular, we foresee the following areas of expansion:
New file formats
Currently we only support the .po/.mo file format. This is because we have limited time and resources and we also feel that the .mo file format is the best current choice, as it it provides plural form handling. However, there are other file formats. Trolltech has their own format, there is a Solaris message catalog format, presumably some Windows formats, and OS X also has a native format.
In order to add support for one of these formats - or your own! - it's necessary to write both FILE* and FILE_HANDLER* implementations. Then the new effective descendant of FILE_HANDLER* must be added to the chain-of-responsibility (called chain) in the make feature of FILE_MANAGER.
New data sources
Currently we only have one implementation of DATASOURCE_MANAGER*. But maybe a file is not suited to everything. A possible data source, however far-fetched, might be a database: all strings could be fetched via queries. Or maybe all strings could be fetched via SOAP or RPC from a remote machine, to ensure up-to-date translations. More realistically one could certainly imagine a data source that checks the locally-stored translations and fetches the latest version remotely if there has been changes. The easiest way to do such things is of course to write a new effective descendant of DATASTRUCTURE*. This may or may not require a new implementation of DICTIONARY* - for a system that fetches strings on.demand rather then loading them all at initialisation, a new DICTIONARY* would certainly be advisable!
To make the library know about a new DATASOURCE_MANAGER*, URI_PARSER must be told how to recognise an uri that requires it. It is advisable to choose a nice prefix.
Currently FILE_MANAGER has the simplistic policy of only examining files in the current directory and trusting their name. It is very well possible that there is a project policy of placing each locale in it's own directory (KDE does this) or of having a translation spanned over multiple .mo files for one locale.
A good place to implement such a project-dependant policy is a descendant of FILE_MANAGER, or an entirely new data source.
New dictionaries might be required by new data sources. Or maybe the translations used by your project can be stored in a more efficient way then the general case - one could imagine a dictionary that takes advantage of singular/plural distribution, or that is keyed to the way translations are stored in a particular file format. To add a new dictionary, it's sufficient to write an implementation of DICTIONARY* and to make sure it's used.