EiffelStudio Internationalization

Revision as of 06:45, 16 July 2007 by Ted (Talk | contribs) (More detailed plan)

Overview

Since i18n have been mostly implemented in Eiffel, Eiffel Studio is coming into the new era of internationalization. The goal of i18n integration is to provide multiple language support in Eiffel Studio and let users switch languages of the interface easily at runtime.

Steps to integrate i18n

Non-editor part

The first step concentrates on the interface of Eiffel Studio. All buttons, labels, tool tips and grids that are directly used by Eiffel Studio project.

  1. Collect all static interface strings in the system., including some context dependent strings.
    1. This is not necessary, but doing this give us better management and code quality. Only INTERFACE_NAMES knows i18n.
    2. Change all types in INTERFACE_NAMES, EB_METRIC_NAMES, CONF_INTERFACE_NAMES, WARNING_MESSAGES to be STRING_GENERAL. Caller should be adapted correspondingly.For some stings, two versions maybe needed. One for internal use, the other one for the interface,especially for strings saved as preferences and strings constants used in configure XML files.
    3. Rewrite bodies of those strings using i18n translation routines, STRING_32 instances are actually produced.
    4. Modify places using EV_CONSTANTS, make new classes if needed. e.g. EV_CONFIRMATION_DIALOG; EV_WARNING_DIALOG are not usable.
    5. Write scripts to draw strings from default.xml. "Directory", names and descriptions of preferences are also needed to be translated.
    6. Write scripts to draw names of wizards from .dsc files, and translate interface names by code in wizard projects.
  2. Build language menus to switch language. (Choices have been done as a preference, and decided not to do observer pattern.)
    1. Make interface classes locale observers so that all tools know when interface names should be reread.
  3. Solve the problems in vision2.
    1. In Chinese, menu chars are conventionally parenthesized and under scored following the menu text. This can be done by the translator.
    2. Handle "&" as both char and wchar for menu items.
    3. Fix "tab" issue for menu items.
  4. Add string encoding conversion support in i18n library, with which correct translated stream can be direct to the console. And this would be usefull when doing the following editor part.
  5. Integrate i18n .po generation tool. It has been done in i18n branch. (po generation tool has been done a stand alone tool.)
  6. Use .po generation tool to generate estudio.pot file. The .po generation tool generally extracts strings that are taken as parameters of `translated' and `translated_plural' and produce .pot file.
  7. Since strings used in EiffelStudio not only come from the source code. Write scripts to extract them and merge them into estudio.pot. Strings should be extracted from: default.xml → descriptions of preferences
  8. Duplicate estudio.pot file to .po files with names of locale ids. Each .po file represents a locale. i18n library read .mo files with correct names of id. Though .po files are not necessary to have names of locale id, .mo files are one-to-one produced from .po files. So using locale ids as names of .po files is reasonable.
  9. Translators using .po editor open .po files and translate interface names to all kinds of languages.
  10. Generate .mo files using .mo generation script.

Editor part

Earlier glance

  1. This step might be more complicated. And will be done later after the first step. Probably after 6.0 release. This step concentrates on extending the editor library to accept wide characters. Internationalization of any output directed to the editor is done in the step. Many existing tools might be affected, search tool, formatting tools etc.
  2. Encoding conversion facility is needed.

More detailed plan

  • Editor library
    • Use UTF-16 internally in the editor
    • Unicode supported Eiffel sanner
    • Unicode text rendering
    • Ability read all kinds of text format. This implies the editor is able to convert all kinds of format from and to Unicode. Attribute of `encoding' is needed. This also affects how to handle strings from IM.
    • Enabling IM of drawing area on GTK.
  • i18n library
    • Add ability to recognize context. `translation_with_context (original: STRING_GENERAL): STRING_32' and `plural_translation_with_context (original_singular, original_plural: STRING_GENERAL; plural_number: INTEGER): STRING_32' would be added.
    • Modify po generation tool to handle new interfaces.
  • GUI
    • A way to let the user decide which encoding should the editor use to render or save.
    • Search facility
    • Text Formatting
    • Code completion
    • Pick and Drop able grids
  • Compiler (Possible step, may lead to change of ECMA)
    • Choose an encoding that the parser works on. (Local encoding or UTF-8 probably)
    • Hence the compiler has to be able to convert all kinds of texts to the one it accepts.
    • Names heaps, queries and outputs need to be changed.
    • This may affect a lot of things, configuration, metrics etc.
  • Documentation generation

Text Flow and Encoding (Design)

Eiffel Studio Text Flow and Encoding Design.PNG

File structure

Repository

All files are stored in %EIFFEL_SRC%\Delivery\studio\lang

%EIFFEL_SRC%\Delivery\studio\lang\script

Place where scripts for generating .mo files are put. The scripts are invoked when building a delivery.

%EIFFEL_SRC%\Delivery\studio\lang\mo_files

Place to put .mo files. Those files are actually used at runtime.
Only .mo files need to be included in a delivery.

%EIFFEL_SRC%\Delivery\studio\lang\po_files

Place to put .pot file and .po files. 

Delivery

 Windows:
 %ISE_EIFFEL%\studio\lang\mo_files\*.mo
 Unix:
 /usr/share/locale/(product_version_name)/*.mo

Maintenance

General

  • .pot file is PO template file which is generated by .po generation tool. .pot file is simply untranslated file with only source entries and blank target entries.
  • .po files are the files translators actually work on. Whenever translators get a new version of .pot file, they should update .po file they are working on from the .pot file. Update is normally done by third party tools. Tools like poEdit give lists of new strings and obsolete strings. And in the full list new strings and fuzzy strings are marked in different colors by poEdit.
  • Fuzzy strings are applied when updating. msgmerge of Gettext make slightly changed strings fuzzy. When the checking of fuzzy strings is done, translators should remove fuzzy marks.
  • Obsolete strings are commented out at the end of .po files when merging. Those comments can be removed at anytime if we wish.
  • When the translation or modification is done, translators only need to commit .po file(s) that they are working on.
  • Whenever new languages are decided to add in. New .po file can be added directly in %EIFFEL_SRC%\Delivery\studio\lang\po_files. Eiffel Studio should have the ability to detect at runtime what languages are available.

Life Cycle

  1. Changes take place in code.(Developer)
  2. Syncronize local code from repository.(Maintainer)
  3. Run po_generation_tool with approperate arguments (Maintainer)
  4. Run build_misc_enties.bat (Maintainer)
  5. Commit estudio.pot from last step. (Maintainer)
  6. Update .po files from estudio.pot. (Translator)
  7. Translate and commit various .po files. (Translator)
  8. Build and commit .mo files. (Release maker in principle. Maintainer or Translator is also OK.)
  9. .mo files go with releases. (Release maker)

Translator Guide

  • Update $EIFFEL_SRC/Delivery.
  • Download a po editor. poEdit for Windows and KBabel or gtranslator for KDE and Gnome.
  • In $EIFFEL_SRC/Delivery/studio/lang/po_files, find out the .po file(s) one should work on. Take zh_CN.po as example. Open zh_CN.po in po editor. In the po editor, there should be a command to update from pot file. Update from $EIFFEL_SRC/Delivery/studio/lang/po_files/estudio.pot within the po editor. If there is any change took place in estudio.pot, the po editor should inform. The translator fills empty entries, solves FUZZY strings or modifies translated entries.
  • When the translation is done, just commit those modified po files.
  • Make sure that po files are saved in UTF-8 encoding.
  • Note that never try by hand to add or remove terms in .po files. Because other translators wouldn't see the changes if estudio.pot file were not updated.
  • Note that a translator should not modify estudio.pot file.

Developer Guide

  • The major thing a developer should take care of is code quality. All names need to be translated should in principle be put in framework/interface_names. Whenever a string of sentence is needed in the interface, just leave it as a sentence to be translated. Be careful to SEPERATE a sentence into terms or phrases, because ways to sequence those terms again into sentence vary in different languages. Plural form should be used whenever needed. There are a lot of examples in INTERFACE_NAMES.
  • If a developer wants to take changes into effect immediately, see what a maintainer and a translator should do.

Maintainer Guide

  • Build po generation tool which is located at internal svn repository $EIFFEL_SRC/tools/po_generation_tool.
  • Make sure gettext and perl is installed. On windows cygwin cantains perl and gettext modules.
  • When there are new or modified strings need to be translated in the code. $EIFFEL_SRC/Delivery/studio/lang/estudio.pot should be regenerated.
 To regenerate estudio.pot, one should do:
 po_generation_tool -D %EIFFEL_SRC%/Eiffel %EIFFEL_SRC%/framework %EIFFEL_SRC%/help/wizards %EIFFEL_SRC%/library/wizard -o %EIFFEL_SRC%/Delivery/studio/lang/po_files/estudio.pot
 On windows:
 %EIFFEL_SRC%/Delivery/studio/lang/script/build_misc_entries.bat
 On Unix:
 perl $EIFFEL_SRC/Delivery/studio/lang/script/misc_po_extraction.pl
 msguniq -s -o $EIFFEL_SRC/Delivery/studio/lang/po_files/estudio.pot $EIFFEL_SRC/Delivery/studio/lang/po_files/estudio.pot
  • Make sure etudio.pot file committed is generated from repository code. Do not commit estudio.pot that was generated from local code. Because other maintainers might override your changes that are not from repository. And modification of estudio.pot file by hand is not recommended.
  • Carefully check estudio.pot file is correct before committing, because tranlators who work on estudio.pot with problem would probably lose their efforts when the correction is done.
  • Commit the estudio.pot file to let translators to update.
  • To add a new language support, simply copy estudio.pot to LOCALE_ID.po where LOCALE_ID should be:
 Case 1: LL_RR
 Case 2: LL_SS_RR
 Case 3: LL_RR
 Case 4: LL_RR.Enc
 Case 5: LL_RR@SS  [sometimes the SS is simply variant information]
 LL is a two-letter language identifier from ISO 639-1 or, if there is none, a three-letter
 identifier from ISO 639-2/T
 RR is a two-letter country coding from ISO 3166-1, except when it is not (en-029 ('English (Carribean)') under Windows)
 SS under windows is mostly either 'Latn' or 'Cyrl'. @SS on linux is sometimes useful and sometimes meaningless

Locale Id for reference

       Afrikaans (South Africa)         af-ZA         
       Amharic (Ethiopia)         am-ET         
       Arabic (U.A.E.)         ar-AE         
       Arabic (Bahrain)         ar-BH         
       Arabic (Algeria)         ar-DZ         
       Arabic (Egypt)         ar-EG         
       Arabic (Iraq)         ar-IQ         
       Arabic (Jordan)         ar-JO         
       Arabic (Kuwait)         ar-KW         
       Arabic (Lebanon)         ar-LB         
       Arabic (Libya)         ar-LY         
       Arabic (Morocco)         ar-MA         
       Arabic (Oman)         ar-OM         
       Arabic (Qatar)         ar-QA         
       Arabic (Saudi Arabia)         ar-SA         
       Arabic (Syria)         ar-SY         
       Arabic (Tunisia)         ar-TN         
       Arabic (Yemen)         ar-YE         
       Mapudungun (Chile)         arn-CL         
       Assamese (India)         as-IN         
       Azeri (Azerbaijan, Cyrillic)         az-Cyrl-AZ         
       Azeri (Azerbaijan, Latin)         az-Latn-AZ         
       Bashkir (Russia)         ba-RU         
       Belarusian (Belarus)         be-BY         
       Bulgarian (Bulgaria)         bg-BG         
       Bengali (India)         bn-IN         
       Tibetan (Bhutan)         bo-BT         
       Tibetan (PRC)         bo-CN         
       Breton (France)         br-FR         
       Bosnian (Bosnia and Herzegovina, Cyrillic)         bs-Cyrl-BA         
       Bosnian (Bosnia and Herzegovina, Latin)         bs-Latn-BA         
       Catalan (Catalan)         ca-ES         
       Corsican (France)         co-FR          -- Note: Corsican is in the msdn table, but has no LCID - maybe in future releases it will get one (corsican nationalists might threaten to blow up Microsoft HQ)
       Czech (Czech Republic)         cs-CZ         
       Welsh (United Kingdom)         cy-GB         
       Danish (Denmark)         da-DK         
       German (Austria)         de-AT         
       German (Switzerland)         de-CH         
       German (Germany)         de-DE         
       German (Liechtenstein)         de-LI         
       German (Luxembourg)         de-LU         
       Lower Sorbian (Germany)         dsb-DE         
       Divehi (Maldives)         dv-MV         
       Greek (Greece)         el-GR         
       English (Caribbean)         en-029         
       English (Australia)         en-AU         
       English (Belize)         en-BZ         
       English (Canada)         en-CA         
       English (United Kingdom)         en-GB         
       English (Ireland)         en-IE         
       English (India)         en-IN         
       English (Jamaica)         en-JM         
       English (Malaysia)         en-MY         
       English (New Zealand)         en-NZ         
       English (Philippines)         en-PH         
       English (Singapore)         en-SG         
       English (Trinidad and Tobago)         en-TT         
       English (United States)         en-US         
       English (South Africa)         en-ZA         
       English (Zimbabwe)         en-ZW         
       Spanish (Argentina)         es-AR         
       Spanish (Bolivia)         es-BO         
       Spanish (Chile)         es-CL         
       Spanish (Colombia)         es-CO         
       Spanish (Costa Rica)         es-CR         
       Spanish (Dominican Republic)         es-DO         
       Spanish (Ecuador)         es-EC         
       Spanish (Spain)         es-ES         
       Spanish (Guatemala)         es-GT         
       Spanish (Honduras)         es-HN         
       Spanish (Mexico)         es-MX         
       Spanish (Nicaragua)         es-NI         
       Spanish (Panama)         es-PA         
       Spanish (Peru)         es-PE         
       Spanish (Puerto Rico)         es-PR         
       Spanish (Paraguay)         es-PY         
       Spanish (El Salvador)         es-SV         
       Spanish (United States)         es-US         
       Spanish (Uruguay)         es-UY         
       Spanish (Venezuela)         es-VE         
       Estonian (Estonia)         et-EE         
       Basque (Basque)         eu-ES         
       Persian (Iran)         fa-IR         
       Finnish (Finland)         fi-FI         
       Filipino (Philippines)         fil-PH         
       Faroese (Faroe Islands)         fo-FO         
       French (Belgium)         fr-BE         
       French (Canada)         fr-CA         
       French (Switzerland)         fr-CH         
       French (France)         fr-FR         
       French (Luxembourg)         fr-LU         
       French (Monaco)         fr-MC         
       Frisian (Netherlands)         fy-NL         
       Irish (Ireland)         ga-IE         
       Dari (Afghanistan)         gbz-AF         
       Galician (Spain)         gl-ES         
       Alsatian (France)         gsw-FR         
       Gujarati (India)         gu-IN         
       Hausa (Nigeria, Latin)         ha-Latn-NG         
       Hebrew (Israel)         he-IL         
       Hindi (India)         hi-IN         
       Croatian (Bosnia and Herzegovina, Latin)         hr-BA         
       Croatian (Croatia)         hr-HR         
       Hungarian (Hungary)         hu-HU         
       Armenian (Armenia)         hy-AM         
       Indonesian (Indonesia)         id-ID         
       Igbo (Nigeria)         ig-NG         
       Yi (PRC)         ii-CN         
       Icelandic (Iceland)         is-IS         
       Italian (Switzerland)         it-CH         
       Italian (Italy)         it-IT         
       Inuktitut (Canada, Syllabics)         iu-Cans-CA         
       Inuktitut (Canada, Latin)         iu-Latn-CA         
       Japanese (Japan)         ja-JP         
       Georgian (Georgia)         ka-GE         
       Khmer (Cambodia)         kh-KH         
       Kazakh (Kazakhstan)         kk-KZ         
       Greenlandic (Greenland)         kl-GL         
       Kannada (India)         kn-IN         
       Korean (Korea)         ko-KR         
       Konkani (India)         kok-IN         
       Kyrgyz (Kyrgyzstan)         ky-KG         
       Luxembourgish (Luxembourg)         lb-LU         
       Lao (Lao PDR)         lo-LA         
       Lithuanian (Lithuania)         lt-LT         
       Latvian (Latvia)         lv-LV         
       Maori (New Zealand)         mi-NZ         
       Macedonian (Macedonia, FYROM)         mk-MK         
       Malayalam (India)         ml-IN         
       Mongolian (Mongolia)         mn-Cyrl-MN         
       Mongolian (PRC)         mn-Mong-CN         
       Mohawk (Canada)         moh-CA         
       Marathi (India)         mr-IN         
       Malay (Brunei Darussalam)         ms-BN         
       Malay (Malaysia)         ms-MY         
       Maltese (Malta)         mt-MT         
       Norwegian (Bokm?l, Norway)         nb-NO         
       Nepali (India)         ne-IN          --also missing    
       Nepali (Nepal)         ne-NP         
       Dutch (Belgium)         nl-BE         
       Dutch (Netherlands)         nl-NL         
       Norwegian (Nynorsk, Norway)         nn-NO         
       Sesotho sa Leboa/Northern Sotho (South Africa)         ns-ZA         
       Occitan (France)         oc-FR         
       Oriya (India)         or-IN         
       Punjabi (India)         pa-IN         
       Polish (Poland)         pl-PL         
       Pashto (Afghanistan)         ps-AF         
       Portuguese (Brazil)         pt-BR         
       Portuguese (Portugal)         pt-PT         
       K'iche (Guatemala)         qut-GT         
       Quechua (Bolivia)         quz-BO         
       Quechua (Ecuador)         quz-EC         
       Quechua (Peru)         quz-PE         
       Romansh (Switzerland)         rm-CH         
       Romanian (Romania)         ro-RO         
       Russian (Russia)         ru-RU         
       Kinyarwanda (Rwanda)         rw-RW         
       Sanskrit (India)         sa-IN         
       Yakut (Russia)         sah-RU         
       Sami (Northern, Finland)         se-FI         
       Sami (Northern, Norway)         se-NO         
       Sami (Northern, Sweden)         se-SE         
       Sinhala (Sri Lanka)         si-LK         
       Slovak (Slovakia)         sk-SK         
       Slovenian (Slovenia)         sl-SI         
       Sami (Southern, Norway)         sma-NO         
       Sami (Southern, Sweden)         sma-SE         
       Sami (Lule, Norway)         smj-NO         
       Sami (Lule, Sweden)         smj-SE         
       Sami (Inari, Finland)         smn-FI         
       Sami (Skolt, Finland)         sms-FI         
       Albanian (Albania)         sq-AL         
       Serbian (Bosnia and Herzegovina, Cyrillic)         sr-Cyrl-BA         
       Serbian (Serbia and Montenegro, Cyrillic)         sr-Cyrl-CS         
       Serbian (Bosnia and Herzegovina, Latin)         sr-Latn-BA         
       Serbian (Serbia and Montenegro, Latin)         sr-Latn-CS         
       Swedish (Finland)         sv-FI         
       Swedish (Sweden)         sv-SE         
       Swahili (Kenya)         sw-KE         
       Syriac (Syria)         syr-SY         
       Tamil (India)         ta-IN         
       Telugu (India)         te-IN         
       Tajik (Tajikistan)         tg-Cyrl-TJ         
       Thai (Thailand)         th-TH         
       Turkmen (Turkmenistan)         tk-TM         
       Tamazight (Algeria, Latin)         tmz-Latn-DZ         
       Setswana/Tswana (South Africa)         tn-ZA         
       Urdu (India)         tr-IN         
       Turkish (Turkey)         tr-TR         
       Tatar (Russia)         tt-RU         
       Uighur (PRC)         ug-CN         
       Ukrainian (Ukraine)         uk-UA         
       Urdu (Pakistan)         ur-PK         
       Uzbek (Uzbekistan, Cyrillic)         uz-Cyrl-UZ         
       Uzbek (Uzbekistan, Latin)         uz-Latn-UZ         
       Vietnamese (Vietnam)         vi-VN         
       Upper Sorbian (Germany)         wen-DE         
       Wolof (Senegal)         wo-SN         
       Xhosa/isiXhosa (South Africa)         xh-ZA         
       Yoruba (Nigeria)         yo-NG         
       Chinese (PRC)         zh-CN         
       Chinese (Hong Kong SAR, PRC)         zh-HK         
       Chinese (Macao SAR)         zh-MO         
       Chinese (Singapore)         zh-SG         
       Chinese (Taiwan)         zh-TW         
       Zulu/isiZulu (South Africa)         zu-ZA