Difference between revisions of "Internationalization/file format"
(moved comment to talk page) |
m |
||
(48 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
− | Here | + | [[Category:Internationalization]] |
+ | |||
+ | ==Summary== | ||
+ | |||
+ | A bref description of the most important file formats used for the translation of programs. | ||
+ | |||
+ | ==PO Files== | ||
+ | |||
+ | ===Format of PO files=== | ||
+ | |||
+ | A PO file has an entry for each string that has to be translated. There are two kind of them, a "normal" one and one that involves plural forms. | ||
+ | |||
+ | ====Normal entry==== | ||
+ | |||
+ | Here is the general structure of a "normal" entry: | ||
+ | |||
+ | white-space | ||
+ | # translator-comments | ||
+ | #. extracted-comments | ||
+ | #: references... | ||
+ | #, flag... | ||
+ | msgid untranslated-string | ||
+ | msgstr translated-string | ||
+ | |||
+ | Where the ''translator-comments'' are created and maintained exclusively by the translator, this comments have some white space immediately following the #. The other comments are created by the program that created the PO file. | ||
+ | ''References'' are space separated lists of locations (sourcefile:linenumber) specifying where the translation unit is found in a source file. | ||
+ | After the special comment "#," there can be some ''flags'', as ''fuzzy'' shows that the msgstr string might not be a correct translation, i.e. the translator is not sure of his work. | ||
+ | The 'untranslated-string' is the untranslated string as it appears in the original program source. The ''translated-string'' is (as the name suggests) the translated string, if there is no translation it is an empty string. | ||
+ | |||
+ | ====Plural form entry==== | ||
+ | |||
+ | white-space | ||
+ | # translator-comments | ||
+ | #. automatic-comments | ||
+ | #: reference... | ||
+ | #, flag... | ||
+ | msgid untranslated-string-singular | ||
+ | msgid_plural untranslated-string-plural | ||
+ | msgstr[0] translated-string-case-0 | ||
+ | ... | ||
+ | msgstr[N] translated-string-case-n | ||
+ | |||
+ | ===Supported character encodings=== | ||
+ | |||
+ | character encodings that can be used are limited to those supported by both GNU libc and GNU libiconv. These are: ASCII, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-13, ISO-8859-15, KOI8-R, KOI8-U, CP850, CP866, CP874, CP932, CP949, CP950, CP1250, CP1251, CP1252, CP1253, CP1254, CP1255, CP1256, CP1257, GB2312, EUC-JP, EUC-KR, EUC-TW, BIG5, BIG5-HKSCS, GBK, GB18030, SHIFT_JIS, JOHAB, TIS-620, VISCII, UTF-8. | ||
+ | |||
+ | ===Po Editors=== | ||
+ | |||
+ | * poEdit | ||
+ | * KBabel | ||
+ | * Gtranslator | ||
+ | * LocFactoryEditor (XLIFF and PO editor for Mac OSX) | ||
+ | |||
+ | ===Positive aspects=== | ||
+ | |||
+ | * Powerful plural handling | ||
+ | * Format created for translation purpose | ||
+ | * Easy for humans to read | ||
+ | * Used by gettext, kbabel, rosetta and many other programs | ||
+ | * Support and elaboration tools for almost all plattforms | ||
+ | |||
+ | ===Negative aspects=== | ||
+ | |||
+ | * The PO file format does not provide a way of identifying the source and target language within a file. By GNU standards, GNU software is written in American English (en-US), and this is reflected in Gettext by only having support for Germanic plural forms in the source language. It is therefore recommended to set the source-language attribute to en-US by default. | ||
+ | |||
+ | ==ts Files== | ||
+ | |||
+ | ===Format of ts files=== | ||
+ | |||
+ | The .ts file format is used Trolltech for the QT applications. They are XML conforming files. Here an example of a .ts file, generated by lupdate (a tool made by Trolltech that extracts translatable text from the C++ source code of the Qt application, see [[Internationalization/tool evaluation|here]] for further information): | ||
+ | |||
+ | <!DOCTYPE TS><TS> | ||
+ | <context> | ||
+ | <name>MyExample</name> | ||
+ | <message> | ||
+ | <source>i18n=Internationalization</source> | ||
+ | <translation type="unfinished"></translation> | ||
+ | </message> | ||
+ | </context> | ||
+ | </TS> | ||
+ | |||
+ | And after the translation (for example with Qt Linguist) it would look like this: | ||
+ | |||
+ | <!DOCTYPE TS><TS> | ||
+ | <context> | ||
+ | <name>MyExample</name> | ||
+ | <message> | ||
+ | <source>i18n=Internationalization</source> | ||
+ | <translation>i20e=Internazionalizzazione</translation> | ||
+ | </message> | ||
+ | </context> | ||
+ | </TS> | ||
+ | |||
+ | The .ts file is than converted to the .qm file format, a compact binary format that provides extremely fast lookups for translations, with a tool named lrelease. | ||
+ | |||
+ | The creation of .qm files can also be done with the GNU gettext tools: with "xgettext --qt" as string extractor for producing the .pot file. And then convert the translated file (.po) with the "msgfmt --qt" command for creating the .qm files. | ||
+ | |||
+ | === ts Editors === | ||
+ | |||
+ | *QT Linguistic | ||
+ | |||
+ | ===Positive aspects=== | ||
+ | |||
+ | * "full" support for unicode character encodings | ||
+ | * In trolltech's opinion it's a human readable text | ||
+ | |||
+ | ===Negative aspects=== | ||
+ | |||
+ | * QT's translation framework does not support plurals | ||
+ | * Qt message catalog format supports Unicode only in the translated strings, not in the untranslated strings | ||
+ | |||
+ | == xliff Files == | ||
+ | |||
+ | ===Format of XLIFF files=== | ||
+ | |||
+ | XLIFF is the XML Localization Interchange File Format. It is intended to give any software provider a single interchange file format that can be understood by any localization provider. | ||
+ | |||
+ | You can find a XLIFF Tree Structure [http://www.oasis-open.org/committees/xliff/documents/xliff-specification.htm#AppTree here] | ||
+ | |||
+ | === ts Editors === | ||
+ | |||
+ | *[http://www.heartsome.net/EN/xlfedit.html heartsome] | ||
+ | *[https://open-language-tools.dev.java.net/editor/about-xliff-editor.html XLIFF Translation Editor] | ||
+ | |||
+ | ===Positive aspects=== | ||
+ | |||
+ | * OASIS standard | ||
+ | |||
+ | ===Negative aspects=== | ||
+ | |||
+ | * complicated plural form handling | ||
+ | |||
+ | ==References== | ||
+ | |||
+ | * [http://www.gnu.org/software/gettext/manual/html_mono/gettext.html Gettext manual (for PO files)] | ||
+ | * [http://librarian.launchpad.net/2395419/it.po Example of a PO file (from rosetta)] | ||
+ | * [http://news.com.com/2100-1013_3-5146581.html Microsoft and XML] | ||
+ | * [http://en.wikipedia.org/wiki/XML#Quick_syntax_tour XML syntax description on Wikipedia] | ||
+ | * [http://doc.trolltech.com/4.1/i18n.html Trolltech i18n] | ||
+ | * [http://translate.sourceforge.net/wiki/ open source i18n and l10n project] |
Latest revision as of 23:15, 3 September 2006
Summary
A bref description of the most important file formats used for the translation of programs.
PO Files
Format of PO files
A PO file has an entry for each string that has to be translated. There are two kind of them, a "normal" one and one that involves plural forms.
Normal entry
Here is the general structure of a "normal" entry:
white-space # translator-comments #. extracted-comments #: references... #, flag... msgid untranslated-string msgstr translated-string
Where the translator-comments are created and maintained exclusively by the translator, this comments have some white space immediately following the #. The other comments are created by the program that created the PO file. References are space separated lists of locations (sourcefile:linenumber) specifying where the translation unit is found in a source file. After the special comment "#," there can be some flags, as fuzzy shows that the msgstr string might not be a correct translation, i.e. the translator is not sure of his work. The 'untranslated-string' is the untranslated string as it appears in the original program source. The translated-string is (as the name suggests) the translated string, if there is no translation it is an empty string.
Plural form entry
white-space # translator-comments #. automatic-comments #: reference... #, flag... msgid untranslated-string-singular msgid_plural untranslated-string-plural msgstr[0] translated-string-case-0 ... msgstr[N] translated-string-case-n
Supported character encodings
character encodings that can be used are limited to those supported by both GNU libc and GNU libiconv. These are: ASCII, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-13, ISO-8859-15, KOI8-R, KOI8-U, CP850, CP866, CP874, CP932, CP949, CP950, CP1250, CP1251, CP1252, CP1253, CP1254, CP1255, CP1256, CP1257, GB2312, EUC-JP, EUC-KR, EUC-TW, BIG5, BIG5-HKSCS, GBK, GB18030, SHIFT_JIS, JOHAB, TIS-620, VISCII, UTF-8.
Po Editors
- poEdit
- KBabel
- Gtranslator
- LocFactoryEditor (XLIFF and PO editor for Mac OSX)
Positive aspects
- Powerful plural handling
- Format created for translation purpose
- Easy for humans to read
- Used by gettext, kbabel, rosetta and many other programs
- Support and elaboration tools for almost all plattforms
Negative aspects
- The PO file format does not provide a way of identifying the source and target language within a file. By GNU standards, GNU software is written in American English (en-US), and this is reflected in Gettext by only having support for Germanic plural forms in the source language. It is therefore recommended to set the source-language attribute to en-US by default.
ts Files
Format of ts files
The .ts file format is used Trolltech for the QT applications. They are XML conforming files. Here an example of a .ts file, generated by lupdate (a tool made by Trolltech that extracts translatable text from the C++ source code of the Qt application, see here for further information):
<!DOCTYPE TS><TS> <context> <name>MyExample</name> <message> <source>i18n=Internationalization</source> <translation type="unfinished"></translation> </message> </context> </TS>
And after the translation (for example with Qt Linguist) it would look like this:
<!DOCTYPE TS><TS> <context> <name>MyExample</name> <message> <source>i18n=Internationalization</source> <translation>i20e=Internazionalizzazione</translation> </message> </context> </TS>
The .ts file is than converted to the .qm file format, a compact binary format that provides extremely fast lookups for translations, with a tool named lrelease.
The creation of .qm files can also be done with the GNU gettext tools: with "xgettext --qt" as string extractor for producing the .pot file. And then convert the translated file (.po) with the "msgfmt --qt" command for creating the .qm files.
ts Editors
- QT Linguistic
Positive aspects
- "full" support for unicode character encodings
- In trolltech's opinion it's a human readable text
Negative aspects
- QT's translation framework does not support plurals
- Qt message catalog format supports Unicode only in the translated strings, not in the untranslated strings
xliff Files
Format of XLIFF files
XLIFF is the XML Localization Interchange File Format. It is intended to give any software provider a single interchange file format that can be understood by any localization provider.
You can find a XLIFF Tree Structure here
ts Editors
Positive aspects
- OASIS standard
Negative aspects
- complicated plural form handling