Difference between revisions of "Internationalization/file format"

Latest revision as of 00:15, 4 September 2006

Summary

A bref description of the most important file formats used for the translation of programs.

PO Files

Format of PO files

A PO file has an entry for each string that has to be translated. There are two kind of them, a "normal" one and one that involves plural forms.

Normal entry

Here is the general structure of a "normal" entry:

white-space
#  translator-comments
#. extracted-comments
#: references...
#, flag...
msgid untranslated-string
msgstr translated-string

Where the translator-comments are created and maintained exclusively by the translator, this comments have some white space immediately following the #. The other comments are created by the program that created the PO file. References are space separated lists of locations (sourcefile:linenumber) specifying where the translation unit is found in a source file. After the special comment "#," there can be some flags, as fuzzy shows that the msgstr string might not be a correct translation, i.e. the translator is not sure of his work. The 'untranslated-string' is the untranslated string as it appears in the original program source. The translated-string is (as the name suggests) the translated string, if there is no translation it is an empty string.

Plural form entry

white-space
#  translator-comments
#. automatic-comments
#: reference...
#, flag...
msgid untranslated-string-singular
msgid_plural untranslated-string-plural
msgstr[0] translated-string-case-0
...
msgstr[N] translated-string-case-n

Supported character encodings

character encodings that can be used are limited to those supported by both GNU libc and GNU libiconv. These are: ASCII, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-13, ISO-8859-15, KOI8-R, KOI8-U, CP850, CP866, CP874, CP932, CP949, CP950, CP1250, CP1251, CP1252, CP1253, CP1254, CP1255, CP1256, CP1257, GB2312, EUC-JP, EUC-KR, EUC-TW, BIG5, BIG5-HKSCS, GBK, GB18030, SHIFT_JIS, JOHAB, TIS-620, VISCII, UTF-8.

Po Editors

poEdit
KBabel
Gtranslator
LocFactoryEditor (XLIFF and PO editor for Mac OSX)

Positive aspects

Powerful plural handling
Format created for translation purpose
Easy for humans to read
Used by gettext, kbabel, rosetta and many other programs
Support and elaboration tools for almost all plattforms

Negative aspects

The PO file format does not provide a way of identifying the source and target language within a file. By GNU standards, GNU software is written in American English (en-US), and this is reflected in Gettext by only having support for Germanic plural forms in the source language. It is therefore recommended to set the source-language attribute to en-US by default.

ts Files

Format of ts files

The .ts file format is used Trolltech for the QT applications. They are XML conforming files. Here an example of a .ts file, generated by lupdate (a tool made by Trolltech that extracts translatable text from the C++ source code of the Qt application, see here for further information):

<!DOCTYPE TS><TS>
    <context>
        <name>MyExample</name>
        <message>
            <source>i18n=Internationalization</source>
            <translation type="unfinished"></translation>
        </message>
    </context>
</TS>

And after the translation (for example with Qt Linguist) it would look like this:

<!DOCTYPE TS><TS>
    <context>
        <name>MyExample</name>
        <message>
            <source>i18n=Internationalization</source>
            <translation>i20e=Internazionalizzazione</translation>
        </message>
    </context>
</TS>

The .ts file is than converted to the .qm file format, a compact binary format that provides extremely fast lookups for translations, with a tool named lrelease.

The creation of .qm files can also be done with the GNU gettext tools: with "xgettext --qt" as string extractor for producing the .pot file. And then convert the translated file (.po) with the "msgfmt --qt" command for creating the .qm files.

ts Editors

QT Linguistic

Positive aspects

"full" support for unicode character encodings
In trolltech's opinion it's a human readable text

Negative aspects

QT's translation framework does not support plurals
Qt message catalog format supports Unicode only in the translated strings, not in the untranslated strings

xliff Files

Format of XLIFF files

XLIFF is the XML Localization Interchange File Format. It is intended to give any software provider a single interchange file format that can be understood by any localization provider.

You can find a XLIFF Tree Structure here

ts Editors

Positive aspects

OASIS standard

Negative aspects

complicated plural form handling

References

Retrieved from "https://dev.eiffel.com/index.php?title=Internationalization/file_format&oldid=4451"

Category:

Internationalization

@@ Line 1: / Line 1: @@
-==Summary==
+[[Category:Internationalization]]
-Here [[Internationalization|we]] evaluate various file formats used for the translation of programs. For the moment we are considering:
+==Summary==
-* XML
+A bref description of the most important file formats used for the translation of programs.
-* po
-* [http://transolution.python-hosting.com xliff] (good description in this homepage)
-* create an own format
 ==PO Files==
@@ Line 20: / Line 17: @@
   white-space
   #  translator-comments
-  #. automatic-comments
+  #. extracted-comments
-  #: reference...
+  #: references...
   #, flag...
   msgid untranslated-string
@@ Line 27: / Line 24: @@
 Where the ''translator-comments'' are created and maintained exclusively by the translator, this comments have some white space immediately following the #. The other comments are created by the program that created the PO file.
+''References'' are space separated lists of locations (sourcefile:linenumber) specifying where the translation unit is found in a source file.
 After the special comment "#," there can be some ''flags'', as ''fuzzy'' shows that the msgstr string might not be a correct translation, i.e. the translator is not sure of his work.
 The 'untranslated-string' is the untranslated string as it appears in the original program source. The ''translated-string'' is (as the name suggests) the translated string, if there is no translation it is an empty string.
@@ Line 46: / Line 44: @@
 character encodings that can be used are limited to those supported by both GNU libc and GNU libiconv. These are: ASCII, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-13, ISO-8859-15, KOI8-R, KOI8-U, CP850, CP866, CP874, CP932, CP949, CP950, CP1250, CP1251, CP1252, CP1253, CP1254, CP1255, CP1256, CP1257, GB2312, EUC-JP, EUC-KR, EUC-TW, BIG5, BIG5-HKSCS, GBK, GB18030, SHIFT_JIS, JOHAB, TIS-620, VISCII, UTF-8.
-I think they are a lot...
 ===Po Editors===
@@ Line 62: / Line 58: @@
 * Easy for humans to read
 * Used by gettext, kbabel, rosetta and many other programs
+* Support and elaboration tools for almost all plattforms
 ===Negative aspects===
-* We have to write a PO parser
+* The PO file format does not provide a way of identifying the source and target language within a file. By GNU standards, GNU software is written in American English (en-US), and this is reflected in Gettext by only having support for Germanic plural forms in the source language. It is therefore recommended to set the source-language attribute to en-US by default.
-==XML==
+==ts Files==
-===Format of XML===
+===Format of ts files===
-XML is used for example by Trolltech for their .ts files. Here an example of .ts file, generated my lupdate (a tool made by trolltech that extracts translatable text from the C++ source code of the Qt application, see [[Internationalization/tool evaluation|here]] for further information):
+The .ts file format is used Trolltech for the QT applications. They are XML conforming files. Here an example of a .ts file, generated by lupdate (a tool made by Trolltech that extracts translatable text from the C++ source code of the Qt application, see [[Internationalization/tool evaluation|here]] for further information):
   <!DOCTYPE TS><TS>
@@ Line 94: / Line 91: @@
       </context>
   </TS>
+The .ts file is than converted to the .qm file format, a compact binary format that provides extremely fast lookups for translations, with a tool named lrelease.
+The creation of .qm files can also be done with the GNU gettext tools: with "xgettext --qt" as string extractor for producing the .pot file. And then convert the translated file (.po) with the "msgfmt --qt" command for creating the .qm files.
+=== ts Editors ===
+*QT Linguistic
 ===Positive aspects===
-* full support for unicode character encodings
+* "full" support for unicode character encodings
-* There is already a parser in the EiffelBase
 * In trolltech's opinion it's a human readable text
 ===Negative aspects===
-* Not everybody knows it
+* QT's translation framework does not support plurals
-* Microsoft seeks XML-related patents that could restrict the use of XML (there should be a "Very negative aspect" section)
+* Qt message catalog format supports Unicode only in the translated strings, not in the untranslated strings
-* In my opinion it's not a human readable text (Fortunately not all human beings are Computer scientists)
-==New Format==
+== xliff Files ==
-===Format of our Format===
+===Format of XLIFF files===
-* It doesn't exist yet, so we don't know how it looks like.
+XLIFF is the XML Localization Interchange File Format. It is intended to give any software provider a single interchange file format that can be understood by any localization provider.
-* We could give our own extension to the file format for example .et (eiffel translation) or .babe (babylon eiffel) or .eint (eiffel i18n) ...
+You can find a XLIFF Tree Structure [http://www.oasis-open.org/committees/xliff/documents/xliff-specification.htm#AppTree here]
+=== ts Editors ===
+*[http://www.heartsome.net/EN/xlfedit.html heartsome]
+*[https://open-language-tools.dev.java.net/editor/about-xliff-editor.html XLIFF Translation Editor]
 ===Positive aspects===
-* Free to do what we want
+* OASIS standard
 ===Negative aspects===
-* A new format? Why should we be different?
+* complicated plural form handling
-==Conclusions==
 ==References==

Difference between revisions of "Internationalization/file format"

Latest revision as of 00:15, 4 September 2006

Contents

Summary

PO Files

Format of PO files

Normal entry

Plural form entry

Supported character encodings

Po Editors

Positive aspects

Negative aspects

ts Files

Format of ts files

ts Editors

Positive aspects

Negative aspects

xliff Files

Format of XLIFF files

ts Editors

Positive aspects

Negative aspects

References

Navigation

Development

Wiki

Search

Tools