Difference between revisions of "Internationalization/file format"
(moved comment to talk page) |
m (first steps) |
||
Line 1: | Line 1: | ||
− | Here we | + | ==Summary== |
+ | |||
+ | Here we evaluate various file formats used for the translation of programas. For the moment we are considering: | ||
+ | |||
+ | * XML | ||
+ | * po | ||
+ | * create an own format | ||
+ | |||
+ | ==PO Files== | ||
+ | |||
+ | ===Format of PO files=== | ||
+ | |||
+ | A PO file has an entry for each string that has to be translated. There are two kind of them, a "normal" one and one that involves plural forms. | ||
+ | |||
+ | ====Normal entry==== | ||
+ | |||
+ | Here is the general structure of a "normal" entry: | ||
+ | |||
+ | white-space | ||
+ | # translator-comments | ||
+ | #. automatic-comments | ||
+ | #: reference... | ||
+ | #, flag... | ||
+ | msgid untranslated-string | ||
+ | msgstr translated-string | ||
+ | |||
+ | Where the ''translator-comments'' are created and maintained exclusively by the translator, this comments have some white space immediately following the #. The other comments are created by the program that created the PO file. | ||
+ | After the special comment "#," there can be some ''flags'', as ''fuzzy'' shows that the msgstr string might not be a correct translation, i.e. the translator is not sure of his work. | ||
+ | The 'untranslated-string' is the untranslated string as it appears in the original program source. The ''translated-string'' is (as the name suggests) the translated string, if there is no translation it is an empty string. | ||
+ | |||
+ | ====Plural form entry==== | ||
+ | |||
+ | white-space | ||
+ | # translator-comments | ||
+ | #. automatic-comments | ||
+ | #: reference... | ||
+ | #, flag... | ||
+ | msgid untranslated-string-singular | ||
+ | msgid_plural untranslated-string-plural | ||
+ | msgstr[0] translated-string-case-0 | ||
+ | ... | ||
+ | msgstr[N] translated-string-case-n | ||
+ | |||
+ | ===Supported character encodings=== | ||
+ | |||
+ | character encodings that can be used are limited to those supported by both GNU libc and GNU libiconv. These are: ASCII, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-13, ISO-8859-15, KOI8-R, KOI8-U, CP850, CP866, CP874, CP932, CP949, CP950, CP1250, CP1251, CP1252, CP1253, CP1254, CP1255, CP1256, CP1257, GB2312, EUC-JP, EUC-KR, EUC-TW, BIG5, BIG5-HKSCS, GBK, GB18030, SHIFT_JIS, JOHAB, TIS-620, VISCII, UTF-8. | ||
+ | |||
+ | I thing they are a lot... |
Revision as of 05:37, 28 April 2006
Contents
Summary
Here we evaluate various file formats used for the translation of programas. For the moment we are considering:
- XML
- po
- create an own format
PO Files
Format of PO files
A PO file has an entry for each string that has to be translated. There are two kind of them, a "normal" one and one that involves plural forms.
Normal entry
Here is the general structure of a "normal" entry:
white-space # translator-comments #. automatic-comments #: reference... #, flag... msgid untranslated-string msgstr translated-string
Where the translator-comments are created and maintained exclusively by the translator, this comments have some white space immediately following the #. The other comments are created by the program that created the PO file. After the special comment "#," there can be some flags, as fuzzy shows that the msgstr string might not be a correct translation, i.e. the translator is not sure of his work. The 'untranslated-string' is the untranslated string as it appears in the original program source. The translated-string is (as the name suggests) the translated string, if there is no translation it is an empty string.
Plural form entry
white-space # translator-comments #. automatic-comments #: reference... #, flag... msgid untranslated-string-singular msgid_plural untranslated-string-plural msgstr[0] translated-string-case-0 ... msgstr[N] translated-string-case-n
Supported character encodings
character encodings that can be used are limited to those supported by both GNU libc and GNU libiconv. These are: ASCII, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-13, ISO-8859-15, KOI8-R, KOI8-U, CP850, CP866, CP874, CP932, CP949, CP950, CP1250, CP1251, CP1252, CP1253, CP1254, CP1255, CP1256, CP1257, GB2312, EUC-JP, EUC-KR, EUC-TW, BIG5, BIG5-HKSCS, GBK, GB18030, SHIFT_JIS, JOHAB, TIS-620, VISCII, UTF-8.
I thing they are a lot...