Difference between revisions of "Internationalization/file format"

Revision as of 13:37, 28 April 2006

1 Summary
2 PO Files
- 2.1 Format of PO files
  - 2.1.1 Normal entry
  - 2.1.2 Plural form entry
- 2.2 Supported character encodings

Summary

Here we evaluate various file formats used for the translation of programas. For the moment we are considering:

XML
po
create an own format

PO Files

Format of PO files

A PO file has an entry for each string that has to be translated. There are two kind of them, a "normal" one and one that involves plural forms.

Normal entry

Here is the general structure of a "normal" entry:

white-space
#  translator-comments
#. automatic-comments
#: reference...
#, flag...
msgid untranslated-string
msgstr translated-string

Where the translator-comments are created and maintained exclusively by the translator, this comments have some white space immediately following the #. The other comments are created by the program that created the PO file. After the special comment "#," there can be some flags, as fuzzy shows that the msgstr string might not be a correct translation, i.e. the translator is not sure of his work. The 'untranslated-string' is the untranslated string as it appears in the original program source. The translated-string is (as the name suggests) the translated string, if there is no translation it is an empty string.

Plural form entry

white-space
#  translator-comments
#. automatic-comments
#: reference...
#, flag...
msgid untranslated-string-singular
msgid_plural untranslated-string-plural
msgstr[0] translated-string-case-0
...
msgstr[N] translated-string-case-n

Supported character encodings

character encodings that can be used are limited to those supported by both GNU libc and GNU libiconv. These are: ASCII, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-13, ISO-8859-15, KOI8-R, KOI8-U, CP850, CP866, CP874, CP932, CP949, CP950, CP1250, CP1251, CP1252, CP1253, CP1254, CP1255, CP1256, CP1257, GB2312, EUC-JP, EUC-KR, EUC-TW, BIG5, BIG5-HKSCS, GBK, GB18030, SHIFT_JIS, JOHAB, TIS-620, VISCII, UTF-8.

I thing they are a lot...

Retrieved from "https://dev.eiffel.com/index.php?title=Internationalization/file_format&oldid=2334"

Difference between revisions of "Internationalization/file format"

Revision as of 13:37, 28 April 2006

Contents

Summary

PO Files

Format of PO files

Normal entry

Plural form entry

Supported character encodings

Navigation

Development

Wiki

Search

Tools

@@ Line 1: / Line 1: @@
-Here we will add the result of our research on the file formats ( of the files holding translated strings). Look [http://say.uaz.ch/wiki/tiki-index.php?page=compare+xml+po here] for a "preview".
+==Summary==
+Here we evaluate various file formats used for the translation of programas. For the moment we are considering:
+* XML
+* po
+* create an own format
+==PO Files==
+===Format of PO files===
+A PO file has an entry for each string that has to be translated. There are two kind of them, a "normal" one and one that involves plural forms.
+====Normal entry====
+Here is the general structure of a "normal" entry:
+ white-space
+ #  translator-comments
+ #. automatic-comments
+ #: reference...
+ #, flag...
+ msgid untranslated-string
+ msgstr translated-string
+Where the ''translator-comments'' are created and maintained exclusively by the translator, this comments have some white space immediately following the #. The other comments are created by the program that created the PO file.
+After the special comment "#," there can be some ''flags'', as ''fuzzy'' shows that the msgstr string might not be a correct translation, i.e. the translator is not sure of his work.
+The 'untranslated-string' is the untranslated string as it appears in the original program source. The ''translated-string'' is (as the name suggests) the translated string, if there is no translation it is an empty string.
+====Plural form entry====
+ white-space
+ #  translator-comments
+ #. automatic-comments
+ #: reference...
+ #, flag...
+ msgid untranslated-string-singular
+ msgid_plural untranslated-string-plural
+ msgstr[0] translated-string-case-0
+ ...
+ msgstr[N] translated-string-case-n
+===Supported character encodings===
+character encodings that can be used are limited to those supported by both GNU libc and GNU libiconv. These are: ASCII, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-13, ISO-8859-15, KOI8-R, KOI8-U, CP850, CP866, CP874, CP932, CP949, CP950, CP1250, CP1251, CP1252, CP1253, CP1254, CP1255, CP1256, CP1257, GB2312, EUC-JP, EUC-KR, EUC-TW, BIG5, BIG5-HKSCS, GBK, GB18030, SHIFT_JIS, JOHAB, TIS-620, VISCII, UTF-8.
+I thing they are a lot...