Difference between revisions of "Internationalization/file format"

Revision as of 14:42, 28 April 2006

1 Summary
2 PO Files
3 XML
- 3.1 Positive aspects
- 3.2 Negative aspects
4 New Format
- 4.1 Positive aspects
- 4.2 Negative aspects
5 Conclusions
6 References

Summary

Here we evaluate various file formats used for the translation of programas. For the moment we are considering:

XML
po
create an own format

PO Files

Format of PO files

A PO file has an entry for each string that has to be translated. There are two kind of them, a "normal" one and one that involves plural forms.

Normal entry

Here is the general structure of a "normal" entry:

white-space
#  translator-comments
#. automatic-comments
#: reference...
#, flag...
msgid untranslated-string
msgstr translated-string

Where the translator-comments are created and maintained exclusively by the translator, this comments have some white space immediately following the #. The other comments are created by the program that created the PO file. After the special comment "#," there can be some flags, as fuzzy shows that the msgstr string might not be a correct translation, i.e. the translator is not sure of his work. The 'untranslated-string' is the untranslated string as it appears in the original program source. The translated-string is (as the name suggests) the translated string, if there is no translation it is an empty string.

Plural form entry

white-space
#  translator-comments
#. automatic-comments
#: reference...
#, flag...
msgid untranslated-string-singular
msgid_plural untranslated-string-plural
msgstr[0] translated-string-case-0
...
msgstr[N] translated-string-case-n

Supported character encodings

character encodings that can be used are limited to those supported by both GNU libc and GNU libiconv. These are: ASCII, ISO-8859-1, ISO-8859-2, ISO-8859-3, ISO-8859-4, ISO-8859-5, ISO-8859-6, ISO-8859-7, ISO-8859-8, ISO-8859-9, ISO-8859-13, ISO-8859-15, KOI8-R, KOI8-U, CP850, CP866, CP874, CP932, CP949, CP950, CP1250, CP1251, CP1252, CP1253, CP1254, CP1255, CP1256, CP1257, GB2312, EUC-JP, EUC-KR, EUC-TW, BIG5, BIG5-HKSCS, GBK, GB18030, SHIFT_JIS, JOHAB, TIS-620, VISCII, UTF-8.

I thing they are a lot...

Positive aspects

Powerful plural handling
Format created for translation purpose
Easy for humans to read
Used by gettext, kbabel, rosetta and many other programs

Negative aspects

XML

Positive aspects

Negative aspects

Not every body knows it
Microsoft seeks XML-related patents, that could restrict the use of XML (there should be a "Very negative aspect" section)

New Format

Positive aspects

Free to do what we want

Negative aspects

A new format? Why should we be different?

Conclusions

References

Retrieved from "https://dev.eiffel.com/index.php?title=Internationalization/file_format&oldid=2348"

Revision as of 14:42, 28 April 2006 (view source) Etienner (Talk \| contribs) m (→‎References) ← Older edit		Revision as of 14:42, 28 April 2006 (view source) Etienner (Talk \| contribs) (→‎References) Newer edit →
Line 85:		Line 85:

	* [http://www.gnu.org/software/gettext/manual/html_mono/gettext.html\| Gettext manual (for PO files)]		* [http://www.gnu.org/software/gettext/manual/html_mono/gettext.html\| Gettext manual (for PO files)]
−	* [http://news.com.com/2100-1013_3-5146581.html \| Microsoft and XML]	+	* [http://news.com.com/2100-1013_3-5146581.html Microsoft and XML]

Difference between revisions of "Internationalization/file format"

Revision as of 14:42, 28 April 2006

Contents

Summary

PO Files

Format of PO files

Normal entry

Plural form entry

Supported character encodings

Positive aspects

Negative aspects

XML

Positive aspects

Negative aspects

New Format

Positive aspects

Negative aspects

Conclusions

References

Navigation

Development

Wiki

Search

Tools