Difference between revisions of "Talk:EiffelStudio Internationalization"

(Removed incorrect insertion.)
 
(16 intermediate revisions by 3 users not shown)
Line 24: Line 24:
  
 
'''--[[User:Patrickr|Patrickr]] 17:13, 13 November 2006 (CET)'''
 
'''--[[User:Patrickr|Patrickr]] 17:13, 13 November 2006 (CET)'''
 +
 
On Unix there are rules where which part of an application belongs to. Standardising this locations makes various things easier.
 
On Unix there are rules where which part of an application belongs to. Standardising this locations makes various things easier.
 +
I also think changing the language during runtime is not necessary.
 +
 +
== Dont use UTF-16 ==
 +
 +
UTF-16 is an abomination - it should never be used.
 +
--[[User:Colin-adams|Colin-adams]] 09:03, 1 August 2007 (CEST)
 +
 +
I agree. -- [[User:Ted|Ted]] 08:04, 2 July 2008 (PDT)
 +
 +
== Auto-encoding detection ==
 +
 +
Do you have a scheme in mind?
 +
 +
This is, in general, impossible, but special circumstances can make it tractable. The starting possibilities
 +
for an Eiffel source text are quite limited, so at first glance it looks possible.
 +
 +
However, I note that if an Eiffel source text uses ASCII characters for everything except the contents of STRING_8 literals, and no STRING_32 or CHARACTER_32 literals are present, then it will be impossible to distinguish between ISO-8859-1 (or most other subsets of ISO-8859) and UTF-8, unless the UTF-8 file starts with a BOM. But the latter practise is reprehensible, and many editors do not support it.
 +
 +
--[[User:Colin-adams|Colin-adams]] 09:09, 1 August 2007 (CEST)
 +
 +
Yes, one can not tell the encoding accurately. I thought this could be possible done by combination of means. I knew firefox and IE has encoding detection library, but they seem only give the most possible results, not accurate ones.
 +
This part maybe implemented as just "encoding detection" in the end.
 +
--[[User:Ted|Ted]] 08:13, 2 July 2008 (PDT)
 +
 +
== Notes term ==
 +
 +
The problem becomes much simpler if you add the following restriction:
 +
 +
Source codings other than ISO-646 (US-ASCII) and ISO-8859-1 (Latin-1) are only allowed if the class contains an encoding term in the notes (indexing, in pre-ECMA) clause whose value names the encoding.
 +
 +
So a UTF-8 source file would start something like:
 +
 +
<eiffel>
 +
indexing
 +
 +
description: "My latest class writen in Unicode"
 +
encoding: "UTF-8"
 +
 +
class MY_LATEST
 +
 +
end
 +
</eiffel>
 +
--[[User:Colin-adams|Colin-adams]] 09:17, 1 August 2007 (CEST)
 +
 +
This is one way we are thinking of. But ideally this needs parsing. Maybe specifying encoding as compiler argument is good enough, or make it into .ecf files.
 +
--[[User:Ted|Ted]] 08:18, 2 July 2008 (PDT)
 +
 +
== All XML files support all of Unicode ==
 +
 +
The line that says ECF files, etc., will need a Unicode encoding is not true. All XML files, no matter what their encoding, support the entire Unicode character set.
 +
--[[User:Colin-adams|Colin-adams]] 22:41, 1 July 2008 (PDT)
 +
 +
What I mean is specific to EiffelStudio in which we only put iso-8859-1 as encoding of ecfs. This char set is not sufficient, since users will be able to put any Unicode chars in project settings, for example, descriptions. And other internal implementations using XML, like diagram storage, metrics storage may not even take encoding into account. These parts need to be adapted. --[[User:Ted|Ted]] 08:25, 2 July 2008 (PDT)
 +
 +
One thing I am not so sure is whether Gobo XML parser has supported encodings rather than Unicode ones. For example, GB2312. (This implies ability to convert between GB2312 and UTF8 etc.) --[[User:Ted|Ted]] 08:29, 2 July 2008 (PDT)
 +
 +
It IS sufficient.
 +
No matter what the encoding, all Unicode characters (within the limits of the XML version - which means 1.0 for Gobo) can be represented. That is what character references are for. E.g. &#331; is outside the range of Latin-1, but you can still specify it in an XML file with encoding="ISO8859-1".
 +
And no, Gobo does not support GB2312, but it does support UTF-8 (all XML parsers must support UTF-8 and UTF-16). But this is not very relevant.
 +
--[[User:Colin-adams|Colin-adams]] 07:13, 3 July 2008 (PDT)
 +
 +
I see what you mean. It is good to know Gobo takes them as what they are (no information lose). But I am a little surprised that knowing the encoding Gobo doesn't return meaningful strings (in UTF8), but simply the stream taken directly from byte sequence. (Correct me if I am wrong.) So EiffelStudio still needs to handle conversions manually because UTF-32 is used internally and only with known encoding strings can be correctly rendered.
 +
This means we need to convert strings into an encoding the XML file specifies (none implies UTF8). Then converting most Unicode chars to iso-8859-1 is not correct anymore. And can only be safe converting the byte sequence read from XML from the encoding it specifies to UTF32. In any case, I prefer to use UTF8 for all XML files as default.
 +
I am also afraid that using "iso-8859-1" to carry strings in other encoding would make some XML editors incorrectly render them.
 +
--[[User:Ted|Ted]] 09:13, 3 July 2008 (PDT)
 +
 +
Why do you think Gobo doesn't return meaningful strings? Of course it does.
 +
You don't have to worry about any of this. The XML specification, written oer 10 years ago, sorted all this out.
 +
 +
And it is quite safe to serialize as ISO-8859-1 if you wish. No XML editor can render it incorrectly.
 +
--[[User:Colin-adams|Colin-adams]] 23:20, 6 July 2008 (PDT)
 +
 +
I realize I should not say "XML file must support Unicode encoding", because that's a feature of XML. I meant was more that XML handlers must take Unicode into account. Those handlers in ES definitely need adaption, No matter how Gobo part deals with encodings. And thank you, Colin, for correcting me how Gobo XML handles encoding. I will look more into Gobo XML on encodings before really starting working on this part. <br>
 +
Otherwise, from my experiments, no editor/renderer(VS, IE, Firefox) correctly displays Chinese with "<?xml version="1.0" encoding="ISO-8859-1"?>". The reason for me is simple, ISO-8859-1 does not define Chinese characters.<br>
 +
--[[User:Ted|Ted]] 04:22, 7 July 2008 (PDT)

Latest revision as of 07:09, 7 July 2008

--Patrickr 17:31, 9 November 2006 (CET)

The location for the mo files should be setup in the environment library, on Unix those files go under

/usr/share/locale

e.g.

/usr/share/locale/en/LC_MESSAGES/eiffelstudio.mo
/usr/share/locale/de/LC_MESSAGES/eiffelstudio.mo
/usr/share/locale/de_CH/LC_MESSAGES/eiffelstudio.mo

--Juliant 19:53, 10 November 2006 (CET)

Is it really necessary that the language can be changed while running EiffelStudio? I would say this is set once (even during installation). It wouldn't be a problem to just restart EiffelStudio.

--Ted 03:20, 13 November 2006 (CET)

What are the advantages of putting mo files under /usr/share/locale? mo files are not shared between applications and normally users do not need to change mo files.
All mo files are put in ES installation directory, we only need to store the locale id as a preference.
More over, mo files are implemented to be accepted by the library only with names of locale id.
We need to decide whether the language can be switched at runtime. Of course, not doing this as most applications definitely reduces a lot of time.

--Patrickr 17:13, 13 November 2006 (CET)

On Unix there are rules where which part of an application belongs to. Standardising this locations makes various things easier. I also think changing the language during runtime is not necessary.

Dont use UTF-16

UTF-16 is an abomination - it should never be used. --Colin-adams 09:03, 1 August 2007 (CEST)

I agree. -- Ted 08:04, 2 July 2008 (PDT)

Auto-encoding detection

Do you have a scheme in mind?

This is, in general, impossible, but special circumstances can make it tractable. The starting possibilities for an Eiffel source text are quite limited, so at first glance it looks possible.

However, I note that if an Eiffel source text uses ASCII characters for everything except the contents of STRING_8 literals, and no STRING_32 or CHARACTER_32 literals are present, then it will be impossible to distinguish between ISO-8859-1 (or most other subsets of ISO-8859) and UTF-8, unless the UTF-8 file starts with a BOM. But the latter practise is reprehensible, and many editors do not support it.

--Colin-adams 09:09, 1 August 2007 (CEST)

Yes, one can not tell the encoding accurately. I thought this could be possible done by combination of means. I knew firefox and IE has encoding detection library, but they seem only give the most possible results, not accurate ones. This part maybe implemented as just "encoding detection" in the end. --Ted 08:13, 2 July 2008 (PDT)

Notes term

The problem becomes much simpler if you add the following restriction:

Source codings other than ISO-646 (US-ASCII) and ISO-8859-1 (Latin-1) are only allowed if the class contains an encoding term in the notes (indexing, in pre-ECMA) clause whose value names the encoding.

So a UTF-8 source file would start something like:

indexing
 
 description: "My latest class writen in Unicode"
 encoding: "UTF-8"
 
class MY_LATEST
 
end

--Colin-adams 09:17, 1 August 2007 (CEST)

This is one way we are thinking of. But ideally this needs parsing. Maybe specifying encoding as compiler argument is good enough, or make it into .ecf files. --Ted 08:18, 2 July 2008 (PDT)

All XML files support all of Unicode

The line that says ECF files, etc., will need a Unicode encoding is not true. All XML files, no matter what their encoding, support the entire Unicode character set. --Colin-adams 22:41, 1 July 2008 (PDT)

What I mean is specific to EiffelStudio in which we only put iso-8859-1 as encoding of ecfs. This char set is not sufficient, since users will be able to put any Unicode chars in project settings, for example, descriptions. And other internal implementations using XML, like diagram storage, metrics storage may not even take encoding into account. These parts need to be adapted. --Ted 08:25, 2 July 2008 (PDT)

One thing I am not so sure is whether Gobo XML parser has supported encodings rather than Unicode ones. For example, GB2312. (This implies ability to convert between GB2312 and UTF8 etc.) --Ted 08:29, 2 July 2008 (PDT)

It IS sufficient. No matter what the encoding, all Unicode characters (within the limits of the XML version - which means 1.0 for Gobo) can be represented. That is what character references are for. E.g. ŋ is outside the range of Latin-1, but you can still specify it in an XML file with encoding="ISO8859-1". And no, Gobo does not support GB2312, but it does support UTF-8 (all XML parsers must support UTF-8 and UTF-16). But this is not very relevant. --Colin-adams 07:13, 3 July 2008 (PDT)

I see what you mean. It is good to know Gobo takes them as what they are (no information lose). But I am a little surprised that knowing the encoding Gobo doesn't return meaningful strings (in UTF8), but simply the stream taken directly from byte sequence. (Correct me if I am wrong.) So EiffelStudio still needs to handle conversions manually because UTF-32 is used internally and only with known encoding strings can be correctly rendered. This means we need to convert strings into an encoding the XML file specifies (none implies UTF8). Then converting most Unicode chars to iso-8859-1 is not correct anymore. And can only be safe converting the byte sequence read from XML from the encoding it specifies to UTF32. In any case, I prefer to use UTF8 for all XML files as default. I am also afraid that using "iso-8859-1" to carry strings in other encoding would make some XML editors incorrectly render them. --Ted 09:13, 3 July 2008 (PDT)

Why do you think Gobo doesn't return meaningful strings? Of course it does. You don't have to worry about any of this. The XML specification, written oer 10 years ago, sorted all this out.

And it is quite safe to serialize as ISO-8859-1 if you wish. No XML editor can render it incorrectly. --Colin-adams 23:20, 6 July 2008 (PDT)

I realize I should not say "XML file must support Unicode encoding", because that's a feature of XML. I meant was more that XML handlers must take Unicode into account. Those handlers in ES definitely need adaption, No matter how Gobo part deals with encodings. And thank you, Colin, for correcting me how Gobo XML handles encoding. I will look more into Gobo XML on encodings before really starting working on this part.
Otherwise, from my experiments, no editor/renderer(VS, IE, Firefox) correctly displays Chinese with "<?xml version="1.0" encoding="ISO-8859-1"?>". The reason for me is simple, ISO-8859-1 does not define Chinese characters.
--Ted 04:22, 7 July 2008 (PDT)