Talk:Internationalization/feasibility

< Talk:Internationalization
Revision as of 04:33, 7 May 2006 by Leo (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Unicode support

A report on the current support for Unicode in Gobo has been requested (see here) but may not come in time for us to profit. It seems like there are two efforts to support Unicode, one for Gobo and one for EiffelVision, that are still sort of independent from oneanother. --Carlo 22:36, 29 April 2006 (CEST)

I'm meant to investigate that. I shall eventually (before 5th May) but one thing is clear: it's a bit of a mess.
It is not clear why there are two classes that model unicode strings. We have STRING32. Fine. (Well, not fine actually, because it takes WIDE_CHARACTERs as elements and it is not at first glance obvious how you assign a meaning to a WIDE_CHARACTER (No,wait, I've found out. You can convert an integer into a WIDE_CHARACTER. Unless I am missing something, this is slighly impractical). It even has this strange little thing with WIDE_CHARACTER inheriting from a class called something like my WIDE_CHARACTER_REF which has a WIDE_CHARACTER as an attribute. This, at 2.am, does not seem right at all. ) Then we have some string classes for modelling UTF encodings. Ok. But they descend from the normal STRING. These will have to be investigated at a later date because Eiffel Studio - an offical build, mind you, not self compiled - is segfaulting.

Momentary Unicode-output support

...The sequence %/code/, where code is an unsigned integer in any of the available forms--decimal, binary, 
octal, hexadecimal--corresponding to a valid character code in the chosen character set. 
It allows you to denote any Unicode or Extended ASCII character by its integer code; for
exemple %/59/ represents a semicolon (the character of code 59). Since listings for character
codes--for example in Unicode documentation--often give them in base 16, you may use the 0xNNN convention
for hexadecimal integers: the semicolon example can also be expressed as %/0x3B/, where 3B is the hexadecimal
code for 59.

Since the three cases define all the possibilities, a percent sign is illegal in a context expecting a
Character unless immediately followed by /code/ where code is a legal character code. For example %? is
illegal (no such special character); so is %/0xFFFFFF/ (not in the Unicode range)...
                                                                                      ECMA-STANDARD 367

Momentary is just supported the syntax "%/code/" (STRING_32) or '%/code/' (WIDE_CHARACTER) where code is a decimal in the range 0-255.

More details about syntax definition of Manifest Character at 8.32.22 in the ECMA Standard

So basically it is of very little use to us at the moment (The people who say unicode support works have not tried anything beyond "Hé, toi!" ?).The question is: if you create a wide character from an integer that is outside that range, what happens? We will not really want to embed strange and fantastic characters in strings ourselves but we want to read them from a file. Leo 14:33, 7 May 2006 (CEST)