Difference between revisions of "Internationalization/posix locale"

 
(5 intermediate revisions by 2 users not shown)
Line 4: Line 4:
 
My Summary is based on article by Michael Becker (see [http://www.ijon.de/comp/tutorials/locale.html here]).This summary is more or less concerning SuSE-Linux.
 
My Summary is based on article by Michael Becker (see [http://www.ijon.de/comp/tutorials/locale.html here]).This summary is more or less concerning SuSE-Linux.
  
There are differences between User-locale and C-locale. C-locale, is the local, which is only effectual in C-environment. Outside C-environment if there is no other locale set, the C-locale is then effectual, which is called "POSIX"-locale.
+
There are differences between User-locale and C-locale. C-locale, is the local, which is only effective in C-environment. If there are no locales  set in user environment, then C-locale is effective, which is so called "POSIX"-locale.
  
 
With C-command <code>setlocale (LC_ALL,"")</code> one set C-locale with User-locale. So one could retrieve details about User-locale information with C-functions.
 
With C-command <code>setlocale (LC_ALL,"")</code> one set C-locale with User-locale. So one could retrieve details about User-locale information with C-functions.
Line 27: Line 27:
 
localeconv()</code>   
 
localeconv()</code>   
 
see ''manpage locale(7)'' to get more information.
 
see ''manpage locale(7)'' to get more information.
 +
 +
Another way to obtain the current locale is to use the environnement variable ''LANGUAGE'', this is done with the following code:
 +
 +
<code>[eiffel,N]
 +
environement : EXECUTION_ENVIRONMENT
 +
...
 +
environement.get ("LANGUAGE")
 +
    -- returns the (rank)list (as string) of languages set by the user
 +
</code>
  
 
== Retrieving locale information ==
 
== Retrieving locale information ==
 
C-command <code> nl_langinfo() </code>(defined in langinfo.h) one could get information about all categories.   
 
C-command <code> nl_langinfo() </code>(defined in langinfo.h) one could get information about all categories.   
 +
 +
 +
 +
===1. LC_NUMERIC ===
 +
 +
Comparing LC_NUMERIC formats in posix,en_US, de_DE,zh_CN
 +
1.C-codes and their outputs:
 +
====posix====
 +
#include <stdio.h>
 +
#include <langinfo.h>
 +
 +
int main (int arg,char** argv){
 +
        printf("%d\n",12345678);
 +
        printf("%'d\n",12345678);
 +
        printf("%f\n",123456.78);
 +
        printf("%'f\n",123456.78);
 +
        printf("%1$d:%2$.*3$d\n",12,34,2);
 +
}
 +
Outputs are:12345678 12345678 123456.780000 123456.780000 12:34
 +
====en_US====
 +
#include <stdio.h>
 +
#include <langinfo.h>
 +
#include <locale.h>
 +
 +
int main (int arg,char** argv){
 +
        setlocale(LC_NUMERIC,"en_US");
 +
        printf("%d\n",12345678);
 +
        printf("%'d\n",12345678);
 +
        printf("%f\n",123456.78);
 +
        printf("%'f\n",123456.78);
 +
        printf("%1$d:%2$.*3$d\n",12,4,2);
 +
}
 +
Outputs are:12345678 12,345,678 123456.780000 123,456.780000 12:04
 +
====de_DE====
 +
int main (int arg,char** argv){
 +
        setlocale(LC_NUMERIC,"de_DE");
 +
        printf("%d\n",12345678);
 +
        printf("%'d\n",12345678);
 +
        printf("%f\n",123456.78);
 +
        printf("%'f\n",123456.78);
 +
        printf("%1$d:%2$.*3$d\n",12,34,2);
 +
       
 +
       
 +
 +
}
 +
Outputs are:12345678 12.345.678 123456,780000 123.456,780000 12:34
 +
====zh_CN====
 +
setlocale(LC_NUMERIC,"zh_CN"); only change to above code
 +
Outputs are:12345678 12,345,678 123456.780000 123,456.780000 12:34
 +
===Analysing results===
 +
Look at all the locale source files, the results are resonable to us.
 +
 +
Feld         
 +
Posix
 +
en_US
 +
de_DE
 +
zh_CN
 +
decimal_point
 +
"<U002E>"
 +
"<U002E>"
 +
"<U002C>"
 +
"<U002E>"
 +
thousands_sep
 +
""
 +
"<U002C>"
 +
"<U002E>"
 +
"<U002C>"
 +
grouping
 +
-1
 +
3;3
 +
3;3
 +
3
 +
 +
Where "<U002E>" means "dot"; "<U002C>" means "comma".
 +
 +
===2.LC_TIME===
 +
For this purpose there is a C-function <code> nl_langinfo() </code> (defined in langinfo.h).
 +
Here is a list of arguments for this function, which are useful for our project:
 +
*ABDAY_x : abbreviated name of weekday x, %a-Descriptor
 +
*DAY_x : name of day x,%A-Descriptor
 +
*ABMON_x : abbreviated name of month x, %b-Descriptor
 +
*MON_x : name of month x,%B-Descriptor
 +
*[AM|PM]_STR:  strings which can be used in the representation of time as an hour from 1 to 12 plus an am/pm specifier.
 +
*D_T_FMT:  time and date pattern in a locale-specific way.
 +
*D_FMT: date pattern: date pattern in a locale-specific way.
 +
*T_FMT: time pattern in a locale-specific way.
 +
 +
The result of nl_langinfo, called with one of the last three arguments, can than be used as argument for the C-function <code>strftime()</code>.
 +
 
For example, with <code> nl_langinfo(D_T_FMT)</code>, one get a pointer at a string, which in form of %a %d %b %Y %T %Z.
 
For example, with <code> nl_langinfo(D_T_FMT)</code>, one get a pointer at a string, which in form of %a %d %b %Y %T %Z.
 +
where:
 +
%a: The abbreviated weekday name according to the current locale
 +
%d: The day of the month as a decimal number (range 01 to 31)
 +
%b: The abbreviated month name according to the current locale
 +
%Y: The year as a decimal number including the century.
 +
%T: The time in 24-hour notation (%H:%M:%S)
 +
%Z: The time zone or name or abbreviation
  
 
== Other information ==
 
== Other information ==
Line 47: Line 152:
 
and so on for LC_CTYPE.  
 
and so on for LC_CTYPE.  
  
For more information please read [http://www.opengroup.org/onlinepubs/007908799/xbd/locale.html local-Specification of Unix Open]
+
For more information please read [http://www.opengroup.org/onlinepubs/007908799/xbd/locale.html local-Specification of Unix Open].
 +
Another paper auf dem Linux Kongress 1995 in Berlin from Jochen Hein is also useful, see [http://www.jochen.org/vortraege/nls-1995.pdf National Language Support]

Latest revision as of 03:14, 7 September 2006


Summary

My Summary is based on article by Michael Becker (see here).This summary is more or less concerning SuSE-Linux.

There are differences between User-locale and C-locale. C-locale, is the local, which is only effective in C-environment. If there are no locales set in user environment, then C-locale is effective, which is so called "POSIX"-locale.

With C-command setlocale (LC_ALL,"") one set C-locale with User-locale. So one could retrieve details about User-locale information with C-functions.

Listing available locales

There are two directories that one could get information about locales. They are: /usr/share/locale/ /usr/share/i18n/

In the first directory, the locales are binary files. So we could not need them for our purpose since i do not know how to read them for the moment.

In the second directory there are at least 2 subdirectories, with names charmaps and locales.

Shell command ls /usr/share/i18n/locales/ list all the available locales.

Checking for default locale

With shell command "locale" one can get infomation about default locale. In C, to get details of localeone can use

setlocale (LC_ALL,"")
localeconv()

see manpage locale(7) to get more information.

Another way to obtain the current locale is to use the environnement variable LANGUAGE, this is done with the following code:

environement : EXECUTION_ENVIRONMENT
...
environement.get ("LANGUAGE")
    -- returns the (rank)list (as string) of languages set by the user

Retrieving locale information

C-command nl_langinfo()(defined in langinfo.h) one could get information about all categories.


1. LC_NUMERIC

Comparing LC_NUMERIC formats in posix,en_US, de_DE,zh_CN 1.C-codes and their outputs:

posix

  1. include <stdio.h>
  2. include <langinfo.h>

int main (int arg,char** argv){

       printf("%d\n",12345678);
       printf("%'d\n",12345678);
       printf("%f\n",123456.78);
       printf("%'f\n",123456.78);
       printf("%1$d:%2$.*3$d\n",12,34,2);

} Outputs are:12345678 12345678 123456.780000 123456.780000 12:34

en_US

  1. include <stdio.h>
  2. include <langinfo.h>
  3. include <locale.h>

int main (int arg,char** argv){

       setlocale(LC_NUMERIC,"en_US");
       printf("%d\n",12345678);
       printf("%'d\n",12345678);
       printf("%f\n",123456.78);
       printf("%'f\n",123456.78);
       printf("%1$d:%2$.*3$d\n",12,4,2);

} Outputs are:12345678 12,345,678 123456.780000 123,456.780000 12:04

de_DE

int main (int arg,char** argv){

       setlocale(LC_NUMERIC,"de_DE");
       printf("%d\n",12345678);
       printf("%'d\n",12345678);
       printf("%f\n",123456.78);
       printf("%'f\n",123456.78);
       printf("%1$d:%2$.*3$d\n",12,34,2);
       
       

} Outputs are:12345678 12.345.678 123456,780000 123.456,780000 12:34

zh_CN

setlocale(LC_NUMERIC,"zh_CN"); only change to above code Outputs are:12345678 12,345,678 123456.780000 123,456.780000 12:34

Analysing results

Look at all the locale source files, the results are resonable to us.

Feld Posix en_US de_DE zh_CN decimal_point "<U002E>" "<U002E>" "<U002C>" "<U002E>" thousands_sep "" "<U002C>" "<U002E>" "<U002C>" grouping -1 3;3 3;3 3

Where "<U002E>" means "dot"; "<U002C>" means "comma".

2.LC_TIME

For this purpose there is a C-function nl_langinfo() (defined in langinfo.h). Here is a list of arguments for this function, which are useful for our project:

  • ABDAY_x : abbreviated name of weekday x, %a-Descriptor
  • DAY_x : name of day x,%A-Descriptor
  • ABMON_x : abbreviated name of month x, %b-Descriptor
  • MON_x : name of month x,%B-Descriptor
  • [AM|PM]_STR: strings which can be used in the representation of time as an hour from 1 to 12 plus an am/pm specifier.
  • D_T_FMT: time and date pattern in a locale-specific way.
  • D_FMT: date pattern: date pattern in a locale-specific way.
  • T_FMT: time pattern in a locale-specific way.

The result of nl_langinfo, called with one of the last three arguments, can than be used as argument for the C-function strftime().

For example, with nl_langinfo(D_T_FMT), one get a pointer at a string, which in form of %a %d %b %Y %T %Z. where: %a: The abbreviated weekday name according to the current locale %d: The day of the month as a decimal number (range 01 to 31) %b: The abbreviated month name according to the current locale %Y: The year as a decimal number including the century. %T: The time in 24-hour notation (%H:%M:%S) %Z: The time zone or name or abbreviation

Other information

Using a parser one could also retrieve locale information, about which I do not have a lot of idea.

Still there are C-functions to use categories. For example,

strftime()
strptime()
wcsftime()

for LC_TIME For another example,

isalnum
isalpha
isblank

and so on for LC_CTYPE.

For more information please read local-Specification of Unix Open. Another paper auf dem Linux Kongress 1995 in Berlin from Jochen Hein is also useful, see National Language Support