Re: specific UTF-8 locales

From: Gianni Mariani (gi2nospam_at_mariani.ws)
Date: 03/23/05

  • Next message: saroj.yadav_at_gmail.com: "Re: specific UTF-8 locales"
    Date: Tue, 22 Mar 2005 22:09:54 -0800
    
    

    saroj.yadav@gmail.com wrote:
    > As I understand it (correct me, if I am wrong) Unicode came into
    > picture so that a document containing multiple language characters can
    > be supported like somebody can write a document comparing Korean and
    > Chinese in French language.
    >
    > Now, I am looking at all UNIX platforms and seems like all Unix (AIX,
    > HP, Solaris) platforms support Unicode by supporting language/region
    > specific UTF-8 locales like fr_FR.UTF-8, ja_JP.UTF-8, ko_KR.UTF-8 etc.
    >
    > Now in order to use UTF-8 for Japanese, I have to set locale to
    > ja_JP.UTF-8. To use UTF-8 for Korean, I have to set locale to
    > ko_KR.UTF-8.
    >
    > With this approach it's not possible to mix multiple language
    > characters. Doesn't this defeat the whole purpose of Unicode ?
    > Am I missing something ?
    >
    > Thanks in advance for any insight you can provide.
    >

    The "language" in the locate is used to find the message catalog as well
    as the following attributes. In theory, you can have japanese and
    korean characters in your string. It's just that if you format time, or
    collate, or classify a character or format money etc, you'll be getting
    the locale specific behaviour.

    As for rendering multiple languages in one display, that's tricky,
    especially if you're displaying chinese, thai and arabic all in the same
    window and then trying to select a bit of thai and arabic with a mouse.

     From the "setlocale" man page.

            LC_COLLATE
                   for regular expression matching (it determines the
    meaning of
                   range expressions and equivalence classes) and string
    collation.

            LC_CTYPE
                   for regular expression matching, character
    classification, con-
                   version, case-sensitive comparison, and wide
    character func-
                   tions.

            LC_MESSAGES
                   for localizable natural-language messages.

            LC_MONETARY
                   for monetary formatting.

            LC_NUMERIC
                   for number formatting (such as the decimal point and the
      thou-
                   sands separator).

            LC_TIME
                   for time and date formatting.


  • Next message: saroj.yadav_at_gmail.com: "Re: specific UTF-8 locales"

    Relevant Pages

    • Re: UTF-8 locales
      ... > picture so that a document containing multiple language characters can ... Solaris) platforms support Unicode by supporting language/region ... A locale describes much more than just the characters. ...
      (comp.lang.c)
    • specific UTF-8 locales
      ... picture so that a document containing multiple language characters can ... Solaris) platforms support Unicode by supporting language/region ... With this approach it's not possible to mix multiple language ...
      (comp.unix.questions)
    • UTF-8 locales
      ... picture so that a document containing multiple language characters can ... Solaris) platforms support Unicode by supporting language/region ... With this approach it's not possible to mix multiple language ...
      (comp.lang.c)
    • Re: Unicode-based FreeBSD
      ... displaying specialised characters on the screen/tty. ... There are special Input Methods for the rest of Unicode. ... Unicode support and the FreeBSD developers see little reason to ...
      (freebsd-current)
    • Re: Unicode Support
      ... > compiler/assembler is unicode support. ... > but the fact that most assemblers, ... Most popular *editing* tools support this data format. ... Is there really a need to put unicode characters into identifiers? ...
      (alt.lang.asm)