Re: Find out which character set is used



thomas.mertes@xxxxxx writes:

> When a program is running inside an terminal emulator like
> xterm or Konsole the emulator uses some character set like
> ISO Latin-1, ISO Latin-9 or UTF-8.
>
> As far as I know the use of UTF-8 can be recognized by
> examining the environment variables LC_ALL, LC_CTYPE
> and LANG for the string UTF-8.
>
> When UTF-8 is not used, something like Latin-1, Latin-9 or
> some other character set is used.
>
> My question is now:
> How to find out which character set is used in a terminal?

man locale

> For X11 programs there is a similar problem:
> Which character set is used to encode file names in a
> directory?

The encoding is whatever the file was created with. There is no way
to tell. The best you can do is hope that it matches LC_CTYPE. If
you are ambitious, check for common patterns of the most likely
encodings. Of course, on short strings like filenames this is likely
to give many false matches.

The easy way out is to require utf8, and let the user blame himself if
he chooses to use something else.

--
Måns Rullgård
mru@xxxxxxxxxxxxx
.



Relevant Pages

  • Re: Uterm 0.9 - A Unicode / UTF-8 Terminal Emulator
    ... | Uterm is a Unicode/UTF-8 Terminal Emulator for the Linux FrameBuffer Console ... | supporting a large character set. ... | supporting the Chinese/Japanese/Korean character sets. ...
    (comp.os.linux.development.apps)
  • Re: DECC : toupper/tolower performance
    ... > the character set, but note they are different in the two different ways I ... outpout on teh VMS box since the decterm froze just before the ... there is no logical on the host running the ... It is defined in terminal emulator ...
    (comp.os.vms)
  • Uterm 0.9 - A Unicode / UTF-8 Terminal Emulator
    ... Uterm is a Unicode/UTF-8 Terminal Emulator for the Linux FrameBuffer Console ... supporting a large character set. ... It is patterned after kon and jfbterm, popular and current terminal emulators ... supporting the Chinese/Japanese/Korean character sets. ...
    (comp.os.linux.development.apps)
  • Find out which character set is used
    ... When a program is running inside an terminal emulator like ... xterm or Konsole the emulator uses some character set like ... and LANG for the string UTF-8. ...
    (comp.unix.programmer)
  • Re: Creating ANSI text files with international characters
    ... encoding/codepage my outlook uses when exporting to .vcf files, ... iso-8859-1 encoding with codepage 1252. ... I'm at a complete loss as far as vCalendar files go. ... The default character set is ASCII. ...
    (microsoft.public.dotnet.framework)