Re: How to get the encoding table ?
From: Enrique Perez-Terron (enrio_at_online.no)
Date: 09/27/05
- Next message: contracer11_at_gmail.com: "Re: Changing date field in a text file..."
- Previous message: kelly: "how to kill a program in a script that started from that script"
- In reply to: nix: "Re: How to get the encoding table ?"
- Next in thread: nix: "Re: How to get the encoding table ?"
- Reply: nix: "Re: How to get the encoding table ?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Tue, 27 Sep 2005 22:00:24 +0200
On Tue, 27 Sep 2005 06:39:08 +0200, nix <dengcfei@gmail.com> wrote:
>
> Enrique Perez-Terron 写道:
>
>> On Mon, 26 Sep 2005 04:58:57 +0200, nix <dengcfei@gmail.com> wrote:
>>
>> > I want to input mutibyte data into file for test.
>> > for example:
>> > bash:> printf "\200\254" > file1
>> > so I want to get the hexadecimal value of the corresponding multibyte
>> > character.
[...]
> JIS X 0208 and JIS X 0212 are the most popular Japanese Industry
> Standard of character sets, using the form <j-row-column> to represent
> a specific Japanese character such as 平仮名 and 漢字.But they are
> not the encoding.
> The common encoding is such as Shift-JIS, EUC-jp.
>
> My question is when I get a multibyte character such as 間(kanji
> character in Japanese), how can I get the correspoding encoding value.
> It depends on the charmap used by the current locale.
$ echo 間 | od -t x1
0000000 e9 96 93 0a
0000004
Since I have a locale based on utf-8, the response I get is the utf-8
encoding. In this way you get the character encoding for your current
locale.
In addition there is the "iconv" program, if you need to see the encoding
in some other encoding system. For instance, I can do
$ echo 平仮名 and 漢字 | iconv --from utf-8 --to sjis | od -t x1
0000000 95 bd 89 bc 96 bc 20 61 6e 64 20 8a bf 8e 9a 0a
0000020
If you want octal, rather than hex output, use "-t o1" instead of "-t x1".
I should perhaps mention that I am using gnome-terminal and bash.
Other environments may not be able to handle the direct input of CJK
characters, or even the pasting technique I used.
> Is there any utility like "locale -xxx" or "charmap -l" to display the
> characters and encoding values of the charmap.
I am not aware of any such utility.
It reminds me that at some half forgotten point in the past I have
poked around in the files underlying the various parts of the X windows
system, the keyboard extension, the locale system, etc, and at some point
I was surprised to find ascii files listing up the long names of each
character in various character sets, together with some other information,
like perhaps X keysym numbers, and encoding values.
However, when I now tried to find back to it, I had no success.
> References:
> http://www.rikai.com/library/kanjitables/kanji_codes.sjis.shtml
> http://www.rikai.com/library/kanjitables/kanji_codes.euc.shtml
These seem to pretty much provide what you failed to find according
to your original post:
>> > Now, my locale is Ja_JP, maybe the character set is the JIS X 0208 and
>> > JIS X 0212. I googled it and get the character table but it just lists
>> > the character and <row-column> form, for example:
>> > <j0761><j1604>
>> > I can't get the hexadecimal value either.
Am I understanding you correctly?
Cheers,
Enrique
- Next message: contracer11_at_gmail.com: "Re: Changing date field in a text file..."
- Previous message: kelly: "how to kill a program in a script that started from that script"
- In reply to: nix: "Re: How to get the encoding table ?"
- Next in thread: nix: "Re: How to get the encoding table ?"
- Reply: nix: "Re: How to get the encoding table ?"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|