What is the exact meaning of Octal Sequences \ooo-\ooo in tr utility of UNIX03 Specification

From: nix (dengcfei_at_gmail.com)
Date: 10/11/05


Date: 11 Oct 2005 00:11:52 -0700

The Single Unix System Specification V3 (UNIX03 Spec) of tr utility
confused me a lot on the "\ooo-\ooo" issue.
My system is based on EBCDIC encoding.

UNIX03 spec of tr utility states that:
---------------------------------------------------------
"\octal
Octal sequences can be used to represent characters with specific coded
values. An octal sequence consists of a

backslash followed by the longest sequence of one-, two- or
three-octal-digit characters (01234567). "
---------------------------------------------------------

So the "\ooo" reprents a character whose encoding value is the octal
number "ooo". Such as, '\201' reprents

character 'a' in EBCDIC encoding(however in ASCII, 'a' is '\141').

UNIX03 also states that "c-c" reprents a range, such as "a-z" reprents
"abcd...z":
---------------------------------------------------------
"c-c
In the POSIX locale, this construct shall represent the range of
collating elements between the range endpoints

(as long as neither endpoint is an octal sequence of the form \octal),
inclusive, as defined by the collation

sequence. The characters or collating elements in the range shall be
placed in the array in ascending collation

sequence. If the second endpoint precedes the starting endpoint in the
collation sequence, it is unspecified

whether the range of collating elements is empty, or this construct is
treated as invalid. In locales other than

the POSIX locale, this construct has unspecified behavior.

If either or both of the range endpoints are octal sequences of the
form \octal, this shall represent the range

of specific coded values between the two range endpoints, inclusive."
---------------------------------------------------------

But when the "c" is the "\ooo" form, that make confusions.
For example: "\201-\251" ('\251' is 'z'), does it reprent
the character range 'a-z' or the value range from octal 201 to
octal 251?
They are quite different that the range 'a-z' includes 26 characters,
but '201-251' includes 50(octal) = 40(decimal) characters.

UNIX03 has some description about this issue in the Rationale section
of tr utility.
---------------------------------------------------------
"The ISO POSIX-2:1993 standard had a -c option that behaved similarly
to the -C option, but did not supply functionality equivalent to the -c
option specified in IEEE Std 1003.1-2001. This meant that historical
practice of being able to specify tr -d\200-\377 (which would delete
all bytes with the top bit set) would have no effect because, in the C
locale, bytes with the values octal 200 to octal 377 are not
characters.

The earlier version also said that octal sequences referred to
collating elements and could be placed adjacent to each other to
specify multi-byte characters. However, it was noted that this caused
ambiguities because tr would not be able to tell whether adjacent octal
sequences were intending to specify multi-byte characters or multiple
single byte characters. IEEE Std 1003.1-2001 specifies that octal
sequences always refer to single byte binary values when used to
specify an endpoint of a range of collating elements."
---------------------------------------------------------

The first paragraph says "tr -d \200-\377" would have no effect because
\200 and \377 are not characters.
But the sencond paragraph says "octal sequences always refer to single
byte binary values when used go specify and endpoint of a range of
collationg elements". They conflicts.

Which one is correct??

Thanks a lot for your any help.



Relevant Pages