Re: Open Source on OpenVMS - A Progress Report



In article
<7efe77bd-4241-4b67-8dbc-85bd5e0a554b@xxxxxxxxxxxxxxxxxxxxxxxxxxx>,
MetaEd <metaed@xxxxxxxxx> writes:

There is absolutely nothing
saying that this newsgroup or any other group should be
only supporting characters that happens to be in the
*english* alphabet or that it must be 7-bit plain ASCII.

Actually, there is, RFC 1036. This controls the message format for all
messages posted to newsgroups. The message format must follow RFC 822
with some minor modifications. RFC 822 is limited to ASCII (7-bit
codes).

Right.

Any message composed with characters that cannot be represented with
ASCII codes must be stripped of those characters or encoded somehow to
ASCII before transmission. The de facto standard for encoding text is
RFC 2045--2049 (MIME).

As of this writing, Google Groups does not use MIME when all the
characters of the message can be represented with ASCII codes.

Otherwise, if the message can be represented with Latin-1 (ISO-8859-1)
codes, Google Groups does so, and encodes with MIME using Quoted-
Printable. Because Latin-1 is an ASCII superset, and because Quoted-
Printable preserves most ASCII codes, this causes ASCII to be used to
encode the message for transmission wherever possible. Other
characters are encoded with a hex notation. Long lines are also
preserved using a line continuation code. So, despite being encoded,
these messages are pretty easy to comprehend using a newsreader that
lacks MIME support.

I have an EDT macro which does the decoding (see below).

But if the message cannot be represented with Latin-1 codes, Google
Groups uses UTF-8 codes, and encodes with MIME using Base64. UTF-8 and
Base64 are too different from ASCII for such messages to be
comprehended easily using a newsreader that lacks MIME support.

One can extract them and run B64DECODE.EXE on them. However, such
messages USUALLY have no place in a newsgroup in the first place.

The attribution line which Google Groups creates in the body (for
example: "On Oct 20, 2:06 pm, MetaEd <met...@xxxxxxxxx> wrote")
contains a Latin-1 non-breaking space (code A0) between the minutes
and the "am" or "pm". This is a character which does not exist in
ASCII.

As a courtesy to readers having no MIME support, posters can replace
the non-breaking space with a plain space. This will avoid MIME
encoding, as long as the message has no other non-ASCII characters.

Good suggestion.

And, as a courtesy to posters who are spelling names and places
properly using non-ASCII codes, readers can learn to read MIME encoded
messages or use a newsreader that has MIME support.

Something which breaks the RFC but provides few if any problems for most
people, whatever newsreader folks are using, is to use 8-bit characters
WITHOUT encoding. This is analogous to doing so in VMS MAIL (but don't
forget to set the transport to 8-bit in the SMTP configuration). Any
newsreader which has fancy features will probably assume ISO-LATIN-1 and
get it right, as will many WITHOUT fancy features. Such codes can be
entered from a VMS keyboard with the compose key. If you have FORTRAN
installed, do HELP FORT CHAR DEC to get the DEC multinational set (which
is almost ISO-LATIN-1):

+------------------------------------------+
| 8 9 A B C D E F |
+---+--------------------------------------+
| 0 | DCS ° À à |
| 1 | PU1 ¡ ± Á Ñ á ñ |
| 2 | PU2 ¢ ² Â Ò â ò |
| 3 | STS £ ³ Ã Ó ã ó |
| 4 | IND CCH Ä Ô ä ô |
| 5 | NEL MW ¥ µ Å Õ å õ |
| 6 | SSA SPA ¶ Æ Ö æ ö |
| 7 | ESA EPA § · Ç × ç ÷ |
| 8 | HTS ¨ È Ø è ø |
| 9 | HTJ © ¹ É Ù é ù |
| A | VTS ª º Ê Ú ê ú |
| B | PLD CSI « » Ë Û ë û |
| C | PLU ST ¼ Ì Ü ì ü |
| D | RI OSC ½ Í Ý í ý |
| E | SS2 PM Î î |
| F | SS3 APC ¿ Ï ß ï |
+---+--------------------------------------+

! quoted printable
!
! Create a buffer with two blank lines (for some reason one
! blank line is not enough???)
!
find buffer cr_buffer
insert;
insert;
find last
!
DEFINE MACRO KQP
FIND BUFFER KQP
INSERT;s|=2c|,|w
INSERT;s|=FC|ü|w
INSERT;s|=DF|ß|w
INSERT;s|=F6|ö|w
INSERT;s|=E4|ä|w
INSERT;s|=3D|=|w
INSERT;s|=A0| |w
INSERT;s|=91|`|w
INSERT;s|=92|'|w
INSERT;s|=5F|_|w
INSERT;s|=20||w
INSERT;s|=C4|Ä|w
INSERT;s|=D6|Ö|w
INSERT;s|=DC|Ü|w
INSERT;s|=BA|º|w
INSERT;s|=95|·|w
INSERT;s|=2E|.|w
INSERT;s|=2D|-|w
INSERT;s|=E9|é|w
INSERT;s|=E1|á|w
INSERT;s|=C1|Á|w
INSERT;s|=E8|è|w
INSERT;s|=93|<I>|w
INSERT;s|=94|</I>|w
INSERT;s|=E5|å|w
INSERT;s|=96|---|w
INSERT;s|=20| |w
! Linefeed/CR Combination
INSERT;%B
INSERT;change; 9999('=0A=0D' cutsr paste=cr_buffer) ex
! CR/Linefeed Combination
INSERT;%B
INSERT;change; 9999('=0D=0A' cutsr paste=cr_buffer) ex
! Linefeed
INSERT;%B
INSERT;change; 9999('=0A' cutsr paste=cr_buffer) ex
! CR
INSERT;%B
INSERT;change; 9999('=0D' cutsr paste=cr_buffer) ex
INSERT;%B
find last

.



Relevant Pages

  • Re: Open Source on OpenVMS - A Progress Report
    ... only supporting characters that happens to be in the ... *english* alphabet or that it must be 7-bit plain ASCII. ... ASCII codes must be stripped of those characters or encoded somehow to ... codes, Google Groups does so, and encodes with MIME using Quoted- ...
    (comp.os.vms)
  • Re: =?ISO-8859-1?Q?Soup=E7on_of_cedilles_and_aper=E7us?=
    ... Above that are the so-called "extended ASCII codes", ... neither of the non-7-bit-ASCII characters display. ... Unfortunately Mike does not have MIME enabled in his software, ... they are superior to the modern newsreaders. ...
    (alt.usage.english)
  • Re: Ive seen things you people wouldnt believe...
    ... They consist entirely of seven-bit ASCII, ... MIME encodings, which every non-broken newsreader needs to handle ... were used to encode his characters -- and that encoding included ... can't handle MIME, then it's not standards-compliant. ...
    (rec.arts.sf.fandom)
  • Re: ASCII/ANSI
    ... I have noticed that using the Alt characters on the keyboard shows you ... I had a quick look at ASCII lists that I found online but I could not ... may be interpreted / displayed differently in different applications, fonts ... Codes" should give you the codes to type for those characters, ...
    (comp.databases.filemaker)
  • Re: Ascii characters dont correspond to their codes
    ... > I've noticed recently doing VBA that my ascii characters do not correspond> to their given codes. ... > I have to say that I work on a multi language computer with keyboards in> various languages and this may be the cause of the problem as some ascii> characters are lang setting sensitive. ...
    (microsoft.public.word.docmanagement)