Re: CR and LF

From: Floyd Davidson (floyd_at_barrow.com)
Date: 06/12/03


Date: 12 Jun 2003 01:24:38 -0800

triggerfish999@yahoo.com (Roger) wrote:
>Floyd Davidson <floyd@barrow.com> wrote in message >
>> Ahem, which _unix_ system is experiencing this behaviour?
>mmm.. well.. er.. the app is a cross platform thing and this behaviour
>happens under Win2000 - both when run from DOS box and Cygwin. The
>app is reading a plain text file (xml) so I'm not sure the issue of
>*which* unix applies. The app also runs under Solaris and AIX. I
>guess I'm asking about the layout of CR characters and why my getc()
>doesn't see 'em when there is only one preceding the LF.
>I guess this doesn't really help much *sigh*

It helped tremendously! The problem was that you posted this to
a unix newsgroup, but you are talking about a non-unix system
and there was only a hint that that was true. Other, of course
than that the problem sounded as if the only way you could get
what you were getting was if it wasn't on a unix system. You
confirmed all of that, and it does make sense.

The problem is that DOS and then Windows followed CP/M and a few
other platforms in using CRLF to indicate a newline in a text
file, while unix systems use LF alone, and other systems might
use LFCR or just CR. So depending on where the file originated,
you might see different combinations of actual characters in the
file for a newline, and depending on the platform reading the
file there are different combination that match code which does
"getc() == '\n'".

Note that the C standard allows files to be opened in "binary"
mode or in "text" mode, and that on MS platforms there is a
difference (with binary, CRLF is a '\r' and a '\n'; with text,
CRLF is '\n'. On a unix platform there is no difference between
modes, as CRLF is always a '\r' and a '\n').

Complying with the C Standard, various functions which read or
write to files adapt as needed for binary or text mode under
system that use multi-character newline delimiters.

There is more than one way to approach your problem, which is
going to get even more complex when you do port this to other
platforms. You may want to simply convert all files to one
format or the other; then write code for that format in a way
that works with every system (e.g., open all files in binary
mode). Another is to use heuristics to first determine which
format the file is, and write code that will work on all systems
but will preserve the existing format. A third possibility is
that there will be no crossplatform file exchange, so you can
just write different sets of routines for the code used on
different platforms, taking into account the different file
formats.

-- 
Floyd L. Davidson           <http://web.newsguy.com/floyd_davidson>
Ukpeagvik (Barrow, Alaska)                         floyd@barrow.com

Quantcast