Re: Sed one liner (!?) for joining lines
From: Icarus Sparry (usenet_at_icarus.freeuk.com)
Date: 05/25/05
- Next message: Kenny McCormack: "Re: $date-1"
- Previous message: artem0nnospam_at_yahoo.com: "Sed one liner (!?) for joining lines"
- In reply to: artem0nnospam_at_yahoo.com: "Sed one liner (!?) for joining lines"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Wed, 25 May 2005 14:16:46 GMT
On Wed, 25 May 2005 06:25:59 -0700, artem0nnospam wrote:
> I use ssed and on windows I need to join all text lines (termminated
> CR/LF) that begin with a lower case letter [a-z] and only those. anyone
> can help?
>
> these examples seem very close to what I need, but am confused about
> the ":a ta" elements of the script, not found in help file... @gnu
>
> # 14. If a line begins with an equal sign, append it to the
> # previous line (and replace the "=" with a single space).
>> sed -e :a -e '$!N;s/\n=/ /;ta' -e 'P;D' file
The ':a' creates a label (a), that your script can branch to.
The 'ta' is a conditional branch to a. Essentially branch if a change
has been made to the current line since the last 't' command, or the
line was read. See the manual for the 't' command.
So your script written out with one command per line and comments is
# set up a loop
:a
# If it is not the last line then append the next line
# with an embedded newline
$!N
# Change the newline, followed by an equals sign, to a space
s/\n=/ /
# If that change worked (i.e. the next line did start with an =)
# then go back to the loop, otherwise go onto the next instruction
ta
# Print out the current buffer, up to the newline
P
# Delete up to the newline, and go to the start of the script
D
If you change the 's/\n=/ /' to
s/\n\([a-z]\)/ \1/
which is 'change a newline followed by a (remembered) single lowercase
letter to a space followed by the remembered text', it should do what you
want.
The only problem you might have is the CR half of the CRLF pair. Your sed
may remove it for you, as part of its end-of-line processing. Otherwise you
can put a 's/CR//g', where CR is an actual CR character, just before the P.
Doing this might be tricky, I would strongly suggest that you put the sed
commands into a file, and then use 'sed -f command_filename' to do your
work, even more so on Windows, where cmd/command does not have the quoting
facilities that the unix shells offer.
> One consideration is memory as I am afraid as joining all
> the lines that begin with [a-z] (i.e. forming a paragraph) would pose
> problems for large text files?
well, if the size of your paragraphs is bigger than about half your
available virtual memory you have a problem. Sed will only work on
one paragraph at a time, so it can process almost infinite sized files
as long as it can do it in small pieces. So unless you are writing a
massive novel as a single paragraph, you should be OK.
- Next message: Kenny McCormack: "Re: $date-1"
- Previous message: artem0nnospam_at_yahoo.com: "Sed one liner (!?) for joining lines"
- In reply to: artem0nnospam_at_yahoo.com: "Sed one liner (!?) for joining lines"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|