Re: Removing lines from a plain text file
From: Brian K. White (brian_at_aljex.com)
Date: 06/17/04
- Previous message: Thom Price: "problem upgrading to Java 2 v1.3.1 on 5.0.6"
- In reply to: Brian K. White: "Re: Removing lines from a plain text file"
- Next in thread: John DuBois: "Re: Removing lines from a plain text file"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Wed, 16 Jun 2004 18:49:24 -0400
Brian K. White wrote:
> Ronald J Marchand wrote:
>> "John DuBois" <spcecdt@deeptht.armory.com> wrote in message
>> news:40cf4970$0$434$8eec23a@newsreader.tycho.net...
>>> In article
>>> <96606$40cf188b$42a6716f$19245@msgid.meganewsservers.com>, Ronald J
>>> Marchand <ron@rojomar.com> wrote:
>>>> I know how to append lines, sort lines but I do not know if there
>>>> is a simple command to remove lines.
>>>>
>>>> Given a file, sorted that the first field is numeric as:
>>>> 1001 xxxxx
>>>> 1002 yyyyy
>>>> 1003 zzzzz
>>>> 1004 aaaaa
>>>> 1005 bbbbb
>>>>
>>>> Is there a simple command that will purge the first 3 of these
>>>> lines from the file???
>>>
>>> Based on what criteria?
>>>
>>> If you just want to get rid of the first 3 lines:
>>>
>>> tail +4 file > newfile
>>>
>>> If you want to get rid of the lines based on their content, some
>>> examples:
>>>
>>> awk '$1 !~ /^100[1-3]$/' file > newfile
>>> awk '$2 !~ /^(xxxxx|yyyyy|zzzzz)$/' file > newfile
>>> awk '$2 != "xxxxx" && $2 != "yyyyy" && $2 != "zzzzz"' file > newfile
>>>
>> Thanks to all. Apparently there is no simple command. The file
>> contains information about orders and at some point I need to remove
>> the older records. You can >> a file easily. I didn't know if you
>> could cut it down simply.
>>
>> The lines contain:
>> order_num $logname `who -mx` `date`
>>
>> Any ideas on how to use awk on the output from `date` to remove lines
>> older than a control date?
>>
>> Thanks,
>> Ron
>
> Do you really care exactly what lines get trimmed and when, then, or
> do you simply need to always keep the file from growing past a
> certain size by removing old lines from the beginning?
>
> See JPR's prune.c
> I or he can give you a binary no problem.
>
> You make a simple config file that looks like
> filename kbytes
> filename kbytes
> ...
>
> and put prune in a cron job.
>
> Every time it runs, the named files are clipped down to the specified
> size by removing the excess from the beginning of the file.
> It's meant to be used on such things as syslog, so I assume it somehow
> manages not to interfere with other processes that might be actively
> writing to the end of the same file even as it's removing stuff from
> the beginning. (I've never had syslogd crash for example, and I've
> tried experiments involving writing to a file busily right exactly
> while prune comes along and prunes it, and the file was not missing
> any new lines afterwards.)
>
> The config file specifies a size in "kbytes", but really it only
> comes as close as possible without breaking a line so it wouldn't
> create any invalid lines in your file.
> If your file contains multiple lines that should be considered as one
> unit, then it would invalidate the very oldest unit.
And I just looked at the .c file again and see that it's not actually "JPR's
prune.c"
And I realized that pruning the file by it's size may not be practical if
the amount of data the file receives in a day varies hugely.
IE: if you might get 300 megs per day for several days, and then 10k per day
for several days, then what size do you set for the file and how often do
you trim it? 1.5 gigs, trimmed once a day, would allow you to receive a
maximum of 500 megs in one day and retain a garunteed minimum of only 3
days.
If your numbers are more consistent, and significantly smaller than that,
then it's a viable option.
If you _usually_ only get say 1 meg per day, and you want to keep a month on
hand, and you set a file size of 45 megs just to be generous, no problem,
except if it turns out to be even remotely possible that you might get 50
megs in one day, then the next day after that you wouldn't even have one
full day of history.
Can you inject data into the file and have it be ignored by the process that
normally reads it?
Can you once a day do something like this?
echo "STAMP`date +%Y%m%d`" >> file
If you can do that, then it's easy to then you can use gnu date to get a
date from the past and use awk to search through the file for it and discard
lines untill you find it.
awk -v D=STAMP`gdate -d "100 days" +%Y%m%d` -v P=0 '($0 ~ D){P="1"};(P ==
"1"){print $0}' <file >file.new
translation:
once a day you add a line to the end of the file like:
STAMP20040613
so they are scattered through the file, so don't sort the whole file
anymore!
sort the additions before appending them
use gdate to set a variable D to be the same format as the stamp line from
100 days ago
STAMP20040924
set another variable P to 0 initially which will stand for "don't print"
the awk command then does two things on each line it reads from input.
1 - look to see if the line is the stamp line we are looking for (does $0
contain D ?)
if so, then toggle printing on. (set P = 1)
2 - if printing is is on (does P = 1 ?)
then output the current input line (print $0)
End result is awk reads through the file from the beginning and outputs
nothing until it reaches the target stamp line, then outputs every line from
that point on.
-- Brian K. White -- brian@aljex.com -- http://www.aljex.com/bkw/ +++++[>+++[>+++++>+++++++<<-]<-]>>+.>.+++++.+++++++.-.[>+<---]>++. filePro BBx Linux SCO Prosper/FACTS AutoCAD #callahans Satriani
- Previous message: Thom Price: "problem upgrading to Java 2 v1.3.1 on 5.0.6"
- In reply to: Brian K. White: "Re: Removing lines from a plain text file"
- Next in thread: John DuBois: "Re: Removing lines from a plain text file"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|
|