Re: delete using sed and line number file ....
- From: Ed Morton <morton@xxxxxxxxxxxxxx>
- Date: Fri, 21 Mar 2008 06:24:09 -0500
On 3/21/2008 3:40 AM, pk wrote:
Ed Morton wrote:
awk 'NR==FNR{skip[$0];next}!(FNR in skip)' FileB FileA > FileC
This is one I've been wondering for a long time. If FileA and FileB are very
large, isn't the (FNR in skip) check inefficient? I mean, that seems to
imply a walk over the entire array to see whether the element exists each
time the condition is chacked. (I may be wrong of course, probably due to
my ignorance about the inner workings of awk). Wouldn't something like this
awk 'NR==FNR{skip[$0]++;next} skip[FNR]==0' FileB FileA > FileC
be more efficient?
Could be, though I expect the "in" operator is using hashing so it'd be close as
you're trading a hash lookup for an arithmetic increment plus an index plus a
comparison.
Here's the result of running both scripts twice deleting every odd-numbered line
in a 1-million line file using GNU awk 3.1.6:
$ wc -l FileB FileA
500000 FileB
1000000 FileA
1500000 total
$ time awk 'NR==FNR{skip[$0];next}!(FNR in skip)' FileB FileA > FileC
real 0m29.016s
user 0m28.546s
sys 0m0.328s
$ time awk 'NR==FNR{skip[$0]++;next} skip[FNR]==0' FileB FileA > FileC
real 0m29.558s
user 0m29.015s
sys 0m0.436s
$ time awk 'NR==FNR{skip[$0];next}!(FNR in skip)' FileB FileA > FileC
real 0m28.915s
user 0m28.484s
sys 0m0.483s
$ time awk 'NR==FNR{skip[$0]++;next} skip[FNR]==0' FileB FileA > FileC
real 0m29.502s
user 0m29.186s
sys 0m0.327s
Regards,
Ed.
.
- Follow-Ups:
- References:
- delete using sed and line number file ....
- From: LionelAndJen
- Re: delete using sed and line number file ....
- From: Ed Morton
- Re: delete using sed and line number file ....
- From: pk
- delete using sed and line number file ....
- Prev by Date: Re: "read" array variables via pipe vs. redirection
- Next by Date: Re: delete using sed and line number file ....
- Previous by thread: Re: delete using sed and line number file ....
- Next by thread: Re: delete using sed and line number file ....
- Index(es):
Relevant Pages
|