Re: delete using sed and line number file ....



On 3/21/2008 3:40 AM, pk wrote:
Ed Morton wrote:


awk 'NR==FNR{skip[$0];next}!(FNR in skip)' FileB FileA > FileC


This is one I've been wondering for a long time. If FileA and FileB are very
large, isn't the (FNR in skip) check inefficient? I mean, that seems to
imply a walk over the entire array to see whether the element exists each
time the condition is chacked. (I may be wrong of course, probably due to
my ignorance about the inner workings of awk). Wouldn't something like this

awk 'NR==FNR{skip[$0]++;next} skip[FNR]==0' FileB FileA > FileC

be more efficient?

Could be, though I expect the "in" operator is using hashing so it'd be close as
you're trading a hash lookup for an arithmetic increment plus an index plus a
comparison.

Here's the result of running both scripts twice deleting every odd-numbered line
in a 1-million line file using GNU awk 3.1.6:

$ wc -l FileB FileA
500000 FileB
1000000 FileA
1500000 total
$ time awk 'NR==FNR{skip[$0];next}!(FNR in skip)' FileB FileA > FileC

real 0m29.016s
user 0m28.546s
sys 0m0.328s
$ time awk 'NR==FNR{skip[$0]++;next} skip[FNR]==0' FileB FileA > FileC

real 0m29.558s
user 0m29.015s
sys 0m0.436s
$ time awk 'NR==FNR{skip[$0];next}!(FNR in skip)' FileB FileA > FileC

real 0m28.915s
user 0m28.484s
sys 0m0.483s
$ time awk 'NR==FNR{skip[$0]++;next} skip[FNR]==0' FileB FileA > FileC

real 0m29.502s
user 0m29.186s
sys 0m0.327s

Regards,

Ed.

.



Relevant Pages

  • Re: Wrong Publisher File Attached in Email?
    ... User opens fileA ... User edits fileB and saves it ... The "Sent" folder in Outlook is opened in order to check the email to ... If we go straight to Outlook after saving fileB, and manually attach fileB ...
    (microsoft.public.publisher)
  • Re: How to use "sed" to subtract a file from another file
    ... FileA is a superset of FileB. ... comm is another solution: ... sort -o FileA FileA ...
    (comp.unix.shell)
  • Re: exchange some string in two files?
    ... exchange everyline of char 200-250 on FILEA with everyline of char ... Is there any corelation between FILEA and FILEB? ... there is no ovbious corelation between FILEA and FILEB. ...
    (comp.unix.shell)
  • Re: read-modify-write textfile problem
    ... Const ForReading = 1 ... Set fso = CreateObject ... fileB indeed contains the modified information. ... > I don't understand "when fileA and fileB are the same" ...
    (microsoft.public.frontpage.programming)