Re: Comparing files
From: xyz (persson_at_katamail.com)
Date: 03/23/04
- Next message: John L: "Re: Comparing files"
- Previous message: Chris F.A. Johnson: "Re: how to check existence of multiple files in if-clause ?"
- In reply to: John L: "Re: Comparing files"
- Next in thread: John L: "Re: Comparing files"
- Reply: John L: "Re: Comparing files"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 22 Mar 2004 23:35:15 -0800
"John L" <jl@lammtarra.fslife.co.uk> wrote in message news:<c3nbmv$405$1@newsg4.svr.pol.co.uk>...
> You can use the join command for this.
> (It performs a relational join, as understood by
> the database people.)
>
> join -1 1 -2 1 -a 1 -a 2 file1 file2
>
> This will output all the things you want, as well as
> rows which are the same in both files -- but it is
> simple to use awk or perl to remove these, since
> the last and last-but-one fields will be equal:
>
> join .... | awk '$NF != $(NF-1)'
Actually, this is nearly exactly what I need. The only downside is
that the output from the above won't let me tell if a given unpairable
line comes from file1 or file2, so I think I will do some trick like
this
cat file2 | awk '{ print "* " $0 }' | join -1 1 -2 2 -a 1 -a 2 file1 -
This way I know that, in the output, unpairable lines starting with "*
" and with 3 fields come from file2, and lines with 2 fields come from
file1. The awk for the subsequent remove will thus be
awk '$NF != $(NF - 2)'
since every other output line will have 4 fields.
Even better, since I create file1 and file2 myself with a script, I
can modify the script to create file2 with "* " at the beginning from
the start, so it's already in the right format for the join, and of
course the "* " can always be stripped out later. Also, I think the -o
option could also be useful (I have to read carefully the man page).
> One slight complication is that join needs is input
> to have been sorted, but if this is a problem for you
> then there are ways round it.
As I said before, this is not a problem since I create file1 and file2
myself.
> Have a quick look at the diff command. It might be useful
> *if* your input files have constrained formats.
> Likewise the comm command.
These were the alternatives I was considering before posting, but none
of them really does what I need (mainly because the output they
produce is difficult to parse easily - to me at least).
> The other way to approach it is to use a scripting
> language like awk, perl or python to process the two
> files in turn, building an associative array (or hash)
> of the first file and then using this to compare with
> the second. This is close to what you are trying to
> avoid (above) but is probably quick enough for most
> purposes.
Maybe I can try this alternative if the other way turns out to be
*very* inefficient (something I don't think will happen), otherwise I
think I'll stay with the first option you proposed.
Many thanks for now.
- Next message: John L: "Re: Comparing files"
- Previous message: Chris F.A. Johnson: "Re: how to check existence of multiple files in if-clause ?"
- In reply to: John L: "Re: Comparing files"
- Next in thread: John L: "Re: Comparing files"
- Reply: John L: "Re: Comparing files"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|
|