Re: is there a utility to compare two non-text files?




Loki Harfagr wrote:
Le Thu, 30 Mar 2006 19:39:42 +0200, Janis Papanagnou a écrit :

Loki Harfagr wrote:
Le Wed, 29 Mar 2006 11:43:07 -0800, RolandRB a écrit :

I know I can use diff to compare two text files but is there a utility
to compare two large non-text files?

if you don't care about "seeing the differences" and just want
a diagnostic on similarity you could also use checksums, like
md5sum and sha1sum

By similarity you mean equality?

But wouldn't you then just use diff -s (available with
GNU diff; is that option existing on other Unix systems?)
and evaluate the return status.

The problem with the checksum approach is that you need
calculations that are not necessary for ad-hoc comparisons
and the checksum calculation would usually read the whole
file even if the difference is in the first octet.

Agreed, it was just a suggestion in the idea that maybe the OP
was asking his question as a part of a wider question dealing on
probabilities a file was or not tainted against another version
of the ditto file, but I guess I was not clear enough in my
phrasing :-)
My problem (well, one of them ;-) is my bending on trying and
guess what the *real* questions are behind the stage, I reckon
that makes me somehow give strange answers to sharp questions :-)

Worse even, I focalized in my 'forensic' habits and proposed
checksum tools, absolutely forgetting the 'diff -s', thank you
for reminding me this plain solution, oh boy; now I feel stoopid :D)

Anyway, as I use GNU tools on most of my servers I guess I might also
have to thank you again beacause some heavy check scripts in 'cron'
are now going to be improved !

Actually, I was trying to compare two SAS data sets that had an
identical size. In the end only a native SAS procedure could do it.

.


Loading