Correlating Files, Compute similarity threshold?

From: Marc David Ronell (marc_ronell_at_highstream.net)
Date: 03/25/04


Date: 25 Mar 2004 08:38:13 -0500


Is there a UNIX utility, like diff or cmp which allows two files to be
compared or correlated but its result is some sort of a figure of
merit, which describes the similarity of the files?

The idea is to cross correlate a group of files and locate the files
which are most similar. I would like to use a utility as described
above to get a rough cut and then manually review those files above a
certain cross-correlation threshold.

Thanks,

marc