Re: parse unix-style difference reporting

From: Jonathan Leffler (jleffler_at_earthlink.net)
Date: 12/30/03


Date: Tue, 30 Dec 2003 07:53:19 GMT

Liang wrote:
> I want to diff two files or two versions of one file, and parse the output
> to find a summary of how many lines of replacement/addition/deletion in the
> two files.
>
> Known from diff/cleardiff, the output has a style like:
> 15a16, 15,17d3, 18c19,21 etc.
>
> Anyone know how to parse these output to generate a summary?

It isn't very hard to work it out, is it?

Each item conceptually has four numbers and an operation code:

N1,N2 op N3,N4

When there is just one number on one side of the operation, the values
  N1 and N2, or N3 and N4, are the same.

Inserts are easy: there's always a single number on the LHS, and the
number of lines inserted is N4-N3+1.

Similarly, deletes are easy: there's always a single number on the RHS
of the operator, and the number of lines deleted is N2-N1+1.

Number of lines replaced has two parts to the value - the number of
lines removed and the number replacing the removed lines. Depending
on your viewpoint, you can either choose to count the two values
separately (number removed NR = N2-N1+1, number inserted NI =
N4-N3+1), or you can be cleverer about the calculation and decide that
when NR > NI, then you have NI changed lines and NR-NI deleted lines,
and that when NR < NI, you have NR changed lines and NI-NR inserted
lines. When NR = NI, you have NR (or NI) changed lines, of course.

That took me five minutes to think and type - how long would it have
taken you to do it? (And cross-posted too?)

-- 
Jonathan Leffler                   #include <disclaimer.h>
Email: jleffler@earthlink.net, jleffler@us.ibm.com
Guardian of DBD::Informix v2003.04 -- http://dbi.perl.org/