Re: SORT by text fields

From: Kevin Collins (spamtotrash_at_toomuchfiction.com)
Date: 12/23/03


Date: 22 Dec 2003 17:14:46 -0800

Tapani Tarvainen <gn20031221T182819@tt.oma.it.jyu.fi> wrote in message news:<n6ad5myvy2.fsf@tt.oma.it.jyu.fi>...
> Gnarlodious <gnarlodiousNULL@VOID.invalid.yahoo.com> writes:
>
> > I want to sort lines according to fields containing variable-length text.
> >
> > <td class=Name>variableLengthText<td class=Artist>variableLengthText<td
> > class=Album>variableLengthText<td class=Genre>variableLengthText<td
> > class=Size>variableLengthText<td class=Year>variableLengthText
> >
> > Let's say I want to sort using the "Genre" cell, what "sort" options will do
> > that?
>
> Try
>
> sort -t'>' -k5

Assuming that would work as requested (in my opinion, it wouldn't work
very well unless the other HTML output could be removed), you would be
sorting _starting_ from field 5 through the end of the line, not just
on that field, which could yield wrong results.

To do what you want requires more than a simple one-liner... This
could be accomplished with various tools, although I would use Perl.

This will do what you need assuming you only have lines of a format as
described above and no other HTML:

#!/usr/bin/perl

while (<>)
{
    chomp;
    my @F = split(/<td class=[^>]*>/);

    $lines{"$F[4]"} = $_;
}

foreach (sort(keys(%lines)))
{
    print $lines{"$_"} . "\n";
}

In this example '$F[4]' is the 5th field (delimited by a <td ...>). A
one liner might look like this:

perl -an -F'<td class=[^>]*>' -e '$lines{"$F[4]"} = $_; END {print
$lines{"$_"} foreach (sort(keys(%lines)));}'

Since this is part of an HTML table definition you are asking for
trouble unless you have some way to pull these lines out of the rest
of the HTML and then put them back in... it can be done :)

Good luck,

Kevin