Re: sometimes awk works and sometimes /usr/xpg4/bin/awk works ..

From: Dennis Clarke (dclarke_at_blastwave.org)
Date: 08/12/03


Date: Mon, 11 Aug 2003 18:24:35 -0700

On Mon, 11 Aug 2003, Richard L. Hamilton wrote:
>In article <Pine.GSO.4.53.0308111201590.3426@blastwave>,
> Dennis Clarke <dclarke@blastwave.org> writes:
>>
>>
>> just a rant ..
>>
>> Sometimes it seems like I should just link /usr/bin/awk to /usr/xpg4/bin/awk
>>
>> $ grep "1000A-10-ENC" foo.dat | awk 'BEGIN{FS=";"}{print $1 "\t" $4 }'
>> awk: record `1000A-10-ENC; ;2990....' has too many fields
>> record number 1
>>
>>
>> $ grep "1000A-10-ENC" foo.dat | /usr/xpg4/bin/awk 'BEGIN{FS=";"}{print $1 "\t" $4 }'
>> 1000A-10-ENC 4
>
>Here's the results of some brute force testing of a line like
>
> perl -e 'print "x " x '"${f}"';' | ${awk} '{x=$'"${f}"'}'
>
>for values of ${f} starting with 1 and different flavors of awk for ${awk}:
>
>
>awk: trying to access field 100
>oawk failed with 100 fields

 yep .. I get the same problem. I have extracted data from a database and
there are 768 fields to a record. kaboom.

>
>nawk: trying to access field 500
> source line number 1
> context is
> >>> {x=$500 <<< }
>nawk failed with 500 fields

I did not try anything else other than /usr/xpg4/bin/awk so I wouldn't know.
You seem to have nailed down the situation quite neatly though.

>
>/usr/xpg4/bin/awk: line 0 (NR=1): Too many fields (LIMIT: 4000)
>/usr/xpg4/bin/awk failed with 4001 fields

That is enough for most demanding situations.

>
>/usr/xpg4/bin/awk does fairly well compared to the first two. But with
>gawk, I knew it would be hopeless to simply increment, so I went to doubling.
>It handled 2097152 fields just fine, but took long enough on the next doubling
>(4194304) that I didn't feel like waiting and killed it.

okay .. well it is reasonable to say that gawk would work for even those
situations where the record size is completely ridiculous. I would guess that
it would work given enough RAM. If you like I can test it on a V880 with 8Gb
of RAM. Just to see if it breaks at some reasonable ( or unreasonable )
point.

> Probably VM was
>the limitation; on a larger system with more than 1 1/8 GB RAM, it would've
>probably kept going somewhat longer (with enough RAM, nominally until it hit the
>limits of 32 bit (signed or even unsigned) numbers, but more realistically
>until it maxed out a 32-bit address space; I haven't needed to explore the
>possibility of building a 64-bit gawk executable).

I'll check with blastwave.org to see what's up with a build of gawk for 64 bit
scenarios. I don't know how relevant it will be though. Seems like everyone
is going Intel these days and the 64-bit architecture is a great idea but not
needed.

>
>So if you want a version of awk that for most practical purposes doesn't have
>a maximum number of fields limitation, it's rather clear to me which one that
>would be.

I agree.

>
>Recently I ran into a case where nawk was bombing during patchadd (too
>darn many patches installed, I guess). So I just moved it over and replaced
>it with a symlink to gawk. Thus far, nothing has broken, and I can once
>again install patches without that particular problem.

More than 500 patches?

>
>Of course you do have to scrounge and build gawk yourself (unless it's on
>the freeware CD, or you're willing to get it from one of the sites with
>prebuilt binaries for Solaris), but if you need it, it's well worth having.

Probably in the GNU textutils from blastwave.org site.

>
>But I think the real answer for patching problems would be if Sun rewrote
>the patching (and any other package related) scripts in perl, which could
>also cut the number of child processes (since perl can do pretty much
>everything that sh, awk, sed, etc. can do and then some) and considerably
>speed up patch installation. Since a stable (for a given version of the
>OS) version of perl is pretty much a core part of Solaris >= 8, I can't see
>any good reason (aside from the manpower needed) _not_ to.

The source to awk probably has not been touched since Solaris 2.5.1 days.

Dennis



Relevant Pages

  • Re: sometimes awk works and sometimes /usr/xpg4/bin/awk works ..
    ... gawk, I knew it would be hopeless to simply increment, so I went to doubling. ... So if you want a version of awk that for most practical purposes doesn't have ... again install patches without that particular problem. ... the patching scripts in perl, ...
    (comp.unix.solaris)
  • Re: sed _s_gnu_alternatives_ (Re: [rft] (g)awk substitution)
    ... You attempted to hijack my simple observation ("awk is the standard name, gawk ... I could also teach busybox awk to be called as "gawk", ...
    (Linux-Kernel)
  • Re: sed _s_gnu_alternatives_ (Re: [rft] (g)awk substitution)
    ... You attempted to hijack my simple observation ("awk is the standard name, gawk ... I could also teach busybox awk to be called as "gawk", ...
    (Linux-Kernel)
  • Re: Attempt #3, adding a new command sfilter
    ... have you seen how much crap gets installed when you add perl? ... Why not install the gawk port on the machines you need this on ... features) to our base-system awk. ...
    (freebsd-current)
  • Re: Automatisierte TextVerarbeitung
    ... AWK - oder bessergesagt "awk" kam mir doch gleich irgendwie bekannt vor ... Einige kann man auch als exe fuer Windows beziehen ... Ich habe soeben ein altes mawk herausgekramt und ein mini-awk script, ... Betreff: Re: where are the binaries for GAWK? ...
    (microsoft.public.de.german.windowsxp.sonstiges)