Re: for loop (?)

From: Icarus Sparry (usenet_at_icarus.freeuk.com)
Date: 01/30/04


Date: Fri, 30 Jan 2004 03:03:36 GMT

On Thu, 29 Jan 2004 14:49:37 -0600, Ed Morton wrote:

>
>
> Icarus Sparry wrote:
> <snip>
>> awk 'NR>'$line' {exit}
>> { f="client" NR "_out" ; print $1 $2 > f ; close(f)}' out
>
> Whenever possible, you should use awk variables in awk scripts instead
> of shell variables to avoid problems when those shell variables include
> special characters (\, #, newline, etc.), so do this instead:
>
> awk -v line="$line" 'NR>line {exit}
> { f="client" NR "_out" ; print $1 $2 > f ; close(f)}' out
>
> or (if -v not supported by your version of awk):
>
> awk 'NR>line {exit}
> { f="client" NR "_out" ; print $1 $2 > f ; close(f)}' line="$line" out
>
> <snip>
>> awk 'BEGIN { for(i=1;i<='$1';++i) { print i ; } exit }'
>
> awk -v max="$1" 'BEGIN { for(i=1;i<=max;++i) print i }'
>
> or
>
> awk '{ for(i=1;i<=max;++i) print i }' max="$1" <<!
>
> !
>
> The latter needs that syntax because assigning variables without -v
> means those variables don't apply to the BEGIN section so we have to
> move the for loop to the main body and provide some kind of input (I
> chose a here document with a blank line).
>
> The specific above cases would probably work fine as originally written,
> but it's good to get into the habit of using awk variables so there's no
> surprises down the road.

You have called me on this before, but I didn't bother to answer.

I assert that you have to be aware of the limitations of your tools if you
are going to explore the boundaries! If you need to write programs which
are robust against all possible inputs then you need something stronger
than traditional shell implementations. I am not aware of any shell which
can handle NUL characters correctly in its input. For the particular
characters you list ('\', '#', newline) you need to take special action to
pass these characters to the script in the first place. (See below for
a problem with one of your solutions where you don't need to take any
special actions to get invalid output).

I *know* that adding the variable=value syntax to awk was a mistake. It means
that you are unable to have filenames with equal signs in them without
introducing ambiguity, or extra processing. For a concrete example, at work I
arrange for object files to be written to a shadow of the source tree, with
a prefix of the machine type. I can then write

find $( ls | grep -v SOLARIS8 ) -type f -print | xargs awk '...'

to process the source tree.
It is silly to have to put in a sed script to transform any filenames with
an '=' sign in them. The sed script required is not totally trivial, the
simplest one looks to see if there is a '/' character in the name and if
not prepends a "./", but this fails if there are newlines in the filenames.
Using '-print0' instead of '-print' means one can no longer use sed as it can
not handle NUL characters.

(The traditional use of the variable assignment in the middle of the
filenames was to enable processing of input files more than once in
different ways, e.g. to write a 2 pass assembler.)

I have less problems with the '-v' approach, but this is not portable.
In particular /usr/bin/awk on SOLARIS 8 does not support it. Yes it does have
nawk, which does support it, but this is a different command to type.

Trust me when I say I can write scripts which can test for the existence
of nawk, and if so use it.

All this comes down to how robust you want to make things vs how portable.

For now I will continue to look at the problem, and see if it is worth the
extra effort to make the script robust against weird inputs. My feeling
is that for the vaste majority of the programs posed in comp.unix.shell
the extra effort is not worthwhile, and for those where it is worthwhile
then one should either use perl (python, ruby etc) or write a C program.
Certainly for the trivial 'seq' I think my approach is correct. The program
will not do anything sensible if given an arguement like "fred", so why
should I care if it will do something sensible with an arguement like "'#'"?

In <bv8ob4$2i3@netnews.proxy.lucent.com> you suggest the same thing. I
look forward to you writing the fully correct solution, as there you are
changing the thing being passed in from a literal string to a regular
expression, but without bothering to quote any characters which are special.
Given an input like

PRINTER=P*
awk -v printer=$PRINTER '$0 ~ printer'

you will get out a lot more than you might expect.

Feel free to continue 'correcting' my posts, but please make sure that
your improvements actually are worthwhile, that they are not solving
problems outside the original domain, and that they don't add as many
new problems as they solve.

Of course the big problem with answering questions on Usenet is trying to
figure out what the problem actually is :-).

Icarus



Relevant Pages

  • Re: for loop (?)
    ... you should use awk variables in awk scripts instead ... >>of shell variables to avoid problems when those shell variables include ... I doubt if I've ever written a shell script that's robust ... > extra effort to make the script robust against weird inputs. ...
    (comp.unix.shell)
  • Re: splitting a very large file based on characters in a record (performance)
    ... X (each of 125 characters), and further I have to split the 125 ... While I use awk for most of my text manipulation work, ... personal experience perl will be much faster than awk. ... that I had an awk script and a perl script that read sendmail files to ...
    (comp.unix.shell)
  • Re: What does this mean?
    ... This is in a script I inherited from someone. ... It is not doing anything to $myFile; ... The script prints "1" if the second and fourth characters in the ...
    (comp.lang.awk)
  • Re: Script that counts counts the characters per line.
    ... >> the file line by line and count the characters of each line. ... > Use awk: ... > Using loops to process text in shells is bad practice. ... I removed the ' after END and now the script runs, ...
    (comp.unix.shell)
  • passw awk variable to shell script
    ... Sorry guys not specifically solaris related but i cant find any info ... I want to set a shell variable in a script but with awk. ... I can find so many pages about setting awk variables with shell variables ...
    (SunManagers)