Re: for loop (?)

From: Ed Morton (morton_at_lsupcaemnt.com)
Date: 01/30/04

  • Next message: Ed Morton: "Re: Simple join of 2 lines!!! Arrgh!"
    Date: Fri, 30 Jan 2004 10:50:52 -0600
    
    

    Icarus Sparry wrote:
    > On Thu, 29 Jan 2004 14:49:37 -0600, Ed Morton wrote:
    >
    >
    >>
    >>Icarus Sparry wrote:
    >><snip>
    >>
    >>>awk 'NR>'$line' {exit}
    >>>{ f="client" NR "_out" ; print $1 $2 > f ; close(f)}' out
    >>
    >>Whenever possible, you should use awk variables in awk scripts instead
    >>of shell variables to avoid problems when those shell variables include
    >>special characters (\, #, newline, etc.), so do this instead:
    >>
    >>awk -v line="$line" 'NR>line {exit}
    >>{ f="client" NR "_out" ; print $1 $2 > f ; close(f)}' out
    >>
    >>or (if -v not supported by your version of awk):
    >>
    >>awk 'NR>line {exit}
    >>{ f="client" NR "_out" ; print $1 $2 > f ; close(f)}' line="$line" out
    >>
    >><snip>
    >>
    >>>awk 'BEGIN { for(i=1;i<='$1';++i) { print i ; } exit }'
    >>
    >>awk -v max="$1" 'BEGIN { for(i=1;i<=max;++i) print i }'
    >>
    >>or
    >>
    >>awk '{ for(i=1;i<=max;++i) print i }' max="$1" <<!
    >>
    >>!
    >>
    >>The latter needs that syntax because assigning variables without -v
    >>means those variables don't apply to the BEGIN section so we have to
    >>move the for loop to the main body and provide some kind of input (I
    >>chose a here document with a blank line).
    >>
    >>The specific above cases would probably work fine as originally written,
    >>but it's good to get into the habit of using awk variables so there's no
    >>surprises down the road.
    >
    >
    > You have called me on this before, but I didn't bother to answer.
    >
    > I assert that you have to be aware of the limitations of your tools if you
    > are going to explore the boundaries!

    I agree.

    If you need to write programs which
    > are robust against all possible inputs then you need something stronger
    > than traditional shell implementations.

    Probably true. I doubt if I've ever written a shell script that's robust
    against all possible inputs.

      I am not aware of any shell which
    > can handle NUL characters correctly in its input. For the particular
    > characters you list ('\', '#', newline) you need to take special action to
    > pass these characters to the script in the first place.

    The actions aren't all that special or even atypical, e.g. putting the
    argument in quotes.

    (See below for
    > a problem with one of your solutions where you don't need to take any
    > special actions to get invalid output).
    >
    > I *know* that adding the variable=value syntax to awk was a mistake. It means
    > that you are unable to have filenames with equal signs in them without
    > introducing ambiguity, or extra processing.

    Can't argue with that. I've never come across a file name with an = sign
    in it, but you're right that is an issue if you have that situation.

    <snip>
    > I have less problems with the '-v' approach, but this is not portable.

    True.

    > In particular /usr/bin/awk on SOLARIS 8 does not support it. Yes it does have
    > nawk, which does support it, but this is a different command to type.

    True again. You can also download gawk if it's not already present.

    > Trust me when I say I can write scripts which can test for the existence
    > of nawk, and if so use it.
    >
    > All this comes down to how robust you want to make things vs how portable.

    Absolutely.

    > For now I will continue to look at the problem, and see if it is worth the
    > extra effort to make the script robust against weird inputs. My feeling
    > is that for the vaste majority of the programs posed in comp.unix.shell
    > the extra effort is not worthwhile,

    I agree - you don't want to put any extra effort in, but I contend that
    using var="$var" really isn't any extra effort compared to using
    '"$var"' and it avoids many common problems so we should just always use
    that unless a situation comes up that makes it untenable.

      and for those where it is worthwhile
    > then one should either use perl (python, ruby etc) or write a C program.
    > Certainly for the trivial 'seq' I think my approach is correct. The program
    > will not do anything sensible if given an arguement like "fred", so why
    > should I care if it will do something sensible with an arguement like "'#'"?

    If it did do something sensible with "#" then I wouldn't care either,
    but if I pass "fred" to your script I get no output, which is perfectly
    sensible, whereas if I pass "#" to your script I get this output:

            awk: syntax error near line 1
            awk: illegal statement near line 1

    which, to someone unfamiliar with this problem is a completely baffling
    error message that sends them off trying to debug a syntax error that,
    arguably, doesn't exist.

    If you use my solution, then you get the same sensible result whether
    the input is "fred" or "#".

    > In <bv8ob4$2i3@netnews.proxy.lucent.com> you suggest the same thing. I
    > look forward to you writing the fully correct solution, as there you are
    > changing the thing being passed in from a literal string to a regular
    > expression, but without bothering to quote any characters which are special.
    > Given an input like
    >
    > PRINTER=P*
    > awk -v printer=$PRINTER '$0 ~ printer'
    >
    > you will get out a lot more than you might expect.

    As you said, there's no point going to extra effort to solve problems
    that don't exist in the real environment. If this solution really
    produced a problem I doubt if it'd be difficult to solve that specific
    problem without wasting a lot of time deisgining a totally foolproof
    solution. In particular, changing "$0 ~ printer" to "$1 == printer" (or
    whichever field held the printer name) would be trivial - I just kept
    the "~" because it most closely matched the original and your posted
    solution and I was just trying to correct the variable usage so the OP
    wouldn't think that that was the best way to pass shell variables to awk
    scripts, not come up with a better solution to the original problem as I
    thought your solution was probably perfectly adequate.

    > Feel free to continue 'correcting' my posts, but please make sure that
    > your improvements actually are worthwhile, that they are not solving
    > problems outside the original domain, and that they don't add as many
    > new problems as they solve.

    They are very worthwhile and they don't lead to more problems but rather
    readuce the chances of having problems in most applications. They aren't
    perfect either, of course, because there is no perfect way to pass shell
    variables to awk. Either they aren't portable (-v) or they don't work in
    BEGIN and preclude file names that contain "=" signs (var=value) or they
    create problems when the shell variables expand (awk '..'"$var"'...').

    I had hoped to find a statement of all the pros and cons in the
    comp.lang.awk FAQ, but it's not there - it actually recommends using the
    awk '..'"$var"'...' solution, which is the one that's most likely to get
    you into trouble. You mention protecting against "weird inputs", but the
    average user is much more likely to have, say, a newline in a variable
    than they are to have file names with "=" signs in them.

    As an example of another issue with '"$var"' solution; put these lines
    into an executable file called, say, "tst.awk":

            arg="$1"
            gawk 'BEGIN{print "arg="'"$arg"'}'
            gawk -v arg="$arg" 'BEGIN{print "arg=" arg}'

    and then execute the file as

            tst.awk "hello world"

    and you'll get this output:

            arg=
            arg=hello world

    Once again, the output from using "var=$var" is perfectly reasonable,
    while that from '$var' is, to me at least, inexplicable!

    > Of course the big problem with answering questions on Usenet is trying to
    > figure out what the problem actually is :-).

    Amen. Maybe we need a template with "abstract, description, expected
    input, expected output"...

            Ed.


  • Next message: Ed Morton: "Re: Simple join of 2 lines!!! Arrgh!"

    Relevant Pages

    • Re: for loop (?)
      ... you should use awk variables in awk scripts instead ... > of shell variables to avoid problems when those shell variables include ... can handle NUL characters correctly in its input. ... pass these characters to the script in the first place. ...
      (comp.unix.shell)
    • passw awk variable to shell script
      ... Sorry guys not specifically solaris related but i cant find any info ... I want to set a shell variable in a script but with awk. ... I can find so many pages about setting awk variables with shell variables ...
      (SunManagers)
    • Re: How to rewrite with awk?
      ... > I'm unfamiliar with tools such as sed & awk. ... Extract the string that matches a RE. ... This script will not only expand all the lines that say "include ... file) and not resetting ARGV(the tmp file), it then lets awk do any ...
      (comp.unix.shell)
    • Re: How to remove all .DS_Store files using Terminal?
      ... Great tip, I will add that to my limited Unix arsenal, thanks. ... When I first saw this script, ... into an awk script. ... The conditional expression can be a search string as in /abc/ or a ...
      (comp.sys.mac.system)
    • Re: [PATCH] Linux 2.6: shebang handling in fs/binfmt_script.c
      ... script files and their interpreters (shells, awk, perl, python, guile, ... that that shell or interpreter would be poorly ... And, to be truthful, the usual way that I code awk scripts is not as ...
      (Linux-Kernel)