Re: for loop (?)
From: Ed Morton (morton_at_lsupcaemnt.com)
Date: 01/30/04
- Previous message: Carlos J. G. Duarte: "Re: Simple join of 2 lines!!! Arrgh!"
- In reply to: Icarus Sparry: "Re: for loop (?)"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Fri, 30 Jan 2004 10:50:52 -0600
Icarus Sparry wrote:
> On Thu, 29 Jan 2004 14:49:37 -0600, Ed Morton wrote:
>
>
>>
>>Icarus Sparry wrote:
>><snip>
>>
>>>awk 'NR>'$line' {exit}
>>>{ f="client" NR "_out" ; print $1 $2 > f ; close(f)}' out
>>
>>Whenever possible, you should use awk variables in awk scripts instead
>>of shell variables to avoid problems when those shell variables include
>>special characters (\, #, newline, etc.), so do this instead:
>>
>>awk -v line="$line" 'NR>line {exit}
>>{ f="client" NR "_out" ; print $1 $2 > f ; close(f)}' out
>>
>>or (if -v not supported by your version of awk):
>>
>>awk 'NR>line {exit}
>>{ f="client" NR "_out" ; print $1 $2 > f ; close(f)}' line="$line" out
>>
>><snip>
>>
>>>awk 'BEGIN { for(i=1;i<='$1';++i) { print i ; } exit }'
>>
>>awk -v max="$1" 'BEGIN { for(i=1;i<=max;++i) print i }'
>>
>>or
>>
>>awk '{ for(i=1;i<=max;++i) print i }' max="$1" <<!
>>
>>!
>>
>>The latter needs that syntax because assigning variables without -v
>>means those variables don't apply to the BEGIN section so we have to
>>move the for loop to the main body and provide some kind of input (I
>>chose a here document with a blank line).
>>
>>The specific above cases would probably work fine as originally written,
>>but it's good to get into the habit of using awk variables so there's no
>>surprises down the road.
>
>
> You have called me on this before, but I didn't bother to answer.
>
> I assert that you have to be aware of the limitations of your tools if you
> are going to explore the boundaries!
I agree.
If you need to write programs which
> are robust against all possible inputs then you need something stronger
> than traditional shell implementations.
Probably true. I doubt if I've ever written a shell script that's robust
against all possible inputs.
I am not aware of any shell which
> can handle NUL characters correctly in its input. For the particular
> characters you list ('\', '#', newline) you need to take special action to
> pass these characters to the script in the first place.
The actions aren't all that special or even atypical, e.g. putting the
argument in quotes.
(See below for
> a problem with one of your solutions where you don't need to take any
> special actions to get invalid output).
>
> I *know* that adding the variable=value syntax to awk was a mistake. It means
> that you are unable to have filenames with equal signs in them without
> introducing ambiguity, or extra processing.
Can't argue with that. I've never come across a file name with an = sign
in it, but you're right that is an issue if you have that situation.
<snip>
> I have less problems with the '-v' approach, but this is not portable.
True.
> In particular /usr/bin/awk on SOLARIS 8 does not support it. Yes it does have
> nawk, which does support it, but this is a different command to type.
True again. You can also download gawk if it's not already present.
> Trust me when I say I can write scripts which can test for the existence
> of nawk, and if so use it.
>
> All this comes down to how robust you want to make things vs how portable.
Absolutely.
> For now I will continue to look at the problem, and see if it is worth the
> extra effort to make the script robust against weird inputs. My feeling
> is that for the vaste majority of the programs posed in comp.unix.shell
> the extra effort is not worthwhile,
I agree - you don't want to put any extra effort in, but I contend that
using var="$var" really isn't any extra effort compared to using
'"$var"' and it avoids many common problems so we should just always use
that unless a situation comes up that makes it untenable.
and for those where it is worthwhile
> then one should either use perl (python, ruby etc) or write a C program.
> Certainly for the trivial 'seq' I think my approach is correct. The program
> will not do anything sensible if given an arguement like "fred", so why
> should I care if it will do something sensible with an arguement like "'#'"?
If it did do something sensible with "#" then I wouldn't care either,
but if I pass "fred" to your script I get no output, which is perfectly
sensible, whereas if I pass "#" to your script I get this output:
awk: syntax error near line 1
awk: illegal statement near line 1
which, to someone unfamiliar with this problem is a completely baffling
error message that sends them off trying to debug a syntax error that,
arguably, doesn't exist.
If you use my solution, then you get the same sensible result whether
the input is "fred" or "#".
> In <bv8ob4$2i3@netnews.proxy.lucent.com> you suggest the same thing. I
> look forward to you writing the fully correct solution, as there you are
> changing the thing being passed in from a literal string to a regular
> expression, but without bothering to quote any characters which are special.
> Given an input like
>
> PRINTER=P*
> awk -v printer=$PRINTER '$0 ~ printer'
>
> you will get out a lot more than you might expect.
As you said, there's no point going to extra effort to solve problems
that don't exist in the real environment. If this solution really
produced a problem I doubt if it'd be difficult to solve that specific
problem without wasting a lot of time deisgining a totally foolproof
solution. In particular, changing "$0 ~ printer" to "$1 == printer" (or
whichever field held the printer name) would be trivial - I just kept
the "~" because it most closely matched the original and your posted
solution and I was just trying to correct the variable usage so the OP
wouldn't think that that was the best way to pass shell variables to awk
scripts, not come up with a better solution to the original problem as I
thought your solution was probably perfectly adequate.
> Feel free to continue 'correcting' my posts, but please make sure that
> your improvements actually are worthwhile, that they are not solving
> problems outside the original domain, and that they don't add as many
> new problems as they solve.
They are very worthwhile and they don't lead to more problems but rather
readuce the chances of having problems in most applications. They aren't
perfect either, of course, because there is no perfect way to pass shell
variables to awk. Either they aren't portable (-v) or they don't work in
BEGIN and preclude file names that contain "=" signs (var=value) or they
create problems when the shell variables expand (awk '..'"$var"'...').
I had hoped to find a statement of all the pros and cons in the
comp.lang.awk FAQ, but it's not there - it actually recommends using the
awk '..'"$var"'...' solution, which is the one that's most likely to get
you into trouble. You mention protecting against "weird inputs", but the
average user is much more likely to have, say, a newline in a variable
than they are to have file names with "=" signs in them.
As an example of another issue with '"$var"' solution; put these lines
into an executable file called, say, "tst.awk":
arg="$1"
gawk 'BEGIN{print "arg="'"$arg"'}'
gawk -v arg="$arg" 'BEGIN{print "arg=" arg}'
and then execute the file as
tst.awk "hello world"
and you'll get this output:
arg=
arg=hello world
Once again, the output from using "var=$var" is perfectly reasonable,
while that from '$var' is, to me at least, inexplicable!
> Of course the big problem with answering questions on Usenet is trying to
> figure out what the problem actually is :-).
Amen. Maybe we need a template with "abstract, description, expected
input, expected output"...
Ed.
- Previous message: Carlos J. G. Duarte: "Re: Simple join of 2 lines!!! Arrgh!"
- In reply to: Icarus Sparry: "Re: for loop (?)"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|