Re: Read strings from one file and search for them in a directory containing htm files

From: Ed Morton (morton_at_lsupcaemnt.com)
Date: 11/28/05


Date: Mon, 28 Nov 2005 08:53:39 -0600

Meghavvarnam wrote:

> Ed Morton wrote:
<snip>
>>gawk 'NR==FNR{strings[$0]++;next}
>> { for (string in strings}
>> if (index($0,">"string"<") {
>> usedStrings[string]++
>> delete strings[string] # for efficiency
>> }
>> }
>> END { for (string in usedStrings)
>> print string
>> }' allStrings.txt directory/*.htm > usedStrings.txt
<snip>
> This is the script that I tried -
>
> # listused
> # lists strings that are used in all .htm files
>
> gawk 'NR==FNR{strings[$0]++;next} {
> for (string in strings) #}
> print string
> if (index($0,">"string"<") || index($0,"\""string"\"")
> || index($0,">"string"\n")) {
> usedStrings[string]++
> delete strings[string] # for efficiency
> }

Note that the above is now:

        for (string in strings)
                print string
        if (index...) {
        }

By adding "print string" between the "for.." and the "if..", you've
taken the "if..." outside of the loop. Add parens to make what you want
explicit {...}.

> }
> END {
> for (string in usedStrings)
> print string
> }' allStrings.txt htm/*.htm > usedStringsfile
>
> Please let me know, if there is any mistake in this.

Yes, there is. You now only have "print string" in the "for" loop. The
"if ..." is outside of it.

  I gave execute
> permission for the file that contained this script and ran it from the
> shell.
>
> usedStringsfile was empty at the end of it.
>
> Any pointers will be of great help.
>
>
>>If you'd like the awk script to tell you which strings are/aren't used,
>>that's trivial, e.g.:
>>
>>gawk 'NR==FNR{strings[$0]++;next}
>> { for (string in strings}
>> if (index($0,">"string"<") {
>> usedStrings[string]++
>> delete strings[string] # for efficiency
>> }
>> }
>> END {
>> print "Used Strings:"
>> for (string in usedStrings)
>> printf "\t%s\n",string
>> print "Unused Strings:"
>> for (string in strings)
>> printf "\t%s\n",string
>> }' allStrings.txt directory/*.htm
>>
>
> I modified the script above to remove all parse errors.

What parse errors? There may be some since it's untested, but I don't
see any.

Here is the
> script that I used to try out -
>
> gawk ' NR==FNR{strings[$0]++;next}
> { for (string1 in strings)
> string = sprintf("<%s>", string1)

Here again you've added a line and so taken the subsequent block (the
"if...") out of the loop.

> if (index($0,">"string"<")) {
> usedStrings[string]++
> delete strings[string] # for efficiency
> }
> }
> END {
> print "Used Strings:"
> for (string in usedStrings)
> printf "\t%s\n", string
> print "Unused Strings:"
> for (string in strings)
> printf "\t%s\n", string
> }' allStrings.txt htm/*.htm
>
> I see the same behaviour with this as with the earlier script.

By that do you mean that "usedStringsfile" is empty? Well, yes, it would
be since no-where above do you direct any output to it, but additionally
you've broken the loop again.

  Would we
> need a different approach for this thing at all ??

No.

> What does the line - NR==FNR{strings[$0]++;next} do.

See Janis' response.

> Thank you in advance so much for your help.

You're welcome,

        Ed.



Relevant Pages

  • Re: RFC: Building the Perfect Tabbed Pane (an tutorial article)
    ... manipulation. ... that logic must reside in a script block ... head shouldn't be an issue as long as the script is after the closing ...
    (comp.lang.javascript)
  • Re: Javascript: string detection
    ... this script doesn' work, probably, because of the wrong syntax. ... A complete syntax check is better left to the server where existing code is more readily available and not a burden on the client. ... Be aware that even the regular expression above is restrictive as it won't accept literal IPv6 addresses, those that contain display names, or comments in some locations. ...
    (comp.lang.javascript)
  • Re: Grep and mv
    ... Or just the ones you have created with the script? ... > My silly little grep script extracts the names as ... > The error messages I get when running this script are: ...
    (comp.unix.shell)
  • Re: Need Help with my script logic
    ... Below is the complete script. ... the hidden inputs get populated after the following on the form: ... the other relevant fields are pretty much everything that takes data ...
    (comp.lang.javascript)
  • Re: Unix scripts
    ... operations at the beginning of the script while calling it after ... > If no file is given on the command line or all file lines were read, ... > The system calls echo and printf are frequently used for the file line ...
    (comp.unix.programmer)