How to improve performance of regular expression pattern matching

From: David Marshall (vers_at_nwlink.com)
Date: 11/26/03


Date: 25 Nov 2003 17:49:02 -0800

I have a korn shell script which is parsing through a file looking for
a specific pattern on each line. The list of patterns is stored in an
array, so for each line of the input file the script needs to loop
through the array until it has either found a pattern stored in the
array or has compared with all elements of the array and found
nothing.

I first used the following code to compare the line with the array
elements:

# This loop occurs while reading every line of the input file
# loop thru array and compare current line of file to each item in
array.
while (( j < $count )) # count = count of array
do
  if [[ $inline = *${Names[j]}* ]]; then
    echo "$inline" >> $1.${Names[j]}.tmp
  fi
let j=j+1
done
let j=0 # reset to zero before reading next line of input file.

The issue I ran into with this is that if I have items in the array
that include other items (e.g. john & johnny), I see duplicate
processing going on. So I adjusted the comparison to the following:

  if [[ $inline = *@(${Names[j]})+([[:space:]])* ]]; then

This way it is able to distinguish between these cases by looking for
the pattern followed by a space. The problem is that with this
comparison, performance is drastically reduced. I tested these on a
15MB file and the first example finishes in 15 minutes where the
second takes 450 minutes (7.5 hours.)

I am looking for any suggestions to improve the performance of this
script. Is there a better way to compare a variable with a pattern
(including the space) that will improve performance? Any suggestions
would be greatly appreciated.

Thanks,
David



Relevant Pages

  • Re: shel script problem regarding arrays
    ... Ive got to compare file sizes from two remote linux machines. ... of which I made the following script. ... Feeding the output of `ls -l` into an array ...
    (comp.unix.shell)
  • Re: shell script to compare and delete files from a server???
    ... There is a perl script on B which runs every hour. ... Make a shell script in B, to compare, whether the downloaded files ... already uploaded to A. Use that list to generate "checksums" ... Feeding the output of `ls -l` into an array ...
    (comp.unix.shell)
  • Re: elseif statements
    ... of script and then if the script is equal to the script in the elseif ... loop, perform an action. ... but I am having trouble getting Matlab to compare ... returning an array of that size. ...
    (comp.soft-sys.matlab)
  • Re: shell script to compare and delete files from a server???
    ... There is a perl script on B which runs every hour. ... Make a shell script in B, to compare, whether the downloaded files ... already uploaded to A. Use that list to generate "checksums" ... Feeding the output of `ls -l` into an array ...
    (comp.unix.shell)
  • Re: How to improve performance of regular expression pattern matching
    ... >a specific pattern on each line. ... so for each line of the input file the script needs to loop ... >through the array until it has either found a pattern stored in the ... >I first used the following code to compare the line with the array ...
    (comp.unix.shell)