How to improve performance of regular expression pattern matching
From: David Marshall (vers_at_nwlink.com)
Date: 11/26/03
- Next message: Ed Morton: "Re: How to improve performance of regular expression pattern matching"
- Previous message: rakesh sharma: "Re: copy command"
- Next in thread: Ed Morton: "Re: How to improve performance of regular expression pattern matching"
- Reply: Ed Morton: "Re: How to improve performance of regular expression pattern matching"
- Reply: Charles Demas: "Re: How to improve performance of regular expression pattern matching"
- Reply: Tapani Tarvainen: "Re: How to improve performance of regular expression pattern matching"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: 25 Nov 2003 17:49:02 -0800
I have a korn shell script which is parsing through a file looking for
a specific pattern on each line. The list of patterns is stored in an
array, so for each line of the input file the script needs to loop
through the array until it has either found a pattern stored in the
array or has compared with all elements of the array and found
nothing.
I first used the following code to compare the line with the array
elements:
# This loop occurs while reading every line of the input file
# loop thru array and compare current line of file to each item in
array.
while (( j < $count )) # count = count of array
do
if [[ $inline = *${Names[j]}* ]]; then
echo "$inline" >> $1.${Names[j]}.tmp
fi
let j=j+1
done
let j=0 # reset to zero before reading next line of input file.
The issue I ran into with this is that if I have items in the array
that include other items (e.g. john & johnny), I see duplicate
processing going on. So I adjusted the comparison to the following:
if [[ $inline = *@(${Names[j]})+([[:space:]])* ]]; then
This way it is able to distinguish between these cases by looking for
the pattern followed by a space. The problem is that with this
comparison, performance is drastically reduced. I tested these on a
15MB file and the first example finishes in 15 minutes where the
second takes 450 minutes (7.5 hours.)
I am looking for any suggestions to improve the performance of this
script. Is there a better way to compare a variable with a pattern
(including the space) that will improve performance? Any suggestions
would be greatly appreciated.
Thanks,
David
- Next message: Ed Morton: "Re: How to improve performance of regular expression pattern matching"
- Previous message: rakesh sharma: "Re: copy command"
- Next in thread: Ed Morton: "Re: How to improve performance of regular expression pattern matching"
- Reply: Ed Morton: "Re: How to improve performance of regular expression pattern matching"
- Reply: Charles Demas: "Re: How to improve performance of regular expression pattern matching"
- Reply: Tapani Tarvainen: "Re: How to improve performance of regular expression pattern matching"
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Relevant Pages
|