Re: How to improve performance of regular expression pattern matching

From: David Marshall (vers_at_nwlink.com)
Date: 12/01/03


Date: 1 Dec 2003 14:10:01 -0800

Heiner Steven <heiner.steven@nexgo.de> wrote in message news:<3fc53193$0$20242$9b4e6d93@newsread2.arcor-online.net>...
> David, did you try Ed's suggestion to create one regular
> expression for all of the array elements? I'd be surprised
> if that solution wouldn't be considerably faster.

The reason I used an array of patterns rather than a single pattern
containing all of them is because I will be processing based on which
pattern is matched. So if "John" is matched I would be sending that
line to a file named output.John. So with this requirement I am unable
to use a single pattern.

> We can probably make the script still faster. Currently the
> script checks every element from $Names[] agains each single
> line of the input file. The speed could still improve
> if we found a fast way to find the interesting lines
> (containing one of the search terms), and to ignore the
> others.
>
> This can be done easily by running an "egrep" before
> the line processing:
>
> How fast is the script with this addition?

The reason I did not do this initially is because of the nature of the
file I am parsing. There are not many lines which don't match any
pattern, so the initial egrep only reduces the file size by about 5%.

To answer your question though, it seemed to take about 30 seconds off
of the 10 minute processing time. Not too big of an improvement but
I'll take what I can get.

Thanks for your suggestion,
David



Relevant Pages

  • Re: [RFC] Extending kbuild syntax
    ... The second is the more controversial suggestion. ... The pattern varies over this theme. ... To better express how to use it I have tried to update a few Makefiles ...
    (Linux-Kernel)
  • Re: Script to search an input file, insert a line and then update the file
    ... subject line, i.e. read in a file, search for specific string, if the ... Rainer's suggestion of ed is cool: ... the file foo, but I don't even know how to easily extend it to ... if the pattern doesn't appear in the file. ...
    (comp.unix.programmer)
  • Re: Script to search an input file, insert a line and then update the file
    ... Rainer's suggestion of ed is cool: ... if the pattern doesn't appear in the file. ... Here's a python snippet ... for line in buf: ...
    (comp.unix.programmer)
  • Re: Data conversion
    ... My next suggestion would be the same as Carsten's. ... If you can't spot any ... pattern by looking at the data, you may have to use whatever export ... Install a printer on your system which "prints" ...
    (microsoft.public.fox.helpwanted)
  • Regular Expressions
    ... Here is an example where record #2 is missing the ADDRESS2 line: ... Below is a pattern that matches only the first record. ... When I execute this pattern, ... Is there a way to do this with a single pattern or do I need to create a pattern for each record combination? ...
    (microsoft.public.dotnet.languages.vb)