Re: Help with pattern search, in various file formats.

From: bsh (brian_hiles_at_rocketmail.com)
Date: 05/24/05


Date: 23 May 2005 19:05:54 -0700

reineman1@llnl.gov wrote:
> What I need to do is search the contents of some files for a specific
> character pattern, then list that filename. Something such as find
> /export/home -exec grep -il "abc" {} \; would work perfectly, and
> does for text files such as inboxes.
> The problem is looking inside things such as Word, Excel, Powerpoint,
> etc. I can find what I am looking for by with "strings
> <filename> | grep -i "abc" This doesn't scale well for thousands of
> files, at least I don't know how.
> Using find I could build a filelist and do a foreach loop but the csh
> and sh on Solaris9 choke on an input list that large.
> I have a feeling that the find command is still the choice but can't
> figure out how to pipe the output of the strings command to grep for
a
> pattern, then only list the filename.
> find /export/home -exec strings {} \; | grep -i "abc" fails as the
> output of find are the contents of the filename.
> Appreciate suggestions/opinions for solving this one. Rick

Three things. First and foremost, _especially_ for scripts involving
non-trivial environment manipulation and quoting, avoid csh(1) --
use k/sh(1) instead.

Second, the usual idiom for this kind of thing would be to command:

find /export/home -print | xargs -n50 grep -il /dev/null

The xargs(1) keeps the environment from overflowing; the /dev/null
is a file that _never_ succeeds in the pattern match, forcing grep(1)
to prepend the file name when using multiple file command line
arguments. The -l argument becomes unnecessary, if you do indeed
desire to see the matched lines with filename info.

So, third, as to indexing within proprietary formats:
If you were _not_ using *nix, I would suggest using a specialized
textual indexing tool, like InMagic:

"INMAGIC Plus.EXE","1.0 release
5.0",http://support.inmagic.com/download/dos/inmagic.zip;http://www.inmagic.com/support/upgrades/upgrades.html;http://www.inmagic.com/support/script_lib/script_lib.html,$0,"ASCII
?DB (TEXIS,
tw12-23.zip?)",<inmagic@inmagic.com,support@inmagic.com,sales@inmagic.com>

but I'm sure that there is freeware/opensource somewhere online
for your system that is appropriate to your needs... perhaps
DocIndexer:

http://www.methods.co.nz/docindexer/

You will see from its home page that it is not a trivial task.

=Brian