Re: Finding all e-mail addresses in files on server.
From: Kevin Collins (spamtotrash_at_toomuchfiction.com)
Date: 04/04/05
- Previous message: Kevin Collins: "Re: Finding all e-mail addresses in files on server."
- In reply to: ifoutch_at_gmail.com: "Finding all e-mail addresses in files on server."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Date: Mon, 04 Apr 2005 21:24:13 GMT
In article <1112576768.533990.100930@z14g2000cwz.googlegroups.com>, ifoutch@gmail.com wrote:
> Looking for advice and/or examples on how to audit a server to find all
> possible e-mail addresses listed in files. Need to verify and possibly
> consolidate addresses for server monitoring and administration. What I
> have been doing is greping thru the filesystem and checking for
> matches to entries in /etc/mail/aliases, /etc/password and any lines
> containing "mail" (case insensitive) or "@" in crontabs, and scripts.
>
> What I would like to do is create an exclude list of certain
> directories. Then go through the remaining dirs. and extract mail
> addresses and print them to a file with the filename. This would be run
> on a production systems during off hours but still needs to be "nice"
> to system resources. I want to automate this as much as possible. So
> that we can periodically do an audit of all our systems.
>
> Has anybody done anything like this previously? Any pointers or advice
> greatly appreciated.
>
Since this topic (recursively searching) comes up a lot, I'm posting a shar
file (shell archive) of one of my favorite tools ever. This script called
'rgrep' uses Perl and is very quick and powerful. It uses Perl's regexp engine,
which is more powerful than a grep, too.
Save the text below as 'rgrep.shar' and then run "sh rgrep.shar". It will
create rgrep. Run 'rgrep -h' for help.
Before I post it, I should mention that searching for email addresses sounds
like a trivial task, but see the O'Reilly book 'Mastering Regular Expressions'
for the regexp that actually matches all valid email addresses per the RFCs -
its just under 6600 characters! It might be simpler to search for all the mail
programs such as sendmail, mail, mailx, pine, elm, etc...
Kevin
--- Cut Here ---
#!/bin/sh
# This is a shell archive (produced by GNU sharutils 4.2.1).
# To extract the files from this archive, save it to some FILE, remove
# everything before the `!/bin/sh' line above, then type `sh FILE'.
#
# Made on 2005-04-04 14:08 PDT by <cokm@cpafisxw>.
# Source directory was `/util/bin'.
#
# Existing files will *not* be overwritten unless `-c' is specified.
#
# This shar contains:
# length mode name
# ------ ---------- ------------------------------------------
# 6119 -rwxrwxr-x rgrep
#
save_IFS="${IFS}"
IFS="${IFS}:"
gettext_dir=FAILED
locale_dir=FAILED
first_param="$1"
for dir in $PATH
do
if test "$gettext_dir" = FAILED && test -f $dir/gettext \
&& ($dir/gettext --version >/dev/null 2>&1)
then
set `$dir/gettext --version 2>&1`
if test "$3" = GNU
then
gettext_dir=$dir
fi
fi
if test "$locale_dir" = FAILED && test -f $dir/shar \
&& ($dir/shar --print-text-domain-dir >/dev/null 2>&1)
then
locale_dir=`$dir/shar --print-text-domain-dir`
fi
done
IFS="$save_IFS"
if test "$locale_dir" = FAILED || test "$gettext_dir" = FAILED
echo=echo
else
TEXTDOMAINDIR=$locale_dir
export TEXTDOMAINDIR
TEXTDOMAIN=sharutils
export TEXTDOMAIN
echo="$gettext_dir/gettext -s"
fi
if touch -am -t 200112312359.59 $$.touch >/dev/null 2>&1 && test ! -f 200112312359.59 -a -f $$.touch; then
shar_touch='touch -am -t $1$2$3$4$5$6.$7 "$8"'
elif touch -am 123123592001.59 $$.touch >/dev/null 2>&1 && test ! -f 123123592001.59 -a ! -f 123123592001.5 -a -f $$.touch; then
shar_touch='touch -am $3$4$5$6$1$2.$7 "$8"'
elif touch -am 1231235901 $$.touch >/dev/null 2>&1 && test ! -f 1231235901 -a -f $$.touch; then
shar_touch='touch -am $3$4$5$6$2 "$8"'
else
shar_touch=:
echo
$echo 'WARNING: not restoring timestamps. Consider getting and'
$echo "installing GNU \`touch', distributed in GNU File Utilities..."
echo
fi
rm -f 200112312359.59 123123592001.59 123123592001.5 1231235901 $$.touch
#
if mkdir _sh15995; then
$echo 'x -' 'creating lock directory'
else
$echo 'failed to create lock directory'
exit 1
fi
# ============= rgrep ==============
if test -f 'rgrep' && test "$first_param" != -c; then
$echo 'x -' SKIPPING 'rgrep' '(file already exists)'
else
$echo 'x -' extracting 'rgrep' '(text)'
sed 's/^X//' << 'SHAR_EOF' > 'rgrep' &&
#!/usr/bin/perl
X
die "Usage: rgrep [-iredblL] regexp filepat ...\n rgrep -h for help\n"
X if $#ARGV < $[;
X
# Written by Piet van Oostrum <piet@cs.ruu.nl>
# This is really free software
X
# added "-n" option for name-only
X
$nextopt = 1;
$igncase = '';
$regpat = 0;
$links = 0;
$error = 0;
$skipbin = 1;
$debug = 0;
$nameonly = 0;
X
do { $regexp = shift (@ARGV); } while &checkopt ($regexp);
$icreg = $igncase;
$igncase = '';
X
eval 'sub grep_file {
X while (<F>) {
X $ln++;
X if (/$regexp/o' . $icreg .') {
X if (! $nameonly)
X {
X print "$file:$ln:$_";
X print "\n" if substr($_, -1, 1) ne "\n";
X }
X else
X {
X print "$file\n";
X last;
X }
X }
X }
}';
X
for (@ARGV) {
X if (! &checkopt ($_)) {
X if ($igncase || $regpat || /[?*[]/ || ! -e) {
X if ($regpat) {
X s/#/\\#/g;
X $_ = "#$_#";
X } else { # translate File pattern into regexp
X $re = '#($|/)'; $save = $_;
X while (/[[*?+()|.^$#]/) {
X $re .= $`;
X $c = $&;
X $_ = $';
X if ($c eq '*') { $c = '[^/]*'; }
X elsif ($c eq '?') { $c = '[^/]'; }
X elsif ($c eq '[') {
X if (/.\]/) { $c = "[$`$&"; $_ = $'; }
X else {
X $error++;
X printf stderr "Illegal filepattern %s\n", $save;
X }
X } else { $c = "\\$c"; }
X $re .= $c;
X }
X $_ = "$re$_\$#$igncase";
X }
X print "filepat: $_\n" if $debug;
X push (@filepat, $_);
X }
X else { push (@files, $_); print "file: $_\n" if $debug; }
X }
}
X
exit 1 if $errors ;
X
if ($#filepat < $[) {
n_pat {1;}" ;
}
else {
X $subtxt = 'sub in_pat { local ($f) = @_;';
X $or = "";
X for (@filepat) {
X $subtxt .= $or . '$f =~ m' . $_;
X $or = " || ";
X }
X $subtxt .= ';};1';
X
X if (! eval $subtxt) {
X print $@;
X exit 1;
X }
}
X
@files = (".") if $#files < $[;
X
for $file (@files) {
X &do_grep ($file);
}
X
sub do_grep {
X local ($file) = @_;
X local (*F, $ln, $f, $g, @dirfiles);
X if (-f $file) {
X if (open (F, $file)) {
X if (-B $file) { # binary file -- may be compressed/compacted
X if (($cx1 = getc(F)) eq "\377" && (getc(F) eq "\037")) {
X open (F, "uncompact < $file|");
X if ($skipbin && -B $file) { close (F); return; }
X }
X elsif ($cx1 eq "\037" && (getc(F) eq "\235")) {
X open (F, "uncompress < $file|");
X if ($skipbin && -B $file) { close (F); return; }
X }
X elsif ($skipbin) {
X close (F); return;
X }
X }
X print "Reading $file\n" if $debug;
X &grep_file;
X } else {
X print stderr "Cannot open $file\n";
X }
X }
X elsif (-d $file) {
X print "Entering $file\n" if $debug;
X if (opendir (F, $file)) {
X @dirfiles = readdir (F);
X closedir (F);
X for $f (@dirfiles) {
X next if ($f eq '.' || $f eq '..');
X $g = "$file/$f";
X next if (-l $g && ($links < 1 || $links == 1 && -d $g));
X if (-f $g && &in_pat ($g) || -d _) {
X &do_grep ($g);
X }
X }
X } else {
X print stderr "Can't open $file\n";
X }
X }
}
X
sub checkopt {
X local ($_) = $_[0];
X if (/^-/ && $nextopt) {
X $nextopt = 1;
X @opt = split (/-*/,$_); shift (@opt);
X for $opt (@opt) {
X if ($opt eq 'i') { $igncase = 'i'; }
X elsif ($opt eq 'd') { $debug = 1; }
X elsif ($opt eq 'l') { $links = 1; }
X elsif ($opt eq 'L') { $links = 2; }
X elsif ($opt eq 'b') { $skipbin = 0; }
X elsif ($opt eq 'r') { $regpat = 1; }
X elsif ($opt eq 'e') { $nextopt = 0; }
X elsif ($opt eq 'n') { $nameonly = 1; }
X elsif ($opt eq 'h' || $opt eq 'H') { & help; }
X else { $error++; printf stderr "Unknown option -%s\n", $opt; }
X }
X return 1;
X }
X $nextopt = 1;
X return 0;
}
X
sub help {
X print <<'HELP'; exit 0;
Usage: rgrep [-iredblLn] regexp filepat ...
X regexp = perl regular expression to search
X filepat ... = a list of files and directories to be searched or
X file patterns to match filenames.
X filepat will be interpreted as file or directory name if it exists
X as such, and does not contain the metacharacters [ ] ? or *. After
X the options -i and -r all filepats will be considered patterns.
X rgrep will search all files in any of the directories given (and its
X subdirectories) that match any of the filepats, except binary files.
X Compressed files will be searched in uncompressed form.
X Note: filepats may contain / contrary to find usage.
X -b Don't skip binary files.
X -i Ignore case, either in the regexp or in filename matching (depending
X on the location). Before the regexp only applies to the regexp,
X otherwise to the filepats following it.
X -r The following filepats are treated as real perl regexps rather than
X shell style filename patterns. In this case / is not a special
X character, i.e. it is matched by . and matching is not anchored (you
X must supply ^ and $ yourself). E.g. a.b matches the file /xa/by/zz.
X -l Do follow symbolic links only for files (default is do not follow).
X -L Do follow symbolic links for files and directories.
X -e Do not interpret following argument as option. Useful if regexp or
X filepat starts with a -.
X -d Debugging: Give a lot of output on what happens.
X -n print only the filename of matching files.
X -h print this message and exit.
Piet van Oostrum <piet@cs.ruu.nl>
HELP
}
SHAR_EOF
(set 20 04 01 21 13 58 13 'rgrep'; eval "$shar_touch") &&
chmod 0775 'rgrep' ||
$echo 'restore of' 'rgrep' 'failed'
if ( md5sum --help 2>&1 | grep 'sage: md5sum \[' ) >/dev/null 2>&1 \
&& ( md5sum --version 2>&1 | grep -v 'textutils 1.12' ) >/dev/null; then
md5sum -c << SHAR_EOF >/dev/null 2>&1 \
|| $echo 'rgrep:' 'MD5 check failed'
1eb760efd19050c0bc7911504e83dc45 rgrep
SHAR_EOF
else
shar_count="`LC_ALL= LC_CTYPE= LANG= wc -c < 'rgrep'`"
test 6119 -eq "$shar_count" ||
$echo 'rgrep:' 'original size' '6119,' 'current size' "$shar_count!"
fi
fi
rm -fr _sh15995
exit 0
- Previous message: Kevin Collins: "Re: Finding all e-mail addresses in files on server."
- In reply to: ifoutch_at_gmail.com: "Finding all e-mail addresses in files on server."
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]