Re: Finding all e-mail addresses in files on server.

From: Kevin Collins (spamtotrash_at_toomuchfiction.com)
Date: 04/04/05

  • Next message: _not_valid_email_at_notvaliddomain.org: "ATI PCI Video Card on Solaris"
    Date: Mon, 04 Apr 2005 21:24:13 GMT
    
    

    In article <1112576768.533990.100930@z14g2000cwz.googlegroups.com>, ifoutch@gmail.com wrote:
    > Looking for advice and/or examples on how to audit a server to find all
    > possible e-mail addresses listed in files. Need to verify and possibly
    > consolidate addresses for server monitoring and administration. What I
    > have been doing is greping thru the filesystem and checking for
    > matches to entries in /etc/mail/aliases, /etc/password and any lines
    > containing "mail" (case insensitive) or "@" in crontabs, and scripts.
    >
    > What I would like to do is create an exclude list of certain
    > directories. Then go through the remaining dirs. and extract mail
    > addresses and print them to a file with the filename. This would be run
    > on a production systems during off hours but still needs to be "nice"
    > to system resources. I want to automate this as much as possible. So
    > that we can periodically do an audit of all our systems.
    >
    > Has anybody done anything like this previously? Any pointers or advice
    > greatly appreciated.
    >

    Since this topic (recursively searching) comes up a lot, I'm posting a shar
    file (shell archive) of one of my favorite tools ever. This script called
    'rgrep' uses Perl and is very quick and powerful. It uses Perl's regexp engine,
    which is more powerful than a grep, too.

    Save the text below as 'rgrep.shar' and then run "sh rgrep.shar". It will
    create rgrep. Run 'rgrep -h' for help.

    Before I post it, I should mention that searching for email addresses sounds
    like a trivial task, but see the O'Reilly book 'Mastering Regular Expressions'
    for the regexp that actually matches all valid email addresses per the RFCs -
    its just under 6600 characters! It might be simpler to search for all the mail
    programs such as sendmail, mail, mailx, pine, elm, etc...

    Kevin

     --- Cut Here ---
    #!/bin/sh
    # This is a shell archive (produced by GNU sharutils 4.2.1).
    # To extract the files from this archive, save it to some FILE, remove
    # everything before the `!/bin/sh' line above, then type `sh FILE'.
    #
    # Made on 2005-04-04 14:08 PDT by <cokm@cpafisxw>.
    # Source directory was `/util/bin'.
    #
    # Existing files will *not* be overwritten unless `-c' is specified.
    #
    # This shar contains:
    # length mode name
    # ------ ---------- ------------------------------------------
    # 6119 -rwxrwxr-x rgrep
    #
    save_IFS="${IFS}"
    IFS="${IFS}:"
    gettext_dir=FAILED
    locale_dir=FAILED
    first_param="$1"
    for dir in $PATH
    do
      if test "$gettext_dir" = FAILED && test -f $dir/gettext \
         && ($dir/gettext --version >/dev/null 2>&1)
      then
        set `$dir/gettext --version 2>&1`
        if test "$3" = GNU
        then
          gettext_dir=$dir
        fi
      fi
      if test "$locale_dir" = FAILED && test -f $dir/shar \
         && ($dir/shar --print-text-domain-dir >/dev/null 2>&1)
      then
        locale_dir=`$dir/shar --print-text-domain-dir`
      fi
    done
    IFS="$save_IFS"
    if test "$locale_dir" = FAILED || test "$gettext_dir" = FAILED
      echo=echo
    else
      TEXTDOMAINDIR=$locale_dir
      export TEXTDOMAINDIR
      TEXTDOMAIN=sharutils
      export TEXTDOMAIN
      echo="$gettext_dir/gettext -s"
    fi
    if touch -am -t 200112312359.59 $$.touch >/dev/null 2>&1 && test ! -f 200112312359.59 -a -f $$.touch; then
      shar_touch='touch -am -t $1$2$3$4$5$6.$7 "$8"'
    elif touch -am 123123592001.59 $$.touch >/dev/null 2>&1 && test ! -f 123123592001.59 -a ! -f 123123592001.5 -a -f $$.touch; then
      shar_touch='touch -am $3$4$5$6$1$2.$7 "$8"'
    elif touch -am 1231235901 $$.touch >/dev/null 2>&1 && test ! -f 1231235901 -a -f $$.touch; then
      shar_touch='touch -am $3$4$5$6$2 "$8"'
    else
      shar_touch=:
      echo
      $echo 'WARNING: not restoring timestamps. Consider getting and'
      $echo "installing GNU \`touch', distributed in GNU File Utilities..."
      echo
    fi
    rm -f 200112312359.59 123123592001.59 123123592001.5 1231235901 $$.touch
    #
    if mkdir _sh15995; then
      $echo 'x -' 'creating lock directory'
    else
      $echo 'failed to create lock directory'
      exit 1
    fi
    # ============= rgrep ==============
    if test -f 'rgrep' && test "$first_param" != -c; then
      $echo 'x -' SKIPPING 'rgrep' '(file already exists)'
    else
      $echo 'x -' extracting 'rgrep' '(text)'
      sed 's/^X//' << 'SHAR_EOF' > 'rgrep' &&
    #!/usr/bin/perl
    X
    die "Usage: rgrep [-iredblL] regexp filepat ...\n rgrep -h for help\n"
    X if $#ARGV < $[;
    X
    # Written by Piet van Oostrum <piet@cs.ruu.nl>
    # This is really free software
    X
    # added "-n" option for name-only
    X
    $nextopt = 1;
    $igncase = '';
    $regpat = 0;
    $links = 0;
    $error = 0;
    $skipbin = 1;
    $debug = 0;
    $nameonly = 0;
    X
    do { $regexp = shift (@ARGV); } while &checkopt ($regexp);
    $icreg = $igncase;
    $igncase = '';
    X
    eval 'sub grep_file {
    X while (<F>) {
    X $ln++;
    X if (/$regexp/o' . $icreg .') {
    X if (! $nameonly)
    X {
    X print "$file:$ln:$_";
    X print "\n" if substr($_, -1, 1) ne "\n";
    X }
    X else
    X {
    X print "$file\n";
    X last;
    X }
    X }
    X }
    }';
    X
    for (@ARGV) {
    X if (! &checkopt ($_)) {
    X if ($igncase || $regpat || /[?*[]/ || ! -e) {
    X if ($regpat) {
    X s/#/\\#/g;
    X $_ = "#$_#";
    X } else { # translate File pattern into regexp
    X $re = '#($|/)'; $save = $_;
    X while (/[[*?+()|.^$#]/) {
    X $re .= $`;
    X $c = $&;
    X $_ = $';
    X if ($c eq '*') { $c = '[^/]*'; }
    X elsif ($c eq '?') { $c = '[^/]'; }
    X elsif ($c eq '[') {
    X if (/.\]/) { $c = "[$`$&"; $_ = $'; }
    X else {
    X $error++;
    X printf stderr "Illegal filepattern %s\n", $save;
    X }
    X } else { $c = "\\$c"; }
    X $re .= $c;
    X }
    X $_ = "$re$_\$#$igncase";
    X }
    X print "filepat: $_\n" if $debug;
    X push (@filepat, $_);
    X }
    X else { push (@files, $_); print "file: $_\n" if $debug; }
    X }
    }
    X
    exit 1 if $errors ;
    X
    if ($#filepat < $[) {
    n_pat {1;}" ;
    }
    else {
    X $subtxt = 'sub in_pat { local ($f) = @_;';
    X $or = "";
    X for (@filepat) {
    X $subtxt .= $or . '$f =~ m' . $_;
    X $or = " || ";
    X }
    X $subtxt .= ';};1';
    X
    X if (! eval $subtxt) {
    X print $@;
    X exit 1;
    X }
    }
    X
    @files = (".") if $#files < $[;
    X
    for $file (@files) {
    X &do_grep ($file);
    }
    X
    sub do_grep {
    X local ($file) = @_;
    X local (*F, $ln, $f, $g, @dirfiles);
    X if (-f $file) {
    X if (open (F, $file)) {
    X if (-B $file) { # binary file -- may be compressed/compacted
    X if (($cx1 = getc(F)) eq "\377" && (getc(F) eq "\037")) {
    X open (F, "uncompact < $file|");
    X if ($skipbin && -B $file) { close (F); return; }
    X }
    X elsif ($cx1 eq "\037" && (getc(F) eq "\235")) {
    X open (F, "uncompress < $file|");
    X if ($skipbin && -B $file) { close (F); return; }
    X }
    X elsif ($skipbin) {
    X close (F); return;
    X }
    X }
    X print "Reading $file\n" if $debug;
    X &grep_file;
    X } else {
    X print stderr "Cannot open $file\n";
    X }
    X }
    X elsif (-d $file) {
    X print "Entering $file\n" if $debug;
    X if (opendir (F, $file)) {
    X @dirfiles = readdir (F);
    X closedir (F);
    X for $f (@dirfiles) {
    X next if ($f eq '.' || $f eq '..');
    X $g = "$file/$f";
    X next if (-l $g && ($links < 1 || $links == 1 && -d $g));
    X if (-f $g && &in_pat ($g) || -d _) {
    X &do_grep ($g);
    X }
    X }
    X } else {
    X print stderr "Can't open $file\n";
    X }
    X }
    }
    X
    sub checkopt {
    X local ($_) = $_[0];
    X if (/^-/ && $nextopt) {
    X $nextopt = 1;
    X @opt = split (/-*/,$_); shift (@opt);
    X for $opt (@opt) {
    X if ($opt eq 'i') { $igncase = 'i'; }
    X elsif ($opt eq 'd') { $debug = 1; }
    X elsif ($opt eq 'l') { $links = 1; }
    X elsif ($opt eq 'L') { $links = 2; }
    X elsif ($opt eq 'b') { $skipbin = 0; }
    X elsif ($opt eq 'r') { $regpat = 1; }
    X elsif ($opt eq 'e') { $nextopt = 0; }
    X elsif ($opt eq 'n') { $nameonly = 1; }
    X elsif ($opt eq 'h' || $opt eq 'H') { & help; }
    X else { $error++; printf stderr "Unknown option -%s\n", $opt; }
    X }
    X return 1;
    X }
    X $nextopt = 1;
    X return 0;
    }
    X
    sub help {
    X print <<'HELP'; exit 0;
    Usage: rgrep [-iredblLn] regexp filepat ...
    X regexp = perl regular expression to search
    X filepat ... = a list of files and directories to be searched or
    X file patterns to match filenames.
    X filepat will be interpreted as file or directory name if it exists
    X as such, and does not contain the metacharacters [ ] ? or *. After
    X the options -i and -r all filepats will be considered patterns.
    X rgrep will search all files in any of the directories given (and its
    X subdirectories) that match any of the filepats, except binary files.
    X Compressed files will be searched in uncompressed form.
    X Note: filepats may contain / contrary to find usage.
    X -b Don't skip binary files.
    X -i Ignore case, either in the regexp or in filename matching (depending
    X on the location). Before the regexp only applies to the regexp,
    X otherwise to the filepats following it.
    X -r The following filepats are treated as real perl regexps rather than
    X shell style filename patterns. In this case / is not a special
    X character, i.e. it is matched by . and matching is not anchored (you
    X must supply ^ and $ yourself). E.g. a.b matches the file /xa/by/zz.
    X -l Do follow symbolic links only for files (default is do not follow).
    X -L Do follow symbolic links for files and directories.
    X -e Do not interpret following argument as option. Useful if regexp or
    X filepat starts with a -.
    X -d Debugging: Give a lot of output on what happens.
    X -n print only the filename of matching files.
    X -h print this message and exit.
    Piet van Oostrum <piet@cs.ruu.nl>
    HELP
    }
    SHAR_EOF
      (set 20 04 01 21 13 58 13 'rgrep'; eval "$shar_touch") &&
      chmod 0775 'rgrep' ||
      $echo 'restore of' 'rgrep' 'failed'
      if ( md5sum --help 2>&1 | grep 'sage: md5sum \[' ) >/dev/null 2>&1 \
      && ( md5sum --version 2>&1 | grep -v 'textutils 1.12' ) >/dev/null; then
        md5sum -c << SHAR_EOF >/dev/null 2>&1 \
        || $echo 'rgrep:' 'MD5 check failed'
    1eb760efd19050c0bc7911504e83dc45 rgrep
    SHAR_EOF
      else
        shar_count="`LC_ALL= LC_CTYPE= LANG= wc -c < 'rgrep'`"
        test 6119 -eq "$shar_count" ||
        $echo 'rgrep:' 'original size' '6119,' 'current size' "$shar_count!"
      fi
    fi
    rm -fr _sh15995
    exit 0


  • Next message: _not_valid_email_at_notvaliddomain.org: "ATI PCI Video Card on Solaris"