UPDATED SUMMARY: Simple anti-spam system using open-source software and freely-available data

From: Rich Kulawiec (rsk_at_gsp.org)
Date: 07/23/04

  • Next message: Stephen Moccio: "NFS mounting question"
    Date: Fri, 23 Jul 2004 16:40:46 -0400
    To: sunmanagers@sunmanagers.org
    
    

    This is an update of:

            SUMMARY: Simple anti-spam system using open-source software and freely-available data
            http://www.sunmanagers.org/pipermail/sunmanagers/2003-August/024169.html

    which you might want to browse through before reading this -- though
    it's not really necessary, as this is a complete rewrite.

    This is the approach that I use. Let me emphasize "approach": I don't
    do all these things on all mail servers, and I don't do them in the
    exact same way, because every server/domain gets a different mix of
    incoming spam. It's always important to try to figure out what that
    mix looks like and tailor the blocking to match it. But most of this
    will work most of the time for most people -- and in a lot of cases
    it's turned out to "good enough" that more work isn't necessary.
    In others, it's been "good enough" that the additional work required
    is made quite a bit easier by it.

    So here goes.

    I run sendmail and have had excellent results using a layered approach
    to blocking spam. The general idea is to use those measures which
    are computationally cheapest first, in order to reduce the burden on
    subsequent layers. The approach I'm taking (outlined below) would also
    work for other MTAs (e.g. postfix, exim) on other 'nix systems.

    I don't do any kind of content analysis: I'm in agreement with Paul Vixie
    on this one: either people share our values or they don't. If they do,
    then they don't allow spam to flow out of their networks (at any rate
    beyond a trickle, which is probably inevitable). If they don't, then
    they're either actively supporting spammers or incidentally supporting
    them through neglect and incompetence -- and the reason doesn't really
    matter to me, my users, my systems or my networks.

    More succinctly: systems and networks which emit spam are broken and
    should either be repaired immediately or physically disconnected from
    the Internet until they are.

    More bluntly: I'm not going to waste my resources trying to sort out clean
    water from sewage. That responsibility rests with the people whose servers
    and networks are spewing effluent through the pipes designated for water.

    1. I use this:

            The Spamhaus Project: DROP (Don't Route Or Peer) List
            http://www.spamhaus.org/DROP/

    at the firewall and router level, or in the sendmail 'access' file
    when that's not possible. These are networks which are 100%
    controlled by spammers, so no good can come of accepting their traffic.

    I've augmented this locally by a few particularly problematic networks;
    for example, after reading these:

            Call for Internet Death Penalty: Burstnet/Hostnoc
            http://groups.google.com/groups?selm=20030708121252.GA14167%40example.com

            Call for Internet Death Penalty #2: Optigate/Optinrealbig
            http://groups.google.com/groups?selm=20040604204406.GA2771@example.com

            Call for Internet Death Penalty #3: Hopone/Superb
            http://groups.google.com/groups?selm=20040604204549.GA637@example.com

    their network allocations are now a fixture in my deny lists. It's up to
    you, of course, but I see no reason to ever accept another packet from them.

    2. I have configured sendmail to reject all mail from domains which
    don't resolve. This also blocks mail from broken mail servers, but
    since there's no way to tell them to fix their DNS...

    Sendmail comes set up this way by default on most systems.

    3. I have set up sendmail to issue a multi-line SMTP greeting banner.
    This causes a surprising amount of the malware installed on hijacked Windows
    systems to fail, as it's not set up to deal with that. No doubt future
    malware will cope with this, but for the last year it's been very useful.
    Simple, easy, fast, and satisfying. ;-)

    4. I then use a very large list of domains, via the sendmail 'access'
    file. This is handy because the access file is hashed, thus lookups
    are roughly O(1) no matter how large it becomes. But it's also error-prone:
    in fact, during the past two years, every time I've had a false positive
    reported to me, this is where I've traced it to on all but two occasions.

    But - considering that I'm using a list of about 128,000 domains and
    have had less than a dozen false positives in two years, it seems like
    a reasonable approach. Doubly so because this step alone blocks from
    30% to 40% of incoming spam with very little overhead. Even more so
    because reduces the number of DNSBL queries (see step 8) which not
    only reduces my outbound traffic, but the load I impose on the DNSBLs
    that I'm using.

    Many domain lists are also available; here's a few of them:

            http://www.rhyolite.com/anti-spam/unwelcome.html
            http://www.river.com/ops/spam/bad-domains.txt
            http://www.spamblocked.com/killfile
            http://www.znet.com/blocked-domains.html
            http://www.cluelessmailers.org/listings/blacklistbydomain.html
            http://obob.manilasites.com/
            http://www.carl.net/spam/access.txt
            http://www.unixgirl.com/blockeddomains.html
            http://www.cart00ney.org/blocklist.txt
            http://abuse.easynet.nl/spamlist-usage.html

    Note: if you use a large list of domains in the sendmail 'access' file,
    you will want to RTFM on "makemap" and note the "-c" flag. The speedup
    in rebuilding the hash is quite significant.

    5. I block all mail from certain TLDs on some mail servers because
    the people using those servers don't expect to ever receive mail
    from those places. I don't like doing this, because it's such a drastic
    measure, but it's too effective a technique not to use. In particular,
    I routinely block:

            .cn (China)
            .kr (Korea)
            .tw (Taiwan)

    I'm about >this close< to adding .biz to that list.

    Of course, if you actually expect to get non-spam mail from those TLDs,
    you probably can't do this. This is why I don't block .br, for example:
    I have users who actually get non-spam mail from there. But if you don't,
    you might want to consider blocking it.

    6. I use a few special-purpose rules in the sendmail access file to
    take care of spam from hijacked CacheFlow servers, hijacked AOL
    proxy servers, often-forged addresses, and so on. Let me know if
    you want them: they're pretty simple/short/easy.

    7. I use ~150 subdomains (also in the sendmail access file) which
    correspond to dynamically-allocated IP space, e.g. "dhcp.example.com".
    I don't like doing this either, but it's also too effective not to use:
    spam from hijacked PCs on cable/DSL connections is epidemic. I have
    been slowly expanding this because it seems to be filling in gaps that
    the other measures are missing.

    Note: in most cases, the users on such networks are contractually obligated
    to use their ISP's designated outbound mail server(s). So the only SMTP
    traffic that this measure blocks is (a) spam from zombies (b) spam from
    the spammers' own systems and (c) mail from people who are deliberating
    violating their own ISP's TOS. It's correct to say that (c) isn't
    necessarily spam: but I'm not going to lose any sleep over blocking
    it anyway.

    8. I use multiple DNSBLs, each of which targets a slightly different
    mix of spam.

    For starters, I use

            cn-kr.blackholes.us
            tw.blackholes.us

    for the same reason I block .cn, .kr and .tw -- see step 5 above. Again,
    this may not be a reasonable step for everyone, but check www.blackholes.us
    for other available DNSBLs that might be. They have quite a wide selection,
    both by country and by ISP/host. But locally, use of those two DNSBLs alone
    nails about 30% of incoming spam.

    I then use these DNSBLs (each listed with DNSBL name and web site)

            sbl-xbl.spamhaus.org http://www.spamhaus.org/sbl/
                                            http://www.spamhaus.org/xbl/
            dnsbl.ahbl.org http://www.ahbl.org/
            list.dsbl.org http://dsbl.org/
            dnsbl.njabl.org http://njabl.org/
            relays.ordb.org http://ordb.org/
            l1.spews.dnsbl.sorbs.net http://www.spews.org/

    The Spamhaus SBL+XBL combined DNSBL is a must-have. I have never had
    a false positive with it. And the relatively recent addition of the
    XBL picks up millions of zombie Windows machines that are spewing spam.

    The AHBL augments this nicely, and includes a RHSBL (right-hand-side BL)
    which handles blocking by domain name. If you don't want to do step 4,
    this is a good substitute.

    The DSBL, NJABL, and ORDB all pick up different combinations of open relays,
    open proxies, hijacked systems, etc.

    The SPEWS list -- despite what some of its less-informed critics have
    said -- is very accurate and correctly targets the spam-supporting ISPs
    and hosts who are directly responsible for much of the spam we all endure.

    Other DNSBLs that I have either used or am considering using:

            Blitzed OPM http://opm.blitzed.org/
            PDL http://www.pan-am.ca/pdl
            Leadmon http://www.leadmon.net/spamguard/
            SORBS http://dnsbl.sorbs.net/
            FiveTen http://www.five-ten-sg.com/blackhole.php

    NOTE: You should probably not use any DNSBL until you've read its policies.

    NOTE: If you intend to make heavy use of these DNSBLs, you should probably read
    their web sites and see about doing zone transfers.

    NOTE: I find it very useful to run a local copy of BIND in caching mode on
    every mail server, since those servers often get repeatedly pummeled from the
    same sets of addresses. This not only enhances performance locally, but cuts
    down on the load my servers impose on the DNSBLs.

    NOTE: DNSBLs are invoked sequentially by sendmail, so it's a good idea to
    put the one that blocks the most spam as seen by your servers first. But
    figuring out which that is can be quite an effort. For most people,
    the Spamhaus SBL+XBL DNBSL is a pretty good first guess, though.

    9. I'm experimenting with using rbldnsd to run my own internal DNSBL --
    replacing, in part, the sendmail 'access' file.

    The upside of doing this is that rbldnsd stores information in a very
    compact format with a low memory footprint; it's designed to serve DNSBLs,
    not as a general purpose DNS server. Another advantage is that keeping
    the information in rbldnsd would allow it to be used by sendmail, postfix,
    exim, whatever. Yet another is that it can be queried easily (contrast
    with the sendmail 'access' file).

    The downside is that it's another process to run; it requires a different
    format than sendmail (which means reworking scripts, etc.); and it's one
    more step that could conceivably fail. (Mitigating this is that sendmail
    presumes a non-responding DNSBL means "not listed" and thus fails soft.)

    It's not clear to me yet who this experiment will turn out, but the early
    results are promising enough for me to suggest to others as a possible
    course of action.

    10. My best estimates of the performance of all this is that the local
    measures (1-7) block about half the spam that is blocked, and the
    DNSBLs (8) block the other half of the spam that is blocked. The blocking
    rate itself appears to be somewhere around 93% to 97%: it varies as spammers
    switch networks or domains, or activate new groups of zombies.

    The false positive rate is about 1 per month; but I need to caveat that by
    stating that unreported false positives may still be lurking. (On the
    other hand: my users squawk pretty loud and fast when something goes wrong,
    so I don't think there are many.)

    NOTE: Assessing performance of anti-spam techniques requires both the FN
    (false negative: unblocked spam) and FP (false positive: blocked non-spam).
    It's easy to drive either to 0; it's hard to do both at once.

    NOTE: Everybody's incoming spam and non-spam mix is different. The only way
    to really figure out which of these steps will best minimize (FP, FN) is to
    analyze the statistics. But 1, 2, 3, and some of 8 are nearly always a
    good first guess, and in some cases, they solve enough of the problem that
    further analysis/measures aren't necessary.

    ---Rsk
    _______________________________________________
    sunmanagers mailing list
    sunmanagers@sunmanagers.org
    http://www.sunmanagers.org/mailman/listinfo/sunmanagers


  • Next message: Stephen Moccio: "NFS mounting question"

    Relevant Pages

    • Spam sender using domain name as spoofed source
      ... ins and outs of *nix based servers. ... As such I run sendmail on it and ... Every one of his spam messages that generates an error message (user ... domain name or shutting down mail services? ...
      (freebsd-questions)
    • Re: ANTISPAM: How Execute a command when an email arrives?
      ... I need help from experts in sendmail. ... fight againts SPAM: ... list of automaticaly IP-blocked to other servers) ... I know that "sendmail milter" is related with this question BUT I not ...
      (comp.mail.sendmail)
    • Re: How can i block this spammer?
      ... one of my sendmail servers is under a sort of "indirect" spam attack. ... the result of a spammer using your domain name. ...
      (comp.mail.sendmail)
    • Re: MTA on non-standard port
      ... one machine on my LAN is allowed to speak SMTP to the world. ... with his/her laptop, which happens to be infected and sends spam, etc. ... Comcast stirred up the ants. ... servers!), citing "an incident of spam from my IP address". ...
      (freebsd-questions)
    • Re: MTA on non-standard port
      ... just to give you some idea: my home LAN has a FreeBSD box used ... with his/her laptop, which happens to be infected and sends spam, etc. ... Eventually they stated that I could send mail through their mail servers ... I've had two separate incidents of me sending mail to individuals, ...
      (freebsd-questions)