Re: huge email system

From: Chris Shenton (chris_at_shenton.org)
Date: 11/22/03

  • Next message: Eric W. Bates: "Re: huge email system"
    To: David <david@madcoders.com>
    Date: Sat, 22 Nov 2003 09:50:28 -0500
    
    

    David <david@madcoders.com> writes:

    > We need to build a stable, redundant, and speedy email system that
    > will last for a few years. We need to handle about 500,000 emails
    > per day. We have about 30,000 users, so we need a lot of storage.
    >
    > Our current plan was to implement the following.
    > 2 SMTP only servers.
    > 3 NFS servers with RAID and SCSI
    > 2 POP3 servers.
    >
    > But that leads us to questions such as -
    > - what would be the best way to authenticate?
    > - would the NFS servers need gig nic's? or dual bonded 100Mbit cards?
    > - what smtp server and what pop3 server to use (we want to use Maildir)
    > - what raid level?

    I'm finishing something like that now. My design goals were No single
    points of failure, 1GB server-stored email SMTP+STARTTLS and SMTPS,
    IMAPS and IMAP + STARTTLS. It's over-designed for our population but
    the servers aren't the expensive part; I believe it could scale to
    handle 100K users. I'm replacing a sendmail-based system that's
    exceptionally hard to fix because there are multiple single points of
    failure and no one wants downtime.

    I did the prototype on FreeBSD but the client preferred Solaris for
    their production systems. I'm using qmail with the excellent
    qmail-ldap patch suite from www.nrg4u.com, plus courier-imap.
    OpenLDAP is used for authentication and other user information
    (quotas, account status, etc).

    I'm using a pair of F5 load balancers in the front to detect up/down
    services. This will also allow us to add servers if needs demand it; I
    like being able to add small cheap boxes incrementally rather than
    forklift upgrades of big iron.

    Behind them are a few Netra V210 for SMTP[S], IMAP[S], POPS and soon
    webmail (SqWebMail). Each box has a read-only LDAP replica. Another
    V210 runs the LDAP master, which replicates to the four mail servers.
    Each V210 comes with quad gigabit ethernet: one interface to the load
    balancer, two (redundancy) to backend switches on the NFS server, and
    one for an administrative/monitoring network.
     
    We bought a NetApp for the mail store; it is currently our one single
    point of failure but NetApp has a great reputation for reliability; we
    bought a used unit and saved about 70%. (NetApp uses RAID4 internally
    so disks can be added to a volume on the fly). NetApp's "snapshot"
    facility gives us restores from stupid user errors -- tape
    backup/restore for this much data would be a nightmare. (Qmail's
    Maildir format is NFS safe but it sounds like you already know that :-)

    If my client didn't demand Solaris, I would have preferred FreeBSD. I
    would like to try using the Apple Xserve RAID box since it's 2.5TB
    for $11K. FC-attach it to a pair of FreeBSD boxes which serve it out
    as NFS, use the FreeBSD-5.x "snapshot" feature for NetApp-style
    backup/restore. Service boxes like above, cheaply scalable by adding
    more.

    I like F5 balancers because you can heavily customize the application
    layer health monitoring -- e.g., do a query on the LDAP master and
    check for a sane response. But they're not cheap. Round-Robin DNS
    isn't gonna avoid dead services and Windows clients aren't any good at
    re-trying failed connections. So I don't have a suggestion on an
    inexpensive balancer; I'd be interested in hearing ideas.

    As I mentioned above, our NetApp is the only single point of failure.
    To get more space later on we can get a second unit then buy the
    (pricey) clustering software to remove that SPoF.

    Some other folks have talked about anti-virus/spam issues -- very good
    discussion. I am using qmail-ldap's recent integration of
    qmail-smtp-viruscan which is a very fast block of MS executable
    attachments; not foolproof but highly effective with little load.
    We're considering going with some commercial spam/virus blocking
    appliance but haven't decided yet; I'm trying to keep the qmail-ldap
    system from getting any more complicated. If, however, we integrate
    something into our mail servers, we might have to add another box or
    two to handle the increased load but it's not that expensive with
    small boxes.

    As I mentioned, I'm running all services on all boxes, rather than
    separating SMTP from POP as you suggest; if this turns out to be a bad
    idea, I can change the services around simply by re-defining the
    service pools on the load balancer.

    _______________________________________________
    freebsd-isp@freebsd.org mailing list
    http://lists.freebsd.org/mailman/listinfo/freebsd-isp
    To unsubscribe, send any mail to "freebsd-isp-unsubscribe@freebsd.org"


  • Next message: Eric W. Bates: "Re: huge email system"

    Relevant Pages

    • Re: [fw-wiz] Security Audit and Priorities
      ... > Get yourself on the list of the people notified when new boxes are ... > built and old ones are retired. ... collect logs from its UNIX servers, routers, or firewalls. ... (I say that if this attitude persists they should get ...
      (Firewall-Wizards)
    • 2.6.20.4: NETDEV WATCHDOG and lockups
      ... we have serious problems with 2 of our servers: both shiny new amd64 dual core, with both 2GB RAM, 32bit kernel+userland. ... Both boxes are running fine but after "a while" they lock up and eventually restart all of a sudden. ... we went to 2.6.18-4-k7 and the problem persistent. ...
      (Linux-Kernel)
    • Re: Noise question
      ... I have a chance to pick up 7 v20z servers I was going to use as PDC/BDC's and Citrix Metaframe XPE boxes on Windows. ... And beware of that rack's Plexiglas frontend, it can be cause of overheating and even more noisy servers. ...
      (comp.sys.sun.hardware)
    • NFS issue
      ... We have a group Solaris 8 servers that share a netapp filer for shared ... the job no longer picks up the trigger file when it ... This is fairly high priority for us because we are behind in our patch ... We also have a NetApp guy onsite and have engaged him to look into the issue ...
      (SunManagers)
    • Re: T2000 performance Vs V240
      ... "CoolThreads" servers for a spin yet? ... web servers. ... T2000 boxes look very tempting, ... - The V240 is still being sold (and it will keep being sold for some time. ...
      (comp.unix.solaris)