Like many anti-spam systems these days, SpamAssassin operates a network of
spamtraps. One set of these run off traps.SpamAssassin.org, a server
kindly donated by ISP Sonic.net.
Large-scale spam-trapping systems like this are generally run in quite a
secretive manner, but we’re an open source project — so it may be interesting
if I give some details of our setup. Here’s a potted history of how this
spamtrap server has run over the years…
The beginning
The architecture was initially very simple. The MX was Postfix, delivering to
the "trapper" user, which in turn ran procmail, which directly ran a perl
script. This perl script then performed the trap actions, namely: DoS
prevention, discarding viruses and malware, discarding backscatter bounces,
extraction and cleanup of the incoming mails, then onward reporting, archival,
and further distribution.
Given that this was a target for spam — and we want as much spam as possible
here! — this would predictably run into load issues. Right at the beginning, back in around 2001/2002, I ran this on our shared server, where it pretty quickly caused trouble for delivery of other, more useful mail. It was around this time that Sonic kindly donated the server.
With dedicated hardware, we
weren’t seeing much trouble — it was enough to just wait for the few hours for
a traffic spike to pass, and the Postfix queue would then clear.
Clearing the queues
After a few months, though, this wasn’t enough — the queue would get
consistently clogged, and the backlog became enough to result in the incoming
spam being delayed for days before it made it from the MX to the trap archives.
For a spamtrap, you want fresh spam, but not necessarily all spam — so I
installed a cron job to simply clear the queue on a nightly basis. (I also had
to restart the Postfix server, too, since it’d occasionally get hung and stop
accepting connections on port 25, presumably due to load issues.)
IPC::DirQueue
The next level was an inability of the procmail/perl script end to process the
mail fast enough for the MTA to keep up with the incoming connections, and
follow-on problems, caused by load generated by the perl script impacting the
MX’s activity. To work around these, I designed a new queueing backend, based
around IPC::DirQueue. This
allowed a new split architecture; the procmail-run perl script was
extremely lightweight, delivering all inbound mail to a dirqueue and exiting
quickly, allowing the MX to get back to the next inbound spam message, and the
trap processing script was then split into a web of dirqueues, allowing each
individual part of the trap backend pipeline to operate independently.
There were several benefits to this:
- Since dirqueues operate as a batch-processing model, load spikes become irrelevant; the load incurred is limited by how many dequeuer processes are run.
- The time taken in backend tasks becomes irrelevant to the MX throughput, since that is bottlenecked only by the lightweight perl script and its write speed to the "incoming" dirqueue.
- By splitting the backend work into multiple queues, outages in the spam-reporting systems or onward forwardings become much less of a problem, since they won’t affect inbound spam, archival, outbound delivery to other reporting systems, forwards, etc.
Again, the dirqueues were cleared on a frequent basis, to discard the "spiky"
traffic and ensure we were just seeing samples of the freshest spam. The
dirqueues use a tmpfs as the backing
storage directory, so it never hits the disk at all.
This worked pretty well for several years — from 80 megabytes of spam per day
to the current level, which is around 130MB per day. However, we still
occasionally saw problems from load spikes, where high load caused the traps to
refuse incoming SMTP connections — purely because the load of inbound
connections is too high for the Postfix MX to accept them all in a timely
fashion.
qpsmtpd
Last weekend, I had a go at a project I’d been thinking of trying out for a
long time — switching from Postfix to qpsmtpd. A while back, Matt
Sergeant rewrote
qpsmtpd to use
Danga::Socket, Danga
Interactive / Six Apart’s insanely scalable event-driven asynchronous socket
class, as used in
mogilefsd,
perlbal and
djabberd. This
article
notes that ‘two large antispam companies’ high-traffic spam traps have used
this effectively since the second quarter of 2005, delivering concurrency as
high as 10,000 on some occasions’, so it seemed likely to work ;)
Sure enough, results have been great… we now have a pure-perl system handling
heavy volumes without breaking a sweat, certainly compared to the previous
system. qpsmtpd’s plugin system was elegant, allowing me to annotate inbound
spam with more details of the SMTP transaction, write plugins to deliver mail
to a dirqueue directly instead of to an MTA, and do some conditional code (ie.
basic "deliver this RCPT TO to this queue") where needed.
Full details are over on the
QpsmtpdSpamtrap page on the taint.org
wiki, for the curious.