The volume of spam continues to rise inexorably. Brightmail
are now estimating that 54% of all mail messages are spam.
Nowadays, my personal mail account is getting about 70 a day, rising to
over 200 a day at the weekends. It’s getting tiresome; pretty much
all of it gets marked as spam and diverted, but I still have to wade
through it ‘just in case’, and to build the corpus. I guess I need
to extend my .procmailrc
to divert high-scoring spams somewhere
I can check even less frequently ;)
That’s not the really annoying thing, though. I use tagged
addressing when I publish my email address, most of the time. It
works very well to identify spam sources overall, and divert ‘dead’
addresses that are getting spam, into the spamtraps. That’s the plus.
But the curse of writing spam filters is that you need a good archive of
spam; and one of our SpamAssassin corpus guidelines is to attempt to trim
out duplicate spams where possible. Many spammers will wind up sending
more-or-less identical spam messages, modulo random subject lines,
hash-busters, etc., and with (let’s say) 8 tagged addresses in their
lists, I’ll get 8 copies of that spam, and have to pay a little bit of
attention to trim it down to 1 copy for the corpus.
Damn spam-filter development! All this corpus building is hard work ;)
BTW, note how spam load rises at the weekends; (Tim Hunter, Paul Terry and
Alan Judge of eircom.net also noted this
in their paper presented at LISA ’03 yesterday ;). There’s a good reason
— spammers attempt to deliver their spam while abuse staff are not at
their desk. Same thing applies in the network security world; many of
those attacks have taken place over a US holiday weekend.
Hallowe’en: best too-late idea for a hallowe’en costume: ‘Top Gun GWB’
in his flight suit. In the end, I played half of the ‘Dr. Frankenstein
and Monster’ pair (I was the monster, as C really is a scientist, and
computer ‘science’ doesn’t count). Best costume seen: a very
impressive onnagata
kabuki player.