Skip to content

Month: November 2003

Shock Horror — Do-Not-Call’s Gaping Loophole Exploited

Spam: So in the past 2 weeks, I’ve been called 3 times to ‘take part in a survey’. That’s compared to prior history before the do-not-call law took effect, which was absolutely no survey calls before on this number — but plenty of telemarketing calls.

Of course, I’m sure these surveys are all companies keen to get my considered opinion, rather than phone-spam scum exploiting one of the blindingly obvious loopholes in the federal do-not-call list legislation. Sure.

BTW, that loophole seems to be there due to an oversight issue — it seems the FTC doesn’t have jurisdiction over telephone surveyors. However, this page notes that the FTC staff are prepared to prosecute callers who attempt to subvert the act:

For example, if a survey call asks a consumer if he or she would be interested in purchasing a type of service or merchandise, and that information then is used to contact the consumer to encourage such purchases, the survey call is considered telemarketing and subject to the Do Not Call restrictions.

Which is all well and good, but I’m not going to hang around for 10 minutes of ‘what long-distance company do you use?’ in order to differentiate ‘good’ surveys from ‘bad’ ones; I’ll just hang up straight away.

Sport: Ben forwards this story — the US baseball team has failed to qualify for the next Olympics. Yes, baseball. And no, I didn’t know that other countries had genuine baseball teams.

Shock Horror — Do-Not-Call’s Gaping Loophole Exploited

So in the past 2 weeks, I’ve been called 3 times to ‘take part in a survey’. That’s compared to prior history before the do-not-call law took effect, which was absolutely no survey calls before on this number — but plenty of telemarketing calls.

Of course, I’m sure these surveys are all companies keen to get my considered opinion, rather than phone-spam scum exploiting one of the blindingly obvious loopholes in the federal do-not-call list legislation. Sure.

BTW, that loophole seems to be there due to an oversight issue — it seems the FTC doesn’t have jurisdiction over telephone surveyors. However, this page notes that the FTC staff are prepared to prosecute callers who attempt to subvert the act:

For example, if a survey call asks a consumer if he or she would be interested in purchasing a type of service or merchandise, and that information then is used to contact the consumer to encourage such purchases, the survey call is considered telemarketing and subject to the Do Not Call restrictions.

Which is all well and good, but I’m not going to hang around for 10 minutes of ‘what long-distance company do you use?’ in order to differentiate ‘good’ surveys from ‘bad’ ones; I’ll just hang up straight away.

Sport: Ben forwards this story — the US baseball team has failed to qualify for the next Olympics. Yes, baseball. And no, I didn’t know that other countries had genuine baseball teams.

Clay Shirky on Complex Software Systems

Software: Shirky on the Semantic Web. Great snippet:

it turns out that people can share data without having to share a worldview, so we got the meta-data without needing the ontology. Exhibit A in this regard is the weblog world. In a recent paper discussing the Semantic Web and weblogs, Matt Rothenberg details the invention and rapid spread of ‘RSS autodiscovery’, where an existing HTML tag was pressed into service as a way of automatically pointing to a weblog’s syndication feed.

About this process, which went from suggestion to implementation in mere days, Rothenberg says:

Granted, RSS autodiscovery was a relatively simplistic technical standard compared to the types of standards required for the environment of pervasive meta-data stipulated by the semantic web, but its adoption demonstrates an environment in which new technical standards for publishing can go from prototype to widespread utility extremely quickly. …

This, of course, is the standard Hail Mary play for anyone whose

technology is caught on the wrong side of complexity. People pushing such technologies often make the ‘gateway drug’ claim that rapid adoption of simple technologies is a precursor to later adoption of much more complex ones. Lotus claimed that simple internet email would eventually leave people clamoring for the more sophisticated features of CC:Mail (RIP), PointCast (also RIP) tried to label email a ‘push’ technology so they would look like a next-generation tool rather than a dead-end, and so on.
Here Rothenberg follows the script to a tee, labeling RSS autodiscovery
‘simplistic’ without entertaining the idea that simplicity may be a requirement of rapid and broad diffusion. The real lesson of RSS autodiscovery is that developers can create valuable meta-data without needing any of the trappings of the Semantic Web. Were the whole effort to be shelved tomorrow, successes like RSS autodiscovery would not be affected in the slightest.

Another good line: ‘There is a list of technologies that are actually political philosophy masquerading as code, a list that includes Xanadu, Freenet, and now the Semantic Web.’

Belkin’s Brain-damage, and Bye-bye Public Domain

Spam: The Reg reports that a Belkin Router software upgrade hijacks HTTP connections to spam the browser with ads. Here’s a screenshot of the ad page. Here’s a USENET post bemoaning the situation, and the followup from a Belkin PM.

This is amazing; a working piece of network infrastructure has been effectively modified to:

  • replace the expected HTTP responses with spam ‘for your convenience’
  • do this once every 8 hours until told to stop
  • report serial numbers, IP addresses and software revisions back ‘home’ as part of this

And, of course, web browsing is not the only thing that runs over port 80.

So, it’s a router that inserts spam into your packets, whether you want it or not, due to a software upgrade; and if you want the bugfixes in that upgrade, you get the spam whether you want it or not. And, that spam could break quite a bit of legitimate port 80 traffic, such as automated download tools that aren’t a full web browser, for example. And the spam is unannounced on the download page, or in the change log. I’d hope that’s pretty serious under consumer-protection law… it certainly should be.

Copyright: In case there was any doubt that Sonny Bono and Jack Valenti wanted to remove the legal concept of the public domain, check this quote from the Congressional record:

(Mary Bono): Actually, Sonny wanted the term of copyright protection to last forever. I am informed by staff that such a change would violate the Constitution. I invite all of you to work with me to strengthen our copyright laws in all of the ways available to us. As you know, there is also Jack Valenti’s proposal for term to last forever less one day. Perhaps the Committee may look at that next Congress.

Wow. More via an Eldred-related site.

Real-time DNS blocklist accuracy figures

Spam: DNS blocklists are the oldest means of spam-blocking, and are still exceedingly useful; nowadays, many of these are fully automated systems, using proxy-detection algorithms and sensing patterns in mailer behaviour indicative of spam.

A few months back on the ASRG list, there was a discussion of DNSBL accuracy; I posted some SpamAssassin figures, based on our ‘mass-check’ tests, but noted that they were computed using current DNSBL contents against a corpus of saved mail, so due to the time delta, were not 100% representative.

These figures are a lot better. Since August, I’ve been collecting real-time DNSBL hit data on my mail, as it is delivered at my SpamAssassin installation. In other words, it’s live accuracy data — it’s using just what the DNSBLs had listed at scan time.

(DNS blocklist accuracy figures continued…)

Note, however, that it’s still incomplete:

  • some DNSBLs were not measured; these are just the default DNSBL list in SpamAssassin 2.60, excluding RCVD_IN_NJABL_DIALUP (which I had to remove because I can’t parse out accurate data).
  • it’s only 1 person’s hand-classified mail.
  • SpamAssassin tests more than just the ‘delivering’ SMTP relay; it’ll also look backwards through the headers, at earlier relays, to catch spam sent via mailing lists. This is different from what’s used with most traditional DNSBL-supporting systems.

But the results should still be quite useful.

The time period covered:

  • Thu, 21 Aug 2003 17:11:30 -0700 (PDT)
  • Sat, 25 Oct 2003 23:11:52 -0700 (PDT)

Recap of the fields:

  • SPAM% = percentage of messages hit that were spam
  • HAM% = percentage of messages hit that were spam
  • S/O = Spam/Overall = Bayesian probability of spam
  • RANK = artificial ranking figure, ignore this!
  • SCORE = default SpamAssassin 2.60 score
  • NAME = name of test. Figuring out the exactly DNSBL should be pretty obvious ;)

OVERALL%   SPAM%     HAM%     S/O    RANK   SCORE  NAME
21839     1993    19846    0.091   0.00    0.00  (all messages)
100.000   9.1259  90.8741    0.091   0.00    0.00  (all messages as %)
5.989  59.0567   0.6601    0.989   1.00    2.25  RCVD_IN_BL_SPAMCOP_NET
3.869  37.7822   0.4636    0.988   0.96    1.10  RCVD_IN_DSBL
0.751   8.2288   0.0000    1.000   0.95    4.30  RCVD_IN_OPM_HTTP
1.964  20.2709   0.1260    0.994   0.95    1.10  RCVD_IN_NJABL_PROXY
0.659   7.1751   0.0050    0.999   0.95    0.64  RCVD_IN_NJABL_SPAM
0.614   0.0000   0.6752    0.000   0.94   -0.10  RCVD_IN_BSP_OTHER
0.050   0.5519   0.0000    1.000   0.94    4.30  RCVD_IN_OPM_SOCKS
0.027   0.3011   0.0000    1.000   0.94    4.30  RCVD_IN_OPM_WINGATE
0.119   0.0000   0.1310    0.000   0.94   -4.30  RCVD_IN_BSP_TRUSTED
0.939   9.7341   0.0554    0.994   0.94    4.30  RCVD_IN_OPM
1.081  10.9383   0.0907    0.992   0.93    1.52  RCVD_IN_SORBS_SOCKS
1.062  10.7376   0.0907    0.992   0.93    1.27  RCVD_IN_SBL
0.229   2.4084   0.0101    0.996   0.93    1.10  RCVD_IN_SORBS_MISC
0.618   6.3221   0.0453    0.993   0.93    1.10  RCVD_IN_SORBS_HTTP
0.595   5.9709   0.0554    0.991   0.92    4.30  RCVD_IN_OPM_HTTP_POST
0.078   0.7526   0.0101    0.987   0.90    2.60  RCVD_IN_SORBS_ZOMBIE
0.815   7.5263   0.1411    0.982   0.89    1.39  DNS_FROM_RFCI_DSN
3.594  24.8369   1.4613    0.944   0.81    2.55  RCVD_IN_DYNABLOCK
1.685  11.4400   0.7054    0.942   0.78    0.10  RCVD_IN_RFCI
0.380   2.4586   0.1713    0.935   0.75    1.31  RCVD_IN_NJABL_RELAY
6.182  33.9689   3.3911    0.909   0.73    0.10  RCVD_IN_NJABL
10.422  44.4054   7.0090    0.864   0.63    0.10  RCVD_IN_SORBS
0.037   0.1505   0.0252    0.857   0.54    2.80  RCVD_IN_SORBS_WEB
2.344   4.1144   2.1667    0.655   0.17    0.00  RCVD_IN_SORBS_SPAM

Super-absorbent diaper danger

Blogs: Mimi Smartypants, very funny woman that she is, has become a very funny doting mother:

Similarly, recently she finally fell asleep in my arms and I soon caught the faint scent of urine, but I stuffed that thought way down deep into my brain, where the Syrup Of Denial runs thick and sticky, because I just could not deal with the possibility of waking her up during the diaper change. So my daughter slept in her own urine all night, or at least she slept through the bit that was not sucked up by the scarily absorbent diapers they make nowdays. (Seriously, what the hell is in there? I keep having nightmares that the superabsorbent gel will actually start to absorb moisture out of the baby itself and there will just be a dried-up husk in the crib in the morning.)

My new .sig awaits

Open Source: The FREE, 0% APR, Better Sex, No Effort Diet: Howard Strauss, Princeton’s manager of technology strategy and outreach (no less!) takes aim at free software in their ‘Syllabus’ magazine. He launches a few ad hominems while he’s at it:

These folks are some of the same great people who are supposed to be working for you anyway, plus a smattering of teenagers too young to work at Redmond, hackers, virus creators, and a menagerie of others with whom you will feel great pride in entrusting your IT infrastructure.

Given that Princeton’s OIT uses SpamAssassin, I guess that means he reckons I, and the other developers, are ‘teenagers too young to work at Redmond’, or ‘virus creators’. Thanks muchly, Princeton!

It sounds like a joke, but I actually think he’s serious. My recommendation: he needs to take a job in the software-development side of
such companies as ‘Microsoft, IBM, Sun, or even Blackboard’ to see how well the commercial software development methodology really works. Hint: from the outside, you don’t hear the half of it ;)

Oh — regarding ‘teenagers too young to work at Redmond’: this /. comment is worth noting.

Room for an Irish Netflix

Net: So it seems Kerry Packer has announced a Netflix-like service in Australia, Homescreen.

In essence, you pay a flat fee per month, log on to a website, select a whole batch of DVDs, and they post the first 3 out to you. You can keep them as long as you like, then post them back in pre-paid envelopes; once they arrive at the nearest depot, they post out the next 3 on your list.

This works very well — in the form of Netflix at least. I can vouch for the coolness of this; pretty much everyone I know who has a DVD player has joined Netflix. It’s just great having 3 DVDs on-hand for whenever you feel like watching one.

Of course, it requires that the serivce have a decent selection of goods, including some good ‘classics’. From the sounds of things, Homescreen may be failing on this point.

Also, it requires a reliable postal service. But if they can do it in the US, they can certainly do it in Australia or any European country ;)

And I’d bet Ireland has a whole huge DVD-player installed base, given the oft-quoted factoid that there are more PlayStations per capita in Ireland than any other country outside of Japan.

Irish entrepreneurs — get cracking! ;)

The self-aggrandization prize goes to Craig Venter

Science: I’m the human genome, says ‘Darth Venter’ of genetics (Observer).

Craig Venter, the controversial geneticist who led private industry’s decoding of the human genome, has revealed a startling secret. The genome – unravelled two years ago – is his.

To the surprise of scientists, Venter has admitted that much of the DNA used by his company, Celera Genomics, as part of this decoding effort came from his cells. The news has annoyed his colleagues, who claim that Venter subverted the careful, anonymous selection process they had established for their DNA donors.

I missed this story when it came out, but it’s a biggie. Instead of mapping the genome of a scientifically-chosen representative, we have the genome of an egomaniac CEO, who spent the entire project self-aggrandizing and attention-seeking.

Just as well the publicly-funded, international Human Genome Project was around to keep them honest for the most part…

Some more choice quotes:

‘It doesn’t surprise me. It sounds like Craig,’ said Nobel laureate James Watson, co-discoverer of the structure of DNA.

As to his reasons for his actions, Venter was unequivocal. ‘How could one not want to know about one’s own genome?’ he said. Neither was he fazed about accusations of egocentricity. ‘I’ve been accused of that so many times, I’ve got over it,’ he said.

Celera’s science board was not so understanding. ‘Any genome intended to be a landmark should be kept anonymous. It should be a map of all of us, not of one, and I am disappointed if it is linked to a person,’ said board member Arthur Caplan.

He added that the drive to sequence the human genome was an opportunity for personal glory as well as scientific discovery. Venter’s action emphasised the first motive.

Herring Fart Chat

—–BEGIN PGP SIGNED MESSAGE—– Hash: SHA1

Science: Fish farting may not just be hot air (New Scientist):

Biologists have linked a mysterious, underwater farting sound to bubbles coming out of a herring’s anus. No fish had been known to emit sound from its anus nor to be capable of producing such a high-pitched noise.

… Three observations persuaded the researchers that the FRT is most likely produced for communication: Firstly, when more herring are in a tank, the researchers record more FRTs per fish. Secondly, the herring are only noisy after dark, indicating that the sounds might allow the fish to locate one another when they cannot be seen. Thirdly, the biologists know that herrings can hear sounds of this frequency, while most fish cannot. This would allow them to communicate by FRT without alerting predators to their presence.

—–BEGIN PGP SIGNATURE—– Version: GnuPG v1.2.2 (GNU/Linux)
Comment: Exmh CVS

iD8DBQE/qThjQTcbUG5Y7woRAgEOAKDBmfaPgFrrGwTIndzQXJpQvoJGQwCcDyMa qkAWXoutn5Ki64fTK05emHA=
=E1La
—–END PGP SIGNATURE—–

Jody — still going strong

Spam: I just got another Jody spam; 40 points this time, and featuring the very latest in spam fashion, a .biz URL.

It’s amazing! The ‘Jody’ fake testimonial crops up in 9060 results on the web and 78600 results on USENET. The oldest spam Google Groups has with this text was posted back on 26th May 1998, which makes it 5 and a half years old by now. (Check it out for some classic period ASCII art, misspellings, and LOTS OF SHOUTING!!!!)

Last time I posted about it, Ben actually tracked down a ‘Mitchell Wolf M.D., Chicago, Illinois’ — Jody’s supposed spouse. Presumably he’s retired on the the ‘USD 147,200.00 every 45 days’ that Jody was amassing from her ‘hobby’, though. ;)

Sampler Victorious

Ireland: The best programme on Irish TV, by far, is Sampler. It’s a great magazine series covering Ireland’s underground scenes, with several nice scoops, including being the only set of film cameras around for the police brutality that made the May 6th 2001 Dublin ‘Reclaim the Streets’ protest infamous. Great soundtrack, too.

Naturally, it’s also had a long and illustrious history of no support from RTE, who just seem to hate the whole idea and would prefer they just had a nice, non-controversial chat show instead.

Well, Sampler just won ‘Best Special Interest Programme’ at the Irish Film and Television Awards. Nice one! (Not that you’d know it from the IFTA website, which hasn’t updated the awards pages in 2 years. — update: Simon points out I’m looking at the wrong site: the real one is here.)

Disclaimer: Luke, the producer, is a good mate of mine. But it’s still
a great programme. ;)

Go take a look! Episodes 2 to 5 are online in full, in RealVideo format — and encoded at a pretty decent bitrate.

Justin the Scoopist

Timeliness: w00t! I blog about Jason Salavon, and 4 days later Boing Boing and plasticbag.org both pick up on it. (and rightly so.)

It gets better — then there’s this posting about the EVACS e-voting system, and a week later, Wired News cover it!

… OK, I’m totally exagerrating the latter one. Obviously Wired News go into a lot more detail and do a bit more research. ;) In fact, it’s a very good article; here’s a killer quote from Software Improvement’s Matt Quinn, the lead engineer on EVACS:

Quinn … says he is ‘gob smacked’ by what he sees happening among U.S. electronic voting machine makers, whom he says have too much control over the democratic process.

It has been widely reported that Ohio-based Diebold Election Systems, one of the biggest U.S. voting-machine makers, purposely disabled some of the security features in its software. According to reports the move left a backdoor in the system through which someone could enter and manipulate data. In addition, Walden O’Dell, Diebold Election System’s chief executive, is a leading fundraiser for the Republican Party. He stated recently that he was ‘committed to helping Ohio deliver its electoral votes to the president next year.”

‘The only possible motive I can see for disabling some of the security mechanisms and features in their system is to be able to rig elections,’ Quinn said. ‘It is, at best, bad programming; at worst, the system has been designed to rig an election.’

‘I can’t imagine what it must be like to be an American in the midst of this and watching what’s going on,’ Quinn added. ‘Democracy is for the voters, not for the companies making the machines…. I would really like to think that when it finally seeps in to the collective American psyche that their sacred Democracy has been so blatantly abused, they will get mad.’

But he says that the security of voting systems in the U.S. shouldn’t concern Americans alone.

‘After all, we’ve all got a stake in who’s in the White House these days. I’m actually prone to think that the rest of the world should get a vote in your elections since, quite frankly, the U.S. policy affects the rest of the world so heavily.’

At Home with the Fuhrer

Bizarre: Given some historical context, it’s funny how absolutely insane this sounds: Guardian: At Home with the Fuhrer.

My discovery was an article headlined ‘Hitler’s Mountain Home’ – a breathless, three-page Hello!-style tour around Haus Wachenfeld, Hitler’s chalet in the Bavarian Alps. In it, the author, the improbably named Ignatius Phayre, tells us that ‘it is over 12 years since Herr Hitler fixed on the site of his one and only home. It had to be close to the Austrian border’. It was originally little more than a shed, but he was able to develop it ‘as his famous book Mein Kampf became a bestseller of astonishing power’.

The great dictator, it seems, was quite the interiors wizard: ‘The colour scheme throughout this bright, airy chalet is light jade green. The Führer is his own decorator, designer and furnisher, as well as architect… has a passion about cut flowers in his home.’

And he is seldom alone in his mountain hideaway, as he ‘delights in the society of brilliant foreigners, especially painters, musicians and singers. As host, he is a droll raconteur… ‘

Oh, and look who’s practising his archery in the garden: ‘It is strange to watch the burly Field-Marshal Göering, as chief of the most formidable airforce in Europe, taking a turn with the bow-and-arrow at straw targets of 25 yards range.’

And on it gushes, all accompanied by various photos of Hitler and friends admiring the view, examining plans for the house, and one delightful shot of Adolf relaxing on a deckchair with ‘one of his pedigree alsatians beside him’.

Next time you read an over-excited ‘inside the home of’ article, bear in mind that the subject might be a psychopathic dictator bent on world domination and mass murder.

(The article then descends into a convoluted mess of copyright claims and counterclaims, BTW, in case you’re interested. But the bizarre stuff is what got me ;)

Sampler Victorious

The best programme on Irish TV, by far, is Sampler. It’s a great magazine series covering Ireland’s underground scenes, with several nice scoops, including being the only set of film cameras around for the police brutality that made the May 6th 2001 Dublin ‘Reclaim the Streets’ protest infamous. Great soundtrack, too.

Naturally, it’s also had a long and illustrious history of no support from RTE, who just seem to hate the whole idea and would prefer they just had a nice, non-controversial chat show instead.

Well, Sampler just won ‘Best Special Interest Programme’ at the Irish Film and Television Awards. Nice one! (Not that you’d know it from the IFTA website, which hasn’t updated the awards pages in 2 years. — update: Simon points out I’m looking at the wrong site: the real one is here.)

Disclaimer: Luke, the producer, is a good mate of mine. But it’s still
a great programme. ;)

Go take a look! Episodes 2 to 5 are online in full, in RealVideo format — and encoded at a pretty decent bitrate.

Needs more thought

Politics: Nelson Mandela banned from visiting the US. oops! But they’ve fixed it:

The good news is that the United States government has removed Nelson Mandela, Tokyo Sexwale and Sidney Mufamadi from its list of global terrorists. The bad news is that the removal is only for the next 10 years. ….

‘To make an exception for those who struggled against apartheid would require congress to change the law, and that would be a very lengthy process,’ (Virginia Farris, the public affairs spokesperson for the US embassy in Pretoria) said.

Via Wendy M. Grossman, who reckons myself and the other SpamAssassin guys are Mrs. Beeton. ;)

Ho hum

Spam: I just received a spam containing this (HTML tags made readable by translating angles to round brackets):

Subject: Re: ZR, the master walked

(BODY bgColor=#ffffff) (font color=white) hellgrammite vocabularian distaff cardamom curvilinear pyhrric whizzing fruition canvasback maritime calcareous byline peddle cautionary smooch detain deadwood thrash centaur hurd coruscate confession bloom damsel gallon downtown morphine respirator psycho consolidate nee boycott (/font) Ban(/neve)ned C(/elmsford)D Gov(/validate)ernment d(/staccato)on’t wan(/goat)t m(/embank)e t(/trident)o s(/logjam)ell i(/constantine)t. Se(/falloff)e N(/judson)ow – (then a link, finally!!) (font color=white)neuroses aghast mazurka ribose architectural tranquillity heterosexual custom coquette mauritius downgrade croydon mechanist devious nh lange circumscribe infancy drool between foppish momentous doug induce (/font)

What a mess. Regardless, SpamAssassin gave it a 17.4 and autolearned it as spam ;)

Spam load and Hallowe’en

Spam: The volume of spam continues to rise inexorably. Brightmail are now estimating that 54% of all mail messages are spam.

Nowadays, my personal mail account is getting about 70 a day, rising to over 200 a day at the weekends. It’s getting tiresome; pretty much all of it gets marked as spam and diverted, but I still have to wade through it ‘just in case’, and to build the corpus. I guess I need to extend my .procmailrc to divert high-scoring spams somewhere I can check even less frequently ;)

That’s not the really annoying thing, though. I use tagged addressing when I publish my email address, most of the time. It works very well to identify spam sources overall, and divert ‘dead’ addresses that are getting spam, into the spamtraps. That’s the plus.

But the curse of writing spam filters is that you need a good archive of spam; and one of our SpamAssassin corpus guidelines is to attempt to trim out duplicate spams where possible. Many spammers will wind up sending more-or-less identical spam messages, modulo random subject lines, hash-busters, etc., and with (let’s say) 8 tagged addresses in their lists, I’ll get 8 copies of that spam, and have to pay a little bit of attention to trim it down to 1 copy for the corpus.

Damn spam-filter development! All this corpus building is hard work ;)

BTW, note how spam load rises at the weekends; (Tim Hunter, Paul Terry and Alan Judge of eircom.net also noted this in their paper presented at LISA ’03 yesterday ;). There’s a good reason — spammers attempt to deliver their spam while abuse staff are not at their desk. Same thing applies in the network security world; many of those attacks have taken place over a US holiday weekend.

Hallowe’en: best too-late idea for a hallowe’en costume: ‘Top Gun GWB’ in his flight suit. In the end, I played half of the ‘Dr. Frankenstein and Monster’ pair (I was the monster, as C really is a scientist, and computer ‘science’ doesn’t count). Best costume seen: a very impressive onnagata kabuki player.

Spam load and Hallowe’en

The volume of spam continues to rise inexorably. Brightmail are now estimating that 54% of all mail messages are spam.

Nowadays, my personal mail account is getting about 70 a day, rising to over 200 a day at the weekends. It’s getting tiresome; pretty much all of it gets marked as spam and diverted, but I still have to wade through it ‘just in case’, and to build the corpus. I guess I need to extend my .procmailrc to divert high-scoring spams somewhere I can check even less frequently ;)

That’s not the really annoying thing, though. I use tagged addressing when I publish my email address, most of the time. It works very well to identify spam sources overall, and divert ‘dead’ addresses that are getting spam, into the spamtraps. That’s the plus.

But the curse of writing spam filters is that you need a good archive of spam; and one of our SpamAssassin corpus guidelines is to attempt to trim out duplicate spams where possible. Many spammers will wind up sending more-or-less identical spam messages, modulo random subject lines, hash-busters, etc., and with (let’s say) 8 tagged addresses in their lists, I’ll get 8 copies of that spam, and have to pay a little bit of attention to trim it down to 1 copy for the corpus.

Damn spam-filter development! All this corpus building is hard work ;)

BTW, note how spam load rises at the weekends; (Tim Hunter, Paul Terry and Alan Judge of eircom.net also noted this in their paper presented at LISA ’03 yesterday ;). There’s a good reason — spammers attempt to deliver their spam while abuse staff are not at their desk. Same thing applies in the network security world; many of those attacks have taken place over a US holiday weekend.

Hallowe’en: best too-late idea for a hallowe’en costume: ‘Top Gun GWB’ in his flight suit. In the end, I played half of the ‘Dr. Frankenstein and Monster’ pair (I was the monster, as C really is a scientist, and computer ‘science’ doesn’t count). Best costume seen: a very impressive onnagata kabuki player.