Author: Justin

Justin Mason, the author of this weblog.

Coffee and Trivia

Published March 23, 2006

Just got a new cafetiere, so I can finally switch back from instant coffee to the real deal again for my morning coffee. My productivity has doubled. Still no DSL, though -- early next week is the current estimate, and I can hardly wait.

I went to a pub quiz last night with mates Macker, Tom and Alan -- a benefit for a new Dublin theatre company, I think. The prizes were:

First prize: several 50 Euron vouchers for various Dublin eateries
Second prize: two fancy scarves, a Nivea women's cosmetics kit, and a very metrosexual Nivea bath kit for a guy
Third prize: 4 bottles of nice wine

We did very nicely -- "aglet" was correctly defined for instance -- but not nicely enough. Put it this way: guess who's wearing Nivea deodorant?

Buying Consumer Electronics Online, in Ireland?

Published March 20, 2006

Hey lazyweb, hear my plea! What are my options for buying consumer electronics online, now that I'm back in Ireland?

I like online shopping. I dislike Argos, and I really hate Dixons, Currys and all the rest of the consumer-electronics high-street operations. Get me on the net and out of the nasty little shops and I'm happy. ;)

All in all, I'm a bit of an Amazon fan. However, now that I'm back in Ireland, I've been brought back to earth with a bang on that count; the prices are OK for items at both Amazon.com and .co.uk -- but shipping is turning out to be a total disaster.

Basically, I've put in two orders, paid through the nose for basic shipping, and neither has turned up. For example -- I ordered this phone a week and a half ago, on the 9th March, ponying up UKP 27 for the item -- and a painful UKP 7 for shipping by International Mail.

Delivery estimate on ordering was for between 5 and 7 days -- 14th to the 16th March. That was long enough -- but it still hasn't turned up, and Amazon.co.uk is still claiming that that is the current estimate, despite the 16th of March being 4 days ago ;)

On top of that, it appears they don't offer any way to track the packages using that shipping method, so who knows what's happening with the damn thing right now.

If I compare that with an order I made at Amazon.com last November, in which I nabbed a handy FM transmitter for my iPod -- in that case, I got it shipped by plain old US Postal Service for $4.51, which was handily discounted as Super Saver Shipping. That -- as with pretty much all my Amazon.com orders -- arrived in 3-4 days, and for a hell of a lot cheaper too. If I'd had to pay for shipping (which I didn't anyway), $4.51 vs UKP 7 works out as a third of the price, no less.

I'm guessing this is mainly down to Amazon.co.uk being shoddy in terms of how it deals with shipping to Ireland, and there are probably sites that use better-quality shipping partners.

Surely there must be better deals with vendors in Ireland, or even elsewhere in the Eurozone? Anyone know? Please drop us a line in the comments!

Update: the items arrived -- 14 days after ordering. This is a moot point now, though, since Amazon.co.uk are no longer selling 'PC & Video Games, Toys & Games, Gift items, Electronics & Photo and Home & Garden items' to Ireland; I guess it was easier to give up on the Irish market for now. Very disappointing -- but I'm waiting to see what happens next.

VAST.com

Published March 14, 2006

So, my new employer just launched today!

It's a new search service, VAST.com. As the blog says, 'we are building a search service that extracts classified ads from across the web, structures them, and then makes them available via an open REST API for commercial and non-commercial uses.'

Now you can see why I'm excited ;)

Greetings from 1996!

Published March 12, 2006

    --> Sending: ATZ
    ATZ
    OK
    --> Sending: ATQ0 V1 E1 S0=0 &C1 &D2
    ATQ0 V1 E1 S0=0 &C1 &D2
    OK
    --> Sending: ATH1
    ATH1
    OK
    --> Modem initialized.
    --> Sending: ATDT1892150150
    --> Waiting for carrier.
    ATDT1892150150
    CONNECT 45333

45 measly kilobits per second! This is incredibly painful -- and expensive at 5 cents a minute! I briefly considered getting around it by hiring a 3G data-card for the couple of weeks before my DSL is activated -- but that too is insanely overpriced.

Hurry up, DSL...

Disclosure

Published March 10, 2006

As of yesterday, I have a new day-job.

I won't be working on email spam as part of the job, which is an interesting turn of events. However, I'll be sticking with the open-source Apache SpamAssassin project, and keeping up the rate of work on that [*].

I'm not sure how much I can blog about the new place just yet, but I will say it's certainly looking like it'll be very interesting work ;)

[*: modulo the next couple of weeks while I'm waiting for my bloody DSL to be installed. argh!]

Apple Attempting to Patent RSS Aggregation

Published March 9, 2006

Miguel de Icaza quotes Dave Winer, pointing out two patent applications from Apple which seem intended to grab major chunks of the feed syndication space as Apple "IP".

The first application is news feed viewer, 20050289147, filed April 13 2005:

A computer-implemented method for displaying a plurality of articles, the method comprising: storing a first feed bookmark in a folder, the first feed bookmark indicating a first feed, the first feed comprising a first plurality of articles; storing a second feed bookmark in the folder, the second feed bookmark indicating a second feed, the second feed comprising a second plurality of articles; aggregating the first feed and the second feed to form a third feed; and displaying the third feed.

I think there were many RSS readers that implemented this, and others from the patent application, before April 2005. I know Liferea, the one I use, has had UI-level aggregation since September 2004, with its VFolders.

Next, news feed browser, 20050289468, filed April 13 2005. This one contains a wide range of claims, but here's one that stands out as particularly trivial:

A computer-implemented method for discovering a feed, the method comprising: receiving a request to display a file; determining that the file includes relationship XML; determining that a Uniform Resource Locator (URL) within the relationship XML indicates a file that comprises the feed; and displaying one of a group containing the feed and a link to the feed.

That's pretty much RSS autodiscovery, as described in 2002.

The listed inventors in both patents are: Kahn, Jessica; (San Francisco, CA) ; Alfke, Jens; (San Jose, CA) ; Wilkin, Sarah Anne; (Menlo Park, CA) ; Howard, Albert Riley JR.; (Sunnyvale, CA) ; Forstall, Scott James; (Mountain View, CA) ; Lemay, Stephen O.; (San Francisco, CA) ; Melton, Donald Dale; (San Carlos, CA) ; Loofbourrow, Wayne Russell; (San Jose, CA).

Thanks, Apple! and thanks, "inventors"!

It's important to note that this is still in the application stage, and as such can be invalidated, or narrowed down to a saner level, by using the techniques described here. I strongly recommend that people working in the syndication field with sufficient knowledge and expertise who feel strongly enough about this should spend a little time doing so, before the patent is issued and it becomes a multi-million-dollar task to invalidate it. (however, IANApatentL of course ;)

We Win

Published March 8, 2006

ongoing: The ASF Server:

Tim Bray: Which Apache project burns the most resources?

Mads: Spamassassin by a wide margin. [...]

Heh, we win ;)

Helios, the Zones server, has been an incredible resource for us. SpamAssassin isn't a traditional open-source software project in one respect: we use a lot of centralized "phone home" infrastructure to support rule and score generation. Having a virtualized server of this quality and horsepower to use for this has been fantastic.

(thanks to John O'Shea for the pointer!)

IBM Patents Closed-Loop Confirmation

Published March 8, 2006

Another day, another absurd IBM software patent. Via the IP list, here's United States Patent 7,003,497:

A method for confirming an electronic transaction, comprising the steps of: performing an electronic transaction between a first party and a second party; providing, by the first party to the second party, contact information of a third party service provider associated with the first party; contacting, by the second party, the third party service provider to obtain a location of a predetermined, private mailbox associated with the first party; sending, by the second party, a request for confirmation of the electronic transaction to the predetermined, private mailbox associated with the first party; accessing the private mailbox by the first party; and sending, by the first party, a reply message to the request for confirmation to thereby confirm authorization of the electronic transaction, wherein information regarding the private mailbox is not communicated to the second party during the electronic transaction.

There's lots of waffle in the background section about this being for electronic e-commerce transactions, but that claim, and claims 2 and 3 at least, are easily sufficiently broad to cover simple "confirmed opt-in" email subscription systems -- in other words, the system whereby a potential newsletter subscriber clicks on a link in order to "confirm" that they want to subscribe to a newsletter. That's the current best practice email subscription method used by pretty much everyone.

Filed December 31, 2001. There was plenty of prior art before this date, but who would want to go up against IBM, no less, to attempt to get this invalidated, especially now that it's been issued?

Thanks USPTO, you're doing a heck of a job!

US Things I Miss

Published March 7, 2006

So, I've been back in Ireland for several weeks now. How goes the culture shock? Well, let's make a list of the stuff I'm missing from California:

C, who's still back there finishing up her contract. Hurry up, C!
All my friends I left behind in the US :( Come visit!
The weather (well duh)
Trader Joes: low-cost, high-quality organic and near-organic food
The excellent Mexican and Southern food. Mmm, Taco Mesa
Super-cheap cocktails -- although having good Guinness makes up for a lot of this
The back country -- desert, mountains, snow, national parks. Ireland may have more surviving history dotted about, but it's just flat. I miss the mountains
Netflix -- haven't spotted a replacement for this yet. There are companies in Ireland that use a similar idea, but it appears every one just about manages to screw it up and render it useless, generally by introducing throttling, late fees, or slow turnaround. meh
The way my Irish accent meant I could get away with pretty much anything. That trick doesn't work in Ireland ;)

In other news: the broadband choices situation has pretty much gone to shit.

It turns out that all the good options are quite dependent on local-loop unbundling, which -- somehow -- still hasn't gotten around to my local exchange. As a result, guess who's going to be stuck on the wrong end of dialup, no less, for "2 to 3 weeks" until Eircom deign to switch on the bitstream access for my new BT-resold ADSL connection? Here's hoping there's a neighbour with broadband and wifi when I move back in. Joy.

DearAOL and GoodMail

Published March 2, 2006

Things have really been heating up recently around the AOL/Goodmail "pay to send" CertifiedMail scheme -- the EFF and a host of other groups have launched dearaol.com, stating:

This system would create a two-tiered Internet in which affluent mass emailers could pay AOL a fee that amounts to an "email tax" for every email sent, in return for a guarantee that such messages would bypass spam filters and go directly to AOL members' inboxes. Those who did not pay the "email tax" would increasingly be left behind with unreliable service. Your customers expect that your first obligation is to deliver all of their wanted mail, and this plan is a step away from that obligation.

While I dislike this proposal, too, as far as I can tell, AOL actually have pretty reasonable intentions with this program -- nowhere near as bad as the DearAOL.com site makes out.

However, they're doing a really really crappy job of getting this information out there, or committing to reasonable limits on the program, such as announcing that they will use it only for transactional emails, as Yahoo! have done.

I'd strongly recommend reading Carl Hutzler's posting on the subject. Carl was AOL's head of anti-spam operations until last year, so he really knows what he's talking about, and he lays it out clearly -- a lot more clearly than any corporate statements from AOL do. His blog contains a fair bit more on the subject, too.

But seriously -- why isn't there a press release on the AOL site about this scheme? Some front-channel communication about now might be useful, I'd suggest, before things really get hairy -- this crapstorm is coming about partly because AOL's comments are all filtering out in drips and drabs via third parties, and (AOLers say) are being misconstrued and misrepresented in the process. It's a classic case of missing the cluetrain.

I'd also really encourage the EFF people to tone done the rhetoric; statements like "senders will have no guarantee that their emails will be delivered" is scare-mongering, given that SMTP email already provides no such guarantee.

Update: wow, MoveOn went really overboard -- "threatening the Internet as we know it ... The very existence of online civic participation and the free Internet as we know it are under attack." OMG the sky is falling!

Side Issue: The Spam Definition

Also, another note to EFF: defining spam as "whatever you don't want to read" is a terrible mistake to make. That confuses a good, clear, enforceable and automatable definition of spam -- unsolicited bulk email -- and makes it effectively unenforceable by law, unpoliceable by ISPs, impossible to detect automatically, and incompatible with existing, effective EU and Australian legislation.

Listen to your own Chairman of the Board; he's right on this count.

PS: any luck fixing up the non-confirmed signups issue? Last time I checked I could still subscribe any address to the EFF Action Alerts without a cross-check, which is not a good thing.

Another script: goog-love.pl

Published March 2, 2006

A quick hack --

goog-love.pl - find out where your site's google juice comes from

This script will grind through your web site's "access.log" file (which must be in the "combined" log format). It'll pick out the top 100 Google searches found in the referer field, re-run those searches, and determine which ones are giving your website all the linky Google love -- in other words, the searches that your site 'wins' on.

The output is in plain text and a chunk of HTML.

usage:

goog-love.pl sitehost google-api-key < access.log > out.html

e.g.

cat /var/www/logs/taint.org.* | goog-love.pl \
  taint.org 0xb0bd0bb5yourgoogleapikeyhere0xdeadbeef | tee out.html

NOTE: this script requires the SOAP::Lite module be installed. Install it using apt-get install libsoap-lite-perl or cpan SOAP::Lite. It also requires a Google API key.

For example, here are the current results for this site. You can immediately see some interesting stuff that's not immediately obvious otherwise, such as my site being the top hit for [beardy justin] ;)

#1 for kriskat225: http://taint.org/2006/01/20/220239a.html
#1 for kriskat224: http://taint.org/
#1 for mailman rss: http://taint.org/mmrss/index.html
#1 for ray is naked: http://taint.org/2005/05/27/195421a.html
#1 for beardy justin: http://taint.org/2005/09/10/002323a.html
#1 for threadless rss: http://taint.org/2005/05/25/060857a.html
#1 for louis fitzgerald: http://taint.org/2005/05/12/020118a.html
#1 for download JusteTune: http://taint.org/index.php?tag=apple
#1 for mobile repair delhi: http://taint.org/2005/11/11/032651a.html
#1 for site:taint.org mythtv: http://taint.org/index.php?tag=hdtv
#1 for "Google Map" IDS rulesets: http://taint.org/2005/09/
#1 for spam email "prank a friend": http://taint.org/2004/11/
#1 for site:taint.org mythtv freevo: http://taint.org/index.php?tag=mythtv
#1 for world map desktop background: http://taint.org/xplanet/
#1 for kate thornton + Samuel L jackson: http://taint.org/2003/12/10/185721a.html
#1 for when did chris horn leave iona technologies?: http://taint.org/2003/05/
#2 for natkat224: http://taint.org/
#2 for itms linux: http://taint.org/2005/09/20/022107a.html
#2 for msn IDs hacking software: http://taint.org/index.php?tag=hacking
#3 for gmail spam filter: http://taint.org/2004/04/15/033025a.html
#3 for live world map on desktop: http://taint.org/xplanet/
#4 for moin mozex: http://taint.org/2004/10/08/081409a.html
#4 for editable p45: http://taint.org/2005/01/27/025238a.html
#4 for urban dead exploits: http://taint.org/index.php?tag=games
#4 for gmail spam filtering: http://taint.org/2004/04/15/033025a.html
#4 for world map desktop wallpaper: http://taint.org/xplanet/
#5 for cdwow.ie: http://taint.org/2003/12/04/185038a.html
#5 for life hacking: http://taint.org/2005/10/17/210751a.html
#5 for Adelphi Charter: http://taint.org/index.php?tag=politics
#6 for irish SME: http://taint.org/2005/06/23/212513a.html
#6 for urbandead: http://taint.org/index.php?tag=hacks
#6 for SKY NEWS IRELAND: http://taint.org/2004/05/12/205717a.html
#7 for daniel cuthbert: http://taint.org/2005/10/12/205836a.html
#7 for SAMUEL L. JACKSON QUOTES: http://taint.org/2003/12/10/185721a.html
#7 for cool background pictures: http://taint.org/xplanet/
#8 for CDWOW: http://taint.org/2003/12/04/185038a.html
#8 for urban dead: http://taint.org/2005/10/29/224403a.html
#8 for korea porn: http://taint.org/2003/07/12/031422a.html
#8 for BBC port 8998: http://taint.org/2003/08/
#8 for iftop documentation wrt: http://taint.org/index.php?tag=freevo
#8 for php mail injection spam: http://taint.org/2005/12/08/202248a.html
#8 for fake open source software : http://taint.org/index.php?tag=open-source
#9 for faad symbian: http://taint.org/index.php?tag=apple
#9 for sky news ireland: http://taint.org/2004/05/12/205717a.html
#9 for telemarketing counter speech: http://taint.org/2002/11/12/130851a.html
#10 for "Scratch Heads Over": http://taint.org/2003/07/12/031422a.html
#10 for web scraper linux console: http://taint.org/2004/06/05/023726a.html

Download here (5 KiB perl script).

Notes:

if you see a lot of "502 Bad Gateway" errors, it's probably over-zealous anti-bot ACLs on Google's side. Try from another host.
Read the comments for notes on a bug in recent releases of SOAP::Lite; please let me know if you hear of them getting fixed ;)

Vint Cerf speaking at Google on Thursday

Published February 28, 2006

Heads-up, Dublin geeks: Vint Cerf will be speaking at the Dublin Googleplex on Thursday.

Sadly, I won't be able to make it myself -- I had to visit the UK this week. Pity; I would have loved to hear him speak :(

Dublin Riots

Published February 27, 2006

While driving around Ireland on a wedding-location-scouting trip, we started receiving texts talking about riots in Dublin; I texted a friend, and got a reply along these lines: "Celtic-topped scobes run riot through O'Connell St, torching cars in Nassau street, hospitalising cops and Charlie Bird. madness!"

I thought he was joking, but nope. A load of IRA-slogan-shouting scumbags really had been allowed to run riot -- with paving stones of all things left unsecured in their midst! -- and it quickly got way, way out of hand.

The blog coverage is excellent, with lots of photos. I suggest starting with Indymedia Ireland, these Flickr photos and the links on this weblog. It appears the gardai really fell down on this one.

For what it's worth, I was in town a few hours later, and the rest of Dublin was trouble-free -- just the usual Saturday night goings-on. O'Connell St. was still a rubble-strewn mess when I passed through on Sunday, though.

SourceForge.net now offering public Subversion

Published February 23, 2006

Good news. It appears that SourceForge are now offering full, public use of Subversion for all projects on sf.net!

The SourceForge.net: Subversion (Version Control for Source Code) document contains full details on their setup. Notable key points:

It's using authenticated HTTPS -- which is great, going by my experiences with the ASF's setup
Imports are done from either an existing SF.net CVS repository using cvs2svn, from a Subversion 'svnadmin dump' file, or from a CVS repository tarball
CIAbot support is offered as standard ;)

Awesome. I'll be trying this out with Uffizi, which I registered as a Sourceforge project a few weeks ago just to try this out. ;)

DataMation Anti-Spam Product of the Year!

Published February 22, 2006

Hooray!

SpamAssassin has been voted DataMation Anti-Spam Product of the Year for 2006, earning three times as many votes as the next contender.

This is the second year in a row, which is fantastic -- and our margin is increasing each year. ;)

TREC Spam Corpus

Published February 20, 2006

Some news from TREC's Gordon Cormack:

The TREC 2005 Corpus (92,000 messages - 42,000 ham; 50,000 spam) is now available for self-serve download.

TREC Spam Evaluation is a NIST program to develop methods to measure spam filter accuracy and performance. More details here.

The corpus can be picked up at Gordon's site. As far as I can tell, this should be a pretty solid corpus for spam researchers and developers.

Four Things

Published February 20, 2006

I don't do silly blog antics much, but I got tagged by Mat for the Four Things meme. Looking around, it is indeed a bit more interesting than things like the usual LJ quiz, so why not!

I wrote this on the plane from LA to Dublin, which may have affected some of the selections in 4 places I would rather be right now at least ;)

4 jobs I've had:

I was Iona Technologies' first employee, and stayed there for no less than 7 years. I got to see the company grow from a handful of people, most of whom weren't getting paid (hence how I wound up as the first employee ;), all the way up to a 300-strong multinational, while the company itself formed a core of Ireland's mini dot-com boom. That was fantastic fun, and educational to boot.
my Dad's gun/fishing/sporting-goods shop. Was it really a good idea to have a teenager working near firearms? At least I wasn't the one who unplugged the fridge where the maggots were kept, so that they all hatched over the course of one weekend...
A horrible teenage job -- picking tomatoes. I can still feel the orange dust under my fingernails every time I smell fresh tomatoes :( I didn't last very long at that at all.
writing an Amiga-based kiosk system for virtually no pay whatsoever, at the age of 18 or 19. Ah, exploitation.

4 movies I can watch over and over:

Koyaanisqatsi -- it's dating a little now, since every ad agency through the 90s ripped it off. But still, the invention of a new format. I remember looking at the 405 freeway in LA, and thinking "looks like something out of Koyaanisqatsi" -- of course, it was.
Princess Mononoke -- either that, or Nausicaa. I just love the way the characters are coloured in shades of grey, rather than black and white.
the Lord of the Rings trilogy -- oh dear I'm a hopeless Tolkien fanboy.
Spinal Tap -- pure genius.

4 places I've lived:

Melbourne, Australia; around the time of the annoying TV drama, The Secret Lives Of Us;
Newport Beach, CA; around the time of the annoying TV drama, The O.C.;
Dublin, Ireland; no annoying TV drama -- so far
University of California Irvine, CA; while Irvine itself is the most soulless suburban hellhole I've ever visited, living on the UCI campus is quite fun by comparison. Take about 1000 grad students, post-docs and lecturers from around the world; put them all in the same square mile or so; remove all fun (and bars!) from the surrounding areas; watch them make their own entertainment, or go mad.

4 tv shows I love:

4 places I've vacationed:

Annapurna Base Camp, Nepal; we trekked our way up to there, then trekked back down again. Unforgettable. I really want to do another Nepal trek as a result
car-camping around the Australian state of Victoria; they have some fantastic national park campsites, which most tourists overlook
learning how to dive in Ko Tao, Thailand; great setting, great dive sites, pretty cheap too!
Yosemite; amazing, world-class natural beauty. Californians don't realise just how lucky they've got it ;)

4 of my favourite dishes:

A good Thai green curry
Laos-style green papaya salad with sticky rice
a good meaty cassoulet, from Fandango in San Luis Obispo. At least, that was the tastiest meal I've had in recent months ;)
Mangosteen -- the queen of fruit, according to the Thais. I could, and probably have, eaten hundreds of these

4 places I would rather be right now:

spending New Year's Day with a bunch of friends in rural West Cork or County Galway; until I moved to the US, this was one of my favourite annual traditions.
the Stag's Head Bar, Dublin, in the snug, again with a bunch of friends
sitting on the grass outside the Pavilion bar in TCD, on a sunny summer's day (hmm, that's a lot of bars!)
Chiang Mai, Thailand

4 sites I visit daily:

4 people I'm tagging:

The Return of Sneakernet

Published February 20, 2006

Keith Dawson sent this on -- an interview with Jim Gray, head of Microsoft's Bay Area Research Center and winner of the ACM Turing Award, talking about new transmission systems for truly massive data collections. Very interesting:

[One] option is to send whole computers. .... We're now into the 2-terabyte realm, so we can't actually send a single disk; we need to send a bunch of disks. It's convenient to send them packaged inside a metal box that just happens to have a processor in it. I know this sounds crazy -- but you get an NFS or CIFS server and most people can just plug the thing into the wall and into the network and then copy the data.

Dave Patterson, interviewer: What's the difference in cost between sending a disk and sending a computer?

JG: If I were to send you only one disk, the cost would be double -- something like $400 to send you a computer versus $200 to send you a disk. But I am sending bricks holding more than a terabyte of data -- and the disks are more than 50 percent of the system cost. Presumably, these bricks circulate and don't get consumed by one use.

DP: Are you sending them a whole PC?

JG: Yes, an Athlon with a Gigabit Ethernet interface, a gigabyte of RAM, and seven 300-GB disks -- all for about $3,000.

DP: It's your capital cost to implement the Jim Gray version of "Netflicks." (jm: sic)

JG: Right. We built more than 20 of these boxes we call TeraScale SneakerNet boxes. Three of them are in circulation. We have a dozen doing TeraServer work; we have about eight in our lab for video archives, backups, and so on. It's real convenient to have 40 TB of storage to work with if you are a database guy. Remember the old days and the original eight-inch floppy disks? These are just much bigger.

DP: "Sneaker net" was when you used your sneakers to transport data?

JG: In the old days, sneaker net was the notion that you would pull out floppy disks, run across the room in your sneakers, and plug the floppy into another machine. This is just TeraScale SneakerNet. You write your terabytes onto this thing and ship it out to your pals. Some of our pals are extremely well connected -- they are part of Internet 2, Virtual Business Networks (VBNs), and the Next Generation Internet (NGI). Even so, it takes them a long time to copy a gigabyte. Copy a terabyte? It takes them a very, very long time across the networks they have.

E-Pending

Published February 20, 2006

Boing Boing has an interesting case today:

"I filled out a web form for a contest from Miller using a throwaway junk email address and then, months after I dumped the throwaway account, I got this to my main account! Not sure I like the idea of companies tracking me down like this."

I sent a mail to follow up on this, but it's worth blogging here too.

This is, unfortunately, common practice among the "legitimate" bulk mailer companies; it's called "e-pending" (short for "email address appending"). Basically, the advertiser contacts one of the big data-mining companies, provides them with the data they have about the customer -- name, postal address, etc., and gets them to match that against their database; the data-miner then provides any other email addresses they may have on file for that user, even if those email addrs were provided for bills, promotional use for other companies, etc.

The advertisers contend that permission was given by the person who's being mailed; the recipients contend that permission was given to send to a specific address, not all of that person's addresses in perpetuity.

Here's a few more examples of e-pending gone bad: two Jennifer Millers, Sony scraping ancient Internic contact addresses, Spamvertized.org comment on the practice, Joe St. Sauver comments.

It's exclusively a US phenomenon, as far as I know; I think most cases of e-pending are rendered illegal under EU data protection law. Handy. ;)

Update: Brian at the Spam Kings weblog notes that 'this spooky little spam was the work of Equifax, the big credit reporting agency that shut down its Boca Raton-based spam operation, Naviant, in 2003, due to the impending passage of CAN-SPAM.'

RFID in the Grauniad, and back in Dublin

Published February 16, 2006

Greetings from sunny Dublin, Ireland! (really!)

I'm now back in taint.org's native timezone, although precariously set up and experiencing occasional interruptions. If you're waiting for a mail from me, it may take a little more time.

I did have time to be interviewed last week by Karlin Lillington for this Guardian story:

To make sure customs agents could read his cat's chip to match him to his Pet Passport on return to Europe, Mason bought his own scanner at a cost of some Â£200. "I didn't want to risk the cat being impounded for six months' quarantine at Heathrow," he sighs.

It's true.

Happy to be back -- I think. Looking forward to my first pints, in over a year, of creamy Guinness in its native habitat. I also have a couple of half-written weblog entries I wrote on the plane, too...

Yahoo! delete b3ta newsletter mailing list?

Published February 10, 2006

Today's top item on the b3ta front page, under Site News:

Yahoo please talk to us! Help! - our yahoogroups list (with over 100,000 subscribers) has been deleted. We don't know why. If you work at Yahoo and can help us sort this out please contact me at robmanuel AT gmail dot com.

posted by rob on 10th Feb at 2pm

B3ta is a long-established UK humour site who send out a weekly newsletter, every Friday afternoon, using Yahoo! Groups as their mailing list service. They've been doing this for years. Yep, that's 100,000 subscribers.

Anyway, if anyone from Y!Groups, or anyone who knows someone there, is reading, please do get in touch with the b3ta guys -- this is a very serious catastrophe for them. I'd be curious to hear how/why this happened.

To tie this into spam-filtering and email operational topics, it brought this posting from Jeremy Zawodny to mind:

This all makes me wonder if it's worth it for smaller organizations to bother running their own mail servers anymore. If Google offered small business mail the way Yahoo does, there'd be some serious competition in the market and it'd make a lot of people's lives much easier.

While Jeremy was talking about a different service from list hosting, I think we're seeing the other side of the email-outsourcing coin, here.

Update: fwiw, it's back:

Yahoo update - on Friday Yahoo deleted our list of 100,000 newsletter readers email addresses, hence we didn't send a newsletter. Today they've been in touch and have promised a response by Tuesday. Fingers crossed. UPDATE: It looks like it's back! Hooray for Yahoo!

Broadband choices in Ireland

Published February 9, 2006

Perfect timing! Just 5 days before I return to Ireland, Damien Mulley posts 'Broadband choices in Ireland', a good overview of the options available for consumer broadband internet connection.

I've been out of the loop for quite a while, and spoilt by the options available in suburban Southern California (which are, of course, pretty good). But this is a lot better than what was on the table when I left, 3 years ago.

What strikes me is that the upload/download speeds are quite reasonable and pretty close to what you'd see in the US. Similarly, the prices are finally near to the going rate in the US, once the various limitations and add-ons (required 'bundles', state taxes etc.) are taken into consideration.

However, virtually all of these deals use the horrendous concept of download capping! Given that I use this stuff for work, and routinely rsync around 30GB chunks of email corpora between central offices, colo servers, and my desktop, this just won't fly. It could be argued that I'm therefore not a typical broadband consumer, who these deals have been carefully designed to cater for. But seriously -- if a telecommuting software developer isn't a typical broadband consumer, who the hell is? Hey telcos: a little flexibility goes a long way -- don't fence me in. ;)

All in all, it looks like Smart Telecom are the winners; 3Mb/s download, 512Kb upload -- and most importantly, no cap -- for EUR 35 per month. (And check out that XHTML/WAI-compliant website!)

I probably would have gone with Irish Broadband, but for the past 6 months the only thing I've been hearing about them via word-of-mouth has been bad news, detailing customer service meltdown after meltdown. Even the legendarily incompetent 'biddies' of Eircom seem to be getting better reviews nowadays.

Talking of Eircon, our dear old dirty-tricks-wielding celtic-tiger-throttling incumbent telco: the top Sponsored Link on a Google search for irish broadband is:

Irish Broadband

www.eircom.ie -- More speed, prices reduced by 25%, free modem & a free connection!

Scum.

Spamhaus comment on the AOL/Goodmail deal

Published February 7, 2006

AOL and Yahoo! have been making a lot of headlines with their plans to reduce their whitelist-management workload -- and make a little pay-to-send money on the side -- with a deal with Goodmail.

Now Spamhaus have gone on the record against the plan:

On Monday, Richard Cox, chief information officer at antispam organization Spamhaus, said that "an e-mail charge will destroy the spirit of the Internet."

"The Internet has become what it is because of freedom of communication. Open discussion is what gives it value. There should be no cost for particular services, and e-mail should be free and accessible to all. This will disenfranchise people."

RFID “e-Passports”

Published February 3, 2006

This is what passports containing RFID chips will look like:

Note the little rectangular logo at the bottom. According to Ed Hasbrouck, that's the ICAO standard logo indicating that this is an RFID passport, and therefore:

identity thieves, terrorists, direct marketers, data aggregators, malicious governments, or anyone else with a radio receiver within 10 meters (30+ feet) or more whenever your passport is read at a border crossing, airport, etc. can secretly and remotely track you, log your movements through the unique "collision avoidance" ID number sent by the chip, and intercept and decrypt all the data (including your digital photo and, in some countries, your digitized fingerprints) needed to "clone" a perfect copy of your passport, forge other identity credentials, or impersonate you.

Of relevance are the comments over at Bruce Schneier's weblog entry regarding the Riscure research into the Dutch Biometric Passport's lousy security.

Interestingly, as one commenter there notes, breaking the crypto may be overkill; the knowledge that a person is carrying a passport from a certain country, or set of countries, may be enough for certain attackers.

I asked the Irish Passport Office about their RFID plans last April:

I'm an Irish citizen and passport-holder. I have been following recent discussions in the US regarding the addition of RFID computer chips to US passports, and I note that the US Department of State is now indicating that this measure was made necessary due to recent International Civil Aviation Organization (ICAO) standards -- namely ICAO Doc 9303.

As a result, since Ireland is a signatory to ICAO regulations, this raises the question as to whether Irish passports shall shortly include similar RFID or "contactless chip" technology.

Can you tell me:

if this is planned?

is there a mechanism for public comment on this process?

who could I further email to ask about this, if you do not know?

Disappointingly, I never received a reply. :( Someday I should really chase this up.

Update, Oct 17 2006: Well, they never bothered replying. They did, however, introduce RFID chips to Irish passports:

The chip technology allows the information stored in an Electronic Passport to be read by special chip readers at a close distance. The chip incorporates digital signature technology to verify the authenticity of the data stored on the chip.

OpenWRT Wifi Repeater Recipe

Published February 3, 2006

Seeing as I've moved house, and am staying at a friend's temporarily until I head back to .ie, internet access has become a bit of a problem. Hence, I'm posting this via some neighbour's leeched wifi ;)

To do this, I came up with some seriously hacky IP infrastructure, to wit a repeater setup composed of two off-the-shelf router/NAT/AP boxes, since the signal is pretty weak and needed a boost to cover the useful parts of the house. If you're curious. the details can be read over here.

Weblog Spam and Adversarial Classification

Published January 31, 2006

Dr. Dave, author of the Spam Karma WordPress antispam plugin, has posted an interesting article about new weblog-spammer tactics:

These spams do not present most of the idiotic traits of their lower colleagues: they do not try cramming hundreds of URLs or inserting hundreds of easily spotted junk keywords in the comment content. Instead, they use only the dedicated name and homepage fields to sneak in spam URL and keywords. The comment content is often perfectly innocuous, sometimes even topical (by copying parts of another comment or a trackbacking post). All in all, these spams could easily be missed by a human moderator who wouldn't look carefully at the contact name and URL.

(Thanks to Kelson Vibber for the pointer to this.)

In other words, he is noting what we noticed in email anti-spam; that what works well one year, is likely to degrade over time as the spammers attempt to evade it, and one has to keep working to keep up.

The best term for this appears to be adversarial classification. Anti-spam activities fall into this category, and it often means that classic text classification algorithms aren't suitable -- after all, the Reuters-21578 dataset never tried to evade your classifier ;)

In a similar vein, this MS research paper is interesting:

Previous work on adversarial classification has made the unrealistic assumption that the attacker has perfect knowledge of the classifier. .... We present efficient algorithms for reverse engineering linear classifiers with either continuous or Boolean features and demonstrate their effectiveness using real data from the domain of spam filtering.

It's akin to John Graham-Cumming's work looking into how a spammer could get past a bayesian filter "from the outside", but with more techniques, and examining MS' MaxEnt algorithm, too. PDF here, well worth a read.

(By the way, I'm in the process of moving house, so if you send me an email, it may take a while for me to reply. This situation is likely to prevail for the next few weeks, for what it's worth -- fun.)

Raw Food Crackpottery

Published January 23, 2006

Via RobotWisdom, a review of a new Primrose Hill cafe:

No wheat. No gluten. No sugar. No GMO. No dairy. No yeast. No shoes.

Yep, no shoes. If you want to enjoy the detoxifying glories of London's first raw-food cafe, then please leave your clod-hoppers at the door, along with your high stress levels and your smart-arse scepticism.

I know of another cafe elsewhere which also offered a largely-raw menu. This one, however, shared a back alleyway with a shop where a friend of mine worked.

He noted that on several occasions, he'd seen rats near, or on, the pallets of plastic-wrapped fruit and vegetables. You see, the raw food was delivered to the kitchen door, where it laid outside for a short while -- in the rat-infested alleyway. Rats crawling over your food, naturally, is not a good thing.

There's a very good reason why some smart stone-age ancestor invented cooking our food -- because it kills the germs that'll make us sick!

Devotees claim that because the enzymes are destroyed when food is heated above 48C, our bodies have to utilise our own enzymes to break down the food, which can result in us feeling tired and run-down.

Yeah, devotees are pretty much talking crap there. ;) If anything, cooked food is easier to digest than raw. And good luck with the whole 'getting by without using enzymes' thing!

What a load of quackery.

Happy Spam-Solved Day!

Published January 23, 2006

Happy BillG-Scheduled Spam Solved Day!

"Two years from now, spam will be solved," Microsoft's Bill Gates said [at the 2004 World Economic Forum in Switzerland].

So is it? Weeeeell.....

To "solve" the problem for consumers in the short run doesn't require eliminating spam entirely, said Ryan Hamlin, the general manager who oversees [Microsoft]'s anti-spam programs. Rather, he said, the idea is to contain it to the point that its impact on in-boxes is minor.

In that way, Hamlin said, Gates' prediction has come true for people using the right tactics and advanced filtering technology.

Ha. I am reminded of 'weapons of mass destruction-related program activities'.

As one slashdotter says, 'when you fail, try try again; or conversely, change the requirements and make it look like a success, which is exactly what BG has done.'

It's not washing, though, unsurprisingly. The poll on the same page, asks 'do you agree with Microsoft's contention that the spam problem has been "solved"?' Right now, with 1169 votes, it has 7.2% (in other words, the MS employees) agreeing, and a whopping 92.8% not going for it.

SweetheartsConnection.com – Interesting Dating Scam

Published January 20, 2006

Here's an interesting online scam. An anonymous friend, working in anti-spam, writes:

'I've been covertly looking into rumours of a myspace scam and thought you might like to blog it - I don't want to be attached to this in any way otherwise I'd write about it myself (I have a profile on there that I want to keep around in case other scams show up, but I don't really want to advertise the profile).

It works like this:

You sign up for a myspace account and fill in your profile details. Then in a couple of days someone contacts you pretending they're using their friend's account because they haven't signed up yet. They say something along the lines of "I saw your profile and thought you were cute, if you're interested email me at (random)@yahoo". If you email them, you get a reply back being all bubbly and cute, and a link to a web page that sort of looks like a "My First Homepage" - it even says "I'm taking a course at the community college in HTML". There are pics on the page of a very cute girl, but at the bottom a teaser saucy picture in lingerie, and an Adult Pass signup to get more pics. Of course the signup is $40.

It's a subtle scam, but definitely a scam. Here's an example of the type of site you get sent to:

http://www.honesthost5mb.com/kristenssite/

Note the hosting service. Now delete the /kristenssite/ part and it looks legit, right? Until you click on a few links and realise they have nothing to sell.

Google has no knowledge of honesthost5mb - nobody links to them, so how did Kristen find them?

It's indeed quite funny that there's a terribly similar hosting service out there: http://www.jagflyhosting.com/ - yet for some reason all their links seem to work, and they have an accessible phone number. Shock. Horror!

I'm pretty sure the account being (ab)used on myspace is a stolen one - it looks pretty legit, including linked in friends and comments, so I'm suspecting a cracked password.

Anyway, thought you could blog this to warn others about it (feel free to advertise the above link - though I guess that'll ruin the whole "google doesn't know" thing ;-) I wish I had the guts to sign up for the extra pics to see what you end up with!'

They also passed on the email content, noting 'here's the email sent from yahoo webmail from an AOL account (sadly AOL proxies all web content so I can't track it any further than New York proxies)':

Hi [redacted] ! Hey you found me! I was a little worried you wouldn't be able to :P so, how are you? I'm ok.. I'm sneaking a email in at work before my boss comes back in, so sorry if it's a little short! I promise to write more later :)

So I promised you some pics:P well I will have to send you some of me when I get home (don't have the pics here at work). In the meantime you can check out my personal homepage. It's kind of playground while I'm taking this intro to HTML class, kind of like my blog page. Here is the link: http://www.honesthost5mb.com/kristenssite It's not much yet but it's getting there. hehe

So tell me more about yourself, are you a work to live or live to work kinda person? What are you looking for in a girl? Do you like myspace? I think I'll make a profile soon, it's free right? and you can add your own HTML? That would be cool.. So how is your 2006 going? Mine is ok, one thing I'm excited about though is that today is exactly 1 week before my birthday. Hey, maybe if we hit it off, we can go on a first date on my birthday, that would be really cool. :)

Anyways, enough with the 20 questions right? oh, I prefer to chat on IM, its more personal you know? Do you have AIM? im kriskat224 on there, msg me sometime ok?

Well I should log off and get some work done.. Write back soon! and take care!

xoxo ~ Kristen

Sure enough, a little further research on Google yields the following examples...

The earliest is this story at Jiveworld.net, of 2004-05-24, noting:

Aaron recently received an e-mail from someone he supposedly chatted with on Match.com:

Aaron: I had actually been chatting with someone I might have met there a LONG time ago. I couldn't remember, so I gave her the benefit of the doubt. I thought it was SPAM, but hey, even my own e-mails sounds like SPAM sometimes. She sent me a picture in her e-mail, but the mail service she was using didn't like it. So she sent me the link to her "website." It initially seemed like a real personal web space until the big ADULT BUREAU logo appeared. Oh yes, very legitimate.

This was a unique experience for me since someone actually wrote a tailored response to my e-mail, responding to specific things I had mentioned. Even though the bulk of the e-mail seemed form generated, this had to have been a time intensive process for damn near no return. Well, after the ADULT thing, I thought my response to her e-mail was inventive. Since I haven't received another response, it's obvious she (Or he) took the hint.

Another: a thread at FordPower.net, 2004-09-24, with a link to http://www.4mbwickedweb.com/sites/melissa/ (since expired);

Another: a Fark thread posting, 2005-01-28, scroll down to the posting of '2005-01-28 10:42:28 AM' by 'XavierCrutch', linking to http://www.stepstonehost.com/jesshomepage/ (since expired);

Another: this weblog post, scroll down to March 13, 2005, 'Personal ads and the great porn conspiracy', where the poster is snared, via IM with AIM user natkat224 this time, and is sent another link to a site using http://adultbureau.sweetheartsconnection.com/ to collect the $40 fee;

Another: another weblog post, 2005-10-28.

A google search for the AIM username 'natkat224' reveals plenty more hits.

So here's a list of the sites found from those links, and via google, so far:

The common host, at all stages, is 'SWEETHEARTSCONNECTION.COM', registered to

INTERTRANS TRADING OVERSEAS LIMITED
VASILEOS OTHONOS 21, FANEROMENIX COMPLEX, OFFICE 102, 6030 LARNACA
N/A
N/A, CA N/A
CY

lots more detail here. SweetheartsConnection.com has terms and conditions that appear to prohibit spamming -- but it turns out that they themselves have a pretty scary entry at RipoffReport.com, anyway, noting:

If you want a free LIFE TIME PASSWORD with Adult Bureau.. you have to apply for a 1 month membership @$39.95 to Sweetheartsconnection.com A DATING SERIVCE ..... charge appears as IT INTERNET SERVICES.

No matter if you request cancellation of service this company will continue to bill you " it gets better " then send you to there home made collection company " Secure debt collections, " two companies in one both fraud

Phony Notices will be sent to the home demanding final payment of a service NEVER USED. They will contact you, try intimidate you into paying a Balance of $200.00 (Sweetheartsconnecton.com automatically rebills your credit card every month @$39.95.

eek.

This weblog post, of 2005-10-28. is shaping up to be the canonical support group for victims of this scam; worth reading the comments there.

Quite a scam, and interesting to note the "personal touch" via email and IM.

The C=64-izer

Published January 20, 2006

Ever wondered what today's internet meme images would look like on mid-'80's home computing hardware?

Wonder no longer!

What Works in Software Development

Published January 19, 2006

I already posted this to the link-blog yesterday, but it's so good it's worth promoting more widely. If you write software for a living, you really ought to read the slides for Michael Schwern's excellent 'What Works In Software Development' talk.

It's a long presentation (108 slides!), but during the course of that, he covers:

effective teamwork
dealing with bad customers
dealing with bad management
classic coding mistakes
classic project management mistakes
classic design mistakes
test-driven development
refactoring
patterns

It's a really good synthesis of what I think are the best bits of good OO design, XP, CPAN and perl's design and coding styles, without most of the cruft. I'll be pointing people at this for years to come, I think...

(Found via yoz.)

Planet Antispam: Beta No More

Published January 18, 2006

Planet Antispam has been working pretty nicely for the last couple of weeks -- can't say I've noticed any trouble, and its RSS feed is turning out to be a nice aggregation of anti-spam news. On top of that, John Levine was kind enough to set up a CNAME for it at a more appropriate URL -- http://planet.spam.abuse.net/.

As a result, it's now fully-fledged, and fit to lose the 'beta' qualifier. Please bookmark, subscribe to the feeds, and pass on the URL to others you think may be interested!

Moving Home — De-Cluttering

Published January 15, 2006

I'm moving home.

The flights are booked -- Feb 14th, Valentine's Day, I'll be leaving Orange County and heading back to Dublin permanently. In the meantime, I've been selling stuff, throwing stuff out, decommissioning servers, and making backups.

The server

My erstwhile desktop, later my trusty back-room server, 'jalapeno', was sold earlier today. Thankfully, I bought a 250GB hard drive recently, so I actually had the room to back up its 70GB somewhere beforehand.

Being security-conscious, I overwrote its partitions using pseudo-random data before passing it on ('dd if=/dev/urandom of=/dev/hda9 bs=1024k'). However, being lazy, I did this while the machine was up and running, over an SSH link.

Watching as 'df' produced gibberish output, and as later commands started producing nothing but bus errors, was odd -- a very strange feeling to be actively destroying the disk's data like that. Here's hoping the backups worked...

The yard sale

We had one, in the process selling about $1000 worth of IKEA furniture, books, camping equipment, bits of hardware, sports equipment, and a pink xmas tree:

The local bargain hunters starting knocking on the door at 8:15am, despite the sign's posted start time of 9am. Once we did start bringing items out to the front lawn to sell, there were already about 10 people, which quickly swelled to a mob of 20 by 8:45am. They were keen!

By the end of Saturday, we've sold pretty much all the furniture, all of the sports and camping equipment, most of the hardware that isn't total crap, and only 2 of the books. One shopper's explanation: 'she didn't have the time to read books'.

Still, the yard sale has netted $345. Not bad, and a good feeling to de-clutter so successfully.

Music, and iPod Shuffle

Published January 14, 2006

I've realised I like the endings of songs; whether I like a song or not, entirely depends on how it ends.

Apple's iPod shuffle algorithm is incredible. I've been spending quite a bit of time listening to it, and I'm sure it's not random; I think it's picking next tracks based partly on the similarity of metadata between the current and candidate tracks, which is quite neat as an automated mixing technique.

So is it random? Google says:

yes
no; a commenter on that article notes the same thing I'm talking about
yes
no; can't say I've noticed the Beatles getting a push on mine
yes
and finally, no answer here, but a pretty cool stats experiment

IPC::DirQueue 0.06 released

Published January 10, 2006

More details on the mailing list, if you're into that sort of thing. ;)

Google DRM and WON Authentication

Published January 9, 2006

So, Google have invented their own DRM, apparently. I'm keen to find out more details; Techdirt and Plasticbag.org are so far the only places I can find in the blogosphere to discuss it in any detail.

One tidbit worth noting from the LA Times coverage:

The Google copy-protection software also imposes a big restriction: The CBS shows, NBA games and other material protected by the software can be watched only on a computer that's connected to the Internet.

"I think it's going to be a problem," said Li, the Forrester analyst, adding that Google executives told her they were trying to fix it.

That's interesting. In my opinion, given that quote, I'll bet Google's DRM is something similar to the copy-protection systems used for many games since about id's Quake 3 and Valve's Half-Life; an online "key server" which validates codes, tracks player IDs, and who's viewing what, "live", as the video is cued up and played.

Some more info on the Half-Life WON authentication system can be found in this GamaSutra article; subscription required -- try viewing this google-cache version with Javascript off if you don't have a sub. That's historical now, of course, since that WON system has been replaced by a new auth protocol as part of Valve's 'Steam' system.

The key factor is the network, separating the dangerous, untrustworthy user machine from the trusted key server. Since the online key server can act as a platform for trusted, known-insubvertable code to run, along with the video server, both being under Google's control, it's actually possible to build reasonably solid DRM on this model. That's as opposed to the usual case, where a reasonably determined teenager can break it in a week of school-nights. ;)

Anyway, that's speculation. It remains to be seen if they've come up with something along the lines of WON authentication -- and if it's still easily subvertable or not.

Update: Aristotle Pagaltzis has a pretty good point in the comments:

Watching video, unlike playing a multiplayer game, is not an activity that inherently requires connecting to a server. Playing a multiplayer game, OTOH, inherently is.

So cracking a multiplayer gameâ€™s key check is fruitless, because then you canâ€™t play online anymore, which was the whole point of the game in the first place. In contrast, a video player with a cracked key check still fulfills its purpose just fine.

I think he's right. That's a key point, demonstrating how WON authentication still can't help -- media playback, as a task, is itself fundamentally crackable.

Wedding Plans

Published January 7, 2006

Myself and the lovely C are planning on getting married, hopefully sometime this year. I've just come across some details about Japanese weddings, and apparently:

'If you are attending a Japanese wedding reception, you are expected to bring cash for a gift (called Oshugi). The amount depends on your relationship with the couple and the region, unless the fixed amount is indicated on the invitation card. The average is 30,000yen ($250) for a friend's wedding. It's important that the cash is enclosed in a special envelope called Shugi-bukuro and your name is written on the front.' ... 'It is a grave insult to give less than $200.'

That gives me a great idea... ;)

Planet Antispam at abuse.net

Published January 5, 2006

Planet Antispam now has a better URL -- http://planet.spam.abuse.net/ . Much better!

Planet Antispam

Published January 4, 2006

So a few weeks back, I mooted the idea of an anti-spam Planet site, similar to Planet GNOME, Planet Java, Planet Perl et al.

Here's the results: Planet Antispam.

It's still got a few rough edges; notably, the URL is not permanent -- I'd prefer something at a more spam-themed domain -- and the logo is the generic "PlanetPlanet" one. But it's up and running in a beta-ish fashion.

Feel free to bookmark, subscribe, post the URL on, etc.; ~~and if you'd like to give it a better home with an A record at a spam-themed domain, drop me a line.~~

Update, Jan 17: Thanks to John Levine, it now has a permanent home at http://planet.spam.abuse.net/ . After several weeks of operation, I think it's turning out to be pretty solid, too!

By the way, it also needs more source feeds. If you know of people with blogs, working on/writing about anti-spam (of the email variety), with RSS feeds that work, include the post text, and permit further redistribution of that text, drop us a line and I'll add them.

Finally, here's a picture of a Starbucks SPAM(r) Sandwich. (shudder)

A couple of links while del.icio.us is ill

Published December 19, 2005

Happy birthday, Perl! --

Perl was 18 today. In many jurisdictions, it can now drink intoxicating liquors, vote, and join the armed forces.

Global Warming Sceptic Bingo:

Just tick the box when they use the argument next to it. Get four in a row and you win!

Get well soon, del.icio.us.

Irish MEPs on Data Retention

Published December 15, 2005

So, the bad news -- it appears that the European Parliament has passed the 'Data Retention' Directive, ~~introducing~~ requiring EU states to introduce mandatory electronic surveillance of all European citizens.

Tuppenceworth.ie has looked up how the Irish MEPs voted on the Directive. I was appalled to discover that Proinsias De Rossa (Labour) was the only Irish MEP to vote for this surveillance.

I generally give a high preference to Labour when voting, and before that, Democratic Left, and I've voted for him several times in the past. However, I think this may be the deal-breaker. I'm extremely disappointed.

By the way if party line was the issue -- that didn't stop Gay Mitchell (Fine Gael), who broke party line on this, saying:

I do not know why this proposal was rushed. The extremely accelerated legislation procedure has meant that there was little time for discussion, and translations were sometimes unavailable. There was also no time for a technology assessment or for a study on the impact on the internal market.

Major credit to him.

My ApacheCon Roundup

Published December 15, 2005

Back from ApacheCon!

I've got to say, I found it really useful this year. Last year, I was pretty new to the ASF, and found that my expectations of ApacheCon didn't quite match reality; it wasn't a rip-roaring success exactly, for me, as a result.

However, many details of how the ASF works -- and how the conference itself works and is organised -- are much clearer after you've spent some time lurking and absorbing practices in the meantime. (The visibility one gets into the process as a member of the ASF makes this a lot easier.)

Result: it was much more of a success for me this time around. Plenty of networking, putting faces to the names, hanging out, and discussing many aspects of our work.

The hackathon really worked out, too; while we didn't produce a hell of a lot of code per se, it made for a good 'developer summit' and I think we established solid agreement on SpamAssassin's short-term directions and goals. (summary: rules, and faster).

On top of that, I got to meet up with Colm MacCarthaigh and Cory Doctorow for discussion of Digital Rights Ireland. Looks like I'll be spending a bit of time on that next year ;)

Finally: Solaris. On Monday night, I got to sit down with Daniel Price, one of the kernel engineers behind Solaris Zones, work through a quick demo of a bug I was running into with chroot(2) and zones on our rule-QA buildbot server, and watch as he visually traced it through the OpenSolaris kernel source on the web. From this -- and from talking to Daniel -- it's pretty clear that things have changed at Sun. Pretty much the entire Solaris operating system is now a full-on open-source project; it's not just a marketing gimmick. The source is up there on the web, that's the source for the code they're running now, and there's no half-assed 'freeze it, cut out the good bits, and throw it over the wall' fake-open-source tricks.

The concept of getting this level of access to Solaris source code and engineers, would have blown my mind when I was Iona's sysadmin back in the 1990s ;) I'm very impressed.

Windows Live Local and Firefox

Published December 9, 2005

Windows Live Local, with its isometric, Sim City, "bird's eye" view, is quite nice.

However, what gets me is -- do MS do this deliberately? I'm referring, of course, to the way it's broken on Firefox 1.5, requiring you to drag twice to get it scrolling around the viewport, and the jumpy, clunky UI on that browser.

Pretty lame -- and lazy, too. By now, it's essential for a new fancy website to work under Firefox; even if only 20% of your users will be using it, a good proportion of those are the bleeding-edge, 'taste-maker' types who'll be blogging about it, writing reviews for newspapers and news sites, and generally generating buzz for you, and thereby attracting the other 80%.

I'm told it works great in IE, but there's no way I'm starting Windows and opening up that app. If I want to be infected by 700 different malwares within seconds, I'll ask. ;)

On top of that, coverage seems spotty -- Ireland is AWOL, of course.

As a result, my one line summary would have to be: idea = cool, dataset = probably cool, execution = half-assed and crappy. I'm looking forward to Google doing a much better job with their implementation of the Sim City viewpoint.

Email Injection attacks in PHP via mail()

Published December 8, 2005

Apparently, spammers are now exploiting a hole, or holes, in multiple PHP scripts which use the mail() API.

The holes are described at the SecurePHP wiki; basically, the script author inserts CGI fields directly into a message template without stripping newlines, and this allows attackers to create new headers, take over the message body, and generally take over the mail message and destinations entirely.

Funnily enough, these are the same holes Ronald F. Guilmette and I found in FormMail 1.9, and described in our Jan 2002 advisory Anonymous Mail Forwarding Vulnerabilities in FormMail 1.9 (PDF) on page 10, Exploitation of email and realname CGI Parameters. Ah, plus ca change...

Worth noting that perl's venerable taint checking would have spotted these, if it were used.

ApacheCon US 2005

Published December 5, 2005

In a couple of weeks, I'll be going to San Diego for ApacheCon US 2005 (including the hackathon beforehand). There'll be quite a few other SpamAssassin committers there, too, so if you're working with SA, or interested in getting some face time with the developers, there's no better way of doing so.

Digital Rights Ireland launch, next Tuesday

Published December 3, 2005

DRI's formal launch is next Tuesday:

December 6th sees the formal launch of Digital Rights Ireland, with a press conference in the Conference Room, Pearse St. Library, Dublin 2 at 11.30am. We would like to invite to you to come along - we'd welcome your support, and the chance to chat with you about your concerns after the main conference. Please feel free to invite anyone else who you think would be interested in digital rights. To give us an idea of numbers, we'd appreciate an email to <contact AT digitalrights.ie> to let us know if you're planning on coming along.

New SpamAssassin Rule Development Tools

Published November 24, 2005

Recently, I've been working on new systems to develop SpamAssassin rules faster, and with a lower 'barrier to entry' to the core ruleset. Some highlights seem bloggable, seeing as it's all web-based and I can link to it!

The 'preflight' BuildBot:

This uses the fantastic BuildBot continuous-integration system to monitor changes to our Subversion repository.

Every time something is checked into SVN, this wakes up and immediately runs mass-checks using that latest code and rules, allowing near-real-time viewing of changes in rule behaviour. (A 'mass-check' is a massive run of SpamAssassin across a corpus of hundreds of thousands of emails, en masse, to measure rule hit-rates.)

The corpus it mass-checks is split in a certain way so that results will be available very quickly -- typically in under 10 minutes -- with increasing quantities of results becoming available as time elapses.

Progress of the mass-checks are visible at the BuildBot here; as they complete, their results become visible on the Rule-QA app (below). (More info, if you're curious.)

The Rule-QA App:

To date, we've used the basic "freqs" table -- output from the hit-frequencies command-line script -- as the UI for rule QA and evaluation. This is fine for a small number of developers, but it scales badly and (like mass-checks) requires a pretty complex setup on the developer's machine.

This new component is a web application, which takes the "freqs" table, and "webifies" it -- demo.

Some major improvements are also made possible; the most important, that it can now display 'freqs' for multiple revisions during the day, and keeps historical data for comparison. It adds several new reports from 'hit-frequencies'; a score-map, overlaps, a performance measurement, and a boolean 'promoteability' measurement.

Finally, a really useful new report is the graph of rule hit-rate, as it changes over time. Here's a cached demo, or see the same data produced 'live'. This gives a totally new insight into how the rule hits for various people's corpora, how that changed over time, and allows a whole new type of rule analysis. (In fact, it also allows pretty good corpus analysis, too; can you tell which submitters bounce high-scoring spam at receipt time?)

(More info on these.)

Product idea: RAID Backup Enclosures

Published November 23, 2005

Cory Doctorow at Boing Boing links to an article at TechCrunch that lists Better and Cheaper Online File Storage as a product that needs to be made. However, Ben Laurie does the sums on online storage as a useful backup medium, and found them not exactly compelling (e.g. 100GB of data will take 75 days to upload over an 128Kbps link).

I tend to agree. An online host isn't great as a backup host, since, in my experience, there are two types of backups required:

The important small files (for example: encrypted password lists, my address book, my ~/bin directory)
The massive big filesets (for example: MP3s, photos)

The first kind of fileset is amenable to an online backup-storage service, at first glance. However -- in my opinion you're better off going the whole hog for these files, and using the distributed, versioned backup method of putting it in a good networked revision control system, and checking it out everywhere, so you can also make changes and check in from any host; otherwise, you face the perils of syncing up a single backup from multiple "writers", without conflicts. So far, none of the online file storage services offer SVN as an access method, so a shell account at a colo server still seems more useful on that count.

The second kind of fileset, as Ben notes, will take donkey's years to upload and sync as a backup mechanism; and the economics are hardly compelling for the service provider.

I think I prefer Brad Templeton's idea to deal with large-data backups --

I propose a software RAID-5, done over a LAN with 3 to 5 drives scattered over several machines on the LAN.

Slow as hell, of course, having to read and write your data out over the LAN even at 100mbits. Gigabit would obviously be better. But what is it we have that's taking up all this disk space ? it?s video, music and photos. Things which, if just being played back, don?t need to be accessed very fast. If you're not editing video or music, in particular, you can handle having it on a very slow device. (Photos are a bigger issue, as they do sometimes need fast access when building thumbnails etc.)

This could even be done among neighbours over 802.11g, with suitable encryption. In theory.

As a commenter notes, Linux has support for this already, in the form of software RAID and the network block device.

So: take an external IDE enclosure, add a GumStix board running Linux with software RAID, LVM, and nbd, and add wifi. Then add DAV, SMB and NFS export of the disk, and some decent UI code to organise the volumes into a single exported RAID volume (hopefully automatically!), and it'd be a pretty compelling product, in my opinion!

(hey Craig! I said GumStix! ;)

Wisdom Teeth — Complete!

Published November 21, 2005

On Friday, I got my lower-left wisdom tooth extracted. That's the last one that should cause any trouble; there's only one remaining, and it's fully out so shouldn't act up. After a few years of on-again-off-again twinges, and lots of irresponsible putting-off of surgery, I've finally taken care of it.

The downside: I'm totally zonked on painkillers, so I won't be doing much for the next few days apart from what's required for day-to-day day-job stuff.

Urban Dead HUD; added Inventory Sorting

Published November 14, 2005

I've updated the Urban Dead HUD Greasemonkey userscript; it now offers inventory sorting, inspired by Ikko's userscript (albeit a little different in implementation). Here's a screenshot:

Right now, UD is reasonably interesting -- our team of plucky survivors have been helping out with the defence of Caiger Mall, a major mall towards the north-west of the city. We've repulsed the Church of the Resurrection's attempts to wipe us out, but that seems to have made us quite a juicy target; there are now no less than three separate Zombie groups ganging up on us. For now, we're still holding out.

Mobile phone repair at Karol Bagh Market

Published November 11, 2005

I love these pictures:

I link-blogged that article ages ago, but I keep thinking of it, so it's worth a proper post in its own right, to expand on that.

These guys work at an Indian mobile phone repair stall in Karol Bagh Market, in Delhi. The blog entry notes:

As in China, many of the mobile phone shops and street kiosks offer mobile phone repair service. Many of these guys can strip and rebuild a mobile phone in minutes. ... a lot of the hyperbole surrounding western hacker culture makes me smile compared to what these guys are doing day in day out.

Also, a commenter notes: 'in india, for about 1$, you can convert a CDMA phone to GSM !! also, they can unlock phones and do a veriety of hacks for little money.'

There's so many lessons I'm getting from it:

I've had a shoe resoled in 5 minutes for next to nothing at a stall not too different from that -- but this is a mobile phone. It's amazing to think of that level of hardware hacking taking place every day at a back-street market stall.
Those phones were doubtless planned, as a product, with a 'ship back to manufacturer' support plan. That clearly isn't going to fly without that developed-world luxury, Fedex. So this is the developing-world street finding its own uses for things, and working around the dependencies on systems that are optimised for the developed world.
It's the flip-side of Joshua Ellis' grim meathook future, where we're not facing down the barrel of a New-Orleans-style descent into barbarity if the power suddenly cuts out; tech can go on. It may be a little chunkier, though, and with more duct tape, but hey.
It's also a beautiful demonstration of how those of us in the developed world who assume that developing-worlders cannot find a use for high tech, are talking shit. (cf. Ethan Zuckerman as a good example of someone who gets this, more than almost anyone else I can think of.)

I think this is one of the most important lessons I learned while travelling through India and SE Asia a few years back -- the developing world is using high tech, and it's not using it in the same ways we do -- or even the ways we anticipated, and we have plenty to learn from them too.

Found at Jan Chipchase's site, which is full of great contemplation on this stuff. (The story on Seoul's selca culture is nuts, too -- it's like Flickr^1000.)

(PS: I have a wisdom tooth extraction scheduled for next Friday... wish me luck. That's another thing you don't want to happen in the developing world, although I daresay it'd rock in Bangkok!)

(Update: clarification -- my cite of Ethan Z was meant as a compliment ;)

IFSO Seminar In Dublin

Published November 4, 2005

Passing this on for readers in Ireland -- this sounds like an interesting event. From the FSFE-IE mailing list:

On the morning of Friday November 18th, IFSO is organising an event hosted by MEP Proinsias De Rossa about preventing software patents in the EU. Topics covered will be:

An analysis of the software patent directive;
a discussion of Free Software and computer security;
an introduction to IFSO/FSFE and their work;
the future of legislative obstacles to the development and distribution of software.

The event will be held in the European Parliament Office in Ireland, and spaces are limited. Participants are therefore asked to register their intent to attend. See here for more details.

Producing Open Source Software

Published November 2, 2005

Plug: Producing Open Source Software, a new book by Karl Fogel (of the Subversion and CVS projects), readable online as HTML or in ground-up wood formats.

It's got a whole load of solid-gold good advice on open-source development best practices, and even includes a section on dealing with the dreaded Reply-To munging issue.

Looks excellent -- this is definitely one to read.

Urban Dead HUD

Published October 29, 2005

I've been playing a bit of Urban Dead recently. Urban Dead is a very low-key, web-based MMORPG -- you play a 3-minute turn once every 24 hours. It needs some rebalancing and some new features, especially given the organised nature of some of the bigger marauding zombie hordes, but I'm still finding it fun.

To scratch a couple of itches, I've written a Greasemonkey user script for UD called the Urban Dead HUD. It adds several nifty features to the user interface:

keyboard accelerator access keys for the action buttons, and your inventory -- very handy when you're attacking an enemy repeatedly;
an on-page long-distance map of the surrounding squares;
a distance tracker, which tracks the distances to "important" locations for you

There's screenshots on the download page, so you can see what I'm talking about.

Greasemonkey is a fantastic tool, as is Mark Pilgrim's Dive Into Greasemonkey, which has repeatedly turned out to be an excellent, well-written reference while hacking this. Thanks guys!

trueColor() bug in GD::Graph

Published October 28, 2005

Hacking on a new rule-QA subsystem for SpamAssassin, I came across this bug in GD::Graph. If:

you are drawing a graph using GD::Graph;
outputting in PNG or GIF format;
and the 'box' area -- the margins outside the graph -- keeps coming up as black, instead of white as you've specified;

check your code for calls to GD::Image->trueColor(1);, or the third argument to the GD::Image->new() constructor being 1. It appears that there's a bug in the current version of GD (or GD::Graph) where graphing to a true-colour buffer is concerned, in that the 'box' area continually comes out in black.

(Seen in versions: perl 5.8.7, GD 2.23, GD::Graph 1.43 on Linux ix86; perl 5.8.6, GD 2.28, GD::Graph 1.43 on Solaris 5.10.)

False Positive ‘Reports’ != FP Measurement

Published October 26, 2005

John Graham-Cumming writes an excellent monthly newsletter on anti-spam, concentrating on technical aspects of detecting and filtering spam. Me, I have a habit of sending follow-up emails in response ;)

This month, it was this comment, from a techie at another software company making anti-spam products:

When I look at the stats produced on our spam traps, which get millions of messages per day from 11 countries all over the world, I see our spam catch rate being consistently over 98% and over 99% most of the time. We also don't get more than 1 or 2 false positive reports from our customers per week, which can give an impression of our FP rate, considering the number of mailboxes we protect.

My response:

'Worth noting that a "false positive report from our customer" is NOT the same thing as a "false positive" (although in fairness, [the sender] does note only that it will "give an impression" of their FP rate).

This is something that I've seen increasingly in the commercial anti-spam world -- attempting to measure false positive rates from what gets reported "upstream" via the support channels.

In reality, the false positives are still happening -- it's just that there are obstacles between the end-user noticing them, and the FP report arriving on a developer's desk; changes to the organisational structure, surly tech support staff, or even whether the user was too busy to send that report, will affect whether the FP is counted.

Many FPs will go uncounted as a result. As a result, IMO it is not a valid approach to measurement.'

I've been saying this a lot in private circles recently, so in my opinion that's a good reason to post it here...

Wired on the Motorola ROKR iTunes phone

Published October 26, 2005

Via Cory at Boing Boing, here's a great Wired post-mortem on how all the corporate vested interests (including Apple!) turned a nice concept for a new, music-playing mobile phone, into a useless, DRM-hogtied, designed-by-committee turd.

That's worth a read, in itself. However, what really blew my mind was this:

Anssi Vanjoki, executive vice president of Nokia and head of its multimedia group, has bad news for the [music] labels. ... He pushes a couple of buttons on the [phone's] keypad. Up pops Symella, a new peer-to-peer downloading program from Hungary. As the name suggests, Symella is a Symbian application that runs on Gnutella, the P2P network that hosts desktop file-sharing apps like BearShare and Limewire. It was created earlier this year by two students at a Budapest engineering school that for four years has been exploring mobile P2P in conjunction with a local Nokia research center.

Symella doesn't come installed on the N91; Vanjoki downloaded it from the university Web site. "Now I am connected to a number of peers," he continues, "and I can just go and search for music or any other files. If I find some music I like and it's 5 megabytes and I want to download it - the carriers will love this. It will give them a lot of traffic."

I had no idea the platform was that open, at this stage. It'll be interesting to see what happens next...

UK ATM fraud in the 1990s

Published October 24, 2005

The Register: How ATM fraud nearly brought down British banking. This story is mind-boggling; it claims that UK ATM security had two major issues that have been kept secret since the 1990s:

An insecure data format used for the data on the magnetic stripes in one bank's cards;
Another bank's computing department "going rogue", "cracking PINs and taking money from customers' accounts with abandon" as the story puts it. Yikes.

The latter problem is scary, but in my opinion the former problem is more interesting from a computer security point of view.

This is a classic example of bad data format design, as it left the PIN and the account details individually rewritable -- in other words, an attacker could (and did) change one while keeping the other intact.

This British Computer Society abstract provides more details on the who, how and where:

... it was revealed that UKP 130,000 had been stolen from Abbey National cardholders during 1994 and 1995 with counterfeit cards. Andrew Stone, a bank security consultant who had been advising Which?, the magazine of the Consumers' Association, was jailed for five and a half years for the theft. This fraud involved spying on Abbey customers as they used their cards in automated teller machines (ATMs) or cash dispensers... [Stone] recorded card details and personal identification numbers (PINs) using powerful video cameras. The details were then encoded on the magnetic strips of other cards.

Finally, another quote from the Reg story:

why is he telling this explosive story now? Because chip and PIN has been deployed across the UK ATM network. "The vulnerability in the UK ATM network was still there to be exploited -- if someone had chanced upon it."

I wonder if other banking systems worldwide are still vulnerable, however? Did any other banks elsewhere license the vulnerable systems from UK banks, without knowing about these vulnerabilities? How long did it take for them to be fixed, if they were fixed?

Avian Flu, Health vs. IP Protection

Published October 22, 2005

Over at O'Reilly Radar, a question came up as to whether Roche's patent on Tamiflu should be respected if, in the event of a pandemic, people were dying on a large scale due to an inability for Roche to produce Tamiflu in sufficient quantities.

James Love of cptech.org recently pointed out that the WTO made an exception for a situation like this, allowing importation of medicines from foreign countries in violation of local patent licenses in the case of an emergency, in a 30 August 2003 decision:

Your country would benefit from importing generic medicines produced under a compulsory license, in order to build up adequate stockpiles or to obtain needed medicines in the event of a crisis.

However, many developed-world countries have explicitly made a commitment never to use this limited TRIPS waiver, namely the following:

Australia, Austria, Belgium, Canada, Denmark, Finland, France, Germany, Greece, Iceland, Ireland, Italy, Japan, Luxembourg, Netherlands, New Zealand, Norway, Portugal, Spain, Sweden, Switzerland, United Kingdom and the US.

Another 10 countries about to join the EU said they would only use the system to import in national emergencies or other circumstances of extreme urgency, and would not import once they had joined the EU: Czech Republic, Cyprus, Estonia, Hungary, Latvia, Lithuania, Malta, Poland, Slovak Republic and Slovenia.

So there you have it; the trade representatives for many developed-world countries took some kind of 'strong IP' high moral stand, and gave up this ability. I'll bet national health authorities are, right now, wandering government halls around the world, looking for trade representative asses to kick...

‘Internet Stamps’: ‘Sender Pays’ Is Back From The Dead

Published October 22, 2005

Jeremy Zawodny mentions that Tim Bray has proposed something he calls 'Internet Stamps' to solve the blog-spam problem; here's Tim's description of how it works:

An Internet Stamp is an assertion, signed by a Post Office, that some chunk of text was issued by someone who paid for the stamp. At least one major Post Office will be required by government statute to sell stamps to anyone in the world for either US$0.01 or EUR 0.01, and no stamp-selling organization will be recognized which sells stamps for less than this amount. For this to work, the number of stamp-selling organizations needs to be small and the organizations stable; another reason why Post Offices are plausible candidates.

It works like this: if you want to buy stamps, you sign up for an account with your Post Office; it works like paper stamps, you buy a bunch at a time in advance, in small amounts like $20 or EUR 10. Then the Post Office offers a Web Service where you connect to a port, authenticate yourself and send along some text; the Post Office decrements your account and sends back the stamp. There are a variety of digesting/signing/PKI techniques that could be applied to implement the stamps; a standard is required but should be easy.

Apparently himself and a few other guys chatted about it at the first Foo Camp, back in 2003. Funnily enough, in the anti-spam community, we were having our own chats about it, but it sounds like our paths didn't cross for some reason...

We call this idea 'sender pays'. Earlier in 2003, in June, John Levine published what I'd consider the canonical wrap-up of why it will not work, in 'An Overview of e-Postage'.

That report demolishes the use of 'sender pays' for e-mail anti-spam, on three main counts:

Creating a transaction system large enough for e-postage would be prohibitively expensive. The nearest parallel is the credit card transaction system, which deals with 1% of the transaction volume per day, and with much larger profit margins to make it worth their while.
The true financial, administrative, and social costs of e-postage are completely unknown. What do you do when a 'bad guy' steals the e-postage stamps off Aunt Millie's hard disk, without her knowledge? How much is the Fraud Handling Department going to cost? Is she just going to be out of luck when this happens? Will you need to use whitelisting and a content-based anti-spam filter as well, to filter out the messages sent using valid, but stolen, stamps?
Users hate micropayments. In short, see Andrew Odlyzko's research.

Now, using it on weblog spam is a little more practical than e-mail spam, for one because it has a lower daily volume of transactions; but these objections still stand, in my opinion.

John Levine is one of the foremost authorities in anti-spam, and this report has been a mainstay of the anti-spam canon for two years. Anyone discussing a new anti-spam concept really ought to know this report backwards and forwards by this stage, and go into some detail as to how their proposal deals with the issues raised, if it's to be taken seriously.

‘I Go Chop Your Dollar’, the video

Published October 21, 2005

Wow! videos.antville.org (via robotwisdom) came through with the goods. Go check out the video for Nkem Owoh (aka Osuofia) singing "I Go Chop Your Dollar", which turns out to be pretty catchy!

Here's the lyrics so all us oyinbos can sing along:

I don suffer no be small
Upon say I get sense
Poverty no good at all, no
Na im make I join this business
419 no be thief, its just a game
Everybody dey play am
if anybody fall mugu, ha! my brother I go chop am

Chorus
National Airport na me get am
National Stadium na me build am
President na my sister brother
You be the mugu, I be the master
Oyinbo I go chop your dollar, I go take your money dissapear
419 is just a game, you are the loser I am the winner
The refinery na me get am,
The contract, na you I go give am
But you go pay me small money make I bring am
you be the mugu, I be the master... na me be the master ooo!!!!

When Oyinbo play wayo, them go say na new style
When country man do im own, them go de shout bring am, kill am, die!
Oyinbo people greedy, I say them greedy
I don see them tire thats why when them fall enter my trap o!
I dey show them fire

Lyrics from here; there's a few other funny comments there too:

just saw the "i go chop your dollar"......i am glad we are blessed with a natural comedian as good as Nkem Owoh.....thank God say oyibo (sic) no sabi pidgin if not dis song for give them small panic........

Heh, looks like the 'small panic' is now underway ;)

‘I Will Eat Your Dollars’

Published October 21, 2005

An excellent, eye-opening interview with Samuel, an ex-419 scammer.

There's even a theme tune:

Their anthem, "I Go Chop Your Dollars," hugely popular in Lagos, hit the airwaves a few months ago as a CD penned by an artist called Osofia:

"419 is just a game, you are the losers, we are the winners.
White people are greedy, I can say they are greedy
White men, I will eat your dollars, will take your money and disappear.
419 is just a game, we are the masters, you are the losers."

Reportedly, Lagos inhabitants paint "This House Is Not For Sale" in big letters on their homes, in case someone posing as the owner tries to put it on the market.

Regarding the workings of the scam:

[Samuel] sent 500 e-mails a day and usually received about seven replies. Shepherd would then take over. "When you get a reply [to a 419 spam], it's 70% sure that you'll get the money," Samuel said.

(via Nelson.)

Daniel Cuthbert’s Travesty of Justice

Published October 12, 2005

The Samizdata weblog posts more details about the Daniel Cuthbert case, where a UK techie was arrested for allegedly attempting to hack a tsunami-donation site. Here's what happened:

Daniel Cuthbert saw the devastating images of the Tsunami disaster and decided to donate UKP30 via the website that was hastily set up to be able to process payments. He is a computer security consultant, regarded in his field as an expert and respected by colleagues and employers alike. He entered his full personal details (home address, number, name and full card details). He did not receive confirmation of payment or a reference and became concerned as he has had issues with fraud on his card on a previous occasion. He then did a couple of very basic penetration tests. If they resulted in the site being insecure as he suspected, he would have contacted the authorities, as he had nothing to gain from doing this for fun and keeping the fact to himself that he suspected the site to be a phishing site and all this money pledged was going to some South American somewhere in South America.

The first test he used was the (dot dot slash, 3 times) http://taint.org/ sequence. The ../ command is called a Directory Traversal which allows you to move up the hierarchy of a file. The triple sequence amounts to a DTA (Directory Traversal Attack), allows you to move three times. It is not a complete attack as that would require a further command, it was merely a light 'knock on the door'. The other test, which constituted an apostrophe (`) was also used. He was then satisfied that the site was safe as his received no error messages in response to his query, then went about his work duties. There were no warnings or dialogue boxes showing that he had accessed an unauthorised area.

20 days later he was arrested at his place of work and had his house searched.

(His actions were detected by the IDS software used by British Telecom.)

In my opinion, this is a travesty of justice.

His actions were entirely understandable, under the circumstances, IMO. They were not hostile activities in themselves -- they might have been the prelude to hostility, in other cases, but, as his later activity proved, not in this one.

Instead of making parallels with "rattling the doorknob" or "lurking around the back door of a bank", a better parallel would be looking through the bank's front window, from the street!

If only law enforcement took this degree of interest in genuine phishing cases, where innocent parties find their bank accounts emptied by real criminals, like the unprosected phisher in Quebec discussed in this USA Today article!

Appalling.

Harpers: The Uses of Disaster

Published October 12, 2005

In this month's Harpers -- The Uses of Disaster contains a passages that rings bells, post-Katrina:

You can see the grounds for that anxiety in the aftermath of the 1985 Mexico City earthquake, which was the beginning of the end for the one-party rule of the PRI over Mexico. The earthquake, measuring 8.0 on the Richter scale, hit Mexico City early on the morning of September 19 and devastated the central city, the symbolic heart of the nation. An aftershock nearly as large hit the next evening. About ten thousand people died, and as many as a quarter of a million became homeless.

The initial response made it clear that the government cared a lot more about the material city of buildings and wealth than the social city of human beings. In one notorious case, local sweatshop owners paid the police to salvage equipment from their destroyed factories. No effort was made to search for survivors or retrieve the corpses of the night-shift seamstresses. It was as though the earthquake had ripped away a veil concealing the corruption and callousness of the government. International rescue teams were rebuffed, aid money was spent on other programs, supplies were stolen by the police and army, and, in the end, a huge population of the displaced poor was obliged to go on living in tents for many years.

However, there's a happy ending there:

That was how the government of Mexico reacted. The people of Mexico, however, had a different reaction. 'Not even the power of the state,' wrote political commentator Carlos MonsivÃ¡s, 'managed to wipe out the cultural, political, and psychic consequences of the four or five days in which the brigades and aid workers, in the midst of rubble and desolation, felt themselves in charge of their own behavior and responsible for the other city that rose into view.' As in San Francisco in 1906, in the ruins of the city of architecture and property, another city came into being made of nothing more than the people and their senses of solidarity and possibility. Citizens began to demand justice, accountability, and respect. They fought to keep the sites of their rent-controlled homes from being redeveloped as more lucrative projects. They organized neighborhood groups. And eventually they elected a left-wing mayor -- a key step in breaking the PRI's monopoly on power in Mexico.

Photo Update

Published October 12, 2005

Photoblog! We recently ticked off another of California's national parks with a trip to Joshua Tree, and saw this:

Scary desert people. Also, I got to be in a fractal:

Beardy progress continues, as you can see!

In other pics, Catherine cooked me an amazing birthday cake:

Also: I ate the most sacrilicious food ever -- mochi that tastes like green-tea-filled Eucharist wafer!

Ah, the blessed sacrament of the (green tea) body and (red bean) blood. The textural resemblance really was phenomenal; I guess it never came up in product taste tests. Quite funny. Very tasty too, by the way.

Bruce Sterling on J. G. Ballard

Published September 28, 2005

Ballardian.com just posted an interview with Bruce Sterling about J.G. Ballard by Chris Nakashima-Brown. One of my favourite authors talks about the other -- it's amazing!

A couple of highlights:

... The assumptions behind The Crystal World were so radically different and ontologically disturbing compared to common pulp-derived SF. If you just look at the mechanisms of the suspension of disbelief in The Crystal World, it's like, okay, time is vibrating on itself and this has caused the growth of a leprous crystal ... whatever. There's never any kind of fooforah about how the scientist in his lab is going to understand this phenomenon, and reverse it, and save humanity. It's not even a question of anybody needing to understand what's going on in any kind of instrumental way. On the contrary, the whole structure of the thing is just this kind of ecstatic surreal acceptance. All Ballard disaster novels are vehicles of psychic fulfilment.

....

My suspicion is that in another four to five years you're going to find people writing about climate change in the same way they wrote about the nuclear threat in the 50s. It's just going to be in every story every time. People are going to come up with a set of climate-change tropes, like three-eyed mutants and giant two-headed whatevers, because this is the threat of our epoch and it just becomes blatantly obvious to everybody. Everybody's going to pile on to the bandwagon and probably reduce the whole concept to kindling. That may be the actual solution to a genuine threat of Armageddon -- to talk about it so much that it becomes banal.

To me these late-Ballard pieces, these Shepperton pieces -- Cocaine Nights, Super-Cannes and so forth -- really seem like gentle chiding from somebody who's recognized that his civilisation really has gone mad. They're a series of repetitions that say, 'Look, we're heading for a world where consensus reality really is just plain unsustainable, and the ideas that the majority of our people hold in their heart of hearts are just not connected to reality'. I think that may be a very prophetic assessment on his part. I think we may in fact be in such a world right now -- where people have really just lost touch with the 'reality-based community' and are basically just living in self-generated fantasy echo chambers that have no more to do with the nature of geopolitical reality than Athanasius Kircher or Castaneda's Don Juan.

Kitty vs. International RFID Standardisation

Published September 27, 2005

So, I've just bought myself an RFID implant reader.

However, don't jump to conclusions -- it's not that I'm hoping that possession will put me on the right side of the New World Order 21st-century pervasive-RFID-tracking security infrastructure or anything -- it's for my cat. Here's why...

Many years ago, back in Ireland, we had an RFID chip implanted in our cat, as you do. Then 3 years ago, we entered the US, bringing the cat with us, and started looking into what we'd have to do to bring him back again.

Ireland and the UK are rabies-free, and have massive paranoia about pets that may harbour it; as a result, pets imported into those countries generally have to stay in a quarantine facility for 6 months. Obviously 6 months sans kitty is something that we want to avoid, and thankfully a recent innovation, the Pet Travel Scheme allows this. It allows pets to be imported into the UK from the USA, once they pass a few bureaucratic conditions, and from there they can travel easily to Ireland legally. (BTW Matt, this still applies; we checked!)

One key condition is that the pet be first microchipped with an RFID chip, then tested for rabies, with those results annotated with the chip ID number. Once the animal arrives in the UK on the way back, the customs officials there verify his RFID implant chip's ID number against the number on the test result documentation, and (assuming they match and all is in order) he skips the 6 month sentence.

So far, it seems pretty simple; the cat's already chipped, we just have to go to the vet, get him titred, and all should proceed simply enough from there. Right? Wrong.

We spent a while going to various vets and animal shelters; unfortunately, almost everyone who works in a vet's office in California seem to be incompetent grandmothers who just work there because they like giving doggies a bath, couldn't care less about funny foreign European microchips, and will pretty much say anything to shut you up. Tiring stuff, and unproductive; eventually, after many fruitless attempts to read the chip, I gave up on that angle and just researched online.

Despite what all the grannies claimed, as this page describes, the US doesn't actually use the ISO 11784/11785 standard for pet RFID chips. Instead it uses two alternative standards, one called FECAVA, and another FECAVA-based standard called AVID-Encrypted. They are, of course, entirely incompatible with ISO 11784/11785, although, to spread confusion, the FECAVA standard appears to be colloquially referred to in parts of the US vet industry, as "European" or even "ISO standard". I think it was originally developed in Europe, and may have been partially ISO-11784-compliant to a degree, but the readers have proven entirely incompatible with the chip we had, which is referred to as "ISO" in the UK and Ireland at least. They don't even use the same frequencies; FECAVA/AVID are on 125 KHz, while ISO FDX-B is on 134.2 KHz.

(BTW, a useful point for others: you can also tell the difference at the data level; FECAVA/AVID use 10-digit ID numbers, while ISO numbers are 15-digit. Also, "FDX-B" seems to accurately describe the current Euro-compatible ISO-standard chip system.)

Now, a few years back, it appears that one company attempted to introduce ISO-FDX-B-format readers and chips to the FECAVA-dominated marketplace, in the form of the Banfield 'Crystal Tag' chip and reader system.

That attempt foundered last year, thanks to what looks a lot like some MS-style dirty tricks -- patent infringement lawsuits and some 'your-doggy-is-in-danger' FUD:

what we have here is a different, foreign chip that's being brought in and it's caused a lot of confusion with pet owners, with shelters, and veterinarians.

(Note 'foreign' -- a little petty nationalism goes a long way.) The results can be seen in this press story on the product's withdrawal:

Although ISO FDX-B microchips are being used in some European countries and parts of Australia, acceptance of ISO FDX-B microchips is not universal and the standard on which they are based continues to generate controversy, in part due to concerns about ID code duplication.

FUD-bomb successful!

Anyway, this left us in a bad situation; our cat's chip was unreadable in the US, and possibly even illegal given the patent litigation ;) . We had two choices: either we got the cat re-chipped with a US chip, paying for that, or we could find our own ISO-compatible reader.

We sprung for the latter; although the re-chipping and re-registration would probably cost less than the $220 the reader would cost, we'd need to buy a US reader in addition, since the readers at London Heathrow airport are ISO readers, not FECAVA/AVID-compatible. On top of that, this way gives me a little more peace of mind about compatibility issues when we eventually get the cat to Heathrow; we now know that the cat's chip will definitely be readable there, instead of taking a risk on the obviously-quite-confusing nest of snakes that is international RFID standardisation.

Anyway, having decided to buy a reader, that wasn't the last hurdle. Apparently due to the patent infringement lawsuit noted above, no ISO/FDX-B-compatible readers were on sale in the US! A little research found an online vendor overseas, and with a few phone calls, we bought a reader of our very own.

This arrived this morning; with a little struggling from the implantee, we tried it out, and verified that his ID number was readable. Success!

PRNGs and Groove Theory

Published September 23, 2005

Urban Dead is a new browser-based MMORPG that's become popular recently. I'm not planning to talk about the game itself, at least not until I've played it a bit!, but there's something worth noting here -- a cheat called Groove Theory:

Groove Theory was a cheat for Urban Dead that claimed to exploit an apparent lack [sic] of a random number generator in the game, [so] that performing an action exactly eight seconds after a successful action would also be successful.

Kevan, the Urban Dead developer, confirmed that Groove Theory did indeed work, and made this comment after fixing the bug:

There is some pattern to the random numbers, playing around with them; "srand(time())" actually brings back some pretty terrible patterns, and an eight-second wait will catch some of these.

So -- here's my guess as to how this happened.

It appears that Urban Dead is implemented as a CGI script. I'll speculate that somewhere near the top of the script, there's a line of code along the lines of srand(time()), as Kevan mentioned. With a sufficiently fast network connection, and a sufficiently unloaded server, you can be reasonably sure that hitting "Refresh" will cause that srand call to be executed on the server within a fraction of a second of your button-press. In other words, through careful timing, the remote user can force the pseudo-random-number generator used to implement rand() into a desired set of states!

As this perl script demonstrates, the output from perl's rand() is perfectly periodic in its low bits on a modern Linux machine, if constantly reseeded using srand() -- the demo script's output decrements from 3 to 0 by 1 every 2 seconds, then repeats the cycle, indefinitely.

I don't know if Urban Dead is a perl script, PHP, or whatever; but whatever language it's written in, I'd guess that language uses the same PRNG implementation as perl is using on my Linux box.

As it turns out, this PRNG failing is pretty well-documented in the manual page for rand(3):

on older rand() implementations, and on current implementations on different systems, the lower-order bits are much less random than the higher-order bits. Do not use this function in applications intended to be portable when good randomness is needed.

That manual page also quotes Numerical Recipes in C: The Art of Scientific Computing (William H. Press, Brian P. Flannery, Saul A. Teukolsky, William T. Vetterling; New York: Cambridge University Press, 1992 (2nd ed., p. 277)) as noting:

"If you want to generate a random integer between 1 and 10, you should always do it by using high-order bits, as in

j=1+(int) (10.0*rand()/(RAND_MAX+1.0));

and never by anything resembling

j=1+(rand() % 10);

(which uses lower-order bits)."

I think Groove Theory demonstrates this nicely!

Update: I need to be clearer here.

Most of the Groove Theory issue is caused by the repeated use of srand(). If the script could be seeded once, instead of at every request, or if the seed data came from a secure, non-predictable source like /dev/random, things would be a lot safer.

However, the behaviour of rand() is still an issue though, due to how it's implemented. The classic UNIX rand() uses the srand() seed directly, to entirely replace the linear congruential PRNG's state; on top of that, the arithmentic used means that the low-order bits have an extremely obvious, repeating, simple pattern, mapping directly to that seed's value. This is what gives Groove Theory its practicability by a human, without computer aid; with a more complex algorithm, it'd still be guessable with the aid of a computer, but with the simple PRNG, it's guessable, unaided.

Update 2: as noted in a comment, Linux glibc's rand(3) is apparently quite good at producing decent numbers. However, perl's rand() function doesn't use that; it uses drand48(), which in glibc is still a linear congruential PRNG and displays the 'low randomness in low-order bits' behaviour.

Buying Music From iTMS in Linux

Published September 20, 2005

On saturday, I spent a little time trying to work out how to give Steve Jobs my money; more accurately, I wanted to get some way to buy music from the iTunes Music Store from my Linux desktop, and this isn't as easy as it really should be, because the official iTMS is a mess of proprietary Mac- and Windows-only DRM-laden badness.

Here's a quick walkthrough of how this went:

install iTunes in my VMWare Windows install
sign up for iTMS, and give Apple all my personal info, including super-s3kr1t card verification codes, eek
buy a song
find the DRM'd file in the filesystem; it's an .m4p file, and xine doesn't seem to like it
do some googling for 'iTunes DRM remove linux'; that leads to Jon Lech Johansen's JusteTune
download and run JusteTune installer
get obscure hexadecimal error code dialog. hmm! what could that mean?
download and run .NET runtime, link on JusteTune page
rerun JusteTune -- it works this time
select Account -> Authorize, enter login info
drag and drop file -- it's decrypted!

So, that yields a decrypted AAC file, which I can play on Linux using xine. That's the hard part done!

However, I want to play my purchases in JuK, the very nice iTunes-style music player app for KDE.

While the gstreamer audio framework supports playback of AAC files with the gstreamer0.8-faad package ('sudo apt-get install gstreamer0.8-faad'), JuK itself can't find the file or read its metadata, so it doesn't show up in the music collection as playable. I don't want to go hacking code from CVS into my desktop's music player -- possibly the most essential app on the desktop -- so transcoding them to MP3 seems to be the best option.

Somebody's already been here before, though -- that's one of the benefits of being a late adopter! Here's a script to convert .m4a files to .mp3 using the 'faad' tool ('sudo apt-get install faad').

During this work, I came across Jon Lech Johansen's latest masterwork -- SharpMusique, a fully operational native Linux interface to the iTMS. Building on Ubuntu Hoary was a simple matter of tar xvfz, configure, make, sudo make install, and it works great -- and automatically de-DRMs the files on the fly as it downloads them! Now that's the way to enjoy the iTMS on Linux, at least until Apple's engineers break it again.

Update, May 2006: Apple's engineers broke it. Thanks Wilfredo ;)

End result: a brand new, complete, high-quality copy of Dengue Fever's new album, Escape From Dragon House. Previously I'd only had a couple of tracks off this, so I'm now a happy camper, music-wise.

BTW, I was also considering trying out the new Yahoo! Music Store, but it too uses fascist DRM tricks and is platform-limited, and I'm not sure how breakable it is. On top of that, the prospect of not being able to try it out before handing over credit-card details put me off. As far as I can see, I can't even look up the albums offered before subscribing. All combined, I'll stick with iTMS for now.

Don’t Dumb Me Down

Published September 20, 2005

A great Guardian 'Bad Science' column by Ben Goldacre, Don't dumb me down. An excellent article on how mainstream journalists fail miserably in their attempts to report science stories accurately, and how this fundamentally misrepresents science to society at large.

Being a geek (of the computing persuasion) who hangs out with other geeks (of various science persuasions), I would up discussing this problem myself a month or two ago. This paragraph sums up where I think the failure lies:

There is one university PR department in London that I know fairly well - it's a small middle-class world after all - and I know that until recently, they had never employed a single science graduate. This is not uncommon. Science is done by scientists, who write it up. Then a press release is written by a non-scientist, who runs it by their non-scientist boss, who then sends it to journalists without a science education who try to convey difficult new ideas to an audience of either lay people, or more likely - since they'll be the ones interested in reading the stuff - people who know their way around a t-test a lot better than any of these intermediaries. Finally, it's edited by a whole team of people who don't understand it. You can be sure that at least one person in any given "science communication" chain is just juggling words about on a page, without having the first clue what they mean, pretending they've got a proper job, their pens all lined up neatly on the desk.

I'd throw in the extra step of a paper in Nature. Apart from that, in my opinion, he's spot on.

Other disciplines don't have this problem:

Because papers think you won't understand the "science bit", all stories involving science must be dumbed down, leaving pieces without enough content to stimulate the only people who are actually going to read them - that is, the people who know a bit about science. Compare this with the book review section, in any newspaper. The more obscure references to Russian novelists and French philosophers you can bang in, the better writer everyone thinks you are. Nobody dumbs down the finance pages. Imagine the fuss if I tried to stick the word "biophoton" on a science page without explaining what it meant. I can tell you, it would never get past the subs or the section editor. But use it on a complementary medicine page, incorrectly, and it sails through.

Statistics are what causes the most fear for reporters, and so they are usually just edited out, with interesting consequences. Because science isn't about something being true or not true: that's a humanities graduate parody. It's about the error bar, statistical significance, it's about how reliable and valid the experiment was, it's about coming to a verdict, about a hypothesis, on the back of lots of bits of evidence.

Fingerprinting and False Positives

Published September 20, 2005

New Scientist News - How far should fingerprints be trusted? (via jwz):

Evidence from qualified fingerprint examiners suggests a higher error rate. These are the results of proficiency tests cited by Cole in the Journal of Criminal Law & Criminology (vol 93, p 985). From these he estimates that false matches occurred at a rate of 0.8 per cent on average, and in one year were as high as 4.4 per cent. Even if the lower figure is correct, this would equate to 1900 mistaken fingerprint matches in the US in 2002 alone.

This is why I'm so unhappy about getting fingerprinted as part of US immigration's US-VISIT program and similar. My fingerprints have been collected on several occasions as part of that program, and as a result will now be shared throughout the US government, and internationally, and will be retained for 75 to 100 years, whether I like it or not.

As a result, with sufficient bad luck, I may become one of those false positives. Fingers crossed all those government and international partner agencies are competent enough to avoid that!

Update: oh wow, this snippet from the New Scientist editorial clearly demonstrates one case where it all went horribly wrong:

Last year, an Oregon lawyer named Brandon Mayfield was held in connection with the Madrid bombings after his fingerprint was supposedly found on a bag in the Spanish capital. Only after several weeks did the Spanish police attribute the print to Ouhnane Daoud, an Algerian living in Spain.

eek! Coverage from the National Assoc of Criminal Defense Lawyers, and the Washington Post.

ToorCon

Published September 15, 2005

ToorCon this year looks good. I'm not going, but I wish I'd gotten it together. There's a couple of spam/phishing-related talks, and a data-visualisation talk by Christopher Abad; hopefully he might diverge into some of this phishing data he talks about in this First Monday paper. Dan Kaminsky's talk looks interesting, too --

Application-layer attacks against MD5

We will show how web pages and other executable environments can be manipulated to emit arbitrarily different content with identical MD5 hashes.

Sounds like fun!

TiVo Co-Opts Anti-Spam Terminology

Published September 15, 2005

This is pathetic. As noted in the link-blog a couple of days ago (as well as everywhere else), TiVo's new DRM features have been spotted 'in the wild', protecting the valuable Intellectual Property that is Family Guy and Simpsons reruns.

The icing on the cake is that TiVo have come up with a hilarious hand-wavy explanation -- apparently it was line noise. Marc Hedlund of O'Reilly and Cory Doctorow are having none of it, and rightly so; as a bonus, Cory asked a group of DRM experts, who 'burst into positive howls of disbelief' that line noise could corrupt the DRM bits and the corresponding checksums to match.

From my angle, though, there's another noteworthy factor:

"During the test process, we came across people who had false positives because of noisy analog signals. We actually delayed development (of the new TiVo software) to address those false positives." (-- Jim Denney, director of product marketing for TiVo)

Interesting use of the term 'false positive' there. Sounds more like a good old-fashioned bug if you ask me ;)

Anyway, I'm glad I went for the home-built option. It was pretty obvious that TiVo are in the cross-hairs, and their product is only going to get worse as the DRM industry push harder...

SpamAssassin 3.1.0

Published September 15, 2005

'ANNOUNCE: SpamAssassin 3.1.0 available!' - MARC

Phew. That took a while, but it's worth it ;)

Bart Simpson vs. Missing-Person Data Entry

Published September 13, 2005

Ka-Ping Yee notes that Google have launched Katrina People Search, using Ping's speedily-created People Finder Interchange Format. This is good, especially since there's still no sign of the FEMA/Microsoft effort.

In passing -- it's disappointing to note how many appearances are made by a Mr. Heywood J. Ablohmie...

DnsblAccuracy082005 – Spamassassin Wiki

Published September 11, 2005

Do you use anti-spam DNS blocklists? If so, you should probably go take a look at DnsblAccuracy082005 on the SpamAssassin wiki; I've collated the results from our recent mass-check rescoring runs for 3.1.0, to produce have up-to-date measurements of the accuracy and hit-rates for most of the big DNS blocklists.

A few highlights:

highest hit-rate of an IP blocklist: the Distributed Sender Blackhole List, with 39.84% of spam hit
most accurate IP blocklist: the Spamhaus Exploits Block List, with a miniscule 0.04% false positive rate
highest hit-rate network test, overall: the OB SURBL list, 51.93% of spam hit

We don't have accurate figures for the new URIBL.COM lists, btw -- only the rulesets that are distributed with SpamAssassin were measured.

Bogus Challenge-Response Bounces: I’ve Had Enough

Published September 11, 2005

I get quite a lot of spam. For one random day last month (Aug 21st), I got 48 low-scoring spam mails (between 5 and 10 points according to SpamAssassin), and 955 high-scorers (anything over 10). I don't know how much malware I get, since my virus filter blocks them outright, instead of delivering to a folder.

That's all well and good, because spam and viruses are now relatively easy to filter -- and if I recall correctly, they were all correctly filed, no FPs or FNs (well, I'm not sure about the malware, but fingers crossed ;).

The hard part is now 'bogus bounces' -- the bounces from 'good' mail systems, responding to the forged use of my addresses as the sender of malware/spam mails. There were 306 of those, that day.

Bogus bounces are hard to filter as spam, because they're not spam -- they're 'bad' traffic originating from 'good', but misguided, email systems. They're not malware, either. They're a whole new category of abusive mail traffic.

I say 'misguided', because a well-designed mail system shouldn't produce these. By only performing bounce rejection with a 4xx or 5xx response as part of the SMTP transaction, when the TCP/IP connection is open between the originator and the receiving MX MTA, you avoid most of the danger of 'spamming' a forged sender address. However, many mail systems were designed before spammers and malware writers started forging on a massive scale, and therefore haven't fixed this yet.

I've been filtering these for a while using this SpamAssassin ruleset; it works reasonably well at filtering bounces in general, catching almost all of the bounces. (There is a downside, though, which is that it catches more than just bogus bounces -- it also catches real bounces, those in response to mails I sent. At this stage, though, I consider that to be functionality I'm willing to lose.)

The big remaining problem is challenge-response messages.

C-R is initially attractive. If you install it, your spam load will dwindle to zero (or virtually zero) immediately -- it'll appear to be working great. What you won't see, however, is what's happening behind the scenes:

your legitimate correspondents are getting challenges, will become annoyed (or confused), and may be unwilling or unable to get themselves whitelisted;
spam that fakes other, innocent third party addresses as the sender, will be causing C-R challenges to be sent to innocent, uninvolved parties.

The latter is the killer. In effect, you're creating spam, as part of your attempts to reduce your own spam load. C-R shifts the cost of spam-filtering from the recipient and their systems, to pretty much everyone else, and generates spam in the process. I'm not alone in this opinion.

That's all just background -- just establishing that we already know that C-R is abusive. But now, it's time for the next step for me -- I've had enough.

I initially didn't mind the bogus-bounce C-R challenges too much, but the levels have increased. Each day, I'm now getting a good 10 or so C-R challenges in response to mails I didn't send. Worse, these are the ones that get past the SpamAssassin ruleset I've written to block them, since they don't include an easy-to-filter signature signifying that they're C-R messages, such as Earthlink's 'spamblocker-challenge' SMTP sender address or UOL's 'AntiSpam UOL' From address. There seems to be hundreds of half-assed homegrown C-R filters out there!

So now, when I get challenge-response messages in response to spam which forges one of my addresses as the 'From' address, and it doesn't get blocked by the ruleset, I'm going to jump through their hoops so the spam is delivered to the C-R-protected recipient. Consider it a form of protest; creating spam, in order to keep youself spam-free, is simply not acceptable, and I've had enough.

And if you're using one of these C-R filters -- get a real spam filter. Sure they cost a bit of CPU time -- but they work, without pestering innocent third parties in the process.

Beardy Justin

Published September 10, 2005

Yes, I've been growing a beard. Strangely, it seems to be going quite well! Here's a good pic of beardy Justin, standing on a bridge over the Merced river in Yosemite:

Lots more pics from the holiday should be appearing here shortly, if you're curious.

Mosquitos, Snakes and a Bear

Published September 7, 2005

Well, I'm back... it appears that Google Maps link I posted wasn't too much use in deciphering where I was going; sorry about that. Myself and C spent a fun week and a bit, driving up to Kings Canyon and Yosemite, backpacking around for a few days, then driving back down via the 395 via Bishop, Mammoth Lakes, Lone Pine and so on.

Kings Canyon: Unfortunately, not so much fun; we had the bad luck of encountering what must be the tail end of the mosquito season, and spent most of our 2 days there running up and down the Woods Creek trail without a break, entirely surrounded by clouds of mozzies. Possibly this headlong dashing explains how we ran into so much other wildlife -- including a (harmless) California Mountain King Snake and, less enjoyably -- and despite wearing bear bells on our packs to avoid this -- a black bear...

We rounded a corner on the trail, and there it was, munching on elderberries. Once we all spotted each other, there were some audible sounds of surprise from both bear and humans, and the bear ran off in the opposite direction; the humans, however, did not. We were about 500 feet from our camp for the night, so we needed to get past where the bear had been, or face a long walk back.

Despite some fear (hey, this was our first bear encounter!), we stuck around, shouted, waved things, and took the various actions you take. It all went smoothly, the bear had probably long since departed, but we took it slow regardless, and had a very jittery night in our tent afterwards. After that, and the unceasing mozzie onslaught, we were in little hurry to carry on around the planned loop, so we cut short our Kings Canyon trip by a day and just returned down the trail to its base.

Yosemite: a much more successful trip. There were many reasons, primarily that the mosquito population was much, much lower, and discovering that the Tuolumne Meadows Lodge -- comfortable tent cabins, excellent food, and fantastic company -- provided a truly excellent base camp.

But I'd have to say that the incredible beauty of Tuolumne Meadows and the Vogelsang Pass really blew me away. I don't think I've seen any landscape quite like that, since trekking to Annapurna Base Camp in Nepal. I'm with John Muir -- Yosemite and its surrounds are a wonder of the world.

Lee Vining: had to pick up a sarnie at the world-famous Whoa Nellie Deli. Yum! After all the camping, we stayed in a hotel with TV, got some washing done, and watched scenes from a J.G. Ballard novel play out on NBC and CNN. Mind-boggling.

Mammoth Lakes: A quick kvetch. Mammoth is possibly the most pedestrian-hostile town I've ever visited. They have a hilarious section of 100 feet of sidewalk, where I encountered a fellow pedestrian using those ski-pole-style hiking walking sticks, and entirely in seriousness. Was the concept of walking so foreign in that town that long-distance walking accessories were required? I don't know, but it didn't make up for the other 90% of the streets where peds were shoved off onto the shoulder, in full-on 'sidewalk users aren't welcome here' Orange County style.

On top of that, the single pedestrian crossing in the main street spans five lanes of traffic, with no lighting, warning signs, or indeed any effective way for drivers to know whether peds were crossing or not. Unsurprisingly we nearly got run over when we tried using the damn thing. Best avoided.

I'm amazed -- it's like they designed the town to be ped-hostile. Surely allowing peds to get around your town is a bonus when you're a ski resort for half of the year? Meh.

Anyway, back again, a little refreshed. Once more into the fray...

Off on Holidays

Published August 27, 2005

I'm taking a week off to go hiking in some of the amazing back country that California has to offer. Assuming I don't get eaten by a bear, I'll see you all around Sep 6....

Faster string search alternative to Boyer-Moore: BloomAV

Published August 26, 2005

An interesting technique, from the ClamAV development list -- using Bloom filters to speed up string searching. This kind of thing works well when you've got 1 input stream, and a multitude of simple patterns that you want to match against the stream. Bloom filters are a hashing-based technique to perform extremely fast and memory-efficient, but false-positive-prone, binary lookups.

The mailing list posting ('Faster string search alternative to Boyer-Moore') gives some benchmarks from the developers' testing, along with the core (GPL-licensed) code:

Regular signatures (28,326) :

Extended Boyer-Moore: 11 MB/s

BloomAV 1-byte: 89 MB/s

BloomAV 4-bytes: 122 MB/s

Some implementation details:

the (implementation) we chose is a simple bit array of (256 K * 8) bits. The filter is at first initialized to all zeros. Then, for every virus signature we load, we take the first 7 bytes, and hash it with four very fast hash functions. The corresponding four bits in the bloom filter are set to 1s.

Our intuition is that if the filter is small enough to fit in the CPU cache, we should be able to avoid memory accesses that cost around 200 CPU cycles each.

Also, in followup discussion, the following paper was mentioned: A paper describing hardware-level Bloom filters in the Snort IDS -- S. Dharmapurikar, P. Krishnamurthy, T. Sproull, and J. W. Lockwood, "Deep packet inspection using parallel Bloom filters," in Hot Interconnects, (Stanford, CA), pp. 44--51, Aug. 2003.

This system is dubbed 'BloomAV'. Pretty cool. It's unclear if the ClamAV developers were keen to incorporate it, though, but it does point at interesting new techniques for spam signatures.

Tech Camp Ireland

Published August 25, 2005

Irish techies, mark your calendars! Various Irish bloggers are proposing a Tech Camp Ireland geek get-together, similar to Bar Camp in approach, for Saturday October 15th.

Ed Byrne and James Corbett are both blogging up a storm already. I'd go, but it'd be a hell of a trip ;)

I would say it needs a little less blog, a little more code, and a little more open source, but it does look very exciting, and it's great to see the Bar Camp spirit hitting Ireland.

More on ‘Bluetooth As a Laptop Sensor’

Published August 23, 2005

Bluetooth As a Laptop Sensor in Cambridge, England.

I link-blogged this yesterday, where it got picked up by Waxy, and thence to Boing Boing -- where some readers are reportedly considering it doubtful. Craig also expressed some skepticism. However, I think it's for real.

Check out the comments section of Schneier's post -- there's a few notable points:

Some Bluetooth-equipped laptops will indeed wake from suspend to respond to BT signals.
Davi Ottenheimer reports that the current Bluetooth spec offers "always-on discoverability" as a feature. (Obviously the protocol designers let usability triumph over security on that count.)
Many cellphones are equipped with Bluetooth, and can therefore be used to detect other 'discoverable' BT devices in range.
Walking around a UK hotel car park, while pressing buttons on a mobile phone, would be likely to appear innocuous -- I know I've done it myself on several occasions. ;)

Finally -- this isn't the first time the problem has been noted. The same problem was reported at Disney World, in the US:

Here's the interesting part: every break-in in the past month (in the Disney parking lots) had involved a laptop with internal bluetooth. Apparently if you just suspend the laptop the bluetooth device will still acknowledge certain requests, allowing the thief to target only cars containing these laptops.

Mind you, perhaps this is a 'chinese whispers' case of the Disney World thefts being amplified. Perhaps it was noted as happening in Disney World, reported in an 'emerging threats' forum where the Cambridgeshire cop heard it, and he then picked it up as something worth warning the public about, without knowing for sure that it was happening locally.

Update: aha. An observant commenter on Bruce Schneier's post has hit on a possibly good reason why laptops implement wake-on-Bluetooth:

On my PowerBook, the default Bluetooth settings were "Discoverable" and "Wake-on-Bluetooth" -- the latter so that a Bluetooth keyboard or mouse can wake the computer up after it has gone to sleep.

Emergent Chaos: I’m a Spamateur

Published August 23, 2005

Emergent Chaos: I'm a Spamateur:

In private email to Justin "SpamAssassin" Mason, I commented about blog spam and "how to fix it," then realized that my comments were really dumb. In realizing my stupidity, I termed the word "spamateur," which is henceforth defined as someone inexperienced enough to think that any simple solution has a hope of fixing the problem.

I think this is my new favourite spam neologism ;)

How convenient does the ‘right thing’ have to be?

Published August 17, 2005

Environment: Kung Fu Monkey: Hybrids and Hypotheses. A great discussion of the Toyota Prius:

Kevin Drum recently quoted a study which re-iterated that there's no "real" advantage to buying a hybrid. It's only just as convenient -- so if you're driving a hybrid, you're doing it for some other reason than financial incentive.

That made me think: what a perfect example of just how fucking useless as a society we've become. We can't even bring ourselves to do the right thing when it's only JUST as convenient as doing the wrong thing. And that's not even considered odd. Even sadder.

Running on WordPress!

Published August 15, 2005

I've decided to try out the real deal -- a 'proper' weblogging platform, namely WordPress. Be sure to comment if you spot problems...

Grumpiness and Cigarettes

Published August 13, 2005

Meta: My apologies if you wound up running into me online at some stage this week -- I've been in a lousy mood.

I gave up smoking cigarettes at the end of May, and switched to patches. That went pretty well, dropping from 21mg patches, to 14mg, to 7mg. But this week I finally hit the end of the line, stopped applying a patch every morning, and became fully nicotine-free. Only, ouch -- it's not quite as easy as I thought!

Cigarette addiction is (apparently) composed of two conceptual lumps -- the physical addiction to nicotine, and the mental addiction to the 'idea' of smoking. Through the patches, I've successfully nailed the mental addiction, but I'm now facing the physical withdrawal. I'm sweating, dizzy, can't focus my eyes, can't concentrate, my skin is going crazy, and I'm INCREDIBLY grouchy. It's amazing how much havoc the act of withholding nicotine can cause, especially when you consider that it's not a required nutrient for the human body -- it's an 'optional extra' that I never should have gone near in the first place.

Wierdly, though, I don't want a cigarette. Instead, I want a patch ;)

Xen and UKUUG 2005

Published August 11, 2005

Linux: PingWales' round-up of UKUUG Linux 2005 Day 3 includes this snippet:

As well as running (Virtual Machines), Xen allows them to be migrated on the fly. If a physical system is overloaded, or showing signs of failure, a virtual machine can be migrated to a spare node. This process takes time, but causes very little interruption to service. The machine state is first copied in its entirety, then the changes are copied repeatedly until there are a small enough number than the machine can be stopped, the remaining changes copied and the new version started. This usually provides a service interruption of under 100ms - a small enough jitter that people playing Quake 3 on a server in a virtual machine did not notice when it was moved to a different node.

Now that is cool.

Jim Winstead’s A9 on foot

Published August 11, 2005

Images: Jim Winstead's walk up Broadway from a few days ago has already garnered a few interested parties, since he's Creative-Commons-licensed all the photos, and they're easily findable via Google and on Flickr.

I find this interesting; the collision between open source, photography and cartography is cool. The result is a version of maps.A9.com, where you can actually use the images legally in your own work. More people should do this for other cities.

Where the ‘cursor’ came from

Published August 9, 2005

Stuff: So C is a massive antiques nut, and got tickets for the Antiques Roadshow next month in LA. As a result, we've been shopping around for interesting stuff for her to bring along.

Here's what I found at the antiques market last weekend:

Click on the pic to check out my multiplication skills!

The Life of a SpamAssassin Rule

Published August 6, 2005

Spam: during a recent discussion on the SpamAssassin dev list, the question came up as to how long a rule could expect to maintain its effectiveness once it was public -- the rule secrecy issue.

In order to make a point -- that certain types of very successful rules can indeed last a long time -- I picked out one rule, MIME_BOUND_DD_DIGITS. Here's a smartened-up copy of what I found out.

This rule matches a certain format of MIME boundary, one observed in 17.4637% of our spam collection and with 0 nonspam hits. Since we have a massive collection of mails, received between Jan 2004 to May 2005, and a rule with a known history, we can then graph its effectiveness over time.

The rule's history was:

bug 3396: the initial contribution from Bob Menschel, May 15 2004
r10692: arrived in SVN: May 16 2004
r20178: promoted to 'MIME_BOUND_DD_DIGITS': May 20 2004 (funnily enough, with a note speculating about its lifetime from felicity!)
released in the SpamAssassin 3.0.0 release: mid-Sep 2004

So, we would expect to see a drop in its effectiveness against spam in late May 2004 and onwards, if the spammers were reacting to SVN changes; or post September 2004, if they react to what's released.

By graphing the number of hits on mails within each 2-hour window, we can get a good idea of its effectiveness over time:

The red bars are total spam mails in each time period; green bars, the number of spam mails that hit the rule in each period. May 15 2004 and Sep 20 2004 are marked; Jan 2004 is at the left, and May 2005 is at the right-most extreme of the graph. (There's a massive spike in spam volume at the right -- I think this is Sober.Q output, which disappears after a week or so.)

It appears that the rule remains about even in effectiveness in the 4 months it's in SVN, but unreleased; it declines a little more after it makes it into a SpamAssassin release. However, it trails off very slowly -- even in May 2005, it's still hitting a good portion of spam.

Given this, I suspect that most spammers are not changing structural aspects of their spam in response to SpamAssassin with any particular alacrity, or at least are not capable of doing so.

To speculate on the latter, I think many spammers are using pirated copies of the spamware apps, so cannot get their hands on updated versions through 'legitimate' channels.

Speculating on the former -- in my opinion there's a very good chance that SpamAssassin just isn't a particular big target for them to evade, compared to the juicy pool of gullible targets behind AOL's filters, for example. ;)

‘Irish EFF’

Published August 6, 2005

Ireland: There's been some discussion about 'an Irish EFF' recently, reminding me of the old days of Electronic Frontier Ireland in the 1990s.

I was reminded of this by Danny O'Brien's article in The Guardian, where he notes an interesting point -- half of the effectiveness of the EFF in the US, is because they have a few full-time people sitting in an office, answering phone calls. Essentially they act as a human PBX, being the go-to guy connecting journalists to activists and experts.

Now that is something that could really work, and is needed in Ireland, which is in the same boat as the UK in this respect; the journalists don't know who to ask for a reliable opposing opinion when the BSA, ICT Ireland, or the IRMA put out incorrect statements. It has to be someone who's always available for a quote at the drop of a hat, over the phone. From experience, this takes dedication -- and without getting paid for it, it's hard to keep the motivation going.

IrelandOffline have done it pretty well for the telecoms issue; ICTE have done a brilliant job, the best I've seen in Europe IMO, of grabbing hold of the e-voting issue to the stage where they own it; but for online privacy, software patenting, and other high-tech-meets-society issues, there's nobody doing it that successfully.

(Update: added ICTE, slipped my mind! Sorry Colm!)

Happy Birthday to the RISKS Forum!

Published August 6, 2005

Tech: One of the first online periodicals I started reading regularly, when I first got access to USENET back in 1989 or so, was comp.risks -- Peter G. Neumann's RISKS Forum. Since then, I've been reading it religiously, in various formats over the years.

It appears that RISKS has just celebrated its 20th anniversary.

Every couple of weeks it provides a hefty dose of computing reality to counter the dreams of architecture astronauts and the more tech-worshipping members of our society, who fail to realise that just because something uses high technology, doesn't necessarily make it safer.

I got to meet PGN a couple of weeks ago at CEAS, and I was happy to be able to give my thanks -- RISKS has been very influential on my code and my outlook on computing and technology.

Nowadays, with remote code execution exploits for e-voting machines floating about, and National Cyber-Security Czars, I'd say RISKS is needed more than ever. Long may it continue!

Stupid ‘Ph’ Neologisms Considered Harmful

Published August 6, 2005

Words: 'Pharming'. I recently came across this line in a discussion document:

'Wait, isn't this exactly the kind of attack pharmers mount?'

I was under the impression that 'pharming' was a transgenics term: 'In pharming, ... genetically modified (transgenic) animals are
mostly used to make human proteins that have medicinal value. The protein encoded by the transgene is secreted into the animal's milk, eggs or blood, and then collected and purified. Livestock such as cattle, sheep, goats, chickens, rabbits and pigs have already been modified in this way to produce several useful proteins and drugs.'

Obviously this wasn't what was being referred to. So I got googling. It appears the sales and marketing community of various security/filtering/etc. companies, have been getting all het up about various phishing-related dangers.

The earliest article I could find was this -- GCN: Is a new ID theft scam in the wings? (2005-01-14):

''Pharming is a next-generation phishing attack,' said Scott Chasin, CTO of MX Logic. 'Pharming is a malicious Web redirect,' in which a person trying to reach a legitimate commercial site is sent to the phony site without his knowledge. 'We don't have any hard evidence that pharming is happening yet,' Chasin said. 'What we do know is that all the ingredients to make it happen are in place.'

Oooh scary! The article is short on technical detail (but long on scary), but I think he's talking about DNS cache poisoning, whereby an attacker implants incorrect data in the victim's DNS cache, to cause them to visit the wrong IP address when they resolve a name. This Wired article (2005-03-14) seems to confirm this.

But wait! Another meaning is offered by Green Armor Solutions, who use the term to talk about the Panix and Hushmail domain hijacks, where an attacker social-engineered domain transfers from their registrars. There's no date on the page, but it appears to be post-March 2005.

Finally, yet another meaning is offered in this article at CSO Online: How Can We Stop Phishing and Pharming Scams? (May 2005): 'The Computing Technology Industry Association has reported that pharming occurrences are up for the third straight year.' What?! Call Scott Chasin!

Steady on -- it appears that the 'pharming' CSO Online is talking about, has devolved to the stage where it's simply a pop-up window that attempts to emulate a legit site's input -- no DNS trickery involved. (This trick has, indeed, been used in phish for years.)

So right there we have three different meanings for 'pharming', or four if you count the biotech one.

It may be impossible to get the marketeers to stop referring to 'pharming'. But please, if you're a techie, don't use that term, it's lack of clarity renders it useless. Anyway, the biotech people were there first, by several years...

Stunning round-up of alleged election fraud in Ohio

Published August 5, 2005

Voting: None Dare Call It Stolen - Ohio, the Election, and America's Servile Press, by Mark Crispin Miller.

Miller and many others have obviously been spending a lot of work chasing down each incident in Ohio since last November, and there's quite a lot of them. It's impressive the degree to which recounts were evaded, if these allegations are true. There's many shocking cases alleged than I could really fit here -- but here's some of the lowest points:

On December 13, 2004, it was reported by Deputy Director of Hocking County Elections Sherole Eaton, that a Triad GSI employee had changed the computer that operated the tabulating machine, and had "advised election officials how to manipulate voting machinery to ensure that preliminary hand recount matched the machine count." This same Triad employee said he worked on machines in Lorain, Muskingum, Clark, Harrison, and Guernsey counties.
it strongly appears that Triad and its employees engaged in a course of behavior to provide "cheat sheets" to those counting the ballots. The cheat sheets told them how many votes they should find for each candidate, and how many over and under votes they should calculate to match the machine count. In that way, they could avoid doing a full county-wide hand recount mandated by state law.
In Union County, Triad replaced the hard drive on one tabulator. In Monroe County, "after the 3 percent hand count had twice failed to match the machine count, a Triad employee brought in a new machine and took away the old one. (That machine's count matched the hand count.)"

The willingness to throw away functioning, reliable election systems, and replacing them with new, easy-to-subvert ones, is astounding. But on top of that, when concerned parties investigate and find danger signs, it's easily buried:

Miller emphasizes that, even after the National Election Data Archive Project, on March 31, 2005, "released its study demonstrating that the exit polls had probably been right, it made news only in the Akron Beacon-Journal," while "the thesis that the exit polls were flawed had been reported by the Associated Press, the Washington Post, the Chicago Tribune, USA Today, the San Francisco Chronicle, the Columbus Dispatch, CNN.com, MSNBC, and ABC."

Miller's conclusion: 'the press has unilaterally disarmed'.

Lean’s got a weblog

Published August 3, 2005

Friends: the ex-Iona readers, and those with an interest in urban design, might like to go take a look at citynoise.blogspot.com -- Lean Doody's new urban design weblog.

SpikeSource, Open Source, and Bongo

Published July 26, 2005

Open Source: so I was just looking at OSCON 2005's website, and I noticed that it listed Kim Polese, of SpikeSource, as a presenter.

I don't really pay any attention to what's happening in Java these days, but it appears that SpikeSource launched last year to provide 'enterprise support services for open-source software' with a Java/enterprise slant.

Funnily enough, my last encounter with a Kim-Polese-headed company did indeed have a big effect on me, open-source-wise.

That company was Marimba, and they made an excellent Java GUI builder called Bongo. In those days (nearly ten years ago!), I was working on a product for Iona as a developer, in Java and C++, and we needed to provide a GUI on a number of Java tools. I chose to use Bongo, as it had a great feature set and looked reliable.

Wow, was I wrong! The software was reliable -- sadly, the same couldn't be said about the vendor. What I hadn't considered was the possibility that the company might decide to discontinue the product, and not offer any migration help to its customers -- and that's exactly what happened, Sometime around 1998, Marimba decided that Bongo wasn't quite as important as their Castanet 'push' product, and dropped it. Despite calls from the Bongo-using community to release the code so that the community could maintain it and avoid code-rot, they never did, and as a result apps using Bongo had to be laboriously rewritten to remove the Bongo dependencies.

I learned an important lesson about writing software -- if at all possible, build your products on open source, instead of relying on a fickle commercial software vendor. It's a lot harder to have the rug pulled out from under you, that way.

Update: Well, it seems it was quite far off the mark about Marimba. Someone who worked at Marimba at the time read the blog entry, and got in touch via email:

I was an employee of Marimba in the early days, and was around when we developed Bongo, and still later, when we discontinued it, and still later, when Bongo *was* released to the open-source community (jm: appears to be around the start of 1999 I think). It was hosted on a site called freebongo.org and continued to be enhanced with new features and a lot of new and cool widgets. It was ultimately discontinued a few years later due to lack of interest.
It was hosted and primarily maintained in the open-source community by one of the original Bongo engineers. Here's a link from the Java Gazette from the days when it was called Free Bongo.
So don't go blaming Marimba. We did listen to our users and release the code!

Fair enough -- and they deserve a lot more credit than I'd initially assumed. I guess I must have missed this later development after leaving Iona. Apologies, ex-Marimbans!

Patents and Laches

Published July 25, 2005

Patents: This has come up twice recently in discussions of software patenting, so it's worth posting a blog entry as a note.

There's a common misconception that a patenter does not necessarily need to enforce a patent in the courts, for it to remain valid. This isn't true in the US at least, where there is the legal doctrine of 'laches', defined as follows in the Law.com dictionary:

Laches - the legal doctrine that a legal right or claim will not be enforced or allowed if a long delay in asserting the right or claim has prejudiced the adverse party (hurt the opponent) as a sort of 'legal ambush'.

The Bohan Mathers law firm have a good paragraph explaining this:

...the patent holder has an obligation to protect and defend the rights granted under patent law. Just as permitting the public to freely cross one's property may lead to the permanent establishment of a public right of way and the diminishment of one's property rights, so the knowing failure to enforce one's patent rights (one legal term for this is laches) against infringement by others may result in the forfeiture of some or all of the rights granted in a particular patent.

See also this and this page for discussion of cases where it was relevant. It seems by no means clear-cut, but the doctrine is there.

CEAS

Published July 25, 2005

Spam: back from CEAS. The schedule with links to full papers is up, so anyone can go along and check 'em out, if you're curious.

Overall, it was pretty good -- not as good as last year's, but still pretty worthwhile. I didn't find any of the talks to be quite up to the standards of last year's TCP damping or Chung-Kwei papers; but the 'hallway track' was unbeatable ;)

Here's my notes:

AOL's introductory talk had some good figures; a Pew study reported that 41% of people check email first thing in morning, 40% have checked in the middle of the night, and 26% don't go more than 2-3 days without checking mail. It also noted that URLs spimmed (spammed via IM) are not the same as URLs spammed -- but the obfuscation techniques are the same; and they're using 2 learning databases, per-user and global, and the 'Report as Spam' button feeds both.

Experiences with Greylisting: John Levine's talk had some useful data -- there are still senders that treat a 4xx SMTP response (temp fail) as 5xx (permanent fail), particularly after end of the DATA phase of the transaction, such as an 'old version of Lotus Notes'; and there are some legit senders, such as Kodak's mail-out systems, which regenerate the body in full on each send, even after a temp fail, so the body will look different. He found that less than 4% of real mail from real MTAs is delayed, and overall, 17% of his mail traffic was temp-failed. The 4% of nonspam that was delayed was delayed with peaks at 400 and 900 seconds between first tempfail and eventual delivery.

As usual, there were a variety of 'antispam via social networks' talks -- there always are. Richard Clayton had a great point about all that: paraphrasing, I trust my friends and relatives on some things, and they are in my social networks -- but I don't trust their judgement of what is and is not spam. (If you've ever talked to your mother about how she always considers mails from Amazon to be spam, you'll know what he means.)

Combating Spam through Legislation: A Comparative Analysis of US and European Approaches:
the EU 'opt-in' directive is now transposed everywhere in the EU; EU citizens who are spammed by a citizen from another EU country, the reports should be sent to the antispam authority in the sender's country; and there's something called 'ECNSA', an EU contact network of spam authorities, which sounds interesting (although ungoogleable).

Searching For John Doe: Finding Spammers and Phishers: MS' antispam attorney, Aaron Kornblum, had a good talk discussing their recent court cases. Notably, he found one cases where an Austrian domain owner had set up a redirector site which sounded like it was expressly set up for spam use -- news to me (and worrying).

A Game Theoretic Model of Spam E-Mailing: Ion Androutsopoulos gave a very interesting talk on a game theoretic approach to anti-spam -- it was a little too complex for the time allotted, but I'd say the paper is worth a read.

Understanding How Spammers Steal Your E-Mail Address: An Analysis of the First Six Months of Data from Project Honey Pot: Matthew Prince of Project Honeypot had some excellent data in this talk; recommended. He's found that there's an exponential relationship between google Page Rank and spam received at scraped addresses, which matches with my theory of how scrapers work; and that only 3.2% of address-harvesting IPs are in proxy/zombie lists compared to 14% of spam SMTP delivery IPs. (BTW, my theory is that address scraping generally uses Google search results as a seed, which explains the former.)

Computers beat Humans at Single Character Recognition in Reading based Human Interaction Proofs (HIPs): this presented some great demonstrations of how a neural network can be used to solve HIPs (aka CAPTCHAs) automatically. However, I'm unsure how useful this data is, given that the NN required 90000 training characters to achieve the accuracy levels noted in the paper; unless the attacker has access to their own copy of the HIP implementation they can run themselves, they'd have to spend months performing HIPs to train it, before an attack is viable.

Throttling Outgoing SPAM for Webmail Services: cites Goodman in ACM E-Commerce 2004 as saying that ESP webmail services are a 'substantial source of spam', which was news to me! (less than 1% of spam corpora, I'd guess). It then discusses requiring the submitter of email via an ESP webmail system to perform a hashcash-style proof-of-work before their message is delivered. By using a Bayesian spam filter to classify submitted messages, the ESP can cause spammers to perform more work than non-spammers, thereby reducing their throughput. Didn't strike me as particularly useful -- Yahoo!'s Miles Libbey got right to the heart of the matter, asking if they'd considered a situation where spammers have access to more than one computer; they had not. A better paper for this situation would be Alan Judge's USENIX LISA 2003 one which discusses more industry-standard rate-limiting techniques.

SMTP Path Analysis: IBM Research's anti-spam team discuss something very similar to several techniques used in SpamAssassin; our versions have been around for a while, such as the auto-whitelist (which tracks the submitter's IP address rounded to the nearest /16 boundary), since 2001 or 2002, and the Bayes tweaks we added from bug 2384, back in 2003.

Naive Bayes Spam Filtering Using Word-Position-Based Attributes: an interesting tweak to Bayesian classification using a 'distance from start' metric for the tokens in a message. Worth trying out for Bayesian-style filters, I think.

Good Word Attacks on Statistical Spam Filters: not so exciting. A bit of a rehash of several other papers -- jgc's talk at the MIT conference on attacking a Bayesian-style spam filter, the previous year's CEAS paper on using a selection of good words from the SpamBayes guys, and it entirely missed something we found in our own tech report -- that effective attacks will result in poisoned training data, with a significant bias towards false positives. In my opinion, the latter is a big issue that needs more investigation.

Stopping Outgoing Spam by Examining Incoming Server Logs: Richard Clayton's talk. Well worth a read. It's an interesting technique for ISPs -- detecting outgoing spam by monitoring hits to your MX from your own dialup pools which uses known ratware patterns.