Skip to content

Justin's Linklog Posts

VAST.com

So, my new employer just launched today!

It’s a new search service, VAST.com. As the blog says, ‘we are building a search service that extracts classified ads from across the web, structures them, and then makes them available via an open REST API for commercial and non-commercial uses.’

Now you can see why I’m excited ;)

Greetings from 1996!

    --> Sending: ATZ
    ATZ
    OK
    --> Sending: ATQ0 V1 E1 S0=0 &C1 &D2
    ATQ0 V1 E1 S0=0 &C1 &D2
    OK
    --> Sending: ATH1
    ATH1
    OK
    --> Modem initialized.
    --> Sending: ATDT1892150150
    --> Waiting for carrier.
    ATDT1892150150
    CONNECT 45333

45 measly kilobits per second! This is incredibly painful — and expensive at 5 cents a minute! I briefly considered getting around it by hiring a 3G data-card for the couple of weeks before my DSL is activated — but that too is insanely overpriced.

Hurry up, DSL…

Disclosure

As of yesterday, I have a new day-job.

I won’t be working on email spam as part of the job, which is an interesting turn of events. However, I’ll be sticking with the open-source Apache SpamAssassin project, and keeping up the rate of work on that [*].

I’m not sure how much I can blog about the new place just yet, but I will say it’s certainly looking like it’ll be very interesting work ;)

[*: modulo the next couple of weeks while I’m waiting for my bloody DSL to be installed. argh!]

Apple Attempting to Patent RSS Aggregation

Miguel de Icaza quotes Dave Winer, pointing out two patent applications from Apple which seem intended to grab major chunks of the feed syndication space as Apple “IP”.

The first application is news feed viewer, 20050289147, filed April 13 2005:

A computer-implemented method for displaying a plurality of articles, the method comprising: storing a first feed bookmark in a folder, the first feed bookmark indicating a first feed, the first feed comprising a first plurality of articles; storing a second feed bookmark in the folder, the second feed bookmark indicating a second feed, the second feed comprising a second plurality of articles; aggregating the first feed and the second feed to form a third feed; and displaying the third feed.

I think there were many RSS readers that implemented this, and others from the patent application, before April 2005. I know Liferea, the one I use, has had UI-level aggregation since September 2004, with its VFolders.

Next, news feed browser, 20050289468, filed April 13 2005. This one contains a wide range of claims, but here’s one that stands out as particularly trivial:

A computer-implemented method for discovering a feed, the method comprising: receiving a request to display a file; determining that the file includes relationship XML; determining that a Uniform Resource Locator (URL) within the relationship XML indicates a file that comprises the feed; and displaying one of a group containing the feed and a link to the feed.

That’s pretty much RSS autodiscovery, as described in 2002.

The listed inventors in both patents are: Kahn, Jessica; (San Francisco, CA) ; Alfke, Jens; (San Jose, CA) ; Wilkin, Sarah Anne; (Menlo Park, CA) ; Howard, Albert Riley JR.; (Sunnyvale, CA) ; Forstall, Scott James; (Mountain View, CA) ; Lemay, Stephen O.; (San Francisco, CA) ; Melton, Donald Dale; (San Carlos, CA) ; Loofbourrow, Wayne Russell; (San Jose, CA).

Thanks, Apple! and thanks, “inventors”!

It’s important to note that this is still in the application stage, and as such can be invalidated, or narrowed down to a saner level, by using the techniques described here. I strongly recommend that people working in the syndication field with sufficient knowledge and expertise who feel strongly enough about this should spend a little time doing so, before the patent is issued and it becomes a multi-million-dollar task to invalidate it. (however, IANApatentL of course ;)

We Win

ongoing: The ASF Server:

Tim Bray: Which Apache project burns the most resources?

Mads: Spamassassin by a wide margin. […]

Heh, we win ;)

Helios, the Zones server, has been an incredible resource for us. SpamAssassin isn’t a traditional open-source software project in one respect: we use a lot of centralized “phone home” infrastructure to support rule and score generation. Having a virtualized server of this quality and horsepower to use for this has been fantastic.

(thanks to John O’Shea for the pointer!)

IBM Patents Closed-Loop Confirmation

Another day, another absurd IBM software patent. Via the IP list, here’s United States Patent 7,003,497:

  1. A method for confirming an electronic transaction, comprising the steps of: performing an electronic transaction between a first party and a second party; providing, by the first party to the second party, contact information of a third party service provider associated with the first party; contacting, by the second party, the third party service provider to obtain a location of a predetermined, private mailbox associated with the first party; sending, by the second party, a request for confirmation of the electronic transaction to the predetermined, private mailbox associated with the first party; accessing the private mailbox by the first party; and sending, by the first party, a reply message to the request for confirmation to thereby confirm authorization of the electronic transaction, wherein information regarding the private mailbox is not communicated to the second party during the electronic transaction.

There’s lots of waffle in the background section about this being for electronic e-commerce transactions, but that claim, and claims 2 and 3 at least, are easily sufficiently broad to cover simple “confirmed opt-in” email subscription systems — in other words, the system whereby a potential newsletter subscriber clicks on a link in order to “confirm” that they want to subscribe to a newsletter. That’s the current best practice email subscription method used by pretty much everyone.

Filed December 31, 2001. There was plenty of prior art before this date, but who would want to go up against IBM, no less, to attempt to get this invalidated, especially now that it’s been issued?

Thanks USPTO, you’re doing a heck of a job!

US Things I Miss

So, I’ve been back in Ireland for several weeks now. How goes the culture shock? Well, let’s make a list of the stuff I’m missing from California:

  • C, who’s still back there finishing up her contract. Hurry up, C!

  • All my friends I left behind in the US :( Come visit!

  • The weather (well duh)

  • Trader Joes: low-cost, high-quality organic and near-organic food

  • The excellent Mexican and Southern food. Mmm, Taco Mesa

  • Super-cheap cocktails — although having good Guinness makes up for a lot of this

  • The back country — desert, mountains, snow, national parks. Ireland may have more surviving history dotted about, but it’s just flat. I miss the mountains

  • Netflix — haven’t spotted a replacement for this yet. There are companies in Ireland that use a similar idea, but it appears every one just about manages to screw it up and render it useless, generally by introducing throttling, late fees, or slow turnaround. meh

  • The way my Irish accent meant I could get away with pretty much anything. That trick doesn’t work in Ireland ;)

In other news: the broadband choices situation has pretty much gone to shit.

It turns out that all the good options are quite dependent on local-loop unbundling, which — somehow — still hasn’t gotten around to my local exchange. As a result, guess who’s going to be stuck on the wrong end of dialup, no less, for “2 to 3 weeks” until Eircom deign to switch on the bitstream access for my new BT-resold ADSL connection? Here’s hoping there’s a neighbour with broadband and wifi when I move back in. Joy.

DearAOL and GoodMail

Things have really been heating up recently around the AOL/Goodmail “pay to send” CertifiedMail scheme — the EFF and a host of other groups have launched dearaol.com, stating:

This system would create a two-tiered Internet in which affluent mass emailers could pay AOL a fee that amounts to an “email tax” for every email sent, in return for a guarantee that such messages would bypass spam filters and go directly to AOL members’ inboxes. Those who did not pay the “email tax” would increasingly be left behind with unreliable service. Your customers expect that your first obligation is to deliver all of their wanted mail, and this plan is a step away from that obligation.

While I dislike this proposal, too, as far as I can tell, AOL actually have pretty reasonable intentions with this program — nowhere near as bad as the DearAOL.com site makes out.

However, they’re doing a really really crappy job of getting this information out there, or committing to reasonable limits on the program, such as announcing that they will use it only for transactional emails, as Yahoo! have done.

I’d strongly recommend reading Carl Hutzler’s posting on the subject. Carl was AOL’s head of anti-spam operations until last year, so he really knows what he’s talking about, and he lays it out clearly — a lot more clearly than any corporate statements from AOL do. His blog contains a fair bit more on the subject, too.

But seriously — why isn’t there a press release on the AOL site about this scheme? Some front-channel communication about now might be useful, I’d suggest, before things really get hairy — this crapstorm is coming about partly because AOL’s comments are all filtering out in drips and drabs via third parties, and (AOLers say) are being misconstrued and misrepresented in the process. It’s a classic case of missing the cluetrain.

I’d also really encourage the EFF people to tone done the rhetoric; statements like “senders will have no guarantee that their emails will be delivered” is scare-mongering, given that SMTP email already provides no such guarantee.

Update: wow, MoveOn went really overboard — “threatening the Internet as we know it … The very existence of online civic participation and the free Internet as we know it are under attack.” OMG the sky is falling!

Side Issue: The Spam Definition

Also, another note to EFF: defining spam as “whatever you don’t want to read” is a terrible mistake to make. That confuses a good, clear, enforceable and automatable definition of spam — unsolicited bulk email — and makes it effectively unenforceable by law, unpoliceable by ISPs, impossible to detect automatically, and incompatible with existing, effective EU and Australian legislation.

Listen to your own Chairman of the Board; he’s right on this count.

PS: any luck fixing up the non-confirmed signups issue? Last time I checked I could still subscribe any address to the EFF Action Alerts without a cross-check, which is not a good thing.

Another script: goog-love.pl

A quick hack —

goog-love.pl – find out where your site’s google juice comes from

This script will grind through your web site’s “access.log” file (which must be in the “combined” log format). It’ll pick out the top 100 Google searches found in the referer field, re-run those searches, and determine which ones are giving your website all the linky Google love — in other words, the searches that your site ‘wins’ on.

The output is in plain text and a chunk of HTML.

usage:

goog-love.pl sitehost google-api-key < access.log > out.html

e.g.

cat /var/www/logs/taint.org.* | goog-love.pl \
  taint.org 0xb0bd0bb5yourgoogleapikeyhere0xdeadbeef | tee out.html

NOTE: this script requires the SOAP::Lite module be installed. Install it using apt-get install libsoap-lite-perl or cpan SOAP::Lite. It also requires a Google API key.

For example, here are the current results for this site. You can immediately see some interesting stuff that’s not immediately obvious otherwise, such as my site being the top hit for [beardy justin] ;)

Download here (5 KiB perl script).

Notes:

  • if you see a lot of “502 Bad Gateway” errors, it’s probably over-zealous anti-bot ACLs on Google’s side. Try from another host.

  • Read the comments for notes on a bug in recent releases of SOAP::Lite; please let me know if you hear of them getting fixed ;)

Dublin Riots

While driving around Ireland on a wedding-location-scouting trip, we started receiving texts talking about riots in Dublin; I texted a friend, and got a reply along these lines: “Celtic-topped scobes run riot through O’Connell St, torching cars in Nassau street, hospitalising cops and Charlie Bird. madness!”

I thought he was joking, but nope. A load of IRA-slogan-shouting scumbags really had been allowed to run riot — with paving stones of all things left unsecured in their midst! — and it quickly got way, way out of hand.

The blog coverage is excellent, with lots of photos. I suggest starting with Indymedia Ireland, these Flickr photos and the links on this weblog. It appears the gardai really fell down on this one.

For what it’s worth, I was in town a few hours later, and the rest of Dublin was trouble-free — just the usual Saturday night goings-on. O’Connell St. was still a rubble-strewn mess when I passed through on Sunday, though.

SourceForge.net now offering public Subversion

Good news. It appears that SourceForge are now offering full, public use of Subversion for all projects on sf.net!

The SourceForge.net: Subversion (Version Control for Source Code) document contains full details on their setup. Notable key points:

  1. It’s using authenticated HTTPS — which is great, going by my experiences with the ASF’s setup
  2. Imports are done from either an existing SF.net CVS repository using cvs2svn, from a Subversion ‘svnadmin dump’ file, or from a CVS repository tarball
  3. CIAbot support is offered as standard ;)

Awesome. I’ll be trying this out with Uffizi, which I registered as a Sourceforge project a few weeks ago just to try this out. ;)

TREC Spam Corpus

Some news from TREC’s Gordon Cormack:

The TREC 2005 Corpus (92,000 messages – 42,000 ham; 50,000 spam) is now available for self-serve download.

TREC Spam Evaluation is a NIST program to develop methods to measure spam filter accuracy and performance. More details here.

The corpus can be picked up at Gordon’s site. As far as I can tell, this should be a pretty solid corpus for spam researchers and developers.

Four Things

I don’t do silly blog antics much, but I got tagged by Mat for the Four Things meme. Looking around, it is indeed a bit more interesting than things like the usual LJ quiz, so why not!

I wrote this on the plane from LA to Dublin, which may have affected some of the selections in 4 places I would rather be right now at least ;)

4 jobs I’ve had:

  • I was Iona Technologies’ first employee, and stayed there for no less than 7 years. I got to see the company grow from a handful of people, most of whom weren’t getting paid (hence how I wound up as the first employee ;), all the way up to a 300-strong multinational, while the company itself formed a core of Ireland’s mini dot-com boom. That was fantastic fun, and educational to boot.

  • my Dad’s gun/fishing/sporting-goods shop. Was it really a good idea to have a teenager working near firearms? At least I wasn’t the one who unplugged the fridge where the maggots were kept, so that they all hatched over the course of one weekend…

  • A horrible teenage job — picking tomatoes. I can still feel the orange dust under my fingernails every time I smell fresh tomatoes :( I didn’t last very long at that at all.

  • writing an Amiga-based kiosk system for virtually no pay whatsoever, at the age of 18 or 19. Ah, exploitation.

4 movies I can watch over and over:

  • Koyaanisqatsi — it’s dating a little now, since every ad agency through the 90s ripped it off. But still, the invention of a new format. I remember looking at the 405 freeway in LA, and thinking “looks like something out of Koyaanisqatsi” — of course, it was.

  • Princess Mononoke — either that, or Nausicaa. I just love the way the characters are coloured in shades of grey, rather than black and white.

  • the Lord of the Rings trilogy — oh dear I’m a hopeless Tolkien fanboy.

  • Spinal Tap — pure genius.

4 places I’ve lived:

  • Melbourne, Australia; around the time of the annoying TV drama, The Secret Lives Of Us;

  • Newport Beach, CA; around the time of the annoying TV drama, The O.C.;

  • Dublin, Ireland; no annoying TV drama — so far

  • University of California Irvine, CA; while Irvine itself is the most soulless suburban hellhole I’ve ever visited, living on the UCI campus is quite fun by comparison. Take about 1000 grad students, post-docs and lecturers from around the world; put them all in the same square mile or so; remove all fun (and bars!) from the surrounding areas; watch them make their own entertainment, or go mad.

4 tv shows I love:

4 places I’ve vacationed:

  • Annapurna Base Camp, Nepal; we trekked our way up to there, then trekked back down again. Unforgettable. I really want to do another Nepal trek as a result

  • car-camping around the Australian state of Victoria; they have some fantastic national park campsites, which most tourists overlook

  • learning how to dive in Ko Tao, Thailand; great setting, great dive sites, pretty cheap too!

  • Yosemite; amazing, world-class natural beauty. Californians don’t realise just how lucky they’ve got it ;)

4 of my favourite dishes:

  • A good Thai green curry

  • Laos-style green papaya salad with sticky rice

  • a good meaty cassoulet, from Fandango in San Luis Obispo. At least, that was the tastiest meal I’ve had in recent months ;)

  • Mangosteen — the queen of fruit, according to the Thais. I could, and probably have, eaten hundreds of these

4 places I would rather be right now:

  • spending New Year’s Day with a bunch of friends in rural West Cork or County Galway; until I moved to the US, this was one of my favourite annual traditions.

  • the Stag’s Head Bar, Dublin, in the snug, again with a bunch of friends

  • sitting on the grass outside the Pavilion bar in TCD, on a sunny summer’s day (hmm, that’s a lot of bars!)

  • Chiang Mai, Thailand

4 sites I visit daily:

4 people I’m tagging:

The Return of Sneakernet

Keith Dawson sent this on — an interview with Jim Gray, head of Microsoft’s Bay Area Research Center and winner of the ACM Turing Award, talking about new transmission systems for truly massive data collections. Very interesting:

[One] option is to send whole computers. …. We’re now into the 2-terabyte realm, so we can’t actually send a single disk; we need to send a bunch of disks. It’s convenient to send them packaged inside a metal box that just happens to have a processor in it. I know this sounds crazy — but you get an NFS or CIFS server and most people can just plug the thing into the wall and into the network and then copy the data.

Dave Patterson, interviewer: What’s the difference in cost between sending a disk and sending a computer?

JG: If I were to send you only one disk, the cost would be double — something like $400 to send you a computer versus $200 to send you a disk. But I am sending bricks holding more than a terabyte of data — and the disks are more than 50 percent of the system cost. Presumably, these bricks circulate and don’t get consumed by one use.

DP: Are you sending them a whole PC?

JG: Yes, an Athlon with a Gigabit Ethernet interface, a gigabyte of RAM, and seven 300-GB disks — all for about $3,000.

DP: It’s your capital cost to implement the Jim Gray version of “Netflicks.” (jm: sic)

JG: Right. We built more than 20 of these boxes we call TeraScale SneakerNet boxes. Three of them are in circulation. We have a dozen doing TeraServer work; we have about eight in our lab for video archives, backups, and so on. It’s real convenient to have 40 TB of storage to work with if you are a database guy. Remember the old days and the original eight-inch floppy disks? These are just much bigger.

DP: “Sneaker net” was when you used your sneakers to transport data?

JG: In the old days, sneaker net was the notion that you would pull out floppy disks, run across the room in your sneakers, and plug the floppy into another machine. This is just TeraScale SneakerNet. You write your terabytes onto this thing and ship it out to your pals. Some of our pals are extremely well connected — they are part of Internet 2, Virtual Business Networks (VBNs), and the Next Generation Internet (NGI). Even so, it takes them a long time to copy a gigabyte. Copy a terabyte? It takes them a very, very long time across the networks they have.

E-Pending

Boing Boing has an interesting case today:

“I filled out a web form for a contest from Miller using a throwaway junk email address and then, months after I dumped the throwaway account, I got this to my main account! Not sure I like the idea of companies tracking me down like this.”

I sent a mail to follow up on this, but it’s worth blogging here too.

This is, unfortunately, common practice among the “legitimate” bulk mailer companies; it’s called “e-pending” (short for “email address appending”). Basically, the advertiser contacts one of the big data-mining companies, provides them with the data they have about the customer — name, postal address, etc., and gets them to match that against their database; the data-miner then provides any other email addresses they may have on file for that user, even if those email addrs were provided for bills, promotional use for other companies, etc.

The advertisers contend that permission was given by the person who’s being mailed; the recipients contend that permission was given to send to a specific address, not all of that person’s addresses in perpetuity.

Here’s a few more examples of e-pending gone bad: two Jennifer Millers, Sony scraping ancient Internic contact addresses, Spamvertized.org comment on the practice, Joe St. Sauver comments.

It’s exclusively a US phenomenon, as far as I know; I think most cases of e-pending are rendered illegal under EU data protection law. Handy. ;)

Update: Brian at the Spam Kings weblog notes that ‘this spooky little spam was the work of Equifax, the big credit reporting agency that shut down its Boca Raton-based spam operation, Naviant, in 2003, due to the impending passage of CAN-SPAM.’

RFID in the Grauniad, and back in Dublin

Greetings from sunny Dublin, Ireland! (really!)

I’m now back in taint.org’s native timezone, although precariously set up and experiencing occasional interruptions. If you’re waiting for a mail from me, it may take a little more time.

I did have time to be interviewed last week by Karlin Lillington for this Guardian story:

To make sure customs agents could read his cat’s chip to match him to his Pet Passport on return to Europe, Mason bought his own scanner at a cost of some £200. “I didn’t want to risk the cat being impounded for six months’ quarantine at Heathrow,” he sighs.

It’s true.

Happy to be back — I think. Looking forward to my first pints, in over a year, of creamy Guinness in its native habitat. I also have a couple of half-written weblog entries I wrote on the plane, too…

Yahoo! delete b3ta newsletter mailing list?

Today’s top item on the b3ta front page, under Site News:

Yahoo please talk to us! Help! – our yahoogroups list (with over 100,000 subscribers) has been deleted. We don’t know why. If you work at Yahoo and can help us sort this out please contact me at robmanuel AT gmail dot com.

posted by rob on 10th Feb at 2pm

B3ta is a long-established UK humour site who send out a weekly newsletter, every Friday afternoon, using Yahoo! Groups as their mailing list service. They’ve been doing this for years. Yep, that’s 100,000 subscribers.

Anyway, if anyone from Y!Groups, or anyone who knows someone there, is reading, please do get in touch with the b3ta guys — this is a very serious catastrophe for them. I’d be curious to hear how/why this happened.

To tie this into spam-filtering and email operational topics, it brought this posting from Jeremy Zawodny to mind:

This all makes me wonder if it’s worth it for smaller organizations to bother running their own mail servers anymore. If Google offered small business mail the way Yahoo does, there’d be some serious competition in the market and it’d make a lot of people’s lives much easier.

While Jeremy was talking about a different service from list hosting, I think we’re seeing the other side of the email-outsourcing coin, here.

Update: fwiw, it’s back:

Yahoo update – on Friday Yahoo deleted our list of 100,000 newsletter readers email addresses, hence we didn’t send a newsletter. Today they’ve been in touch and have promised a response by Tuesday. Fingers crossed. UPDATE: It looks like it’s back! Hooray for Yahoo!

Broadband choices in Ireland

Perfect timing! Just 5 days before I return to Ireland, Damien Mulley posts ‘Broadband choices in Ireland’, a good overview of the options available for consumer broadband internet connection.

I’ve been out of the loop for quite a while, and spoilt by the options available in suburban Southern California (which are, of course, pretty good). But this is a lot better than what was on the table when I left, 3 years ago.

What strikes me is that the upload/download speeds are quite reasonable and pretty close to what you’d see in the US. Similarly, the prices are finally near to the going rate in the US, once the various limitations and add-ons (required ‘bundles’, state taxes etc.) are taken into consideration.

However, virtually all of these deals use the horrendous concept of download capping! Given that I use this stuff for work, and routinely rsync around 30GB chunks of email corpora between central offices, colo servers, and my desktop, this just won’t fly. It could be argued that I’m therefore not a typical broadband consumer, who these deals have been carefully designed to cater for. But seriously — if a telecommuting software developer isn’t a typical broadband consumer, who the hell is? Hey telcos: a little flexibility goes a long way — don’t fence me in. ;)

All in all, it looks like Smart Telecom are the winners; 3Mb/s download, 512Kb upload — and most importantly, no cap — for EUR 35 per month. (And check out that XHTML/WAI-compliant website!)

I probably would have gone with Irish Broadband, but for the past 6 months the only thing I’ve been hearing about them via word-of-mouth has been bad news, detailing customer service meltdown after meltdown. Even the legendarily incompetent ‘biddies’ of Eircom seem to be getting better reviews nowadays.

Talking of Eircon, our dear old dirty-tricks-wielding celtic-tiger-throttling incumbent telco: the top Sponsored Link on a Google search for irish broadband is:

Irish Broadband

www.eircom.ie — More speed, prices reduced by 25%, free modem & a free connection!

Scum.

Spamhaus comment on the AOL/Goodmail deal

AOL and Yahoo! have been making a lot of headlines with their plans to reduce their whitelist-management workload — and make a little pay-to-send money on the side — with a deal with Goodmail.

Now Spamhaus have gone on the record against the plan:

On Monday, Richard Cox, chief information officer at antispam organization Spamhaus, said that “an e-mail charge will destroy the spirit of the Internet.”

“The Internet has become what it is because of freedom of communication. Open discussion is what gives it value. There should be no cost for particular services, and e-mail should be free and accessible to all. This will disenfranchise people.”

RFID “e-Passports”

This is what passports containing RFID chips will look like:

Note the little rectangular logo at the bottom. According to Ed Hasbrouck, that’s the ICAO standard logo indicating that this is an RFID passport, and therefore:

identity thieves, terrorists, direct marketers, data aggregators, malicious governments, or anyone else with a radio receiver within 10 meters (30+ feet) or more whenever your passport is read at a border crossing, airport, etc. can secretly and remotely track you, log your movements through the unique “collision avoidance” ID number sent by the chip, and intercept and decrypt all the data (including your digital photo and, in some countries, your digitized fingerprints) needed to “clone” a perfect copy of your passport, forge other identity credentials, or impersonate you.

Of relevance are the comments over at Bruce Schneier’s weblog entry regarding the Riscure research into the Dutch Biometric Passport’s lousy security.

Interestingly, as one commenter there notes, breaking the crypto may be overkill; the knowledge that a person is carrying a passport from a certain country, or set of countries, may be enough for certain attackers.

I asked the Irish Passport Office about their RFID plans last April:

I’m an Irish citizen and passport-holder. I have been following recent discussions in the US regarding the addition of RFID computer chips to US passports, and I note that the US Department of State is now indicating that this measure was made necessary due to recent International Civil Aviation Organization (ICAO) standards — namely ICAO Doc 9303.

As a result, since Ireland is a signatory to ICAO regulations, this raises the question as to whether Irish passports shall shortly include similar RFID or “contactless chip” technology.

Can you tell me:

  • if this is planned?

  • is there a mechanism for public comment on this process?

  • who could I further email to ask about this, if you do not know?

Disappointingly, I never received a reply. :( Someday I should really chase this up.

Update, Oct 17 2006: Well, they never bothered replying. They did, however, introduce RFID chips to Irish passports:

The chip technology allows the information stored in an Electronic Passport to be read by special chip readers at a close distance. The chip incorporates digital signature technology to verify the authenticity of the data stored on the chip.

OpenWRT Wifi Repeater Recipe

Seeing as I’ve moved house, and am staying at a friend’s temporarily until I head back to .ie, internet access has become a bit of a problem. Hence, I’m posting this via some neighbour’s leeched wifi ;)

To do this, I came up with some seriously hacky IP infrastructure, to wit a repeater setup composed of two off-the-shelf router/NAT/AP boxes, since the signal is pretty weak and needed a boost to cover the useful parts of the house. If you’re curious. the details can be read over here.

Weblog Spam and Adversarial Classification

Dr. Dave, author of the Spam Karma WordPress antispam plugin, has posted an interesting article about new weblog-spammer tactics:

These spams do not present most of the idiotic traits of their lower colleagues: they do not try cramming hundreds of URLs or inserting hundreds of easily spotted junk keywords in the comment content. Instead, they use only the dedicated name and homepage fields to sneak in spam URL and keywords. The comment content is often perfectly innocuous, sometimes even topical (by copying parts of another comment or a trackbacking post). All in all, these spams could easily be missed by a human moderator who wouldn’t look carefully at the contact name and URL.

(Thanks to Kelson Vibber for the pointer to this.)

In other words, he is noting what we noticed in email anti-spam; that what works well one year, is likely to degrade over time as the spammers attempt to evade it, and one has to keep working to keep up.

The best term for this appears to be adversarial classification. Anti-spam activities fall into this category, and it often means that classic text classification algorithms aren’t suitable — after all, the Reuters-21578 dataset never tried to evade your classifier ;)

In a similar vein, this MS research paper is interesting:

Previous work on adversarial classification has made the unrealistic assumption that the attacker has perfect knowledge of the classifier. …. We present efficient algorithms for reverse engineering linear classifiers with either continuous or Boolean features and demonstrate their effectiveness using real data from the domain of spam filtering.

It’s akin to John Graham-Cumming’s work looking into how a spammer could get past a bayesian filter “from the outside”, but with more techniques, and examining MS’ MaxEnt algorithm, too. PDF here, well worth a read.

(By the way, I’m in the process of moving house, so if you send me an email, it may take a while for me to reply. This situation is likely to prevail for the next few weeks, for what it’s worth — fun.)

Raw Food Crackpottery

Via RobotWisdom, a review of a new Primrose Hill cafe:

No wheat. No gluten. No sugar. No GMO. No dairy. No yeast. No shoes.

Yep, no shoes. If you want to enjoy the detoxifying glories of London’s first raw-food cafe, then please leave your clod-hoppers at the door, along with your high stress levels and your smart-arse scepticism.

I know of another cafe elsewhere which also offered a largely-raw menu. This one, however, shared a back alleyway with a shop where a friend of mine worked.

He noted that on several occasions, he’d seen rats near, or on, the pallets of plastic-wrapped fruit and vegetables. You see, the raw food was delivered to the kitchen door, where it laid outside for a short while — in the rat-infested alleyway. Rats crawling over your food, naturally, is not a good thing.

There’s a very good reason why some smart stone-age ancestor invented cooking our food — because it kills the germs that’ll make us sick!

Devotees claim that because the enzymes are destroyed when food is heated above 48C, our bodies have to utilise our own enzymes to break down the food, which can result in us feeling tired and run-down.

Yeah, devotees are pretty much talking crap there. ;) If anything, cooked food is easier to digest than raw. And good luck with the whole ‘getting by without using enzymes’ thing!

What a load of quackery.

Happy Spam-Solved Day!

Happy BillG-Scheduled Spam Solved Day!

“Two years from now, spam will be solved,” Microsoft’s Bill Gates said [at the 2004 World Economic Forum in Switzerland].

So is it? Weeeeell…..

To “solve” the problem for consumers in the short run doesn’t require eliminating spam entirely, said Ryan Hamlin, the general manager who oversees [Microsoft]’s anti-spam programs. Rather, he said, the idea is to contain it to the point that its impact on in-boxes is minor.

In that way, Hamlin said, Gates’ prediction has come true for people using the right tactics and advanced filtering technology.

Ha. I am reminded of ‘weapons of mass destruction-related program activities’.

As one slashdotter says, ‘when you fail, try try again; or conversely, change the requirements and make it look like a success, which is exactly what BG has done.’

It’s not washing, though, unsurprisingly. The poll on the same page, asks ‘do you agree with Microsoft’s contention that the spam problem has been “solved”?’ Right now, with 1169 votes, it has 7.2% (in other words, the MS employees) agreeing, and a whopping 92.8% not going for it.

SweetheartsConnection.com – Interesting Dating Scam

Here’s an interesting online scam. An anonymous friend, working in anti-spam, writes:

‘I’ve been covertly looking into rumours of a myspace scam and thought you might like to blog it – I don’t want to be attached to this in any way otherwise I’d write about it myself (I have a profile on there that I want to keep around in case other scams show up, but I don’t really want to advertise the profile).

It works like this:

You sign up for a myspace account and fill in your profile details. Then in a couple of days someone contacts you pretending they’re using their friend’s account because they haven’t signed up yet. They say something along the lines of “I saw your profile and thought you were cute, if you’re interested email me at (random)@yahoo”. If you email them, you get a reply back being all bubbly and cute, and a link to a web page that sort of looks like a “My First Homepage” – it even says “I’m taking a course at the community college in HTML”. There are pics on the page of a very cute girl, but at the bottom a teaser saucy picture in lingerie, and an Adult Pass signup to get more pics. Of course the signup is $40.

It’s a subtle scam, but definitely a scam. Here’s an example of the type of site you get sent to:

http://www.honesthost5mb.com/kristenssite/

Note the hosting service. Now delete the /kristenssite/ part and it looks legit, right? Until you click on a few links and realise they have nothing to sell.

Google has no knowledge of honesthost5mb – nobody links to them, so how did Kristen find them?

It’s indeed quite funny that there’s a terribly similar hosting service out there: http://www.jagflyhosting.com/ – yet for some reason all their links seem to work, and they have an accessible phone number. Shock. Horror!

I’m pretty sure the account being (ab)used on myspace is a stolen one – it looks pretty legit, including linked in friends and comments, so I’m suspecting a cracked password.

Anyway, thought you could blog this to warn others about it (feel free to advertise the above link – though I guess that’ll ruin the whole “google doesn’t know” thing ;-) I wish I had the guts to sign up for the extra pics to see what you end up with!’

They also passed on the email content, noting ‘here’s the email sent from yahoo webmail from an AOL account (sadly AOL proxies all web content so I can’t track it any further than New York proxies)’:

Hi [redacted] ! Hey you found me! I was a little worried you wouldn’t be able to :P so, how are you? I’m ok.. I’m sneaking a email in at work before my boss comes back in, so sorry if it’s a little short! I promise to write more later :)

So I promised you some pics:P well I will have to send you some of me when I get home (don’t have the pics here at work). In the meantime you can check out my personal homepage. It’s kind of playground while I’m taking this intro to HTML class, kind of like my blog page. Here is the link: http://www.honesthost5mb.com/kristenssite It’s not much yet but it’s getting there. hehe

So tell me more about yourself, are you a work to live or live to work kinda person? What are you looking for in a girl? Do you like myspace? I think I’ll make a profile soon, it’s free right? and you can add your own HTML? That would be cool.. So how is your 2006 going? Mine is ok, one thing I’m excited about though is that today is exactly 1 week before my birthday. Hey, maybe if we hit it off, we can go on a first date on my birthday, that would be really cool. :)

Anyways, enough with the 20 questions right? oh, I prefer to chat on IM, its more personal you know? Do you have AIM? im kriskat224 on there, msg me sometime ok?

Well I should log off and get some work done.. Write back soon! and take care!

xoxo ~ Kristen

Sure enough, a little further research on Google yields the following examples…

The earliest is this story at Jiveworld.net, of 2004-05-24, noting:

Aaron recently received an e-mail from someone he supposedly chatted with on Match.com:

Aaron: I had actually been chatting with someone I might have met there a LONG time ago. I couldn’t remember, so I gave her the benefit of the doubt. I thought it was SPAM, but hey, even my own e-mails sounds like SPAM sometimes. She sent me a picture in her e-mail, but the mail service she was using didn’t like it. So she sent me the link to her “website.” It initially seemed like a real personal web space until the big ADULT BUREAU logo appeared. Oh yes, very legitimate.

This was a unique experience for me since someone actually wrote a tailored response to my e-mail, responding to specific things I had mentioned. Even though the bulk of the e-mail seemed form generated, this had to have been a time intensive process for damn near no return. Well, after the ADULT thing, I thought my response to her e-mail was inventive. Since I haven’t received another response, it’s obvious she (Or he) took the hint.

Another: a thread at FordPower.net, 2004-09-24, with a link to http://www.4mbwickedweb.com/sites/melissa/ (since expired);

Another: a Fark thread posting, 2005-01-28, scroll down to the posting of ‘2005-01-28 10:42:28 AM’ by ‘XavierCrutch’, linking to http://www.stepstonehost.com/jesshomepage/ (since expired);

Another: this weblog post, scroll down to March 13, 2005, ‘Personal ads and the great porn conspiracy’, where the poster is snared, via IM with AIM user natkat224 this time, and is sent another link to a site using http://adultbureau.sweetheartsconnection.com/ to collect the $40 fee;

Another: another weblog post, 2005-10-28.

A google search for the AIM username ‘natkat224’ reveals plenty more hits.

So here’s a list of the sites found from those links, and via google, so far:

The common host, at all stages, is ‘SWEETHEARTSCONNECTION.COM’, registered to

INTERTRANS TRADING OVERSEAS LIMITED
VASILEOS OTHONOS 21, FANEROMENIX COMPLEX, OFFICE 102, 6030 LARNACA
N/A
N/A, CA N/A
CY

lots more detail here. SweetheartsConnection.com has terms and conditions that appear to prohibit spamming — but it turns out that they themselves have a pretty scary entry at RipoffReport.com, anyway, noting:

If you want a free LIFE TIME PASSWORD with Adult Bureau.. you have to apply for a 1 month membership @$39.95 to Sweetheartsconnection.com A DATING SERIVCE ….. charge appears as IT INTERNET SERVICES.

No matter if you request cancellation of service this company will continue to bill you ” it gets better ” then send you to there home made collection company ” Secure debt collections, ” two companies in one both fraud

Phony Notices will be sent to the home demanding final payment of a service NEVER USED. They will contact you, try intimidate you into paying a Balance of $200.00 (Sweetheartsconnecton.com automatically rebills your credit card every month @$39.95.

eek.

This weblog post, of 2005-10-28. is shaping up to be the canonical support group for victims of this scam; worth reading the comments there.

Quite a scam, and interesting to note the “personal touch” via email and IM.

The C=64-izer

Ever wondered what today’s internet meme images would look like on mid-’80’s home computing hardware?

Wonder no longer!

What Works in Software Development

I already posted this to the link-blog yesterday, but it’s so good it’s worth promoting more widely. If you write software for a living, you really ought to read the slides for Michael Schwern’s excellent ‘What Works In Software Development’ talk.

It’s a long presentation (108 slides!), but during the course of that, he covers:

  • effective teamwork
  • dealing with bad customers
  • dealing with bad management
  • classic coding mistakes
  • classic project management mistakes
  • classic design mistakes
  • test-driven development
  • refactoring
  • patterns

It’s a really good synthesis of what I think are the best bits of good OO design, XP, CPAN and perl’s design and coding styles, without most of the cruft. I’ll be pointing people at this for years to come, I think…

(Found via yoz.)

Planet Antispam: Beta No More

Planet Antispam has been working pretty nicely for the last couple of weeks — can’t say I’ve noticed any trouble, and its RSS feed is turning out to be a nice aggregation of anti-spam news. On top of that, John Levine was kind enough to set up a CNAME for it at a more appropriate URL — http://planet.spam.abuse.net/.

As a result, it’s now fully-fledged, and fit to lose the ‘beta’ qualifier. Please bookmark, subscribe to the feeds, and pass on the URL to others you think may be interested!

Moving Home — De-Cluttering

I’m moving home.

The flights are booked — Feb 14th, Valentine’s Day, I’ll be leaving Orange County and heading back to Dublin permanently. In the meantime, I’ve been selling stuff, throwing stuff out, decommissioning servers, and making backups.

The server

My erstwhile desktop, later my trusty back-room server, ‘jalapeno’, was sold earlier today. Thankfully, I bought a 250GB hard drive recently, so I actually had the room to back up its 70GB somewhere beforehand.

Being security-conscious, I overwrote its partitions using pseudo-random data before passing it on (‘dd if=/dev/urandom of=/dev/hda9 bs=1024k’). However, being lazy, I did this while the machine was up and running, over an SSH link.

Watching as ‘df’ produced gibberish output, and as later commands started producing nothing but bus errors, was odd — a very strange feeling to be actively destroying the disk’s data like that. Here’s hoping the backups worked

The yard sale

We had one, in the process selling about $1000 worth of IKEA furniture, books, camping equipment, bits of hardware, sports equipment, and a pink xmas tree:

The local bargain hunters starting knocking on the door at 8:15am, despite the sign’s posted start time of 9am. Once we did start bringing items out to the front lawn to sell, there were already about 10 people, which quickly swelled to a mob of 20 by 8:45am. They were keen!

By the end of Saturday, we’ve sold pretty much all the furniture, all of the sports and camping equipment, most of the hardware that isn’t total crap, and only 2 of the books. One shopper’s explanation: ‘she didn’t have the time to read books’.

Still, the yard sale has netted $345. Not bad, and a good feeling to de-clutter so successfully.

Music, and iPod Shuffle

I’ve realised I like the endings of songs; whether I like a song or not, entirely depends on how it ends.

Apple’s iPod shuffle algorithm is incredible. I’ve been spending quite a bit of time listening to it, and I’m sure it’s not random; I think it’s picking next tracks based partly on the similarity of metadata between the current and candidate tracks, which is quite neat as an automated mixing technique.

So is it random? Google says:

  • yes
  • no; a commenter on that article notes the same thing I’m talking about
  • yes
  • no; can’t say I’ve noticed the Beatles getting a push on mine
  • yes
  • and finally, no answer here, but a pretty cool stats experiment

Google DRM and WON Authentication

So, Google have invented their own DRM, apparently. I’m keen to find out more details; Techdirt and Plasticbag.org are so far the only places I can find in the blogosphere to discuss it in any detail.

One tidbit worth noting from the LA Times coverage:

The Google copy-protection software also imposes a big restriction: The CBS shows, NBA games and other material protected by the software can be watched only on a computer that’s connected to the Internet.

“I think it’s going to be a problem,” said Li, the Forrester analyst, adding that Google executives told her they were trying to fix it.

That’s interesting. In my opinion, given that quote, I’ll bet Google’s DRM is something similar to the copy-protection systems used for many games since about id’s Quake 3 and Valve’s Half-Life; an online “key server” which validates codes, tracks player IDs, and who’s viewing what, “live”, as the video is cued up and played.

Some more info on the Half-Life WON authentication system can be found in this GamaSutra article; subscription required — try viewing this google-cache version with Javascript off if you don’t have a sub. That’s historical now, of course, since that WON system has been replaced by a new auth protocol as part of Valve’s ‘Steam’ system.

The key factor is the network, separating the dangerous, untrustworthy user machine from the trusted key server. Since the online key server can act as a platform for trusted, known-insubvertable code to run, along with the video server, both being under Google’s control, it’s actually possible to build reasonably solid DRM on this model. That’s as opposed to the usual case, where a reasonably determined teenager can break it in a week of school-nights. ;)

Anyway, that’s speculation. It remains to be seen if they’ve come up with something along the lines of WON authentication — and if it’s still easily subvertable or not.

Update: Aristotle Pagaltzis has a pretty good point in the comments:

Watching video, unlike playing a multiplayer game, is not an activity that inherently requires connecting to a server. Playing a multiplayer game, OTOH, inherently is.

So cracking a multiplayer game’s key check is fruitless, because then you can’t play online anymore, which was the whole point of the game in the first place. In contrast, a video player with a cracked key check still fulfills its purpose just fine.

I think he’s right. That’s a key point, demonstrating how WON authentication still can’t help — media playback, as a task, is itself fundamentally crackable.

Wedding Plans

Myself and the lovely C are planning on getting married, hopefully sometime this year. I’ve just come across some details about Japanese weddings, and apparently:

‘If you are attending a Japanese wedding reception, you are expected to bring cash for a gift (called Oshugi). The amount depends on your relationship with the couple and the region, unless the fixed amount is indicated on the invitation card. The average is 30,000yen ($250) for a friend’s wedding. It’s important that the cash is enclosed in a special envelope called Shugi-bukuro and your name is written on the front.’ … ‘It is a grave insult to give less than $200.’

That gives me a great idea… ;)

Planet Antispam

So a few weeks back, I mooted the idea of an anti-spam Planet site, similar to Planet GNOME, Planet Java, Planet Perl et al.

Here’s the results: Planet Antispam.

It’s still got a few rough edges; notably, the URL is not permanent — I’d prefer something at a more spam-themed domain — and the logo is the generic “PlanetPlanet” one. But it’s up and running in a beta-ish fashion.

Feel free to bookmark, subscribe, post the URL on, etc.; and if you’d like to give it a better home with an A record at a spam-themed domain, drop me a line.

Update, Jan 17: Thanks to John Levine, it now has a permanent home at http://planet.spam.abuse.net/ . After several weeks of operation, I think it’s turning out to be pretty solid, too!

By the way, it also needs more source feeds. If you know of people with blogs, working on/writing about anti-spam (of the email variety), with RSS feeds that work, include the post text, and permit further redistribution of that text, drop us a line and I’ll add them.

Finally, here’s a picture of a Starbucks SPAM(r) Sandwich. (shudder)

Allowing users to have steak knives

This post on the Wikipedia/Seigenthaler spat at Corante.com contains this excellent comment from Wikipedia’s Jimmy Wales:

Imagine that we are designing a restaurant. This restuarant will serve steak. Because we are going to be serving steak, we will have steak knives for the customers. Because the customers will have steak knives, they might stab each other. Therefore, we conclude, we need to put each table into separate metal cages, to prevent the possibility of people stabbing each other.

What would such an approach do to our civil society? What does it do to human kindness, benevolence, and a positive sense of community?

When we reject this design for restaurants, and then when, inevitably, someone does get stabbed in a restaurant (it does happen), do we write long editorials to the papers complaining that “The steakhouse is inviting it by not only allowing irresponsible vandals to stab anyone they please, but by also providing the weapons”?

No, instead we acknowledge that the verb “to allow” does not apply in such a situation. A restaurant is not allowing something just because they haven”t taken measures to forcibly prevent it a priori. It is surely against the rules of the restaurant, and of course against the laws of society. Just. Like. Libel. If someone starts doing bad things in a restuarant, they are forcibly kicked out and, if it”s particularly bad, the law can be called. Just. Like. Wikipedia. I do not accept the spin that Wikipedia “allows anyone to write anything” just because we do not metaphysically prevent it by putting authors in cages.

Irish MEPs on Data Retention

So, the bad news — it appears that the European Parliament has passed the ‘Data Retention’ Directive, introducing requiring EU states to introduce mandatory electronic surveillance of all European citizens.

Tuppenceworth.ie has looked up how the Irish MEPs voted on the Directive. I was appalled to discover that Proinsias De Rossa (Labour) was the only Irish MEP to vote for this surveillance.

I generally give a high preference to Labour when voting, and before that, Democratic Left, and I’ve voted for him several times in the past. However, I think this may be the deal-breaker. I’m extremely disappointed.

By the way if party line was the issue — that didn’t stop Gay Mitchell (Fine Gael), who broke party line on this, saying:

I do not know why this proposal was rushed. The extremely accelerated legislation procedure has meant that there was little time for discussion, and translations were sometimes unavailable. There was also no time for a technology assessment or for a study on the impact on the internal market.

Major credit to him.