Skip to content

Justin's Linklog Posts

links for 2006-05-04

London’s Oyster RFID card to become a full cashless payment system

Apparently, Transport For London <a href="http://software.silicon.com/applications/0,39024653,39150647,00.htm”>are planning ‘e-money’ trials based on their remotely-readable <a href=’http://www.rfidbuzz.com/wiki/Standards/MIFARE’>Oyster RFID cards.

Combine that with Kevin Mahaffey of Flexilis’ talk at Black Hat last year, where he demonstrated apparatus to extend RFID read range from 4-6 inches to approximately 50 feet, and things could get messy. ;)

The slides for that talk are available here (PDF); slide 20 specifically mentions the Hong Kong "Octopus" cashless-payment card.

links for 2006-05-03

Blue Frog List Leaked?

Blue Frog is a company who operates a "Do Not Email" list, on the (optimistic) basis that spammers will vet their lists against it.

Reportedly, it’s been compromised. If this is true, I’m not surprised — as Dr. Aviel Rubin‘s report to the FTC of May 2004 regarding a Do-Not-Email list notes:

The scrubbing approach [to running a D-N-E list] requires that a list of live email addresses exist. While the party owning that list may be well intentioned, it is unlikely that such a valuable list would not leak out. History is replete with insider attacks, as well as external break-ins to highly sensitive sites, such as the Pentagon computers. The Do Not Email Registry represents the kind of prize that attracts hackers. In this case, the prize has monetary value as well. Once the list is exposed, there is no way to undo it.

Also, it’s almost inevitable:

If this service were running for some time, it is more likely than not that the plaintext addresses would leak at some point, given the history of computer security incidents.

Update: it appears, according to this white paper, that the Blue Frog "Do Not Intrude" list is hashed, rather than plain-text. Rubin’s advice still applies:

Without hashing, a compromise of the registry database results in exposure of all of the registered email addresses. This is a total disaster. However, even exposure of a hashed list is a catastrophe. A spammer with a copy of a hashed list of email addresses is able to find out, for any email address, if the address is in the registry. The attacker simply hashes a candidate email address and sees if the hashed value is in the list. This is very powerful. [….]

Hashing provides absolutely no security against a marketer who obtains a scrubbed list and uses that to sell the addresses that were scrubbed by the registry. Whether or not the list is hashed has no impact on a malicious marketer in the scrubbing approach.

SpamAssassin in the Google Summer of Code 2006

Are you a student, and interested in earning $4,500 for contributing to open source, and fighting spam, over the course of the summer?

If so, get thee hence to the Google Summer of Code 2006 site, and propose a project!

Last year, we in SpamAssassin didn’t get it together to mentor SoC projects. This year, however, we have a few prospective mentors (including myself), and a few sample project ideas lined up; we’re all ready to go! Here’s the Student FAQ. Be quick; applications end in a week and a bit.

Here’s hoping we get some interesting submissions ;)

links for 2006-04-29

Single-Letter Google Hits

Here’s what happens when you search for single letters on Google:

Interestingly I got to see the new Google search results page, with the sidebar, once. It must be in the process of rolling out…

links for 2006-04-27

links for 2006-04-26

Peoplefeeds and Quick Aggregation

peoplefeeds is cool.

I’ve been looking for something to can aggregate my Flickr, WordPress blog, and del.icio.us feeds into one venue where I can look up items by tag, in a single page-load.

Suprglu was my leading contender, although they weren’t there yet since they didn’t seem to support importing my blog posts with tags preserved — pretty much everything wound up tagged as "uncategorized". disappointing. :( so I was waiting for them to fix that.

This post by Richard MacManus pointed at another couple of options; 43Things and Peoplefeeds. I hadn’t actually noticed that 43Things was doing this kind of aggregation too; unfortunately as far as I can see, they doesn’t support tag preservation and browsing, so there goes my desired feature. shame.

However, Peoplefeeds was right on target, offering a ‘Unified Tagspace’ and a ‘Search All-Personal-Content’ mechanism. It works nicely, too. Here’s my personal aggregator, combining my Flickr feed, my weblog feed, and my del.icio.us feed into one — and with a unified tag-space; here’s my ‘hiking’ tag, hitting all 3 feeds. Perfect.

One other use for this — I’ve forgotten why I was looking for one of these, but I know I did want one ;) — it can be used to make a "private planet". If you have 3 or 4 feeds that you need to combine into one, this provides a very easy way to do that; just set up a userid at Peoplefeeds for that purpose.

Phishing and Inept Banks

John-Graham Cumming asks, ‘Are Citibank crazy?’:

I blogged a while ago about Thunderbird’s phishing filter trapping a seemingly innnocent mail. Now, a reader has forwarded to me a genuine email from Citibank that he says was trapped by Thunderbird. I’m not going to reproduce the email here because it contains private details of the user, but it is a valid Citibank message.

Thunderbird thinks it’s a scam because Citibank uses one of the oldest phishing tricks in the book. The have a URL displayed in the message then when clicked goes to a totally different URL.

Sadly, this has proven to be really quite common. We’ve investigated using this rule as a worthwhile phish-detection rule in SpamAssassin, several times, and without much luck. In fact, we’ve had to create a FAQ entry for it — since it’s such a superficially-attractive but ultimately useless, idea, many people have had long discussions on our lists about it!

The companies that produce these false positives in their mails include American Express, Bed Bath & Beyond, Universal Studios, Microsoft, Hilton Hotels — and now Citibank.

A couple of other examples from real mails:

  <a href="http://www65.americanexpress.com/clicktrk/Tracking?
    mid=MESSAGEID&msrc=ENG-ALERTS&url=
    https://www.americanexpress.com/estatement/?12345">
    https://www.americanexpress.com/estatement/?12345</a>

  <A HREF="http://echo.epsilon.com/WebServices/EchoEngine/T.aspx?l=ID">
    https://www.hilton.com/en/ww/email/tab_email_subscriptions.jhtml</A>

By the way, it really is quite impressive for a bank as heavily phished as Citibank to still be making this kind of basic mistake in their mail-outs! It reinforces a point I made in a mailing list posting recently:

As far as I can see, the approach taken by pretty much all banks to their online services is simply too bureaucratic, hide-bound, and fundamentally driven by their marketing departments, to ever cope effectively with phishing. :(

(For what it’s worth, I know Citi have some smart techies working there; but the rest of the company needs to start paying attention to them.)

Optimo vs. Bud Rising

Optimo have a new mix up — the First Hour Mix:

Here’s the fourth in a brief series of mixes where we present something a little different. This mix isn’t really a mix in the conventional sense but rather 17 tracks blended together. To us, the first hour of Optimo, or to be more accurate, the ‘Espacio’ part of Optimo (Espacio) is a vital part of the night. It is our chance to play absolutely what we like without thinking about the dancefloor.

It’s a great mix — certainly not dancy, but some really interesting tracks here. The Optimo guys put together some really great music.

In fact, I went to see them play last Saturday — or, at least, myself and a couple of mates tried to. Supposedly, they were supporting <a href="http://www.budrising.ie/juanmclean.html”>The Juan Maclean at the Bud Rising festival over the weekend, but the show was such a shambles, without anyone having a clue when it started or who was on stage at any time, I’m pretty sure we missed their set entirely.

On top of that, it was EUR20 in, and to add insult to injury, the only lager on sale was Budweiser! I mean, I wouldn’t mind that if the "Bud Rising Festival" deal meant free entrance, but charging 20 squids and then cutting off the supply of decent booze as well, is just a crime.

Ah well, the Filthy Dukes were pretty good at least.

Google Calendar

So I’ve been using this for a few days now — and I’m loving it. A calendaring system that deals coherently with the web:

I keep finding little things that make perfect sense, and just feel more logical than what I’ve used elsewhere. This rocks!

One thing still needs work, though: the links to Mapping fail spectacularly, for non-US addresses at least. But that’s pretty minor.

By the way, I have a feeling that Mac.com had parts of this, but really, you had to drink a lot of Apple kool-aid to use that, and I just didn’t go for that. Sorry Jobs fans.

Do you know what would be cool now? If Upcoming.org published venue/location-specific iCal feeds. Oh look, they do! Awesome…

BT DSL’s Daily Disconnects

Argh! This is what happens every day to my DSL connection, at half past 12:

13 Mon Apr 10 12:26:53 2006 PP12 -WARN  SNMP TRAP 2: link down
14 Mon Apr 10 12:26:53 2006 PP12  INFO  ppp_ready: ch:8056167c, iface:80419f14
15 Mon Apr 10 12:26:53 2006 PP12 -WARN  SNMP TRAP 3: link up
26 Tue Apr 11 12:26:46 2006 PP12 -WARN  SNMP TRAP 2: link down
28 Tue Apr 11 12:26:48 2006 PP12  INFO  ppp_ready: ch:8056167c, iface:80419f14
29 Tue Apr 11 12:26:48 2006 PP12 -WARN  SNMP TRAP 3: link up
38 Wed Apr 12 12:26:56 2006 PP12 -WARN  SNMP TRAP 2: link down
40 Wed Apr 12 12:26:58 2006 PP12  INFO  ppp_ready: ch:8056167c, iface:80419f14
41 Wed Apr 12 12:26:58 2006 PP12 -WARN  SNMP TRAP 3: link up
50 Thu Apr 13 12:27:00 2006 PP12 -WARN  SNMP TRAP 2: link down
52 Thu Apr 13 12:27:03 2006 PP12  INFO  ppp_ready: ch:8056167c, iface:80419f14
53 Thu Apr 13 12:27:03 2006 PP12 -WARN  SNMP TRAP 3: link up

Worse than that, it will generally assign a different IP address to the connection when it reconnects! This buggers up any applications that rely on long-lived TCP connections, such as SSH shell logins, tunnels, remote-desktop sessions, and instant messaging; all get disconnected and have to be manually re-set up.

Initially, I thought this may have been a flaky connection. However, it appears not — check out those timestamps; that’s a scheduled, daily event. Also, there have been no other disconnections apart from those.

A discussion on the IIU mailing list revealed the reason — it seems BT Ireland have a policy of resetting their customers’ connections daily. That could be OK, if they came right back up with the same IP — TCP/IP is designed to cope with that, and generally does — but it does not do that. Instead the IP address is reassigned every single time.

This is turning out to be quite a nuisance. Working over the internet requires quite a few VPN connections, tunnels, and remote logins, and having to re-set those up, daily, is turning out to be a pain in the neck.

I’m casting around for hacks to get around this. Right now, I have an assortment of jiggery-pokery involving ssh, a shell script ‘while’ loop, and screen(1), but it’s messy and not working out too well. Ideally, I’d set up another VPN (via IPSec or CIPE), and set it up to reconnect on link failure, then route all other VPNs and remote logins out via that — but I don’t have spare routable IPs to do this with. Anyone got any good suggestions?

By the way, it’s worth noting that their FAQ fails to mention this, instead giving some incorrect information about my IP being ‘removed’ when my web browsing session ends:

Is it a fixed IP?

No, the product is set up with dynamic IP Addressing. This means that every time you open your browser you will be allocated a different IP address for the duration of that session. When the session ends the IP Address is removed.

That is incorrect — this has nothing to do with web browsing sessions.

To be honest, I’d prefer not to have to switch ISPs to get away from this brokenness — the rest of the service is quite nice, good pings, good throughput, no other disconnections or outages — but this is quite a problem for someone using BT Broadband for telecommuting purposes. :(

My QuitMeter

I gave up smoking last year on May 26 — that anniversary isn’t too far away. Here’s how much money I’ve saved, courtesy of QuitMeter.com:


QuitMeter Counter courtesy of www.quitmeter.com.

Wow — I could buy myself another iPod! ;)

Software Patenting and “Hot” Fields

Paul Graham’s recent essay on his experience with software patenting has been making the rounds recently.

Now Kevin Marks has commented. Worth reading, since he demonstrates nicely the kind of crap you see in a ‘hot’ field, such as video (which he worked on with Apple’s Quicktime):

I broadly agree with Paul Graham’s essay on Software Patents, but I do think he underestimates the damage from patent trolls, and from what he calls the mafia-like behaviour of some patent holders. Paul has been lucky in the field he has worked in, but in the Audio and Video area there are many patent thickets. … While I was at Apple on QuickTime, there was a steady stream of patent trolls claiming that Apple should pay them royalties; enough to keep several lawyers busy, and a lot of engineers spending time working on prior art evidence demonstrations. Several potential features were excluded from QuickTime due to patent thickets. The obvious one was the Unisys LZW patent that encumbered GIF, but there were other more subtle pressures that meant adopting open source codecs was discouraged. Working on the patent license agreements for MPEG meant that technology ready to ship was deferred pending legal agreement on more than one occasion.

In my experience, that’s what happens — once a field becomes "hot", patent trolls and other nuisance "inventors" start appearing en masse, and then you’ve got to waste a lot of time dealing with that crap.

RSS Feeds for Events in Dublin

So, now that I’m back in Dublin, I’ve taken a quick look around for ways to keep up to date on upcoming live gigs — and found that the situation, frankly, sucks. In particular, almost none of the sites are offering RSS or Atom feeds yet.

Having said that, Waxy and Leonard‘s Upcoming.org is doing quite nicely for the Dublin metro area:

And lots of credit for the promoter, MCD, who seem to be just about the only Irish listings site who offer RSS:

This is fantastic, but — naturally — they don’t cover events put on by their competitors. ;)

Apart from that, it’s pretty shoddy. Lots of late-90’s-looking websites out there, and no feeds in sight. Thankfully, <a href="http://feed43.com/”>Feed43, and some perl scripting, is on hand to allow me to take matters into my own hands.

Entertainment Ireland offer a pretty good music news section — but sans feed. Feed43 saves the day:

And, surprisingly, Ticketmaster, of all sites, is turning out to be a great way to find out what’s on in Dublin, listing pretty much all ticketed events in a nice, clean, succinct format. Unfortunately, the highest location resolution it offers for Ireland is the country as a whole. However, this can be worked around by subscribing to individual venues, such as Crawdaddy or The Village. (This has a happy side-effect of narrowing down the types of music — I can skip finding out that The Eagles are playing, since they won’t be playing at Crawdaddy ;)

For some reason, though, Ticketmaster haven’t got around to offering their own RSS feeds. Not a problem — in response I’ve hacked up tm2rss.cgi, a little script which scrapes the venue pages and produces RSS:

For other venues, simply take the venue URL (for example, http://www.ticketmaster.ie/venue/198641 for The Village), add the numeric venue ID in place of NNNNN in this URL: http://taint.org/scraped/tm2rss.cgi?v=NNNNN , then use that as the Feed URL in your feed reader.

A Gotcha With perl’s “each()”

It’s my bi-monthly perl blog entry, to earn my place on planet.perl.org! ;)

Here’s an interesting "gotcha". Take this code:

    perl -e '%t=map{$_=>1}qw/1 2 3/;
    while(($k,$v)=each %t){print "1: $k\n"; last;}
    while(($k,$v)=each %t){print "2: $k\n";}'

In other words, iterate through all the key-value pairs in %t once, then do it again — but exit early in the first loop.

You would expect to get something like this output:

    1: 1
    2: 1
    2: 3
    2: 2

instead, you see:

    1: 1
    2: 3
    2: 2

The "1" entry in the second loop is AWOL. Here’s why — as "perldoc -f each" notes:

There is a single iterator for each hash, shared by all "each", "keys", and "values" function calls in the program

That’s all "each" calls, throughout the entire codebase, possibly in a different class entirely. Argh.

The workaround: reset the iterator using "keys" between calls to "each":

    perl -e '%t=map{$_=>1}qw/1 2 3/;
    while(($k,$v)=each %t){print "1: $k\n"; last;}
    keys %t;
    while(($k,$v)=each %t){print "2: $k\n";}'

This got us in SpamAssassin — <a href="http://issues.apache.org/SpamAssassin/show_bug.cgi?id=4829″>bug 4829.

To be honest, having to call "keys" after the loop is kludgy — as you can see if you check the patch in bug 4829 there, we had to change from a "return inside loop" pattern to a "set variable and exit loop, reset state, then return" pattern. It’d be nice to have a scoped version of each(), instead of this global scope, so that this would work:

    perl -e '%t=map{$_=>1}qw/1 2 3/;
    { while(($k,$v)=scoped_each %t){print "1: $k\n"; last;} }
    # that each() iterator is now out of scope, so GC'd;
    # the next call uses a new iterator, starting from scratch
    { while(($k,$v)=scoped_each %t){print "2: $k\n";} }'

Scoping, of course, has the benefit of allowing "return early" patterns to work; in my opinion, those are clearer — at the least because they require less lines of code ;)

Feed43 Rocks

I’ve just given Feed43 a go. It’s very nifty.

Basically, it’s a pattern-based HTML-to-RSS scraper — similar to my own Sitescooper in that respect ;) — but built entirely as a web app.

Until now, I’ve been hacking up scrapers one by one, using either Sitescooper or WWW::Mechanize, run from cron, and putting the output up on taint.org; for example, http://taint.org/scraped/ has the public ones: Threadless, Perry Bible Fellowship, and White Ninja comics.

Today, I came across a case where I wanted a new RSS feed, and since I’d been hearing of Feed43, thought I’d give it a try, to save running yet another cron on our server. It was reasonably simple, although still required a fair bit of knowledge of the concepts of scraping via pattern matching against HTML; but the UI was fantastic, with everything previewed using a clean AJAX UI, and within 3 minutes I had a new feed.

For the curious — the feed was for TCAL’s Ireland category , and the results are here: Feed43 (Feed For Free) : TCAL – Ireland. (go ahead and sign up if you like ;)

New web pattern, by the way — there’s a trend towards using "secret URLs" instead of username/password authentication for the kind of "trivial" auth task, like editing feed-scraper details. Good idea.

Public Transit == Crime

I just received a very nice info-pack through my front door regarding the new Dublin Metro line, which is in planning at the moment; it seems they’re soliciting feedback from residents near the proposed routes. Nicely done.

Right now, Dublin has an embarrassment of good public transit, at least when compared to my previous home in Orange County. There, public transit is actively campaigned against.

My favourite claim: that it ‘increases crime’ — in other words that poor people from Santa Ana would come down to Irvine and steal stuff, which they couldn’t do with vehicular transport, for some reason.

The OC Weekly thought it was pretty funny, too — and an opposing group comprehensively debunked it. Still, it seemed to work; while I was living in Irvine, I got to see the Centerline proposal gradually whittled down until it was finally killed off. During that time, in contrast, Dublin built the Luas.

Unfortunately it doesn’t exactly go where I want to go, but you can’t always have everything. ;)

DSL=GOT

finally!

Coffee and Trivia

Just got a new cafetiere, so I can finally switch back from instant coffee to the real deal again for my morning coffee. My productivity has doubled. Still no DSL, though — early next week is the current estimate, and I can hardly wait.

I went to a pub quiz last night with mates Macker, Tom and Alan — a benefit for a new Dublin theatre company, I think. The prizes were:

  • First prize: several 50 Euron vouchers for various Dublin eateries
  • Second prize: two fancy scarves, a Nivea women’s cosmetics kit, and a very metrosexual Nivea bath kit for a guy
  • Third prize: 4 bottles of nice wine

We did very nicely — "aglet" was correctly defined for instance — but not nicely enough. Put it this way: guess who’s wearing Nivea deodorant?

Buying Consumer Electronics Online, in Ireland?

Hey lazyweb, hear my plea! What are my options for buying consumer electronics online, now that I’m back in Ireland?

I like online shopping. I dislike Argos, and I really hate Dixons, Currys and all the rest of the consumer-electronics high-street operations. Get me on the net and out of the nasty little shops and I’m happy. ;)

All in all, I’m a bit of an Amazon fan. However, now that I’m back in Ireland, I’ve been brought back to earth with a bang on that count; the prices are OK for items at both Amazon.com and .co.uk — but shipping is turning out to be a total disaster.

Basically, I’ve put in two orders, paid through the nose for basic shipping, and neither has turned up. For example — I ordered this phone a week and a half ago, on the 9th March, ponying up UKP 27 for the item — and a painful UKP 7 for shipping by International Mail.

Delivery estimate on ordering was for between 5 and 7 days — 14th to the 16th March. That was long enough — but it still hasn’t turned up, and Amazon.co.uk is still claiming that that is the current estimate, despite the 16th of March being 4 days ago ;)

On top of that, it appears they don’t offer any way to track the packages using that shipping method, so who knows what’s happening with the damn thing right now.

If I compare that with an order I made at Amazon.com last November, in which I nabbed a handy FM transmitter for my iPod — in that case, I got it shipped by plain old US Postal Service for $4.51, which was handily discounted as Super Saver Shipping. That — as with pretty much all my Amazon.com orders — arrived in 3-4 days, and for a hell of a lot cheaper too. If I’d had to pay for shipping (which I didn’t anyway), $4.51 vs UKP 7 works out as a third of the price, no less.

I’m guessing this is mainly down to Amazon.co.uk being shoddy in terms of how it deals with shipping to Ireland, and there are probably sites that use better-quality shipping partners.

Surely there must be better deals with vendors in Ireland, or even elsewhere in the Eurozone? Anyone know? Please drop us a line in the comments!

Update: the items arrived — 14 days after ordering. This is a moot point now, though, since Amazon.co.uk are no longer selling ‘PC & Video Games, Toys & Games, Gift items, Electronics & Photo and Home & Garden items’ to Ireland; I guess it was easier to give up on the Irish market for now. Very disappointing — but I’m waiting to see what happens next.

VAST.com

So, my new employer just launched today!

It’s a new search service, VAST.com. As the blog says, ‘we are building a search service that extracts classified ads from across the web, structures them, and then makes them available via an open REST API for commercial and non-commercial uses.’

Now you can see why I’m excited ;)

Greetings from 1996!

    --> Sending: ATZ
    ATZ
    OK
    --> Sending: ATQ0 V1 E1 S0=0 &C1 &D2
    ATQ0 V1 E1 S0=0 &C1 &D2
    OK
    --> Sending: ATH1
    ATH1
    OK
    --> Modem initialized.
    --> Sending: ATDT1892150150
    --> Waiting for carrier.
    ATDT1892150150
    CONNECT 45333

45 measly kilobits per second! This is incredibly painful — and expensive at 5 cents a minute! I briefly considered getting around it by hiring a 3G data-card for the couple of weeks before my DSL is activated — but that too is insanely overpriced.

Hurry up, DSL…

Disclosure

As of yesterday, I have a new day-job.

I won’t be working on email spam as part of the job, which is an interesting turn of events. However, I’ll be sticking with the open-source Apache SpamAssassin project, and keeping up the rate of work on that [*].

I’m not sure how much I can blog about the new place just yet, but I will say it’s certainly looking like it’ll be very interesting work ;)

[*: modulo the next couple of weeks while I’m waiting for my bloody DSL to be installed. argh!]

Apple Attempting to Patent RSS Aggregation

Miguel de Icaza quotes Dave Winer, pointing out two patent applications from Apple which seem intended to grab major chunks of the feed syndication space as Apple "IP".

The first application is news feed viewer, 20050289147, filed April 13 2005:

A computer-implemented method for displaying a plurality of articles, the method comprising: storing a first feed bookmark in a folder, the first feed bookmark indicating a first feed, the first feed comprising a first plurality of articles; storing a second feed bookmark in the folder, the second feed bookmark indicating a second feed, the second feed comprising a second plurality of articles; aggregating the first feed and the second feed to form a third feed; and displaying the third feed.

I think there were many RSS readers that implemented this, and others from the patent application, before April 2005. I know Liferea, the one I use, has had UI-level aggregation since September 2004, with its VFolders.

Next, news feed browser, 20050289468, filed April 13 2005. This one contains a wide range of claims, but here’s one that stands out as particularly trivial:

A computer-implemented method for discovering a feed, the method comprising: receiving a request to display a file; determining that the file includes relationship XML; determining that a Uniform Resource Locator (URL) within the relationship XML indicates a file that comprises the feed; and displaying one of a group containing the feed and a link to the feed.

That’s pretty much RSS autodiscovery, as described in 2002.

The listed inventors in both patents are: Kahn, Jessica; (San Francisco, CA) ; Alfke, Jens; (San Jose, CA) ; Wilkin, Sarah Anne; (Menlo Park, CA) ; Howard, Albert Riley JR.; (Sunnyvale, CA) ; Forstall, Scott James; (Mountain View, CA) ; Lemay, Stephen O.; (San Francisco, CA) ; Melton, Donald Dale; (San Carlos, CA) ; Loofbourrow, Wayne Russell; (San Jose, CA).

Thanks, Apple! and thanks, "inventors"!

It’s important to note that this is still in the application stage, and as such can be invalidated, or narrowed down to a saner level, by using the techniques described here. I strongly recommend that people working in the syndication field with sufficient knowledge and expertise who feel strongly enough about this should spend a little time doing so, before the patent is issued and it becomes a multi-million-dollar task to invalidate it. (however, IANApatentL of course ;)

We Win

ongoing: The ASF Server:

Tim Bray: Which Apache project burns the most resources?

Mads: Spamassassin by a wide margin. […]

Heh, we win ;)

Helios, the Zones server, has been an incredible resource for us. SpamAssassin isn’t a traditional open-source software project in one respect: we use a lot of centralized "phone home" infrastructure to support rule and score generation. Having a virtualized server of this quality and horsepower to use for this has been fantastic.

(thanks to John O’Shea for the pointer!)

IBM Patents Closed-Loop Confirmation

Another day, another absurd IBM software patent. Via the IP list, here’s <a href="http://patft.uspto.gov/netacgi/nph-Parser?Sect1=PTO2&Sect2=HITOFF&u=/netahtml/search-adv.htm&r=302&f=G&l=50&d=PTXT&s1=ISYMD-20060221&p=7&OS=ISD/02/21/2006&RS=ISD/02/21/2006″>United States Patent 7,003,497:

  1. A method for confirming an electronic transaction, comprising the steps of: performing an electronic transaction between a first party and a second party; providing, by the first party to the second party, contact information of a third party service provider associated with the first party; contacting, by the second party, the third party service provider to obtain a location of a predetermined, private mailbox associated with the first party; sending, by the second party, a request for confirmation of the electronic transaction to the predetermined, private mailbox associated with the first party; accessing the private mailbox by the first party; and sending, by the first party, a reply message to the request for confirmation to thereby confirm authorization of the electronic transaction, wherein information regarding the private mailbox is not communicated to the second party during the electronic transaction.

There’s lots of waffle in the background section about this being for electronic e-commerce transactions, but that claim, and claims 2 and 3 at least, are easily sufficiently broad to cover simple "confirmed opt-in" email subscription systems — in other words, the system whereby a potential newsletter subscriber clicks on a link in order to "confirm" that they want to subscribe to a newsletter. That’s the current best practice email subscription method used by pretty much everyone.

Filed December 31, 2001. There was plenty of prior art before this date, but who would want to go up against IBM, no less, to attempt to get this invalidated, especially now that it’s been issued?

Thanks USPTO, you’re doing a heck of a job!

US Things I Miss

So, I’ve been back in Ireland for several weeks now. How goes the culture shock? Well, let’s make a list of the stuff I’m missing from California:

  • C, who’s still back there finishing up her contract. Hurry up, C!

  • All my friends I left behind in the US :( Come visit!

  • The weather (well duh)

  • Trader Joes: low-cost, high-quality organic and near-organic food

  • The excellent Mexican and Southern food. Mmm, Taco Mesa

  • Super-cheap cocktails — although having good Guinness makes up for a lot of this

  • The back country — desert, mountains, snow, national parks. Ireland may have more surviving history dotted about, but it’s just flat. I miss the mountains

  • Netflix — haven’t spotted a replacement for this yet. There are companies in Ireland that use a similar idea, but it appears every one just about manages to screw it up and render it useless, generally by introducing throttling, late fees, or slow turnaround. meh

  • The way my Irish accent meant I could get away with pretty much anything. That trick doesn’t work in Ireland ;)

In other news: the broadband choices situation has pretty much gone to shit.

It turns out that all the good options are quite dependent on local-loop unbundling, which — somehow — still hasn’t gotten around to my local exchange. As a result, guess who’s going to be stuck on the wrong end of dialup, no less, for "2 to 3 weeks" until Eircom deign to switch on the bitstream access for my new BT-resold ADSL connection? Here’s hoping there’s a neighbour with broadband and wifi when I move back in. Joy.

DearAOL and GoodMail

Things have really been heating up recently around the AOL/Goodmail "pay to send" CertifiedMail scheme — the EFF and a host of other groups have launched dearaol.com, stating:

This system would create a two-tiered Internet in which affluent mass emailers could pay AOL a fee that amounts to an "email tax" for every email sent, in return for a guarantee that such messages would bypass spam filters and go directly to AOL members’ inboxes. Those who did not pay the "email tax" would increasingly be left behind with unreliable service. Your customers expect that your first obligation is to deliver all of their wanted mail, and this plan is a step away from that obligation.

While I dislike this proposal, too, as far as I can tell, AOL actually have pretty reasonable intentions with this program — nowhere near as bad as the DearAOL.com site makes out.

However, they’re doing a really really crappy job of getting this information out there, or committing to reasonable limits on the program, such as announcing that they will use it only for transactional emails, as Yahoo! have done.

I’d strongly recommend reading Carl Hutzler’s posting on the subject. Carl was AOL’s head of anti-spam operations until last year, so he really knows what he’s talking about, and he lays it out clearly — a lot more clearly than any corporate statements from AOL do. His blog contains a fair bit more on the subject, too.

But seriously — why isn’t there a press release on the AOL site about this scheme? Some front-channel communication about now might be useful, I’d suggest, before things really get hairy — this crapstorm is coming about partly because AOL’s comments are all filtering out in drips and drabs via third parties, and (AOLers say) are being misconstrued and misrepresented in the process. It’s a classic case of missing the cluetrain.

I’d also really encourage the EFF people to tone done the rhetoric; statements like "senders will have no guarantee that their emails will be delivered" is scare-mongering, given that SMTP email already provides no such guarantee.

Update: wow, MoveOn went really overboard — "threatening the Internet as we know it … The very existence of online civic participation and the free Internet as we know it are under attack." OMG the sky is falling!

Side Issue: The Spam Definition

Also, another note to EFF: defining spam as "whatever you don’t want to read" is a terrible mistake to make. That confuses a good, clear, enforceable and automatable definition of spam — unsolicited bulk email — and makes it effectively unenforceable by law, unpoliceable by ISPs, impossible to detect automatically, and incompatible with existing, effective EU and Australian legislation.

Listen to your own Chairman of the Board; he’s right on this count.

PS: any luck fixing up the non-confirmed signups issue? Last time I checked I could still subscribe any address to the EFF Action Alerts without a cross-check, which is not a good thing.

Another script: goog-love.pl

A quick hack —

goog-love.pl – find out where your site’s google juice comes from

This script will grind through your web site’s "access.log" file (which must be in the "combined" log format). It’ll pick out the top 100 Google searches found in the referer field, re-run those searches, and determine which ones are giving your website all the linky Google love — in other words, the searches that your site ‘wins’ on.

The output is in plain text and a chunk of HTML.

usage:

goog-love.pl sitehost google-api-key < access.log > out.html

e.g.

cat /var/www/logs/taint.org.* | goog-love.pl \
  taint.org 0xb0bd0bb5yourgoogleapikeyhere0xdeadbeef | tee out.html

NOTE: this script requires the SOAP::Lite module be installed. Install it using apt-get install libsoap-lite-perl or cpan SOAP::Lite. It also requires a Google API key.

For example, here are the current results for this site. You can immediately see some interesting stuff that’s not immediately obvious otherwise, such as my site being the top hit for [<a href="http://www.google.com/search?num=10&q=beardy%20justin”>beardy justin] ;)

Download here (5 KiB perl script).

Notes:

  • if you see a lot of "502 Bad Gateway" errors, it’s probably over-zealous anti-bot ACLs on Google’s side. Try from another host.

  • Read the comments for notes on a bug in recent releases of SOAP::Lite; please let me know if you hear of them getting fixed ;)