Skip to content

Category: Uncategorized

links for 2006-06-27

links for 2006-06-21

links for 2006-06-20

links for 2006-06-19

Vodafone Ireland’s flat rate mobile data card

Adrian Weckler posts details of Vodafone Ireland's new flat price datacard; costing 50 Euros per month, including VAT; fully flat rate (hooray, something useful at last!); and they claim that they'll be rolling out HSDPA, which offers 1.2Mbps to 11Mbps rates, 'starting in Dublin in October'.

Those are great numbers, but further info seems thin on the ground; they haven't bothered updating their own website yet, amazingly.

Anyone got further info? What rates does it offer right now? How would one order such a beast?

Holidaze

Quick note -- I'm off on vacation next week -- so I probably won't read any email while I'm there ;) Talk to you after the 17th.

links for 2006-06-07

Running Dapper

I took the plunge over the weekend, and live-upgraded the new 'Dapper Drake' Ubuntu release -- ouch. Here's the two key lessons I learned:

  • Don't run "grub-install" in a misremembered attempt to update the current GRUB boot menu 'menu.lst' file with the new kernel; sadly, this will quietly remove important details from your old menu.lst, such as "initrd" lines, rendering those kernels unbootable. Moral: ensure brain is in gear before meddling with MBRs!

  • If you're a Kubuntu user, watch out. Ensure you run apt-get install ubuntu-base ubuntu-desktop -- bringing the entirety of GNOME up to date -- as well as apt-get install kubuntu-desktop after the upgrade; it appears that some part of a new hotplugging subsystem is not included as a dependency of kubuntu-desktop. Failure to do this results in an inability to use USB/hotpluggable devices, including internal devices like the Synaptics touchpad. No pointer devices (mice or touchpads) means no X server at boot, which is always a little annoying.

Some day I'll just do things the right way, and do a fresh-from-CD install instead. Ah well. The good stuff: the new kernel, or possibly Xorg, is proving to be a lot speedier -- window updates are noticeably smoother; and the new Ubuntu GNOME theme is similarly tasty.

SpamAssassin advisory CVE-2006-2447

CVE 2006-2447, in which Radoslaw Zielinski spotted a nasty in spamd's 'vpopmail' support in pretty much all recent versions of Apache SpamAssassin.

If you use spamd with vpopmail, go read the advisory and determine if you need to take action. Not many people will need to, I think; it's a very rare setup. Still, it's important to get the warning out there anyway.

The irony is that the bug is triggered partly by the "--paranoid" switch. This was intended to increase security, by increasing paranoia when possibly-unsafe situations arose -- hence providing a great demonstration of how the addition of optional code paths, even in the best intentions, can reduce security by allowing bugs to creep in unnoticed.

links for 2006-06-06

Web x, where x != 2.0

Regarding the O'Reilly/CMP "Web 2.0 (SM)" trademark shitstorm, Sean McGrath humourously suggested a workaround -- using a different revision number instead of "2.0", specifically e, 2.71....

However, it's not quite that simple in many jurisdictions, apparently. It seems that trademark law -- in the US, at least -- allows trademarks which include a number to also cover uses within roughly plus or minus 10 of that number. In other words, CMP's application will cover the range from Web -8.0 (SM) (assuming negative numbers are included?) to Web 12.0 (SM).

So much for "Web 3.0", "Web 2.1", "Web 2.71...", and so on. Back to the drawing board, Sean! ;)

(disclaimer: IANAL, of course. Credit to Craig for that tidbit.)

Update: doh, got the value of e wrong...

links for 2006-06-01

links for 2006-05-31

Blog Spam, and a ‘nofollow’ Post-Mortem

An interesting article on blog-spam countermeasures -- Google's embarrassing mistake. Quote:

I think it's time we all agreed that the 'nofollow' tag has been a complete failure.

For those of you new to the concept, nofollow is a tag that blogs can add to hyperlinks in blog comments. The tag tells Google not to use that link in calculating the PageRank for the linked site. [...]

Since its enthusiastic adoption a year and a half ago, by Google, Six Apart, WordPress, and of course the eminent Dave Winer, I think we can all agree that nofollow has done -- nothing. Comment spam? Thicker than ever. It's had absolutely no effect on the volume of spam. That's probably because comment spammers don't give a crap, because the marginal cost of spamming is so low. Also, nofollow-tagged links are still links, which means that humans can still click on them -- and if humans can click, there's a chance somebody might visit the linked sites after all.

I agree. At the time, I pointed at this comment from Mark Pilgrim:

Spammers have it in their heads now that weblog comments are a vector to exploit. They don't look at individual results and tweak their software to stop bothering individuals. They write generic software that works with millions of sites and goes after them en masse. So you would end up with just as much spam, it would just be displayed with unlinked URLs.

Spammers don't read blogs; they just write to them.

I still think he was spot on.

However, one part of the 'Google's embarrassing mistake' article is a red herring -- I think the chilling effect on "nonspam links" is not to be worried about; as Jeremy Zawodny said, life's too short to worry about dropping links purely in the hopes of giving yourself Page Rank. I don't know if I really want links that people are leaving purely for that reason. ;)

In fact, I wouldn't be surprised to hear that Google's crawler starts treating "nofollow" links as mildly non-spammy in a future revision, due to their wide use in wikis, blogs etc.

To be honest, though -- I don't see the problem of blog-spam much anymore. As I said here:

[Weblog] comment spam should be a lot easier to deal with than SMTP spam. ... With weblog comments, you control the protocol entirely, whereas with SMTP you're stuck with an existing protocol and very little "wiggle room".

On my WordPress weblog [ie. here] -- which, admittedly, gets only about 1/4 of the traffic plasticbag.org does -- I've instituted a very simple check stolen from Jeremy Zawodny. I simply include a form field which asks the comment poster for my first name, and if they fail to supply that, the comment is dropped. In addition, I've removed the form fields to post directly, requiring that all comments are previewed; this has the nice bonus of increasing comment quality, too.

Those are the only antispam measures I'm using there, and as a result of those two I get about 1 successful spam posted per week, which is a one-click moderation task in my email. That's it.

The key is to not use the same measures as everyone else -- if every weblog has a different set of protocols, with different form fields asking different simple questions, the only spammers that can beat that are the ones that write custom code for your site -- or use human operators sitting down to an IE window.

Trackbacks, however -- turn that off. The protocol was designed poorly, with insufficient thought given to its abuse potential; there's no point keeping it around, now that it's a spam vector.

Finally, a "perfect" solution to blog spam, while allowing comments, is unachievable. There will always be one guy who's going to sit down at a real web browser to hand-type a comment extolling the virtues of some product or another. The goal is to get it to a level where you get one of those per week, and it's a one-click operation to discard them.

(Update: This story got Slashdotted! The poor server's been up and down repeatedly -- looks like it needs an upgrade. In the meantime, WP-Cache has proven its weight in gold; recommended...)

Retroactive Tagging With TagThe.Net

Hacky hack hack.

Ever since I enabled tags on taint.org, I've been mildly annoyed by the fact that there were thousands of older entries deprived of their folksonomic chunky goodness. A way to 'retroactively tag' those entries somehow would be cool.

Last week, Leonard posted a link on his linkblog to TagThe.net, a web service which offers a nifty REST API; simply upload a chunk of text, and it'll suggest a few tags for that text, like this:

echo 'Hi there, I am a tag-suggesting robot' | curl "http://tagthe.net/api/?text=`urlencode`"
<?xml version="1.0" encoding="UTF-8"?>
<memes>
  <meme source="urn:memanage:BAD542FA4948D12800AA92A7FAD420A1" updated="Tue May 30 20:20:39 CEST 2006">
    <dim type="topic">
      <item>robot</item>
    </dim>
    <dim type="language">
      <item>english</item>
    </dim>
  </meme>
</memes>

This looked promising.

Anyway, I've now implemented this -- it worked great! If you're curious, here's details of how I did it. It's a bit hacky, since I'm only going to be doing this once -- and very UNIXy and perlish, because that's how I do these things -- but maybe somebody will find it useful.

How I Retroactively Tagged taint.org

This weblog runs WordPress -- so all the entries are stored in a MySQL database. I took the MySQL dump of the tables, and a quick script figured out that out of somewhere over 1600-ish posts, there were 1352 that came from the pre-tag era, requiring tag inference. A mail to the TagThe.Net team established that they were happy with this level of usage.

I grepped the post IDs and text out of the SQL dump, threw those into a text file using the simple format 'id=NNN text=SQLHTMLSTRING' (where SQLHTMLSTRING was the nicely-escaped HTML text taken directly from the SQL dump), and ran them through this script.

That rendered the first 2k of each of those entries as a URL-encoded string, invoked the REST API with that, got the XML output, and extracted the tags into another UNIXy text-format output file. (It also added one tag for the 'proto-tag' system I used in the early days, where the first word of the entry was a single tag-style category name.)

Next, I ran this script, which in turn took that intermediate output and converted it to valid PHP code, like so:

cat suggestedtags | ./taglist-to-php.pl  > addtags.php
scp addtags.php my.server:taint.org/wp-admin/

The generated page 'addtags.php' looks like this:

<?php
  require_once('admin.php');
  global $utw;
  $utw->SaveTags(997, array("music","all","audio","drm-free",
      "faq","lunchbox","destination","download","premiere","quote"));
  [...]
  $utw->SaveTags(998, array("software","foo","swf","tin","vnc"));
  $utw->SaveTags(999, array("oses","eek","longhorn","ram",
    "winsupersite","windows","amount","base","dog","preview","system"));
?>

Once that page was in place, I just visited it in my (already logged in) web browser window, at http://taint.org/wp-admin/addtags.php, and watched as it gronked for a while. Eventually it stopped, and all those entries had been tagged. (If I wasn't so hackish, I might have put in a little UI text here -- but I didn't.)

The results are very good, I think.

A success: http://taint.org/tag/research has picked up a lot of the interesting older entries where I discussed things like IBM's Tieresias pattern-recognition algorithm. That's spot on.

A minor downside: it's not so good at nouns. This entry talks about Silicon Valley and geographical insularity, and mentions "Silicon Valley" prominently -- one or both of those words would seem to be a good thing to tag with, but it missed them.

Still, that's a minor issue -- the tags it has suggested are generally very appropriate and useful.

Next, I need to find a way to auto-generate titles for the really old entries ;)

links for 2006-05-29

Web 2.0 and Open Source

A commenter at this post on Colm MacCarthaigh's weblog writes:

I guess I still don't understand how Open Source makes sense for the developers, economically. I understand how it makes sense for adapters like me, who take an app like Xoops or Gecko and customize it gently for a contract. Saves me hundreds of hours of labour. The down side of this is that the whole software industry is seeing a good deal of undercutting aimed at sales to small and medium sized commercial institutions.

Similarly, in the follow-up to the O'Reilly "web 2.0" trademark shitstorm, there's been quite a few comments along the lines of "it's all hype anyway".

I disagree with that assertion -- and Joe Drumgoole has posted a great list of key Web 2.0 vs Web 1.0 differentiators, which nails down some key ideas about the new concepts, in a clear set of one-liners.

Both open source software companies, and "web 2.0" companies, are based on new economic ideas about software and the internet. There's still quite a lot of confusion, fear and doubt about both, I think.

Open Source

As I said in my comment at Colm's weblog -- open source is a network effect. If you think of the software market as a single buyer and seller, with the seller producing software and selling to the buyer, it doesn't make sense.

But that's not the real picture of a software market. If you expand the picture beyond that, to a more realistic picture of a larger community of all sorts of people at all levels, with various levels interacting in a more complex maze of conversation and transactions, open source creates new opportunities.

Here's one example, speaking from experience. As the developer of SpamAssassin, open source made sense for me because I could never compete with the big companies any other way.

If I had been considering it in terms of me (the seller) and a single customer (the buyer), economically I could make a case of 'proprietary SpamAssassin' being a viable situation -- but that's not the real situation; in reality there was me, the buyer, a few 800lb gorillas who could stomp all over any puny little underfunded Irish company I could put together, and quite a few other very smart people, who I could never afford to employ, who were happy to help out on 'open-source SpamAssassin' for free.

Given this picture, I'm quite sure that I made the right choice by open sourcing my code. Since then, I've basically had a career in SpamAssassin. In other words my open source product allowed me to make income that I wouldn't have had, any other way.

It's certainly not simple economics, is a risk, and is complicated, and many people don't believe it works -- but it's viable as an economic strategy for developers, in my experience. (I'm not sure how to make it work for an entire company, mind you, but for single developers it's entirely viable.)

Web 2.0

Similarly -- I feel some of the companies that have been tagged as "web 2.0" are using the core ideas of open source code, and applying them in other ways.

Consider Threadless, which encourages designers to make their designs available, essentially for free -- the designer doesn't get paid when their tee shirt is printed; they get entered into a contest to win prizes.

Or Upcoming.org, where event tracking is entirely user-contributed; there's no professional content writers scribbling reviews and leader text, just random people doing the same. For fun, wtf!

Or Flickr, where users upload their photos for free to create the social experience that is the site's unique selling point.

In other words -- these companies rely heavily on communities (or more correctly certain actors within the community) to produce part of the system -- exactly as open source development relies on bottom-up community contribution to help out a little in places.

The alternative is the traditional, "web 1.0" style; it's where you're Bill Gates in the late 90's, running a commercial software company from the top down.

  • You have the "crown jewels" -- your source code -- and the "users" don't get to see it; they just "use".
  • Then they get to pay for upgrades to the next version.
  • If you deal with users, it's via your sales "channels" and your tech support call centre.
  • User forums are certainly not to be encouraged, since it could be a PR nightmare if your users start getting together and talking about how buggy your products are.
  • Developers (er, I mean "engineers") similarly can't go talking to customers on those forums, since they'll get distracted and give away competitive advantage by accidentally leaking secrets.
  • Anyway, the best PR is the stuff that your PR staff put out -- if customers talk to engineers they'll just get confused by the over-technical messages!

Yeah, so, good luck with that. I remember doing all that back in the '90's and it really wasn't much fun being so bloody paranoid all the time ;)

URLs:

(PS: The web2.0 companies aren't using all of the concepts of open-source, of course -- not all those web apps have their source code available for public reimplementation and cloning. I wish they were, but as I said, I can't see how that's entirely viable for every company. Not that it seems to stop the cloners, anyway. ;)

links for 2006-05-26

links for 2006-05-25

Pam on the AIDS/LifeCycle

My mate Pam is cycling in this year's AIDS/LifeCycle -- for a week from June 4 to 10, she'll be cycling from San Francisco to LA, for charity. That's 585 miles. Since she bought her bike to do this ride, she's clocked up a terrifying 2040 miles. Blimey.

It's for a good cause -- go on, make a donation!

links for 2006-05-23

Poll: keep ‘Fixing Email Weblog’ in Planet Antispam?

I added the Fixing Email weblog to Planet Antispam a while back -- however, I'm not entirely sure at this stage that its content (which is seems to be primarily news syndication) fits with the "planet" concept (which is primarily intended for first-person posts).

So -- quick poll. Let me know what you think, pro or con, Planet readers: should I remove the Fixing Email feed from that site?

Update: that was a pretty resounding 'yes'. Done!

links for 2006-05-22

Dear Recruiters

Dear Recruiters,

If you're going to (a) scrape my CV page from my website, then (b) spam me, unsolicited, offering to represent me for jobs I don't want in places I don't live, in explicit contravention of the terms of use [*] of that document -- here's a tip.

Don't compound the problem by asking me to resend the document in bloody Microsoft Word format. FFS.

([*]: Those terms were, of course, added in an attempt to stem the tide of recruiter spam. Thanks to Colm MacCarthaigh for the idea...)

links for 2006-05-18

Bebo’s “Irish Invasion”

Reading this post at Piaras Kelly's blog, I was struck by something -- I never realised quite how bizarre the situation with Bebo is. If you check out the Google Trends 'country' tab, Ireland is the only country listed -- meaning that search volume for "bebo" is infinitesimal, by comparison, elsewhere! (Update: Ireland was the only country listed, because the URL used limited it to Ireland only. However, the point is still valid when other countries are included, too ;)

It is also destroying Myspace as a search term on the Irish internet. (Update: also fixed)

As a US-based company, they must be mystified by all this attention -- the Brazilian invasion of Orkut has nothing on this ;)

I'll recycle a comment I made on Joe Drumgoole's weblog as to why this happened:

My theory is that social networking systems, like Bebo, Myspace, linkedin, Friendster, Tribe.net, Orkut, Facebook etc. have all developed their own emergent specialisations. These are entirely driven by their users -- although the sites can attempt to push or pull in certain directions (such as Friendster banning 'non-person' accounts), fundamentally the users will drive it. All of those sites have massively different user populations; Tribe has the Burning Man crowd, Friendster the daters, Orkut the brazilians etc.

Next, I think kids of school age form a set of small set of cliques. They don't want to appear cool to friends thousands of miles away, on the internet; they want to appear cool to their peer group in their local school. So all it takes is a group of influential 'tastemakers' -- the alpha males and females in a year -- to go onto Bebo, and it becomes the site for a certain school; and given enough of that, it'll spread to other schools, and soon Bebo becomes the SNS for the irish school system. In other words, Irish kids couldn't really care less what US kids think of them; they want to be cool locally.

Also I think MySpace has a similar problem to Orkut -- it's already 'owned' by a population somewhere else, who are talking about stuff that makes little sense to Irish teenagers. As a result, it's not being used as a social system here in Ireland; instead, it's just used by musicians who want a cheap place to host a few tracks without having to set up their own website.

(Aside: part of the latter is driven by clueless local press coverage of the Arctic Monkeys -- they have latched onto their success, put the cart before the horse, and decided that they were somehow 'made' by hosting music on MySpace, rather than by the attention of their fans. duh!)

links for 2006-05-17

5 Years of taint.org

Five years ago, on 15 May 2001, I started writing this weblog.

Subject matter started with a forward of something odd from the Forteana list -- 'Why Finns are sick of illnesses named after them'. In terms of subject matter, I started the weblog to reduce the amount of forwards I was passing on by email to other groups -- hence the preponderance of forteana posts early on.

Nowadays, by contrast, I try to write original ramblings^Wresearch for the main part of the site, and the occasional "fresh bits" I unearth elsewhere are kept separate, posted to the link-blog at del.icio.us/jm.

However, the real reason I started the thing was to act as an experiment in using WebMake as a blog platform -- at least, that was the excuse. It worked quite successfully, for what it's worth -- but in mid-August 2005, I eventually accepted that there weren't enough hours in the day to maintain a weblogging CMS, and its templates, as well as everything else, and that I didn't really need to test WebMake's abilities any more, and switched to WordPress. I'm glad I did; WP is a great piece of software.

So what's been the biggest hit on taint.org, by far? Here it is: http://taint.org/xfer/2004/kittens.jpg . Lots and lots of Google Image referrers, MySpace hotlinkers, etc. etc. ;) It's a top hit for a GIS search for [kittens], I think.

Random stats, based on April's logs:

  • About 81247 hits were received during April to the RSS 2.0 feed (the default), 9921 to the Atom feed, and 7795 for the RSS 1.0 rendering. That indicates that format-wars-wise, people just use the default. ;)
  • Assuming the RSS reader apps average out to 1 HTTP GET every 30 mins (as Bloglines and Apple's reader do), that means there are somewhere around (98963 / (30 * 24 * 2)) = 68 subscribers.
  • In terms of the old style browser-using readership -- there were 44926 hits on the front page using web browsers.
  • AWStats claims 2700 visits per day, from around 33000 visitors per month. I find the latter figure hard to believe.

After the front page and the feeds, the scraped RSS feeds at http://taint.org/scraped/ come second, Threadless beating out Perry Bible Fellowship by a little bit.

Top stories last month, based on hits:

  • http://taint.org/2006/04/29/230814a.html -- Single-Letter Google Hits
  • http://taint.org/2006/01/20/220239a.html -- the SweetheartsConnection.com Scam (still attracting comments from scammees!)
  • http://taint.org/2004/04/15/033025a.html -- really outdated stats on GMail's spam filtering accuracy
  • http://taint.org/2006/04/20/213624a.html -- Automatically Invoking screen(1) on Remote Logins
  • http://taint.org/2006/04/15/134751a.html -- Google Calendar
  • http://taint.org/2006/04/03/121837a.html -- A Gotcha With perl's "each()"
  • http://taint.org/2005/08/06/024026a.html -- The Life of a SpamAssassin Rule
  • http://taint.org/2006/04/21/133432a.html -- Phishing and Inept Banks
  • http://taint.org/2006/04/06/210519a.html -- RSS Feeds for Events in Dublin
  • http://taint.org/2006/04/13/140841a.html -- BT DSL's Daily Disconnects

Technorati says there are 514 links from 105 sites. I still don't know what the hell that means. ;)

Update: I've remembered that, before I started blogging at taint.org, I kept a diary at Advogato, which dates all the way back to March 2000!

Also, here are some pretty graphs from the graph-top-referers script:

The several slashdottings and a Boing Boinging are quite clear ;)

links for 2006-05-12

Link-blog Networking

Cool -- del.icio.us just added a feature whereby you can now see who has you in their network, and, of course, you can further view their networks and see who's in them.

This'd be great to produce social-network graphs, although I daresay Joshua mightn't be so keen on the spidering load. ;) I've optimistically requested some form of dump, anyway.

The social networking aspect of link collection and link-blogging via del.icio.us is emerging nicely; I'm keen to see what's next in the pipeline.

A few interesting things:

  • Almost everyone who's using del.icio.us seriously for link collection -- ie. applying some quality control thresholds, and bothering to write one-line descriptions, at least -- has filled out their 'network' by now.

  • It'd be useful to have "groups", so that we can now assert things like "jm, boogah, n0wak, negatendo, tweebiscuit, leonardr, muckster and torrez form a group". I'm sure that'd provide useful info, although could probably be inferred anyway. (People are attempting to hack it by using a shared tag on all their postings, like the "irishblogs" tag, but that's an awful misuse of tagging in my opinion ;)

  • Also, it'll be interesting to see what'll happen once Google Co-op figures out a way to incorporate the del.icio.us network data. To be honest, I'm very surprised it wasn't already in there -- it seems like a no-brainer... maybe some Y!/G corporate rivalry is getting in the way.

Anyway, in the meantime it's producing lots of good fodder for my SpicyLinks feed.

SpicyLinks is an implementation of something that I mentioned in a comment on this weblog entry, regarding future methods of reading weblogs; in essence, it's an automated blog aggregation summariser. It reads other people's link-blogs, so I don't have to, and reports the stuff that proves popular in my personal collection of sources.
(Credit where due: HotLinks provided much of the inspiration, but doesn't support personalisation, hence the reimplementation.)

SpicyLinks is similar to Populicious, but that app really misses the point, in my opinion. I don't particularly want to know what everyone is pointing at; I want to know what a selected set of trusted sources (with good taste!) are pointing at.

This aggregation is pretty similar to the del.icio.us 'network' feed, but with much lower volume, and a higher signal/noise ratio, attained by dropping the 'one-off' items that only one person is pointing at. Initially, that may seem like a major failure, since you miss the 'fresh bits' -- but as long as you've got the right people in your source network, it actually works very well.

It'd be great if this was one of the features implemented in the del.icio.us 'network' system...

links for 2006-05-11

links for 2006-05-10

links for 2006-05-09

Script: new-referrer-rss

new-referrer-rss.pl - generate RSS feed of new referrer URLs from access_log

SYNOPSIS

new-referrers-rss nameofsite [source ...] > new-referrers.xml

DESCRIPTION

Given the name of a web site, and a selection of Apache combined log format 'access_log' files containing referrer URL data, this will generate an RSS feed containing the latest referrers.

The script should be run periodically with 'fresh' access_log data, from cron.

Todd Underwood on BlueSecurity DDoS

Renesys Blog: The Bluesecurity Fiasco -- in which Todd Underwood, CSO for Renesys Corporation, applies some real-world knowledge of how the internet works to the "timeline of events" press release, issued by BlueSecurity as part of their ongoing PR about the DDoS.

Judging by the comments at Slashdot, this really needs to be more widely read.

Here's some highlights:

The timeline from BlueSecurity [...] is frustratingly vague. It uses phrases like 'tampering with the Internet backbone using a technique called "Blackhole Filtering".' As Thomas Pogge, a philosophy professor of mine, used to say: that's not even wrong yet. There is no "Internet backbone", there is no technique known as "Blackhole Filtering", and blackhole routing is not normally described as tampering. So the whole explanation is nonsense. [...] Let's clear one thing up for the press and everyone else: this event just wasn't that interesting. The attack against bluesecurity was a run-of-the-mill denial of service attack.

His conclusion:

I believe that the PR engine from BS is in overdrive spinning this event as fast as they can. But the concrete facts being put out by them simply to not add up. In the process they seem to be doing two things: 1) trying to imply or state that someone at UUnet was bribed by a spammer. This is simply ridiculous. I know many of the people who work for UUnet and they are honest, hardworking and extraordinarily clever people. They would not be crooked, or stupid, enough to do such a thing and if they were, they would have been trivially caught by change-management procedures. Moreover, such a change at UUnet (or BTN) wouldn't have caused the event BS claims to have witnessed anyway. Additionally, 2) BS is trying to deflect attention from the damage that they caused at Six Apart. It would be much better if they could just claim ignorance of the DOS, apologize and move on. I recognize that that isn't going to happen, but it sure would make this whole thing easier to handle.

Well said.

Of course, this is pretty much immaterial -- the people who are using Blue Frog, and vocally supporting Blue Security, don't really care what happened. All they care about is that someone is taking some kind of direct action against spammers, in some way or another, and if there's a little "friendly fire" and some bending of the truth, why, this is a war! What, do you support the spammers?

It's disappointing -- the amount of disinformation being successfully pumped out (and accepted!) on this story is massive.

Outside My Window Right Now

Bubba, now safely back in Dublin after his 8000-mile flight from LAX, is getting back into exploring his old manor.

Here he is, ignoring a very brave magpie. Judging by the way the magpie was brazenly hopping around him, cawing, and the way that Bubba was ignoring him, I suspect there may be a nest nearby....

links for 2006-05-04

London’s Oyster RFID card to become a full cashless payment system

Apparently, Transport For London are planning 'e-money' trials based on their remotely-readable Oyster RFID cards.

Combine that with Kevin Mahaffey of Flexilis' talk at Black Hat last year, where he demonstrated apparatus to extend RFID read range from 4-6 inches to approximately 50 feet, and things could get messy. ;)

The slides for that talk are available here (PDF); slide 20 specifically mentions the Hong Kong "Octopus" cashless-payment card.

links for 2006-05-03

Blue Frog List Leaked?

Blue Frog is a company who operates a "Do Not Email" list, on the (optimistic) basis that spammers will vet their lists against it.

Reportedly, it's been compromised. If this is true, I'm not surprised -- as Dr. Aviel Rubin's report to the FTC of May 2004 regarding a Do-Not-Email list notes:

The scrubbing approach [to running a D-N-E list] requires that a list of live email addresses exist. While the party owning that list may be well intentioned, it is unlikely that such a valuable list would not leak out. History is replete with insider attacks, as well as external break-ins to highly sensitive sites, such as the Pentagon computers. The Do Not Email Registry represents the kind of prize that attracts hackers. In this case, the prize has monetary value as well. Once the list is exposed, there is no way to undo it.

Also, it's almost inevitable:

If this service were running for some time, it is more likely than not that the plaintext addresses would leak at some point, given the history of computer security incidents.

Update: it appears, according to this white paper, that the Blue Frog "Do Not Intrude" list is hashed, rather than plain-text. Rubin's advice still applies:

Without hashing, a compromise of the registry database results in exposure of all of the registered email addresses. This is a total disaster. However, even exposure of a hashed list is a catastrophe. A spammer with a copy of a hashed list of email addresses is able to find out, for any email address, if the address is in the registry. The attacker simply hashes a candidate email address and sees if the hashed value is in the list. This is very powerful. [....]

Hashing provides absolutely no security against a marketer who obtains a scrubbed list and uses that to sell the addresses that were scrubbed by the registry. Whether or not the list is hashed has no impact on a malicious marketer in the scrubbing approach.

SpamAssassin in the Google Summer of Code 2006

Are you a student, and interested in earning $4,500 for contributing to open source, and fighting spam, over the course of the summer?

If so, get thee hence to the Google Summer of Code 2006 site, and propose a project!

Last year, we in SpamAssassin didn't get it together to mentor SoC projects. This year, however, we have a few prospective mentors (including myself), and a few sample project ideas lined up; we're all ready to go! Here's the Student FAQ. Be quick; applications end in a week and a bit.

Here's hoping we get some interesting submissions ;)

links for 2006-04-29

Single-Letter Google Hits

Here's what happens when you search for single letters on Google:

Interestingly I got to see the new Google search results page, with the sidebar, once. It must be in the process of rolling out...

links for 2006-04-27

links for 2006-04-26

Peoplefeeds and Quick Aggregation

peoplefeeds is cool.

I've been looking for something to can aggregate my Flickr, WordPress blog, and del.icio.us feeds into one venue where I can look up items by tag, in a single page-load.

Suprglu was my leading contender, although they weren't there yet since they didn't seem to support importing my blog posts with tags preserved -- pretty much everything wound up tagged as "uncategorized". disappointing. :( so I was waiting for them to fix that.

This post by Richard MacManus pointed at another couple of options; 43Things and Peoplefeeds. I hadn't actually noticed that 43Things was doing this kind of aggregation too; unfortunately as far as I can see, they doesn't support tag preservation and browsing, so there goes my desired feature. shame.

However, Peoplefeeds was right on target, offering a 'Unified Tagspace' and a 'Search All-Personal-Content' mechanism. It works nicely, too. Here's my personal aggregator, combining my Flickr feed, my weblog feed, and my del.icio.us feed into one -- and with a unified tag-space; here's my 'hiking' tag, hitting all 3 feeds. Perfect.

One other use for this -- I've forgotten why I was looking for one of these, but I know I did want one ;) -- it can be used to make a "private planet". If you have 3 or 4 feeds that you need to combine into one, this provides a very easy way to do that; just set up a userid at Peoplefeeds for that purpose.

Phishing and Inept Banks

John-Graham Cumming asks, 'Are Citibank crazy?':

I blogged a while ago about Thunderbird's phishing filter trapping a seemingly innnocent mail. Now, a reader has forwarded to me a genuine email from Citibank that he says was trapped by Thunderbird. I'm not going to reproduce the email here because it contains private details of the user, but it is a valid Citibank message.

Thunderbird thinks it's a scam because Citibank uses one of the oldest phishing tricks in the book. The have a URL displayed in the message then when clicked goes to a totally different URL.

Sadly, this has proven to be really quite common. We've investigated using this rule as a worthwhile phish-detection rule in SpamAssassin, several times, and without much luck. In fact, we've had to create a FAQ entry for it -- since it's such a superficially-attractive but ultimately useless, idea, many people have had long discussions on our lists about it!

The companies that produce these false positives in their mails include American Express, Bed Bath & Beyond, Universal Studios, Microsoft, Hilton Hotels -- and now Citibank.

A couple of other examples from real mails:

  <a href="http://www65.americanexpress.com/clicktrk/Tracking?
    mid=MESSAGEID&msrc=ENG-ALERTS&url=
    https://www.americanexpress.com/estatement/?12345">
    https://www.americanexpress.com/estatement/?12345</a>

  <A HREF="http://echo.epsilon.com/WebServices/EchoEngine/T.aspx?l=ID">
    https://www.hilton.com/en/ww/email/tab_email_subscriptions.jhtml</A>

By the way, it really is quite impressive for a bank as heavily phished as Citibank to still be making this kind of basic mistake in their mail-outs! It reinforces a point I made in a mailing list posting recently:

As far as I can see, the approach taken by pretty much all banks to their online services is simply too bureaucratic, hide-bound, and fundamentally driven by their marketing departments, to ever cope effectively with phishing. :(

(For what it's worth, I know Citi have some smart techies working there; but the rest of the company needs to start paying attention to them.)

Optimo vs. Bud Rising

Optimo have a new mix up -- the First Hour Mix:

Here's the fourth in a brief series of mixes where we present something a little different. This mix isn't really a mix in the conventional sense but rather 17 tracks blended together. To us, the first hour of Optimo, or to be more accurate, the 'Espacio' part of Optimo (Espacio) is a vital part of the night. It is our chance to play absolutely what we like without thinking about the dancefloor.

It's a great mix -- certainly not dancy, but some really interesting tracks here. The Optimo guys put together some really great music.

In fact, I went to see them play last Saturday -- or, at least, myself and a couple of mates tried to. Supposedly, they were supporting The Juan Maclean at the Bud Rising festival over the weekend, but the show was such a shambles, without anyone having a clue when it started or who was on stage at any time, I'm pretty sure we missed their set entirely.

On top of that, it was EUR20 in, and to add insult to injury, the only lager on sale was Budweiser! I mean, I wouldn't mind that if the "Bud Rising Festival" deal meant free entrance, but charging 20 squids and then cutting off the supply of decent booze as well, is just a crime.

Ah well, the Filthy Dukes were pretty good at least.

Google Calendar

So I've been using this for a few days now -- and I'm loving it. A calendaring system that deals coherently with the web:

I keep finding little things that make perfect sense, and just feel more logical than what I've used elsewhere. This rocks!

One thing still needs work, though: the links to Mapping fail spectacularly, for non-US addresses at least. But that's pretty minor.

By the way, I have a feeling that Mac.com had parts of this, but really, you had to drink a lot of Apple kool-aid to use that, and I just didn't go for that. Sorry Jobs fans.

Do you know what would be cool now? If Upcoming.org published venue/location-specific iCal feeds. Oh look, they do! Awesome...

BT DSL’s Daily Disconnects

Argh! This is what happens every day to my DSL connection, at half past 12:

13 Mon Apr 10 12:26:53 2006 PP12 -WARN  SNMP TRAP 2: link down
14 Mon Apr 10 12:26:53 2006 PP12  INFO  ppp_ready: ch:8056167c, iface:80419f14
15 Mon Apr 10 12:26:53 2006 PP12 -WARN  SNMP TRAP 3: link up
26 Tue Apr 11 12:26:46 2006 PP12 -WARN  SNMP TRAP 2: link down
28 Tue Apr 11 12:26:48 2006 PP12  INFO  ppp_ready: ch:8056167c, iface:80419f14
29 Tue Apr 11 12:26:48 2006 PP12 -WARN  SNMP TRAP 3: link up
38 Wed Apr 12 12:26:56 2006 PP12 -WARN  SNMP TRAP 2: link down
40 Wed Apr 12 12:26:58 2006 PP12  INFO  ppp_ready: ch:8056167c, iface:80419f14
41 Wed Apr 12 12:26:58 2006 PP12 -WARN  SNMP TRAP 3: link up
50 Thu Apr 13 12:27:00 2006 PP12 -WARN  SNMP TRAP 2: link down
52 Thu Apr 13 12:27:03 2006 PP12  INFO  ppp_ready: ch:8056167c, iface:80419f14
53 Thu Apr 13 12:27:03 2006 PP12 -WARN  SNMP TRAP 3: link up

Worse than that, it will generally assign a different IP address to the connection when it reconnects! This buggers up any applications that rely on long-lived TCP connections, such as SSH shell logins, tunnels, remote-desktop sessions, and instant messaging; all get disconnected and have to be manually re-set up.

Initially, I thought this may have been a flaky connection. However, it appears not -- check out those timestamps; that's a scheduled, daily event. Also, there have been no other disconnections apart from those.

A discussion on the IIU mailing list revealed the reason -- it seems BT Ireland have a policy of resetting their customers' connections daily. That could be OK, if they came right back up with the same IP -- TCP/IP is designed to cope with that, and generally does -- but it does not do that. Instead the IP address is reassigned every single time.

This is turning out to be quite a nuisance. Working over the internet requires quite a few VPN connections, tunnels, and remote logins, and having to re-set those up, daily, is turning out to be a pain in the neck.

I'm casting around for hacks to get around this. Right now, I have an assortment of jiggery-pokery involving ssh, a shell script 'while' loop, and screen(1), but it's messy and not working out too well. Ideally, I'd set up another VPN (via IPSec or CIPE), and set it up to reconnect on link failure, then route all other VPNs and remote logins out via that -- but I don't have spare routable IPs to do this with. Anyone got any good suggestions?

By the way, it's worth noting that their FAQ fails to mention this, instead giving some incorrect information about my IP being 'removed' when my web browsing session ends:

Is it a fixed IP?

No, the product is set up with dynamic IP Addressing. This means that every time you open your browser you will be allocated a different IP address for the duration of that session. When the session ends the IP Address is removed.

That is incorrect -- this has nothing to do with web browsing sessions.

To be honest, I'd prefer not to have to switch ISPs to get away from this brokenness -- the rest of the service is quite nice, good pings, good throughput, no other disconnections or outages -- but this is quite a problem for someone using BT Broadband for telecommuting purposes. :(

My QuitMeter

I gave up smoking last year on May 26 -- that anniversary isn't too far away. Here's how much money I've saved, courtesy of QuitMeter.com:


QuitMeter Counter courtesy of www.quitmeter.com.

Wow -- I could buy myself another iPod! ;)

Software Patenting and “Hot” Fields

Paul Graham's recent essay on his experience with software patenting has been making the rounds recently.

Now Kevin Marks has commented. Worth reading, since he demonstrates nicely the kind of crap you see in a 'hot' field, such as video (which he worked on with Apple's Quicktime):

I broadly agree with Paul Graham's essay on Software Patents, but I do think he underestimates the damage from patent trolls, and from what he calls the mafia-like behaviour of some patent holders. Paul has been lucky in the field he has worked in, but in the Audio and Video area there are many patent thickets. ... While I was at Apple on QuickTime, there was a steady stream of patent trolls claiming that Apple should pay them royalties; enough to keep several lawyers busy, and a lot of engineers spending time working on prior art evidence demonstrations. Several potential features were excluded from QuickTime due to patent thickets. The obvious one was the Unisys LZW patent that encumbered GIF, but there were other more subtle pressures that meant adopting open source codecs was discouraged. Working on the patent license agreements for MPEG meant that technology ready to ship was deferred pending legal agreement on more than one occasion.

In my experience, that's what happens -- once a field becomes "hot", patent trolls and other nuisance "inventors" start appearing en masse, and then you've got to waste a lot of time dealing with that crap.

RSS Feeds for Events in Dublin

So, now that I'm back in Dublin, I've taken a quick look around for ways to keep up to date on upcoming live gigs -- and found that the situation, frankly, sucks. In particular, almost none of the sites are offering RSS or Atom feeds yet.

Having said that, Waxy and Leonard's Upcoming.org is doing quite nicely for the Dublin metro area:

And lots of credit for the promoter, MCD, who seem to be just about the only Irish listings site who offer RSS:

This is fantastic, but -- naturally -- they don't cover events put on by their competitors. ;)

Apart from that, it's pretty shoddy. Lots of late-90's-looking websites out there, and no feeds in sight. Thankfully, Feed43, and some perl scripting, is on hand to allow me to take matters into my own hands.

Entertainment Ireland offer a pretty good music news section -- but sans feed. Feed43 saves the day:

And, surprisingly, Ticketmaster, of all sites, is turning out to be a great way to find out what's on in Dublin, listing pretty much all ticketed events in a nice, clean, succinct format. Unfortunately, the highest location resolution it offers for Ireland is the country as a whole. However, this can be worked around by subscribing to individual venues, such as Crawdaddy or The Village. (This has a happy side-effect of narrowing down the types of music -- I can skip finding out that The Eagles are playing, since they won't be playing at Crawdaddy ;)

For some reason, though, Ticketmaster haven't got around to offering their own RSS feeds. Not a problem -- in response I've hacked up tm2rss.cgi, a little script which scrapes the venue pages and produces RSS:

For other venues, simply take the venue URL (for example, http://www.ticketmaster.ie/venue/198641 for The Village), add the numeric venue ID in place of NNNNN in this URL: http://taint.org/scraped/tm2rss.cgi?v=NNNNN , then use that as the Feed URL in your feed reader.

A Gotcha With perl’s “each()”

It's my bi-monthly perl blog entry, to earn my place on planet.perl.org! ;)

Here's an interesting "gotcha". Take this code:

    perl -e '%t=map{$_=>1}qw/1 2 3/;
    while(($k,$v)=each %t){print "1: $k\n"; last;}
    while(($k,$v)=each %t){print "2: $k\n";}'

In other words, iterate through all the key-value pairs in %t once, then do it again -- but exit early in the first loop.

You would expect to get something like this output:

    1: 1
    2: 1
    2: 3
    2: 2

instead, you see:

    1: 1
    2: 3
    2: 2

The "1" entry in the second loop is AWOL. Here's why -- as "perldoc -f each" notes:

There is a single iterator for each hash, shared by all "each", "keys", and "values" function calls in the program

That's all "each" calls, throughout the entire codebase, possibly in a different class entirely. Argh.

The workaround: reset the iterator using "keys" between calls to "each":

    perl -e '%t=map{$_=>1}qw/1 2 3/;
    while(($k,$v)=each %t){print "1: $k\n"; last;}
    keys %t;
    while(($k,$v)=each %t){print "2: $k\n";}'

This got us in SpamAssassin -- bug 4829.

To be honest, having to call "keys" after the loop is kludgy -- as you can see if you check the patch in bug 4829 there, we had to change from a "return inside loop" pattern to a "set variable and exit loop, reset state, then return" pattern. It'd be nice to have a scoped version of each(), instead of this global scope, so that this would work:

    perl -e '%t=map{$_=>1}qw/1 2 3/;
    { while(($k,$v)=scoped_each %t){print "1: $k\n"; last;} }
    # that each() iterator is now out of scope, so GC'd;
    # the next call uses a new iterator, starting from scratch
    { while(($k,$v)=scoped_each %t){print "2: $k\n";} }'

Scoping, of course, has the benefit of allowing "return early" patterns to work; in my opinion, those are clearer -- at the least because they require less lines of code ;)

Feed43 Rocks

I've just given Feed43 a go. It's very nifty.

Basically, it's a pattern-based HTML-to-RSS scraper -- similar to my own Sitescooper in that respect ;) -- but built entirely as a web app.

Until now, I've been hacking up scrapers one by one, using either Sitescooper or WWW::Mechanize, run from cron, and putting the output up on taint.org; for example, http://taint.org/scraped/ has the public ones: Threadless, Perry Bible Fellowship, and White Ninja comics.

Today, I came across a case where I wanted a new RSS feed, and since I'd been hearing of Feed43, thought I'd give it a try, to save running yet another cron on our server. It was reasonably simple, although still required a fair bit of knowledge of the concepts of scraping via pattern matching against HTML; but the UI was fantastic, with everything previewed using a clean AJAX UI, and within 3 minutes I had a new feed.

For the curious -- the feed was for TCAL's Ireland category , and the results are here: Feed43 (Feed For Free) : TCAL - Ireland. (go ahead and sign up if you like ;)

New web pattern, by the way -- there's a trend towards using "secret URLs" instead of username/password authentication for the kind of "trivial" auth task, like editing feed-scraper details. Good idea.

Public Transit == Crime

I just received a very nice info-pack through my front door regarding the new Dublin Metro line, which is in planning at the moment; it seems they're soliciting feedback from residents near the proposed routes. Nicely done.

Right now, Dublin has an embarrassment of good public transit, at least when compared to my previous home in Orange County. There, public transit is actively campaigned against.

My favourite claim: that it 'increases crime' -- in other words that poor people from Santa Ana would come down to Irvine and steal stuff, which they couldn't do with vehicular transport, for some reason.

The OC Weekly thought it was pretty funny, too -- and an opposing group comprehensively debunked it. Still, it seemed to work; while I was living in Irvine, I got to see the Centerline proposal gradually whittled down until it was finally killed off. During that time, in contrast, Dublin built the Luas.

Unfortunately it doesn't exactly go where I want to go, but you can't always have everything. ;)

DSL=GOT

finally!

Coffee and Trivia

Just got a new cafetiere, so I can finally switch back from instant coffee to the real deal again for my morning coffee. My productivity has doubled. Still no DSL, though -- early next week is the current estimate, and I can hardly wait.

I went to a pub quiz last night with mates Macker, Tom and Alan -- a benefit for a new Dublin theatre company, I think. The prizes were:

  • First prize: several 50 Euron vouchers for various Dublin eateries
  • Second prize: two fancy scarves, a Nivea women's cosmetics kit, and a very metrosexual Nivea bath kit for a guy
  • Third prize: 4 bottles of nice wine

We did very nicely -- "aglet" was correctly defined for instance -- but not nicely enough. Put it this way: guess who's wearing Nivea deodorant?

Buying Consumer Electronics Online, in Ireland?

Hey lazyweb, hear my plea! What are my options for buying consumer electronics online, now that I'm back in Ireland?

I like online shopping. I dislike Argos, and I really hate Dixons, Currys and all the rest of the consumer-electronics high-street operations. Get me on the net and out of the nasty little shops and I'm happy. ;)

All in all, I'm a bit of an Amazon fan. However, now that I'm back in Ireland, I've been brought back to earth with a bang on that count; the prices are OK for items at both Amazon.com and .co.uk -- but shipping is turning out to be a total disaster.

Basically, I've put in two orders, paid through the nose for basic shipping, and neither has turned up. For example -- I ordered this phone a week and a half ago, on the 9th March, ponying up UKP 27 for the item -- and a painful UKP 7 for shipping by International Mail.

Delivery estimate on ordering was for between 5 and 7 days -- 14th to the 16th March. That was long enough -- but it still hasn't turned up, and Amazon.co.uk is still claiming that that is the current estimate, despite the 16th of March being 4 days ago ;)

On top of that, it appears they don't offer any way to track the packages using that shipping method, so who knows what's happening with the damn thing right now.

If I compare that with an order I made at Amazon.com last November, in which I nabbed a handy FM transmitter for my iPod -- in that case, I got it shipped by plain old US Postal Service for $4.51, which was handily discounted as Super Saver Shipping. That -- as with pretty much all my Amazon.com orders -- arrived in 3-4 days, and for a hell of a lot cheaper too. If I'd had to pay for shipping (which I didn't anyway), $4.51 vs UKP 7 works out as a third of the price, no less.

I'm guessing this is mainly down to Amazon.co.uk being shoddy in terms of how it deals with shipping to Ireland, and there are probably sites that use better-quality shipping partners.

Surely there must be better deals with vendors in Ireland, or even elsewhere in the Eurozone? Anyone know? Please drop us a line in the comments!

Update: the items arrived -- 14 days after ordering. This is a moot point now, though, since Amazon.co.uk are no longer selling 'PC & Video Games, Toys & Games, Gift items, Electronics & Photo and Home & Garden items' to Ireland; I guess it was easier to give up on the Irish market for now. Very disappointing -- but I'm waiting to see what happens next.

VAST.com

So, my new employer just launched today!

It's a new search service, VAST.com. As the blog says, 'we are building a search service that extracts classified ads from across the web, structures them, and then makes them available via an open REST API for commercial and non-commercial uses.'

Now you can see why I'm excited ;)

Greetings from 1996!

    --> Sending: ATZ
    ATZ
    OK
    --> Sending: ATQ0 V1 E1 S0=0 &C1 &D2
    ATQ0 V1 E1 S0=0 &C1 &D2
    OK
    --> Sending: ATH1
    ATH1
    OK
    --> Modem initialized.
    --> Sending: ATDT1892150150
    --> Waiting for carrier.
    ATDT1892150150
    CONNECT 45333

45 measly kilobits per second! This is incredibly painful -- and expensive at 5 cents a minute! I briefly considered getting around it by hiring a 3G data-card for the couple of weeks before my DSL is activated -- but that too is insanely overpriced.

Hurry up, DSL...

Disclosure

As of yesterday, I have a new day-job.

I won't be working on email spam as part of the job, which is an interesting turn of events. However, I'll be sticking with the open-source Apache SpamAssassin project, and keeping up the rate of work on that [*].

I'm not sure how much I can blog about the new place just yet, but I will say it's certainly looking like it'll be very interesting work ;)

[*: modulo the next couple of weeks while I'm waiting for my bloody DSL to be installed. argh!]

Apple Attempting to Patent RSS Aggregation

Miguel de Icaza quotes Dave Winer, pointing out two patent applications from Apple which seem intended to grab major chunks of the feed syndication space as Apple "IP".

The first application is news feed viewer, 20050289147, filed April 13 2005:

A computer-implemented method for displaying a plurality of articles, the method comprising: storing a first feed bookmark in a folder, the first feed bookmark indicating a first feed, the first feed comprising a first plurality of articles; storing a second feed bookmark in the folder, the second feed bookmark indicating a second feed, the second feed comprising a second plurality of articles; aggregating the first feed and the second feed to form a third feed; and displaying the third feed.

I think there were many RSS readers that implemented this, and others from the patent application, before April 2005. I know Liferea, the one I use, has had UI-level aggregation since September 2004, with its VFolders.

Next, news feed browser, 20050289468, filed April 13 2005. This one contains a wide range of claims, but here's one that stands out as particularly trivial:

A computer-implemented method for discovering a feed, the method comprising: receiving a request to display a file; determining that the file includes relationship XML; determining that a Uniform Resource Locator (URL) within the relationship XML indicates a file that comprises the feed; and displaying one of a group containing the feed and a link to the feed.

That's pretty much RSS autodiscovery, as described in 2002.

The listed inventors in both patents are: Kahn, Jessica; (San Francisco, CA) ; Alfke, Jens; (San Jose, CA) ; Wilkin, Sarah Anne; (Menlo Park, CA) ; Howard, Albert Riley JR.; (Sunnyvale, CA) ; Forstall, Scott James; (Mountain View, CA) ; Lemay, Stephen O.; (San Francisco, CA) ; Melton, Donald Dale; (San Carlos, CA) ; Loofbourrow, Wayne Russell; (San Jose, CA).

Thanks, Apple! and thanks, "inventors"!

It's important to note that this is still in the application stage, and as such can be invalidated, or narrowed down to a saner level, by using the techniques described here. I strongly recommend that people working in the syndication field with sufficient knowledge and expertise who feel strongly enough about this should spend a little time doing so, before the patent is issued and it becomes a multi-million-dollar task to invalidate it. (however, IANApatentL of course ;)

We Win

ongoing: The ASF Server:

Tim Bray: Which Apache project burns the most resources?

Mads: Spamassassin by a wide margin. [...]

Heh, we win ;)

Helios, the Zones server, has been an incredible resource for us. SpamAssassin isn't a traditional open-source software project in one respect: we use a lot of centralized "phone home" infrastructure to support rule and score generation. Having a virtualized server of this quality and horsepower to use for this has been fantastic.

(thanks to John O'Shea for the pointer!)

IBM Patents Closed-Loop Confirmation

Another day, another absurd IBM software patent. Via the IP list, here's United States Patent 7,003,497:

  1. A method for confirming an electronic transaction, comprising the steps of: performing an electronic transaction between a first party and a second party; providing, by the first party to the second party, contact information of a third party service provider associated with the first party; contacting, by the second party, the third party service provider to obtain a location of a predetermined, private mailbox associated with the first party; sending, by the second party, a request for confirmation of the electronic transaction to the predetermined, private mailbox associated with the first party; accessing the private mailbox by the first party; and sending, by the first party, a reply message to the request for confirmation to thereby confirm authorization of the electronic transaction, wherein information regarding the private mailbox is not communicated to the second party during the electronic transaction.

There's lots of waffle in the background section about this being for electronic e-commerce transactions, but that claim, and claims 2 and 3 at least, are easily sufficiently broad to cover simple "confirmed opt-in" email subscription systems -- in other words, the system whereby a potential newsletter subscriber clicks on a link in order to "confirm" that they want to subscribe to a newsletter. That's the current best practice email subscription method used by pretty much everyone.

Filed December 31, 2001. There was plenty of prior art before this date, but who would want to go up against IBM, no less, to attempt to get this invalidated, especially now that it's been issued?

Thanks USPTO, you're doing a heck of a job!

US Things I Miss

So, I've been back in Ireland for several weeks now. How goes the culture shock? Well, let's make a list of the stuff I'm missing from California:

  • C, who's still back there finishing up her contract. Hurry up, C!

  • All my friends I left behind in the US :( Come visit!

  • The weather (well duh)

  • Trader Joes: low-cost, high-quality organic and near-organic food

  • The excellent Mexican and Southern food. Mmm, Taco Mesa

  • Super-cheap cocktails -- although having good Guinness makes up for a lot of this

  • The back country -- desert, mountains, snow, national parks. Ireland may have more surviving history dotted about, but it's just flat. I miss the mountains

  • Netflix -- haven't spotted a replacement for this yet. There are companies in Ireland that use a similar idea, but it appears every one just about manages to screw it up and render it useless, generally by introducing throttling, late fees, or slow turnaround. meh

  • The way my Irish accent meant I could get away with pretty much anything. That trick doesn't work in Ireland ;)

In other news: the broadband choices situation has pretty much gone to shit.

It turns out that all the good options are quite dependent on local-loop unbundling, which -- somehow -- still hasn't gotten around to my local exchange. As a result, guess who's going to be stuck on the wrong end of dialup, no less, for "2 to 3 weeks" until Eircom deign to switch on the bitstream access for my new BT-resold ADSL connection? Here's hoping there's a neighbour with broadband and wifi when I move back in. Joy.

DearAOL and GoodMail

Things have really been heating up recently around the AOL/Goodmail "pay to send" CertifiedMail scheme -- the EFF and a host of other groups have launched dearaol.com, stating:

This system would create a two-tiered Internet in which affluent mass emailers could pay AOL a fee that amounts to an "email tax" for every email sent, in return for a guarantee that such messages would bypass spam filters and go directly to AOL members' inboxes. Those who did not pay the "email tax" would increasingly be left behind with unreliable service. Your customers expect that your first obligation is to deliver all of their wanted mail, and this plan is a step away from that obligation.

While I dislike this proposal, too, as far as I can tell, AOL actually have pretty reasonable intentions with this program -- nowhere near as bad as the DearAOL.com site makes out.

However, they're doing a really really crappy job of getting this information out there, or committing to reasonable limits on the program, such as announcing that they will use it only for transactional emails, as Yahoo! have done.

I'd strongly recommend reading Carl Hutzler's posting on the subject. Carl was AOL's head of anti-spam operations until last year, so he really knows what he's talking about, and he lays it out clearly -- a lot more clearly than any corporate statements from AOL do. His blog contains a fair bit more on the subject, too.

But seriously -- why isn't there a press release on the AOL site about this scheme? Some front-channel communication about now might be useful, I'd suggest, before things really get hairy -- this crapstorm is coming about partly because AOL's comments are all filtering out in drips and drabs via third parties, and (AOLers say) are being misconstrued and misrepresented in the process. It's a classic case of missing the cluetrain.

I'd also really encourage the EFF people to tone done the rhetoric; statements like "senders will have no guarantee that their emails will be delivered" is scare-mongering, given that SMTP email already provides no such guarantee.

Update: wow, MoveOn went really overboard -- "threatening the Internet as we know it ... The very existence of online civic participation and the free Internet as we know it are under attack." OMG the sky is falling!

Side Issue: The Spam Definition

Also, another note to EFF: defining spam as "whatever you don't want to read" is a terrible mistake to make. That confuses a good, clear, enforceable and automatable definition of spam -- unsolicited bulk email -- and makes it effectively unenforceable by law, unpoliceable by ISPs, impossible to detect automatically, and incompatible with existing, effective EU and Australian legislation.

Listen to your own Chairman of the Board; he's right on this count.

PS: any luck fixing up the non-confirmed signups issue? Last time I checked I could still subscribe any address to the EFF Action Alerts without a cross-check, which is not a good thing.

Another script: goog-love.pl

A quick hack --

goog-love.pl - find out where your site's google juice comes from

This script will grind through your web site's "access.log" file (which must be in the "combined" log format). It'll pick out the top 100 Google searches found in the referer field, re-run those searches, and determine which ones are giving your website all the linky Google love -- in other words, the searches that your site 'wins' on.

The output is in plain text and a chunk of HTML.

usage:

goog-love.pl sitehost google-api-key < access.log > out.html

e.g.

cat /var/www/logs/taint.org.* | goog-love.pl \
  taint.org 0xb0bd0bb5yourgoogleapikeyhere0xdeadbeef | tee out.html

NOTE: this script requires the SOAP::Lite module be installed. Install it using apt-get install libsoap-lite-perl or cpan SOAP::Lite. It also requires a Google API key.

For example, here are the current results for this site. You can immediately see some interesting stuff that's not immediately obvious otherwise, such as my site being the top hit for [beardy justin] ;)

Download here (5 KiB perl script).

Notes:

  • if you see a lot of "502 Bad Gateway" errors, it's probably over-zealous anti-bot ACLs on Google's side. Try from another host.

  • Read the comments for notes on a bug in recent releases of SOAP::Lite; please let me know if you hear of them getting fixed ;)

Dublin Riots

While driving around Ireland on a wedding-location-scouting trip, we started receiving texts talking about riots in Dublin; I texted a friend, and got a reply along these lines: "Celtic-topped scobes run riot through O'Connell St, torching cars in Nassau street, hospitalising cops and Charlie Bird. madness!"

I thought he was joking, but nope. A load of IRA-slogan-shouting scumbags really had been allowed to run riot -- with paving stones of all things left unsecured in their midst! -- and it quickly got way, way out of hand.

The blog coverage is excellent, with lots of photos. I suggest starting with Indymedia Ireland, these Flickr photos and the links on this weblog. It appears the gardai really fell down on this one.

For what it's worth, I was in town a few hours later, and the rest of Dublin was trouble-free -- just the usual Saturday night goings-on. O'Connell St. was still a rubble-strewn mess when I passed through on Sunday, though.

SourceForge.net now offering public Subversion

Good news. It appears that SourceForge are now offering full, public use of Subversion for all projects on sf.net!

The SourceForge.net: Subversion (Version Control for Source Code) document contains full details on their setup. Notable key points:

  1. It's using authenticated HTTPS -- which is great, going by my experiences with the ASF's setup
  2. Imports are done from either an existing SF.net CVS repository using cvs2svn, from a Subversion 'svnadmin dump' file, or from a CVS repository tarball
  3. CIAbot support is offered as standard ;)

Awesome. I'll be trying this out with Uffizi, which I registered as a Sourceforge project a few weeks ago just to try this out. ;)

TREC Spam Corpus

Some news from TREC's Gordon Cormack:

The TREC 2005 Corpus (92,000 messages - 42,000 ham; 50,000 spam) is now available for self-serve download.

TREC Spam Evaluation is a NIST program to develop methods to measure spam filter accuracy and performance. More details here.

The corpus can be picked up at Gordon's site. As far as I can tell, this should be a pretty solid corpus for spam researchers and developers.

Four Things

I don't do silly blog antics much, but I got tagged by Mat for the Four Things meme. Looking around, it is indeed a bit more interesting than things like the usual LJ quiz, so why not!

I wrote this on the plane from LA to Dublin, which may have affected some of the selections in 4 places I would rather be right now at least ;)

4 jobs I've had:

  • I was Iona Technologies' first employee, and stayed there for no less than 7 years. I got to see the company grow from a handful of people, most of whom weren't getting paid (hence how I wound up as the first employee ;), all the way up to a 300-strong multinational, while the company itself formed a core of Ireland's mini dot-com boom. That was fantastic fun, and educational to boot.

  • my Dad's gun/fishing/sporting-goods shop. Was it really a good idea to have a teenager working near firearms? At least I wasn't the one who unplugged the fridge where the maggots were kept, so that they all hatched over the course of one weekend...

  • A horrible teenage job -- picking tomatoes. I can still feel the orange dust under my fingernails every time I smell fresh tomatoes :( I didn't last very long at that at all.

  • writing an Amiga-based kiosk system for virtually no pay whatsoever, at the age of 18 or 19. Ah, exploitation.

4 movies I can watch over and over:

  • Koyaanisqatsi -- it's dating a little now, since every ad agency through the 90s ripped it off. But still, the invention of a new format. I remember looking at the 405 freeway in LA, and thinking "looks like something out of Koyaanisqatsi" -- of course, it was.

  • Princess Mononoke -- either that, or Nausicaa. I just love the way the characters are coloured in shades of grey, rather than black and white.

  • the Lord of the Rings trilogy -- oh dear I'm a hopeless Tolkien fanboy.

  • Spinal Tap -- pure genius.

4 places I've lived:

  • Melbourne, Australia; around the time of the annoying TV drama, The Secret Lives Of Us;

  • Newport Beach, CA; around the time of the annoying TV drama, The O.C.;

  • Dublin, Ireland; no annoying TV drama -- so far

  • University of California Irvine, CA; while Irvine itself is the most soulless suburban hellhole I've ever visited, living on the UCI campus is quite fun by comparison. Take about 1000 grad students, post-docs and lecturers from around the world; put them all in the same square mile or so; remove all fun (and bars!) from the surrounding areas; watch them make their own entertainment, or go mad.

4 tv shows I love:

4 places I've vacationed:

  • Annapurna Base Camp, Nepal; we trekked our way up to there, then trekked back down again. Unforgettable. I really want to do another Nepal trek as a result

  • car-camping around the Australian state of Victoria; they have some fantastic national park campsites, which most tourists overlook

  • learning how to dive in Ko Tao, Thailand; great setting, great dive sites, pretty cheap too!

  • Yosemite; amazing, world-class natural beauty. Californians don't realise just how lucky they've got it ;)

4 of my favourite dishes:

  • A good Thai green curry

  • Laos-style green papaya salad with sticky rice

  • a good meaty cassoulet, from Fandango in San Luis Obispo. At least, that was the tastiest meal I've had in recent months ;)

  • Mangosteen -- the queen of fruit, according to the Thais. I could, and probably have, eaten hundreds of these

4 places I would rather be right now:

  • spending New Year's Day with a bunch of friends in rural West Cork or County Galway; until I moved to the US, this was one of my favourite annual traditions.

  • the Stag's Head Bar, Dublin, in the snug, again with a bunch of friends

  • sitting on the grass outside the Pavilion bar in TCD, on a sunny summer's day (hmm, that's a lot of bars!)

  • Chiang Mai, Thailand

4 sites I visit daily:

4 people I'm tagging:

The Return of Sneakernet

Keith Dawson sent this on -- an interview with Jim Gray, head of Microsoft's Bay Area Research Center and winner of the ACM Turing Award, talking about new transmission systems for truly massive data collections. Very interesting:

[One] option is to send whole computers. .... We're now into the 2-terabyte realm, so we can't actually send a single disk; we need to send a bunch of disks. It's convenient to send them packaged inside a metal box that just happens to have a processor in it. I know this sounds crazy -- but you get an NFS or CIFS server and most people can just plug the thing into the wall and into the network and then copy the data.

Dave Patterson, interviewer: What's the difference in cost between sending a disk and sending a computer?

JG: If I were to send you only one disk, the cost would be double -- something like $400 to send you a computer versus $200 to send you a disk. But I am sending bricks holding more than a terabyte of data -- and the disks are more than 50 percent of the system cost. Presumably, these bricks circulate and don't get consumed by one use.

DP: Are you sending them a whole PC?

JG: Yes, an Athlon with a Gigabit Ethernet interface, a gigabyte of RAM, and seven 300-GB disks -- all for about $3,000.

DP: It's your capital cost to implement the Jim Gray version of "Netflicks." (jm: sic)

JG: Right. We built more than 20 of these boxes we call TeraScale SneakerNet boxes. Three of them are in circulation. We have a dozen doing TeraServer work; we have about eight in our lab for video archives, backups, and so on. It's real convenient to have 40 TB of storage to work with if you are a database guy. Remember the old days and the original eight-inch floppy disks? These are just much bigger.

DP: "Sneaker net" was when you used your sneakers to transport data?

JG: In the old days, sneaker net was the notion that you would pull out floppy disks, run across the room in your sneakers, and plug the floppy into another machine. This is just TeraScale SneakerNet. You write your terabytes onto this thing and ship it out to your pals. Some of our pals are extremely well connected -- they are part of Internet 2, Virtual Business Networks (VBNs), and the Next Generation Internet (NGI). Even so, it takes them a long time to copy a gigabyte. Copy a terabyte? It takes them a very, very long time across the networks they have.

E-Pending

Boing Boing has an interesting case today:

"I filled out a web form for a contest from Miller using a throwaway junk email address and then, months after I dumped the throwaway account, I got this to my main account! Not sure I like the idea of companies tracking me down like this."

I sent a mail to follow up on this, but it's worth blogging here too.

This is, unfortunately, common practice among the "legitimate" bulk mailer companies; it's called "e-pending" (short for "email address appending"). Basically, the advertiser contacts one of the big data-mining companies, provides them with the data they have about the customer -- name, postal address, etc., and gets them to match that against their database; the data-miner then provides any other email addresses they may have on file for that user, even if those email addrs were provided for bills, promotional use for other companies, etc.

The advertisers contend that permission was given by the person who's being mailed; the recipients contend that permission was given to send to a specific address, not all of that person's addresses in perpetuity.

Here's a few more examples of e-pending gone bad: two Jennifer Millers, Sony scraping ancient Internic contact addresses, Spamvertized.org comment on the practice, Joe St. Sauver comments.

It's exclusively a US phenomenon, as far as I know; I think most cases of e-pending are rendered illegal under EU data protection law. Handy. ;)

Update: Brian at the Spam Kings weblog notes that 'this spooky little spam was the work of Equifax, the big credit reporting agency that shut down its Boca Raton-based spam operation, Naviant, in 2003, due to the impending passage of CAN-SPAM.'

RFID in the Grauniad, and back in Dublin

Greetings from sunny Dublin, Ireland! (really!)

I'm now back in taint.org's native timezone, although precariously set up and experiencing occasional interruptions. If you're waiting for a mail from me, it may take a little more time.

I did have time to be interviewed last week by Karlin Lillington for this Guardian story:

To make sure customs agents could read his cat's chip to match him to his Pet Passport on return to Europe, Mason bought his own scanner at a cost of some £200. "I didn't want to risk the cat being impounded for six months' quarantine at Heathrow," he sighs.

It's true.

Happy to be back -- I think. Looking forward to my first pints, in over a year, of creamy Guinness in its native habitat. I also have a couple of half-written weblog entries I wrote on the plane, too...

Yahoo! delete b3ta newsletter mailing list?

Today's top item on the b3ta front page, under Site News:

Yahoo please talk to us! Help! - our yahoogroups list (with over 100,000 subscribers) has been deleted. We don't know why. If you work at Yahoo and can help us sort this out please contact me at robmanuel AT gmail dot com.

posted by rob on 10th Feb at 2pm

B3ta is a long-established UK humour site who send out a weekly newsletter, every Friday afternoon, using Yahoo! Groups as their mailing list service. They've been doing this for years. Yep, that's 100,000 subscribers.

Anyway, if anyone from Y!Groups, or anyone who knows someone there, is reading, please do get in touch with the b3ta guys -- this is a very serious catastrophe for them. I'd be curious to hear how/why this happened.

To tie this into spam-filtering and email operational topics, it brought this posting from Jeremy Zawodny to mind:

This all makes me wonder if it's worth it for smaller organizations to bother running their own mail servers anymore. If Google offered small business mail the way Yahoo does, there'd be some serious competition in the market and it'd make a lot of people's lives much easier.

While Jeremy was talking about a different service from list hosting, I think we're seeing the other side of the email-outsourcing coin, here.

Update: fwiw, it's back:

Yahoo update - on Friday Yahoo deleted our list of 100,000 newsletter readers email addresses, hence we didn't send a newsletter. Today they've been in touch and have promised a response by Tuesday. Fingers crossed. UPDATE: It looks like it's back! Hooray for Yahoo!

Broadband choices in Ireland

Perfect timing! Just 5 days before I return to Ireland, Damien Mulley posts 'Broadband choices in Ireland', a good overview of the options available for consumer broadband internet connection.

I've been out of the loop for quite a while, and spoilt by the options available in suburban Southern California (which are, of course, pretty good). But this is a lot better than what was on the table when I left, 3 years ago.

What strikes me is that the upload/download speeds are quite reasonable and pretty close to what you'd see in the US. Similarly, the prices are finally near to the going rate in the US, once the various limitations and add-ons (required 'bundles', state taxes etc.) are taken into consideration.

However, virtually all of these deals use the horrendous concept of download capping! Given that I use this stuff for work, and routinely rsync around 30GB chunks of email corpora between central offices, colo servers, and my desktop, this just won't fly. It could be argued that I'm therefore not a typical broadband consumer, who these deals have been carefully designed to cater for. But seriously -- if a telecommuting software developer isn't a typical broadband consumer, who the hell is? Hey telcos: a little flexibility goes a long way -- don't fence me in. ;)

All in all, it looks like Smart Telecom are the winners; 3Mb/s download, 512Kb upload -- and most importantly, no cap -- for EUR 35 per month. (And check out that XHTML/WAI-compliant website!)

I probably would have gone with Irish Broadband, but for the past 6 months the only thing I've been hearing about them via word-of-mouth has been bad news, detailing customer service meltdown after meltdown. Even the legendarily incompetent 'biddies' of Eircom seem to be getting better reviews nowadays.

Talking of Eircon, our dear old dirty-tricks-wielding celtic-tiger-throttling incumbent telco: the top Sponsored Link on a Google search for irish broadband is:

Irish Broadband

www.eircom.ie -- More speed, prices reduced by 25%, free modem & a free connection!

Scum.

Spamhaus comment on the AOL/Goodmail deal

AOL and Yahoo! have been making a lot of headlines with their plans to reduce their whitelist-management workload -- and make a little pay-to-send money on the side -- with a deal with Goodmail.

Now Spamhaus have gone on the record against the plan:

On Monday, Richard Cox, chief information officer at antispam organization Spamhaus, said that "an e-mail charge will destroy the spirit of the Internet."

"The Internet has become what it is because of freedom of communication. Open discussion is what gives it value. There should be no cost for particular services, and e-mail should be free and accessible to all. This will disenfranchise people."