Skip to content

Category: Uncategorized

Irish ISPs in record company crosshairs

RTE reports that 4 record companies, EMI, Sony BMG, Universal Music and Warner Music, have brought a High Court action to compel Eircom -- Ireland's largest ISP -- to prevent its networks being used for the illegal downloading of music:

Willie Kavanagh, Managing Director of EMI Ireland and chairman of IRMA, said because of illegal downloading and other factors, the Irish music industry was experiencing a "dramatic and accelerating decline" in income. He said sales in the Irish market dropped 30% in the six years up to 2007.

EMI and the other companies are challenging Eircom's refusal to use filtering technology or other measures to voluntarily block or filter illegally downloaded material. Last October Eircom told the companies it was not in a position to use the filtering software.

(I wonder if those dropping sales in the Irish market comprise only CDs sold by Irish shops? 2001 to 2007 is also the time period when physical sales have given way to online shopping on a gigantic scale, especially for music.)

The Irish Times coverage includes another interesting factoid, which appears in a lot of press regarding this case:

Latest figures available, for 2006, indicate that 20 billion music files were illegally downloaded worldwide that year. The music industry estimates that for every single legal download, there are 20 illegal ones.

A little research reveals that that figure comes from the IFPI Digital Music Report 2008. I'd have a totally different take on it, however. In my opinion, the figure is probably correct, but not for the reasons the IFPI want them to be. There are a number of factors:

There's more commentary on the 20-to-1 figure here.

The IFPI Digital Music Report 2008 also notes:

“2007 was the year ISP responsibility started to become an accepted principle. 2008 must be the year it becomes reality”

Governments are starting to accept that Internet Service Providers (ISPs) should take a far bigger role in protecting music on the internet, but urgent action is needed to translate this into reality, a new report from the international music industry says today.

ISP cooperation, via systematic disconnection of infringers and the use of filtering technologies, is the most effective way copyright theft can be controlled. Independent estimates say up to 80 per cent of ISP traffic comprises distribution of copyright-infringing files.

The IFPI Digital Music Report 2008 points to French President Sarkozy’s November 2007 plan for ISP cooperation in fighting piracy as a groundbreaking example internationally. Momentum is also gathering in the UK, Sweden and Belgium. The report calls for legislative action by the European Union and other governments where existing discussions between the music industry and record companies fail to progress.

So it seems Ireland is the vanguard of an international effort by IFPI members to force ISPs to install filtering, worldwide. It seems the same happened in Belgium last year -- and I reckon there'll be similar cases elsewhere soon.

Either way, I doubt this will be good for Irish internet users.

(PS: while I'm talking about buying MP3s online -- a quick plug for 7digital. Last time I used them, I had a pretty crappy experience, but the situation is a lot better nowadays. They now have a great website that works perfectly in Firefox on Linux; they sell brand new releases like the Hercules and Love Affair album as 320kbps DRM-free MP3s; they support PayPal payments; and downloads are fast and simple -- right click, "Save As". hooray!)

Some other blog coverage: Lex Ferenda with some details about the legal situation, and Jim Carroll.

Update: EMI Ireland seem to be singing from a different hymn-sheet than their head office... interesting.

Update 2: I've taken a look at the Copysense filtering technology, and how it can be evaded.

Announcing IrishPulse

As I previously threatened, I've gone ahead and created a "Microplanet" for Irish twitterers, similar to Portland's Pulse of PDX -- an aggregator of the "stream of consciousness" that comes out of our local Twitter community: IrishPulse.

Here's what you can do:

Add yourself: if you're an Irish Twitter user, follow the user 'irishpulse'. This will add you to the sources list.

Publicise it: feel free to pass on the URL to other Irish Twitter users, and blog about it.

Read it: bookmark and take a look now and again!

In terms of implementation, it's just a (slightly patched) copy of Venus and a perl script using Net::Twitter to generate an OPML file of the Twitter followers. Here's the source. I'd love to see more "Pulse" sites using this...

Google’s CAPTCHA – not entirely broken after all?

A couple of weeks ago, WebSense posted this article with details of a spammer's attack on Google's CAPTCHA puzzle, using web services running on two centralized servers:

[...] It is observed that two separate hosts active on same domain are contacted during the entire process. These two hosts work collaboratively during the CAPTCHA break process. [...]

Why [use 2 hosts]? Because of variations included in the Google CAPTCHA image, chances are that host 1 may fail breaking the code. Hence, the spammers have a backup or second CAPTCHA-learning host 2 that tries to learn and break the CAPTCHA code. However, it is possible that spammers also use these two hosts to check the efficiency and accuracy of both hosts involved in breaking one CAPTCHA code at a time, with the ultimate goal of having a successful CAPTCHA breaking process.

To be specific, host 1 has a similar concept that was used to attack Live mail CAPTCHA. This involved extracting an image from a victim’s machine in the form of a bitmap file, bearing BM.. file headers and breaking the code. Host 2 uses an entirely different concept wherein the CAPTCHA image is broken into segments and then sent as a portable image / graphic file bearing PV..X file headers as requests. [...]

While it doesn't say as such, some have read the post to mean that Google's CAPTCHA has been solved algorithmically. I'm pretty sure this isn't the case. Here's why.

Firstly, the FAQ text that appears on "host 1" (thanks Alex for the improved translation!):

img

FAQ

If you cannot recognize the image or if it doesn’t load (a black or empty image gets displayed), just press Enter.

Whatever happens, do not enter random characters!!!

If there is a delay in loading images, exit from your account, refresh the page, and log in again.

The system was tested in the following browsers: Internet Explorer Mozilla Firefox

Before each payment, recognized images are checked by the admin. We pay only for correctly recognized images!!!

Payment is made once per 24 hours. The minimum payment amount is $3. To request payment, send your request to the admin by ICQ. If the admin is free, your request will be processed within 10-15 minutes, and if he is busy, it will be processed as soon as possible.

If you have any problems (questions), ICQ the admin.

That reads to me a lot like instructions to human "CAPTCHA farmers", working as a distributed team via a web interface.

Secondly, take a look at the timestamps in this packet trace:

img2

The interesting point is that there's a 40-second gap between the invocation on "Captcha breaking host 1" and the invocation on "Captcha breaking host 2". There is then a short gap of 5 seconds before the invocations occur on the Gmail websites.

Here's my theory: "host 1" is a web service gateway, proxying for a farm of human CAPTCHA solvers. "host 2", however, is an algorithm-driven server, with no humans involved. A human may take 40 seconds to solve a CAPTCHA, but pure code should be a lot speedier.

Interesting to note that they're running both systems in parallel, on the same data. By doing this, the attackers can

  1. collect training data for a machine-learning algorithm (this is implied by the 'do not enter random characters!' warning from the FAQ -- they don't want useless training data)

  2. collect test cases for test-driven development of improvements to the algorithm

  3. measure success/failure rates of their algorithms, "live", as the attack progresses

Worth noting this, too:

Observation*: On average, only 1 in every 5 CAPTCHA breaking requests are successfully including both algorithms used by the bot, approximating a success rate of 20%. The second algorithm (segmentation) has very poor performance that sometimes totally fails and returns garbage or incorrect answers.

So their algorithm is unreliable, and hasn't yet caught up with the human farmers. Good news for Google -- and for the CAPTCHA farmers of Romania ;)

Update: here's the NYTimes' take, with broadly agreeing comments from Brad Taylor of Google. (The Register coverage is off-base, however.)

On the effects of lowering your SpamAssassin threshold

So I was chatting to Danny O'Brien a few days ago. He noted that he'd reduced his Spamassassin "this is spam" threshold from the default 5.0 points to 3.7, and was wondering what that meant:

I know what it means in raw technical terms -- spamassassin now marks anything >3.7 as spam, as opposed to the default of five. But given the genetic algorithm way that SA calculates the rule scoring, what does lowering the score mean? That I'm more confident that stuff marked ham is stuffed marked ham than the average person? That my bayesian scoring is now really good?

Do people usually do this without harmful side-effects? What does it mean about them if they do it?

Does it make me a good person? Will I smell of ham? These are the things that keep me awake at night.

It's a good question! Here's what I responded with -- it occurs to me that this is probably quite widely speculated about, so let's blog it here, too.

As you tweak the threshold, it gets more or less aggressive.

By default, we target a false positive rate of less than 0.1% -- that means 1 FP, a ham marked as spam incorrectly, per 1000 ham messages. Last time the scores were generated, we ran our usual accuracy estimation tests, and got a false positive rate of 0.06% (1 in 1667 hams) and a false negative rate of 1.49% (1 in 67 spams) for the default threshold of 5.0 points. That's assuming you're using network tests (you should be) and have Bayes training (this is generally the case after running for a few weeks with autolearning on).

If you lower the threshold, then, that trades off the false negatives (reducing them -- less spam getting past) in exchange for more false positives (hams getting caught). In those tests, here's some figures for other thresholds:

SUMMARY for threshold 3.0: False positives: 290 0.43% False negatives: 313 0.26%

SUMMARY for threshold 4.0: False positives: 104 0.15% False negatives: 1084 0.91%

SUMMARY for threshold 4.5: False positives: 68 0.10% False negatives: 1345 1.13%

so you can see FPs rise quite quickly as the threshold drops. At 4.0 points, the nearest to 3.7, 1 in 666 ham messages (0.15%) will be marked incorrectly as spam. That's nearly 3 times as many FPs as the default setting's value (0.06%). On the other hand, only 1 in 109 spams will be mis-filed.

Here's the reports from the last release, with all those figures for different thresholds -- should be useful for figuring out the likelihoods!

In fact, let's get some graphs from that report. Here is a graph of false positives (in orange) vs false negatives (in blue) as the threshold changes...

and, to illustrate the details a little better, zoom in to the area between 0% and 1%...

You can see that the default threshold of 5.0 isn't where the FP% and FN% rates meet; instead, it's got a much lower FP% rate than FN%. This is because we consider FPs to be much more dangerous than missed spams, so we try to avoid them to a higher degree.

An alternative, more standardized way to display this info is as a Receiver Operating Characteristic curve, which is basically a plot of the true positive rate vs false positives, on a scale from 0 to 1.

Here's the SpamAssassin ROC curve:

More usefully, here's the ROC curve zoomed in nearer the "perfect accuracy" top-left corner:

Unfortunately, this type of graph isn't much use for picking a SpamAssassin threshold. GNUplot doesn't allow individual points to be marked with the value from a certain column, otherwise this would be much more useful, since we'd be able to tell which threshold value corresponds to each point. C'est la vie!

Update:: this is possible with GNUplot 4.2 onwards, it seems. great news! Hat tip to Philipp K Janert for the advice. here are updated graphs using this feature:

(GNUplot commands to render these graphs are here.)

Update again: much better interactive Flash graphs here.

Microplanets

Intriguing! Via Glynn Moody comes an interesting new site, Pulse of Open Source:

To highlight open source activity on Twitter, I have launched a new web application today called The Pulse of Open Source. This is the stream of collective consciousness from the open source community on Twitter. You can follow this stream by simply bookmarking the site and visiting regularly or by adding the RSS feed to your feed reader. You can also create a Twitter account and add the individuals you’d like to follow to your own Twitter friends list if you’d prefer. There is also a mobile version of the site for on-the-go viewing.

I'm not entirely convinced it makes sense -- the "open source community" is a pretty wide and amorphous concept, covering "enterprisey" types like Iona, to conference organisers, to web standards guys to GNOME developers. That's a wide range.

However, that site links to the original, and a version which resonates better: PulseOfPDX.com, 'the stream of Portland's collective consciousness'. Basically, this is a local syndication site, with microblogging from a community of local Twitterers. Similar to the "Planet" concept, which aggregates posts from multiple weblogs into a new 'river of news' combined feed, as seen on Planet Antispam, Planet Perl, Planet.journals.ie, but for off-the-cuff Twitter microblog comments. It's a microplanet, to coin a phrase.

I think I might set up one of these for Ireland... what a great idea!

Update: Ted Leung posted about this today as well, I see, linking to this call for an "out-of-the-box" Twitter aggregator:

In theory, this whole pulse idea could be packaged up to be as easily deployable as “planet” sites. Here, “pulse” is the operational brand-name of aggregating Twitter accounts, where as “planet” is the tried and true operational brand-name of aggregating blogs.

I think I still prefer "microplanet" ;)

Update 2: check out IrishPulse!

Bea picture of the week


Another super-cute photo of Bea, from the latest batch. Nowadays, my photos are all-Bea, all of the time...


Plug plug

It's been a while since I've posted about good shopping experiences I've had. Here's a couple:

SoleTrader.co.uk: I'm a terrible shopper. I hate shops, I always wind up having to visit them at their busiest times on the weekend, and the last time I tried to go shopping for a new pair of shoes, I got caught in torrential rain, fell over and broke my thumb instead. seriously. So feck that.

Instead, I resolved to buy them online, and that I did -- from SoleTrader. They had a great range of trainers, I found what I was after, the price was grand, and delivery on time. Shoes are always the same size -- their sizes are standardised, after all -- so naturally they fit fine. All in all, it worked out great.

Be Organic: these guys operate in North Dublin, delivering bags of organic fruit and vegetables to your door, weekly. We get the Essential Fruit Bag and the Mini Box, with a bi-weekly bag of spuds on top, for EUR 32 per week. The quality of the food is absolutely fantastic, there's never any spoilage or wilting, and it's always fresh and delicious. Compared to supermarket fare, it's leagues ahead. They've also been grand and flexible when we need to tweak the order slightly -- for example we have a veto on celery, and that's not an issue at all. The only problem would be that they've recently increased their prices... but unfortunately that seems to be a general problem in Ireland these days!

vote for Dustin on Saturday

A friend of a friend writes:

Unless you are pretty good at avoiding the media, you will be aware that Dustin the Turkey has been chosen as one of six finalists for RTE's Eurosong, the winner of which will go on to represent Ireland in the Eurovision Song Contest in Serbia in May.

What you may not be aware of is that I wrote and recorded the song with him and need your votes to help get me to Serbia!!!

The TV show will be broadcast live on RTE this Saturday Feb 23rd, at 7pm. It is a televote (a la X-factor format), so get your mobile phones ready. The results are at 9:45pm.

The song, Irlande Douze Points, is a parody on the current types of songs, acts and block-voting in the Eurovision. It may make your ears bleed a bit, you may ask yourself why, but what the hell, send someone you know to the final!!!

Apparently, Dustin urges the contest judges to "give douze points to Ireland, for its lowlands and its highlands, for Terry Wogan's wig and Bono's leather pants. We brought you Guinness and Westlife, 800-years of war and strife, but we all apologise for Riverdance."

Check out the outraged reactions from Ireland's past Eurovision "winners":

Frank McNamara, who wrote two of the Irish Eurovision winners, asked whether RTE, the state broadcaster that selected the six acts, was “giving two fingers” to Irish 'song'writers. “I think it is absolutely disgraceful."

Shay Healy, who wrote Johnny Logan’s Eurovision hit What’s Another Year?, wondered “how any bunch of grown-ups could come up with this as a solution”

Phil Coulter thought that Eurovision was going “down the tubes”.

The choice on Saturday is between a turkey puppet taking the piss in a Northside accent, and such po-faced "serious pop" mawkfests as '"Double Cross My Heart" performed by Donal Skehan' and '"Time to Rise" performed by Maya'. snore. You know it's got to be the turkey.

Here's the official Bebo page, and the Facebook group -- and here's the song itself:

Update: actually, here's another, higher quality clip -- with an entirely different song! Let's hope this is the one...

Update 2: he won. Dana and the other professional Eurovision types have been chewing wasps, it's hilarious!

A historical DailyWTF moment

Today, in work, we wound up discussing this classic DailyWTF.com article -- "Remember, the enterprisocity of an application is directly proportionate to the number of constants defined":

public class SqlWords
{
  public const string SELECT = " SELECT ";
  public const string TOP = " TOP ";
  public const string DISTINCT = " DISTINCT ";
  /* etc. */
}

public class SqlQueries
{
  public const string SELECT_ACTIVE_PRODCUTS =
    SqlWords.SELECT +
    SqlWords.STAR +
    SqlWords.FROM +
    SqlTables.PRODUCTS +
    SqlWords.WHERE +
    SqlColumns.PRODUCTS_ISACTIVE +
    SqlWords.EQUALS +
    SqlMisc.NUMBERS_ONE;
  /* etc. */
}

This made me recall the legendary source code for the original Bourne shell, in Version 7 Unix. As this article notes:

Steve Bourne, at Bell Labs, worked on his version of shell starting from 1974 and this shell was released in 1978 as Bourne shell. Steve previously was involved with the development of Algol-68 compiler and he transferred general approach and some syntax sugar to his new project.

"Some syntax sugar" is an understatement. Here's an example, from cmd.c:

LOCAL REGPTR    syncase(esym)
        REG INT esym;
{
        skipnl();
        IF wdval==esym
        THEN    return(0);
        ELSE    REG REGPTR      r=getstak(REGTYPE);
                r->regptr=0;
                LOOP wdarg->argnxt=r->regptr;
                     r->regptr=wdarg;
                     IF wdval ORF ( word()!=')' ANDF wdval!='|' )
                     THEN synbad();
                     FI
                     IF wdval=='|'
                     THEN word();
                     ELSE break;
                     FI
                POOL
                r->regcom=cmd(0,NLFLG|MTFLG);
                IF wdval==ECSYM
                THEN    r->regnxt=syncase(esym);
                ELSE    chksym(esym);
                        r->regnxt=0;
                FI
                return(r);
        FI
}

Here are the #define macros Bourne used to "Algolify" the C compiler, in mac.h:

/*
 *      UNIX shell
 *
 *      S. R. Bourne
 *      Bell Telephone Laboratories
 *
 */

#define LOCAL   static
#define PROC    extern
#define TYPE    typedef
#define STRUCT  TYPE struct
#define UNION   TYPE union
#define REG     register

#define IF      if(
#define THEN    ){
#define ELSE    } else {
#define ELIF    } else if (
#define FI      ;}

#define BEGIN   {
#define END     }
#define SWITCH  switch(
#define IN      ){
#define ENDSW   }
#define FOR     for(
#define WHILE   while(
#define DO      ){
#define OD      ;}
#define REP     do{
#define PER     }while(
#define DONE    );
#define LOOP    for(;;){
#define POOL    }


#define SKIP    ;
#define DIV     /
#define REM     %
#define NEQ     ^
#define ANDF    &&
#define ORF     ||

#define TRUE    (-1)
#define FALSE   0
#define LOBYTE  0377
#define STRIP   0177
#define QUOTE   0200

#define EOF     0
#define NL      '\n'
#define SP      ' '
#define LQ      '`'
#define RQ      '\''
#define MINUS   '-'
#define COLON   ':'

#define MAX(a,b)        ((a)>(b)?(a):(b))

Having said all that, the Bourne shell was an awesome achievement; many of the coding constructs we still use in modern Bash scripts, 30 years later, are identical to the original design.

Technorati bloginfo API wierdness

For the benefit of other Technorati API users...

In a comment on this entry, Padraig Brady mentioned that his blog had mysteriously disappeared from the Irish Blogs Top 100 list.

I investigated, and found something odd -- it seems Technorati has made a change to their bloginfo API, now listing weblogs with their 'rank', but without some of the important metadata, like 'inboundblogs', 'inboundlinks', and with a 'lastupdate' time set to the epoch (1970-01-01 00:00:00 GMT), in the API. Here's an example:

<!-- generator="Technorati API version 1.0" -->
<!DOCTYPE tapi PUBLIC "-//Technorati, Inc.//DTD TAPI 0.02//EN"
                 "http://api.technorati.com/dtd/tapi-002.xml">
<tapi version="1.0">
<document>
    <result>
        <url>http://www.pixelbeat.org</url>
                    <weblog>
                <name>Pádraig Brady</name>
                <url>http://www.pixelbeat.org</url>
                <rssurl></rssurl>
                <atomurl></atomurl>
                <inboundblogs></inboundblogs>
                <inboundlinks></inboundlinks>
                <lastupdate>1970-01-01 00:00:00 GMT</lastupdate>
                <rank>74830</rank>
            </weblog>
                            </result>
</document>
</tapi>

Compare that with this lookup result, on my own blog:

<?xml version="1.0" encoding="utf-8"?>
<!-- generator="Technorati API version 1.0" -->
<!DOCTYPE tapi PUBLIC "-//Technorati, Inc.//DTD TAPI 0.02//EN"
                 "http://api.technorati.com/dtd/tapi-002.xml">
<tapi version="1.0">
<document>
    <result>
        <url>http://taint.org</url>
                    <weblog>
                <name>taint.org: Justin Mason’s Weblog</name>
                <url>http://taint.org</url>
                <rssurl>http://taint.org/feed</rssurl>
                <atomurl>http://taint.org/feed/atom</atomurl>
                <inboundblogs>143</inboundblogs>
                <inboundlinks>227</inboundlinks>
                <lastupdate>2008-02-12 11:48:10 GMT</lastupdate>
                <rank>43404</rank>
            </weblog>
                            <inboundblogs>143</inboundblogs>
                            <inboundlinks>227</inboundlinks>
            </result>
</document>
</tapi>

This bug had caused a number of blogs to be dropped from the list, since I was using "inboundblogs and inboundlinks == 0" as an indication that a blog was not registered with Technorati.

It's now worked around in my code, although a side-effect is that blogs which have this set will appear with question-marks in the 'inboundblogs' and 'inboundlinks' columns, and will perform poorly in the 'ranked by inbound link count' table (unsurprisingly).

I've posted a query to the support forum -- let's see what the story is.

Interesting Irish Blog Awards shortlistee

This year's Irish Blog Awards shortlists were posted yesterday. I maintain the Irish Blogs Technorati Top 100 list, so good sources of Irish blog URLs are always welcome; I took the shortlisted blogs and added them all.

Interestingly, straight in at number 2 went towleroad.com (warning: not worksafe!). It has a staggering Technorati rank of 1074 -- way ahead of Donncha's 5831 or Mulley's 8678. I was pretty curious as to how an Irish blog could hit those heights without me having heard of it, so I took a look.

Let's just say the content isn't quite what you'd expect to find in a blog shortlisted for 'Best News/Current Affairs Blog' -- a little bit short on Irish news, but heavy on pictures of naked guys getting off with each other. ;)

I took a quick glance, and I couldn't spot any Irish content. WHOIS says the the publisher is LA-based. so I'm curious as to what qualified it as an "Irish blog"...

(by the way, I tried to leave a comment on the blog entry, but it appears Akismet is marking my comments as spam on a number of Wordpress-based blogs at the moment. Yes, I am aware of the irony. No, if SpamAssassin was a blog-spam filter, it wouldn't do that ;)

Update: it's sorted -- they're now gone. Also, it appears I've been removed from the Akismet blacklist, yay.

More on the Trend Micro patent

Dutch free knowledge and culture advocacy group ScriptumLibre.org has called for a worldwide boycott of Trend Micro products. Their chairman, Wiebe van der Worp, claims Trend Micro's aggressive use of litigation is "well beyond the borders of decency".

Also, this Linux.com feature has a great quote from Jim Zemlin, the executive director of the Linux Foundation:

"A company that files a patent claim against code coming from a widely adopted open source project vastly underestimates the self-inflicted damage to its customer and community relationships. In today's world, all of our customers in the software industry are enjoying the benefits of a wide variety of open source projects that provide stability and vendor-neutral solutions to the most basic of their computing needs. I talk to those customers every day. They consider these claims short-sighted and those that assert them to be fearful of their ability to compete in today's economy."

Well said.

Plug: Lenovo service still rocks

I needed to buy a new laptop for work a few months back, and after a little agonizing between the MacBook Pro and a Thinkpad T61p, I plumped for the latter. As I noted at the time, one of the major selling points was the quality of IBM/Lenovo's after-sales warranty service, compared to the atrocious stories I'd heard about AppleCare in Europe. I was, however, taking a leap of faith -- I had used IBM service to great effect in the US, but had never actually tried it out in Ireland.

Sadly, I had to put this to the test today, after the hard disk started producing these warnings:

/var/log/messages:Feb  7 11:21:13 wall kernel: 
[2075890.116000] end_request: I/O error, dev sda, sector 116189461
/var/log/messages:Feb  7 11:21:38 wall kernel: 
[2075914.824000] end_request: I/O error, dev sda, sector 116189460
/var/log/messages:Feb  7 11:24:18 wall kernel: 
[2076075.072000] end_request: I/O error, dev sda, sector 116189462
/var/log/messages:Feb  7 11:25:05 wall kernel: 
[2076121.932000] end_request: I/O error, dev sda, sector 116189463

It's a brand new machine, and a Hitachi TravelStar 7K100 drive, with a good reputation for reliability -- but these things do happen. :(

Interestingly, I thought this was a case of the Bathtub curve in action -- but this comprehensive CMU study of hard drive reliability notes that the 'infant mortality' concept doesn't seem to apply to current hard-drive technology:

Replacement rates [of hard drives in a cluster] are rising significantly over the years, even during early years in the lifecycle. Replacement rates in HPC1 nearly double from year 1 to 2, or from year 2 to 3. This ob- servation suggests that wear-out may start much earlier than expected, leading to steadily increasing replacement rates during most of a system’s useful life. This is an in- teresting observation because it does not agree with the common assumption that after the first year of operation, failure rates reach a steady state for a few years, forming the “bottom of the bathtub”.

Anyway, I digress.

I ran the BIOS hard disk self-test, got the expected failure, then rang up Lenovo's International Warranty line for Ireland. I got through immediately to a helpful guy in India, and gave him my details and the BIOS error message; he had no tricky questions, no guff about me using Linux rather than Windows, and there were no attempts to sting me for shipping.

There's now a replacement HD (and a set of spare recovery disks, bonus!) winging their way via 2-day shipping, expected on Tuesday; I'm to hand over the broken HD to the courier once it arrives. Fantastic stuff!

Assuming the courier doesn't screw up, this is yet another major win for IBM/Lenovo support, and I feel vindicated. ;)

Update: the HD arrived this morning at 10am -- a day early. Very impressive!

CEAS needs your ham

CEAS 2008 is doing another Spam Challenge test of various spam-filters, and as part of this, they need samples of ham mail messages.

As part of the data collection effort, we have set up a website through which it is possible to donate non-sensitive legitimate email, to be used in the evaluation. Any kind of email that the recipient considers legitimate is welcome, including computer generated (non-spam) messages.

After the CEAS evaluation, the benchmark data will be made publicly available to facilitate future reasearch and development in the field of spam prevention.

Here is the collection site; they accept UNIX mbox format, and tar.gz or zip files of same, with an 8MB upload limit.

Remote sound playback through a Nokia 770

For a while now, I've been using various hacks to play music from my Linux laptop, holding my main music collection, to client systems which drive the speakers.

Previously, I used this setup to play via my MythTV box. Nowadays, however, my TV isn't in the room where I want to listen to music. Instead, I have my Nokia 770 hooked up to the speakers; this plays the BBC Radio 4 RealAudio streams nicely, and also the laptop's MP3 collection using a uPnP AV MediaServer.

I specifically use TwonkyMedia right now, playing back via the N770's Media Streamer app. (That works pretty well -- uPnP AV is one of those standards plagued with incompatibilities, but TwonkyMedia and Media Streamer seem to be a reliable combination.)

However, TwonkyMedia sometimes fails to notice updates of the library, and nothing has quite as good a music-player user interface as JuK, the KDE music player and organiser app, so a way to play directly from the laptop instead of via uPnP would be nice...

A weekend's hacking reveals that this is pretty easily done nowadays, thanks to some cool features in pulseaudio, the current standard sound server on Ubuntu gutsy, and the Esound server running on the N770.

Unfortunately, the N770 doesn't (yet) support pulseaudio directly, otherwise we could use its seriously cool support for RTP multicast streams. Still, we can hack something up using the venerable "esd" protocol (again!) Here's how to set it up...

On the N770:

You need to fix the N770's "esd" sound server to allow public connections. Set up your wifi network's DHCP server to give the N770 a static IP address. Log in over SSH, or fire up an xterm. Run the following:

mv /usr/bin/esd /usr/bin/esd.real

cat > /usr/bin/esd <<EOM
#!/bin/sh
exec /usr/bin/esd.real -tcp -public -promiscuous -port 5678 $*
EOM

chmod 755 /usr/bin/esd
/etc/init.d/esd restart

On the server:

Download this file, and save it as n770.pa. Edit it, and change server=n770:5678 on the fourth line to use the IP address or hostname of your Nokia 770 instead of n770. Then run:

cp n770.pa ~/.n770.pa

cat > ~/bin/sound_n770 <<EOM
#!/bin/sh
pulseaudio -k; pulseaudio -nF $HOME/.n770.pa &
EOM

cat > ~/bin/sound_here <<EOM
#!/bin/sh
pulseaudio -k; pulseaudio &
EOM

chmod 755 ~/bin/sound_here ~/bin/sound_n770

Now you just need to run '~/bin/sound_n770' to redirect sound playback to the N770, and '~/bin/sound_here' to reset back to laptop speaker output, for the entire desktop environment. Nifty!

Update: it appears that things may work more reliably if you add "rate=22050" at the end of the "load-module module-esound-sink" line -- this halves the bitrate of the network stream, which copes better with harsh wifi network conditions. The n770.pa file above now includes this.

Irish crumblies don’t trust blogs

It appears a public relations firm, Edelman's, recently performed a phone survey which concluded that bloggers are the "least trusted" group of authority figures source of information in Ireland. This has been widely reported:

on Edelman Dublin's blog:

When we consider who we trust the most as a spokesperson in Ireland, the most trusted sources of information include, financial or industry analysts at 62%, followed by a doctor or healthcare specialist at 57%, an NGO representative at 57% and academics at 53%. Bloggers are the least trusted at 7%.

at Silicon Republic:

Bloggers have emerged as the “least trusted” group in the country.

and on ElectricNews.net:

"What has been interesting to note in this year's findings is the apparent low standings of bloggers and social media in general," said [Mark Cahalane, managing director of Edelman Dublin]. "One interpretation of the survey would be that bloggers have now entered the mainstream and people no longer distinguish between blogs and ordinary websites. This is also reflected by the fact that numerous high profile bloggers are widely quoted in the media."

However, as Damien noted, Piaras Kelly raised a very significant point about this -- 'the people surveyed for the research had to fit a certain demographic, including having to be aged between 35-64.' [...] 'A Generational gap is evident.' This press release corroborates that. Sure enough, most blog readers (and writers) would tend to be of the younger generation -- a pretty key point, one would assume, but one that most of the non-blogger coverage has omitted ;)

(Update: the term "authority figure" wasn't quite correct; replaced with what Edelman themselves use, "source of information".)

Trend Micro’s attack on open source

Trend Micro are demanding that Barracuda Networks pay licensing fees, alleging that they infringe U.S. Patent No. 5,623,600 with their use of the open-source anti-virus tool ClamAV. Here's a Barracuda press release, and here's some details from Barracuda:

Trend Micro alleges that Barracuda Networks and ClamAV infringe on Trend Micro's U.S. Patent No. 5,623,600. Barracuda Networks believes that the patent is invalid due to prior art and further believes that neither its products nor the ClamAV software infringe the patent.

On Sept. 21, 2006, Trend Micro sent Barracuda Networks a letter regarding a license to Trend Micro's '600 patent. After several discussions on paying a license for the patent, Trend Micro demanded Barracuda Networks either remove ClamAV from its products or pay a patent license fee. Barracuda Networks felt it had no choice other than to file for a declaratory judgment in early 2007 in U.S. Federal Court to invalidate Trend Micro's '600 patent and end continued legal threats against Barracuda Networks for use of the free and open source ClamAV software.

Trend Micro subsequently responded to that declaratory action and more recently, Trend Micro filed a claim with the International Trade Commission (ITC). The ITC voted to investigate the claim in December 2007. Trend Micro's ITC claim alleges that Barracuda Networks infringes on Trend Micro's '600 patent, but effectively implies that anyone using the free and open source ClamAV software at the gateway infringes the patent.

The interesting aspects of this case, from my point of view, are twofold -- the patent is a classic bad software patent, very broad and totally obvious both now and at the time it was issued; and it hinges on Barracuda's use of the free software antivirus product, ClamAV. Given Apache SpamAssassin's prevalence in many anti-spam mail filtering appliances (including Barracuda!), this is a very worrying precedent for us -- our product could be next, for some other patent troll company's extortion scheme.

For what it's worth, it appears this patent has long been a licensing moneyspinner for Trend. In 1997, once the patent was issued, Trend went on a spree; McAfee, Symantec and Integralis were sued, eventually buying licenses, as did Electric Mail Company. 2 years ago, Fortinet were sued and settled in their case.

I happily gave Barracuda a quote for their press release on this:

"Trend Micro's actions are clearly an attack on free and open source software and its users, as well as on Barracuda Networks. The '600 patent covers a trivial method, one which was obvious to anyone skilled in the art at the time the patent was written, and should be rendered invalid as soon as possible. I hope that Barracuda Networks is successful in its attempts to defend all users from this patent shakedown."

If you know of prior art for this patent, please head over to Barracuda's site and provide details -- helping to fend off this protection racket would be good for all of us. Barracuda say:

People should look for art dated prior to Trend Micro's filing date of September 26, 1995. The '600 patent is entitled "Virus Detection And Removal Apparatus For Computer Networks." We are interested in all material, including software, code, publications or papers, patents, communications, other media or Web sites that relate to the technology described prior to the filing date.

In particular, this prior art should show antivirus scanning on a firewall or gateway. However, many of the claims do not require virus detection at a gateway. So any material that illustrates virus scanning on a file server is also of interest.

We also believe that a product called MIMESweeper 1.0 from a company called Clearswift, Authentium, or Integralis anticipates several claims of the '600 patent. We have yet to locate a copy of this product and would appreciate anyone who has a copy sending it our way.

Some more coverage:

  • Don Marti at LinuxWorld: 'Regardless of the decision in this case, software patent trolls will continue to be a problem for all software companies, Eben Moglen says. "Getting them to [not operate] in your neighborhood is the best you can do."'

  • Matt Asay at C|Net: 'Antivirus and antispam innovation has tended to come from open source, not the large proprietary vendors. Trend Micro's lawsuit is designed to put cash in its pocket but will end up hurting the consumer.' (Matt led with my quote ;)

  • GrokLaw: 'Anyone using ClamAV, should Trend Micro be successful, is potentially a target.'

  • Ars Technica: 'The patent is very clearly without merit, but that hasn't stopped Trend Micro from using it to threaten ClamAV and extort money from several companies. Situations like this demonstrate a very urgent need for patent reform and illuminate the risks posed by broad software patents, particularly in the area of security.'

Interview with two phish-scene infiltrators

/. posted a link to this interview with Nitesh Dhanjani and Billy Rios, two guys who have infiltrated the "phishing underground".

It's a good article -- lots of detail on the current toolset of a typical phisher, and some details on the community itself:

I had always thought that most phishers were clever hackers evading authorities using the latest evasion techniques and tools. The reality of the matter is most of the phishers we tracked were sloppy and unsophisticated. The tools they used were rarely created by the phisher deploying the actual scam, and for the most part it seemed the phisher merely downloaded kits and tools from some place and reused over and over and over again. It also seemed that many phishers don't even really understand how the phishing kits they've deployed work! We also came across many phishing kits and tools that had simple backdoors written into the source code (essentially, phishers phishing phishers). These backdoors are easily spotted by anyone who has even a basic idea of how the source code flow worked, yet was undetected by many phishers. Maybe a few phishers out there are skilled, but the majority are clueless.

Here's something I've noted about spammers, too -- there's no honour among thieves:

The number of backdoors we saw was staggering. The servers serving the phishing sites had backdoors, the code used in the phishing kits had backdoors, the tools used by phishers had backdoors. Phishers aren't afraid to steal from regulars people and they are also not afraid to steal from other phishers. Some of the backdoors were meant to keep control over a compromised server, while other simply stole information that had been stolen by other phishers! We came across several forums where phishers, scammers, and carders basically identified other phishers, scammers, and carders that had scammed them. These shady characters may work with each other but they sure don't trust each other, that's for sure.

And this is a very important point about blacklists:

Phishers are likely to abuse the blacklists published for [anti-phishing] plugins for their own benefit. The blacklists are a list of known phishing sites that the plugins consume in order to identify what websites are fraudulent. These blacklists therefore contain IP addresses and host names of servers hosting phishing sites. Since phishing sites are commonly installed on servers that have been compromised, and phishers don't bother to patch systems they have installed their kits on, this list translates to a 'list of easily compromisable hosts' for other phishers.

On the latter point, this is one of the key benefits of DNS blocklists, compared to the downloaded, text-based style that Google initially used for its anti-phishing toolbar. To query a DNSBL, you need to know the address you're looking for first of all; but with a text file, you can read the lists in their entirety, without knowing the address in advance. (Google is now apparently tending to use the enchash format, which fixes this.)

And a final word:

For the next few years, we are going to continue to apply band-aids around the problem of data leakage, and continue to play whack-a-mole with the phishers without solving the actual problem at hand. In order to make any significant progress, we must come up with a brand new system that does away with depending on static identifiers. We will know weâ??ve accomplished this when we will be able to publish our credit reports publicly without fearing for our identities.

(I'd place more importance on the liability of the financial institutions, myself -- I think they get away with placing too much blame on the victims of fraud and identity theft.)

Good interview -- worth reading.

Insane Dell.ie markup

A good deal came up on a mailing list I'm on: SAMSUNG 245BW Black High Glossy 24" 5ms DVI Widescreen LCD Monitor for $459.99, or $409.99 after rebate, via Newegg.

A follow-up from a German poster: he'd just picked up a Dell 2407WFP-HC 'for the low, low price of 659 EUR'.

We marvelled at the price difference -- then I looked up Dell.ie forcomparison. I thought 659 EUR was bad, but Dell.ie is asking for 1,117.74 Euros inc VAT for the same product -- insane!!

What possible excuse could there be for that? EUR 458.74 worth of shipping maybe? Do they encase it in platinum? That's nearly three times the price of the Newegg monitor.

Update: Duh. I'm an idiot. That's a 2707WFP, not a 2407WFP; it's 3" bigger and quite a bit fancier. It appears Dell.ie is no longer selling the 2407WFP.

Bad law in North Dakota

This is very bad news for North Dakota-based anti-spammers -- a guy called David Ritz is being sued there by alleged porn spammer Jerry Reynolds, for performing DNS lookups, a DNS zone transfer and a Whois lookup. It appears the judge has found Ritz guilty.

This is astonishingly bad lawmaking by the judge. These are entirely innocuous tools, part of every network administrator's toolkit for debugging and examining internet traffic legitimately. There's nothing remotely criminal or malicious in their use, and the judge has allowed himself to be misled.

North Dakota Judge Gets it Wrong:

'Ritz's behavior in conducting a zone transfer was unauthorized within the meaning of the North Dakota Computer Crime Law. A zone transfer is simply asking a DNS server for all the particular public info it provides about a given domain. This is a common task performed by system administrators for many purposes. The judge is saying that DNS zone transfers are now illegal in North Dakota.'

More details from Ed Falk

David's legal defense fund

My Commodore 64 demos

I recently came across my record at the Commodore Scene Database, and was happy to find that someone had found and uploaded two demos I had written, back in my days as a member of the C=64 demo scene between 1988 and 1990:

(I was a member of the groups 'Excess' and 'Thundertronix' / 'TNT', going by the handle of 'Mantis'.)

With the help of CBA, I was overjoyed to track down another long-lost demo, my crowning achievement on the platform:

If you're curious, feel free to go read those wiki pages or download the .d64's -- they run fine in VICE, the Commodore emulator (amazingly). If you've only got time to check one, check Rhaphanadosis; it's much better than the others.

I'm very impressed with VICE. As far as I can tell, it's perfectly bug-for-bug compatible with the real hardware, playing all of the demos perfectly (apart from a little additional speed due to differing hardware performance). If you haven't already got VICE set up, bear in mind that after installing it, you'll need a copy of the C=64's ROM images; here's a local set.

Also, the Commodore Scene Database is pretty awesome -- it's a full-scale IMDB-style setup, tracking the history of the Commodore demo scene in massive detail. Nice work guys!

The demos were written 100% in 6502/6510 assembly. I developed them using an Action Replay cartridge's built-in monitor; it had an assembler, but one which didn't support symbolic addressing. In other words, every piece of assembly used hand-computed branch offsets, and every variable and subroutine was tracked -- on paper -- by memory location, rather than using symbolic labels. If you want to know what the monitor was like, the VICE built-in monitor is almost identical!

I wrote these when I was 16; part 4 of Rhaphandosis notes the date as being 20 May 1989.

It's interesting reading the scrollers, and doing web and CSDB searches in follow-up to see what happened next --- one of the other Excess members, Raistlin is now Robert Troughton, a successful game developer in the UK with several major titles under his belt.

A Google search for Thundertronix finds a copy of "sex'n'crime" zine, issue 17, July 1990, which notes:

one of the new groups formed in 1990 (jm: slightly off, I think) is THUNDERTRONIX, better known as TNT. they are based in ireland and are doing very well for themselves. they have, in my mind, one of the best coders in the uk, namely MANTIS. he is currently coding a game with many new routines, etc... hopefully he should get some demos out soon!

woo! Er, unfortunately that game never went anywhere. ah well. ;)

BTW, it's funny reading my scrollers in those demos. At the time, I was convinced that the c=64 was a dead platform -- yet here we are in 2008, and there's still a thriving demo scene on the Commodore. Incredible!

Vincent Browne on RTE’s coke habit

Before Christmas, it seemed you could hardly read a newspaper, listen to the radio or watch TV in Ireland without being bombarded with stories about how the country was awash in cocaine.

It's an attractive story, tying in nicely with the death of lingerie model Katy French, hand-wringing over Ireland's recent 'celtic tiger' wealth, a supposed loss of our traditions, etc. etc. RTE, our national broadcaster, made a tabloid series called 'High Society', which cashed in on the issue in a particularly crass way -- crappy "reconstructions" of actors chopping lines with voiceovers, dodgy-looking men handing over money to ominous music, that kind of thing.

Well, just before Christmas, Vincent Browne wrote a fantastic op-ed in the Irish Times regarding this. I have to quote this particularly perceptive passage:

Cocaine abuse is a social problem, but the thrust of much of RTE's coverage of the phenomenon is to suggest that it is a widespread, pervasive problem. There are no recent statistics available on the prevalence of cocaine consumption in Ireland - the last survey was done four years ago. The National Advisory Committee on Drugs (NACD) will be publishing a prevalence report next month and we will know then the size of the phenomenon.

But we have some indicators about the scale of cocaine use. The European drug agency EMCDDA estimates that 3 per cent of all adults in Europe aged between 15 and 64 have used cocaine at least once in their lives.

A third of these took cocaine during the previous year and half of these took cocaine during the previous month. This means that about 0.5 per cent of the adult population took cocaine over the previous month. And the data suggests that, for at least two-thirds of those who have ever taken cocaine, the drug is not a problem for them.

In the US the statistics are higher. Almost 15 per cent of the population aged between 12 and 64 have taken cocaine in their lives and 2.5 per cent took cocaine over the previous year. Again, this is suggestive that cocaine use for most people is not a problem, otherwise the number of people who took cocaine during the previous year as a proportion of the number of people who ever took cocaine would be far higher.

The figures for Ireland are likely to be that about 4 per cent of the adult population have taken cocaine in their lifetime, with about 1 per cent having taken cocaine in the previous year and 0.5 per cent having taken cocaine in the previous month.

It would be better if people did not take cocaine, but the prevalent contention that the consumption of cocaine at all is necessarily harmful and addictive is obviously false.

It would also be better if people did not drink here, for the problems related to the consumption of alcohol are far, far greater than in the case of cocaine.

Instead of presenting a balanced picture of the cocaine phenomenon, RTE has greatly exaggerated the issue, in a way more typically associated with tabloid journalism.

Well said!

Spambots stealing GMail and Hotmail passwords?

I just received this mail from a friend:

Dear friend

Welcome to stwoxy.com ! We are one of the largest electronic distributors and wholesalers in Beijing China. We offer qualified digital products: Motorcycles?TVs, Notebooks, phones. PSP, projectors, GPS, DVD, DV, DC, MP3/4 and so on, which are of world famous brands, such as Sony, IBM, PHILIPS, NOKIA, DELL and so on. All our items are brand new from the manufactures and they come with 1-3 years' after service. These days we are expanding our overseas market, and every item is sold in extremely low price. Such chances should never be missed, ladies and gentlemen, do come to stwoxy.com! you will surely have a big surprise! We are looking forward to hearing from you!

It was sent from a HTTP connection into GMail, and was delivered from there using valid DKIM, Domain Keys and SPF signatures. In addition, it was sent to all the addresses in his address book. In other words, this was no run-of-the-mill impersonation spam -- for this one, the spammer obtained my friend's username and password somehow, logged into GMail, scraped the address book, and then sent spam via GMail that way.

My friend says he didn't access GMail using a desktop mail client, but did have his Google password saved in his web browser (a pretty typical configuration). My theory is that some virus/malware has infected his desktop machine, captured the saved-passwords file from the web browser configuration, and used that to log into GMail. Alternatively, it could also be a guessable username and password which was picked up via dictionary attack, I guess...

This is the first case I've heard of where spammers are actively stealing user account authentication tokens, in order to take over the accounts for spamming. (We'd long predicted it, of course, since it's a natural response to "pay for mail" schemes... but since there's no widely-used pay-for-mail system available yet, it's premature!)

It seems this is not just a GMail thing, btw. Here's a report of the same thing happening to some French guy via HotMail last month (or in english). I don't speak Dutch, but this forum post looks like it might be the same situation.

If you're curious, here's a copy of the spam, delivered to a Yahoo! group; it appears these spammers aren't too sophisticated in terms of the text they're sending, since they haven't morphed that text, HTML, or even the domain in the link yet. It's just the malware that's sophisticated, at this stage.

GNOME, Google and the UNIX user interface

Recently, after a flurry of annoying user interface issues, I've switched my RSS reader from Liferea to Google Reader. Interestingly, it turns out that Google Reader actually fits better with the traditional UNIX user interface concept, I've found.

What triggered this was an upgrade from Liferea 1.0.x to 1.4.4 as part of Ubuntu Gutsy; this brought with it a lot of changed behaviours, such as 'drag-and-drop of feed URL to HTML view no longer subscribes', and one crucial UI issue, '"Skim through articles" only works with ctrl+space'.

I've been a long-time UNIX user, dating back to the days where curses-based interfaces were the norm. As such, I tend to drive commonly-used applications using keyboard commands where possible. (This isn't a purely UNIX thing; Windows has the phenomenon of the keyboard-wielding "power user", too.)

Liferea was attractive, since it offered the ability to skim through articles quickly by just pressing the "Space" key; simply press space to page down, or to skip to the next unread article if at the end of the current one. Unfortunately, Liferea 1.4.x breaks this, and it wasn't going to be fixed, since apparently a GNOME app shouldn't behave this way:

GTK explicitely does implement as a key binding for several of it's widgets. Rebinding means to break the default behaviour for such widgets (tree views, buttons, input fields). [....] Liferea as a web-browsing application should behave like any other web browser and like every other GNOME/GTK application as much as possible.

Now, I don't know if it's GNOME's fault, or what, but for a UNIX desktop app to break with UNIX UI conventions, that's a bad move in my opinion. I gave it a bit of argument in the bug tracker, but eventually gave up as I clearly wasn't getting anywhere. :(

Instead, based on recommendation from friends, I gave Google Reader a try, and quickly figured out its extensive collection of keyboard shortcuts. Now, I'm skimming through my feeds in even less time than it took with Liferea, simply by hitting "ga" to go to my "all unread items" list, then "j", "j", "j" to skip through the postings one by one. Sweet!

It's interesting to note that other Google web apps use the same concepts; Gmail also has a hefty set, and can be driven using them in a manner very reminiscent of the classic UNIX mailreader, Mutt. So, despite being designed with end-users in mind by extremely clever professional user experience designers, these apps still find space for power-user keyboard operation. Take note, GNOME.

Anyway, I'm not too bothered. Google Reader brings other benefits, such as fixing this bug: 'please add ability to go to previous entry in Unread feed', avoiding 'constant memory leak requires daily restarts', and, of course, the utility of being able to track the same set of feeds and keep track of which items I've read in two places (work and home).

If only it was open source ;)

Planet Antispam update

A brief update on Planet Antispam...

I've just added MailChannels' Anti-Spam Blog. Now -- in the interests of disclosure -- I'm a member of MailChannels' Technical Advisory Board. However, that didn't affect this -- their blog has had consistently good, interesting posts dealing with anti-spam-related topics, and without too much plugging of their own products. ;)

Also added recently:

If you know of any other good email anti-spam-related blogs, drop a line in the comments here. (Note that I'm trying to keep it email-related, however, so we're not covering web-spam.)

Spammers “giving up” according to Google

According to this Wired story, Google reckons spammers are giving up on spam:

a remarkable trend is underfoot, according to Brad Taylor, a staff software engineer at Google: The number of spam attempts -- that is, the number of junk messages sent out by spammers -- is flat, and may even be declining for the first time in years.

Actually, this is a wilful misunderstanding of what the Googler in question really said, which was that 'attempts to spam Gmail users have been leveling off over the last year and more recently, even declining slightly'. In other words, they didn't make an observation about the state of the spam problem on an internet-wide basis -- just about the "local" situation as it pertains to Gmail. Bad reporting there, Wired.

But, in passing...

David Berlind at ZDNet recently blogged a rather grumpy response to InfoWorld coverage of CEAS 2007. He raised a very important point:

If I could say something to the author of that story, it would be that so long as any anti-spam solution is not deployed universally throughout the Internet's e-mail system (in other words, so long as some anti-spam tech is not a standard), that anti-spam solution actually makes the spam problem worse. You read that right. Worse. Proprietary anti-spam solutions make the global spam problem worse. They are digging us deeper into the hole that the Internet is already in because everyone who makes those solutions is under the false belief that "s/he who is finally successful at filtering out all spam while allowing the legitimate mail in wins."

Google's blog post is a case in point: 'we're keeping more spam out of your inbox than ever before, so more and more, you can use Gmail for things you enjoy without even realizing that the spam filter is there most of the time.'

That's great -- but it doesn't help anyone except Gmail. It's a myopic view of the spam problem, and David's point stands.

(I disagree with his later conclusion that the only way forward is for Google, MS, AOL and Yahoo! to get together and 'commit to jointly supporting the same technical solutions' -- when the usual BigCos get together, they tend to focus on their own priorities. Take what happened back in 2005 with nofollow for blog-spam -- while it helped the search giants with their own overriding priority, which was to tweak their algorithms to filter out the spam on the search results page, it did nothing to slow the spam flood itself, which has continued unabated.)

We need more open-source, and open-data, anti-spam work.

Informed

This should be in the running for "least informative dialog ever".

(The information in question was that Firefox had been upgraded by the Ubuntu Gutsy Update Manager app, if you're curious...)

Working around O2 Ireland

I'm pretty conservative with my mobile phones -- until recently, my mobiles were all cheap, low-end, super-lightweight Nokias with long battery life and low "worry factor" (ie. not a big deal if they were lost or stolen). Very sensible.

I've finally started catching up with the gadgetorati, though -- my current phone is now a Sony Ericsson K550i, which is still small and light, but has nice features like a 2 megapixel camera, a decent amount of onboard flash space, and a good implementation of Java, hence support for GMail and Google Maps. (Thanks to Joe for the recommendation!)

The only downside is that it came from my operator, O2 Ireland, with some broken configuration settings. (This shouldn't be surprising, of course -- I don't think I've ever heard of a phone arriving with working data connectivity, from any operator, anywhere in the world.)

Anyway, here's what I've done so far to fix it. Hopefully this might be helpful for random google searchers.

1. "Failed to resolve hostname" when publishing photos:

Generally, when I'd try to publish a photo using its Blogger support, I'd get a "failed to resolve hostname" error message. Investigating further, I found that the "O2 WAP" service used a proxy server -- turning that off fixed the problem nicely. Nice reliable proxy you've got there, O2 ;)

Here's how to do that. Open the menu, then select Settings -> Connectivity -> Internet settings -> Internet Profiles. Select O2 WAP and hit More -> Settings. Select Use proxy and change it to No, then hit Save. Problem solved.

2. Cannot send email from the device:

O2's default mail server has a tendency to refuse to accept outbound mail from the phone. Switching to GMail for outbound SMTP works fine. Notice a trend here?

Open the menu, Messaging -> Email -> Settings -> New account. Set the Account name to "gmail". Scroll down to Email address, set it to "yourname@gmail.com". Connection type is "POP3", Username and Password are whatever your GMail account uses. Outgoing server is "smtp.gmail.com". Enter Advanced settings, and set Encryption to "TLS/SSL". Set Outgoing port to "25". Press the back button, then select the "gmail" account's tickbox to make it active, before pressing back again to exit the configuration screen.

3. The "side" buttons go online:

By default, if you hit the "globe" button or the "open window" button on the side of the phone, to the left and right of the main joystick, it's set to open various URLs at www.o2.ie. These buttons are prime UI real estate, and easily accidentally hit; I don't want to go online (and possibly incur a charge) if they're pressed.

Easily fixed. Open the menu, then select Settings -> Connectivity -> Internet settings -> Internet Profiles. Select O2 WAP and hit More -> Advanced, then Change homepage and enter "file:///" under Address and hit Save. It'll now issue an ugly warning if you press those buttons, but at least it won't go online. (It'd be nice to get a nicer fix for this.)

I'm sure there's plenty more; if you've got this phone and have any tips to share, feel free to drop a comment below.

In particular, I'd love to know how to further "de-O2ify" the UI; the top 3 buttons on the menu screen are taken up with worthless operator spam ("O2 Music Store", "O2 Menu" and "Entertainment", all of which go to various URLs at www.o2.ie), while the useful Applications and Alarm screens, which I use all the time, are hidden in a submenu. ugh.

Investing in real estate

Screen real estate, that is -- 3600x1050 pixels of it:

(That's a Samsung SyncMaster 225bw226bw connected to a Thinkpad T61p running Ubuntu Gutsy, if you're curious.)

‘Dead spammer’ story: yep, spam

Remember the 'Russian 'make penis fast' spammer murdered' fake blog posting I wrote about last month? I was right -- the site has now become a spammer link farm.

There's now a new category in the right-hand sidebar of the fake blog post. See if you can spot the odd one out:

  • Programming
  • Personal
  • Web 2.0
  • Python
  • Penis exercises
  • Uncategorized

Sure enough, "Penis exercises" is the only valid outlink from the page (all the others lead to the 'sorry, closed due to too much traffic' page). It leads to a page discussing the usual 'make penis fast' topics, with a batch more links to more pages along the same lines. If you follow the links a little, the whole thing appears to be hawking some device called "Size Genetics". Totally spammy.

New job!

So, as I've hinted previously, I've left Vast to work full-time at a new gig: PutPlace.

I'll be working on more EC2/S3/SQS-related large-scale cluster stuff, and on their open-source plans... looking forward to that. They're a great team -- lots of familiar faces from the Iona days -- and it finally gets me out of telecommuting from home, back into an office again after 5 years ;)

Joe has put up a nice blog post welcoming me. Cheers Joe!

Now to get to grips with Python. (I still love Perl though. ;)

Fedex Ireland and unfair duty charges

I've been on vacation for a week, introducing Bea to the many joys of the bogs of Connemara. I think she liked it.

While I was away, I appeared in Ireland's newspaper of record, the Irish Times, specifically in Conor Pope's 'Pricewatch' consumer-affairs column, under the byline "Shopped to the taxman". Here's a cut-and-paste of some relevant snippets:

Justin Mason [hey, that's me] contacted Pricewatch after being hit with just such a charge. In August, he and his wife, who were expecting a baby, received a package from friends in the US [thanks Nishad and Janet!] containing amongst other things, some hats, socks and a little hoodie for their baby.

"It was shipped via FedEx, got here in good time and was very cute," he says. The couple were delighted, until a couple of weeks later, when they received an invoice from FedEx looking for EUR 34.47, made up of EUR 2.49 duty, EUR 19.88 VAT and EUR 10 in "administration fees", plus an additional EUR 2.10 VAT on the "administration fee".

"This strikes me as pretty unfair, maybe there's duty payable, but I've never had to pay VAT on a gift I've received before? On top of that, being charged one-third of the price as an administrative fee? Ouch!"

The couple disputed the fee and were told if they didn't pay, the invoice would be sent to a debt collection agency and non-payment would affect their credit rating. A couple of weeks later, another gift arrived from the US, followed by another invoice looking for EUR 7.84 in duty, plus the EUR 10 administration fee and EUR 2.10 VAT on that fee. Mason disputed the charge and was eventually told it would be waived as it had a value of less than $50 (EUR 34.70) and was clearly labelled as a gift. There is tax relief called Small Parcel Standard Relief on goods purchased from outside the EU, which is EUR 22 for bought goods and EUR 45 for gifts, so the tax should never have been applied by FedEx.

We contacted FedEx and UPS, highlighting our readers' concerns. A spokesman for FedEx said the administration charge has always been in place in Ireland and was applied "to ensure customers receive their packages quickly".

He said that if it did not pay the VAT and duty, "packages would not be cleared through customs until the customer has paid them, thus adding severe delays to the delivery process".

So, to be honest, I'm not impressed at all with Fedex' response here. I was hoping they'd be more helpful, especially once it hit the most significant consumer-affairs column in the country -- but not at all :(

To recap -- since Conor didn't mention it -- here are my problems with the charges:

  • the packages were both genuine, unsolicited, gifts. Surely having to pay duty on a gift is not applicable; it certainly makes receiving a gift a particularly unpleasant experience!

  • the first package contained baby clothes, which are VAT-free in Irish tax law anyway.

  • we cannot seem to get contact details for someone at Customs and Excise to talk to about this, and Fedex have failed to get back to us since then.

Not sure what the next step is...

There's also a little follow-on discussion at Conor's blog.

Update: good news. A couple of days ago, a letter arrived from Fedex UK, containing 2 credit notes; both invoices had been reduced to EUR 0.00, citing "incorrect application of duty" for one, and "customer satisfaction policy" for the other. Hooray!

Surprise smash hit in the Irish Blogs Top 100

Damien posted an interesting suggestion for the Irish Blogs Top 100 the other day -- during discussion of which, it emerged that there were a few overlooked Irish blogs which hadn't yet shown up on the planet.journals.ie Irish blogs aggregator, and therefore were not appearing in the Top 100. These were:

Anyway, they're in now. When I first spun up the script and checked the results, though I was a bit shocked and had to do a bit of a double-take -- at number 1, far beyond Damien's number 2, was InPhotos.org, with a Technorati Rank of 1 and 102,857 inbound links from 88,772 blogs, compared to Damien's Rank of 7946 with 1,606 links from 519 blogs.

Insane! I guess being in the default WordPress install makes a bit of difference there ;)

Interestingly, InPhotos.org, with a Technorati Authority of 88,434, is far beyond the most popular blog listed on the Technorati Popular Blogs page. It seems that page is a hand-tweaked set of blogs, and not just a "Technorati global Top 100", then, despite what one might naively assume...

PS: Damien's original suggestion, btw, was to measure blog popularity using Google Reader and Feedburner's audience stats. However, I can't do that without a public API I'm allowed to scrape. Does anyone know of one?

Also worth noting that I recently added del.icio.us bookmarks as a metric of popularity, to go with the Technorati stuff. It's interesting to see how those rankings differ -- bloggers and bookmarkers don't always agree, with bookmarkers preferring MP3s, Second Life, and politics I reckon.

the Ron Paul spam scandal

A US presidential candidate called Ron Paul has been advertised in spam. There's currently a massive shitstorm raging about the true source of the spam -- it was delivered via an infected consumer broadband machine, so the source is of course untraceable from the email alone.

Of course, being spam, I received a copy ;) Here's a spample, if you're curious.

The unusual "Content-Type" header format (matching the STOX_REPLY_TYPE SpamAssassin rule) has been seen in a lot of pump-and-dump stock spam recently. (It's also shown up in Storm output, but this isn't from Storm.) It's been around for at least 6 months, so it's probably a built-in behaviour of a downloaded spamware app, rather than a frequently-updated web-hosted spamware site.

My guess -- I'd say the spam was sent using the same spamware application that one of the larger, recent pump-and-dump spammers has been using -- so a reasonably sophisticated app, and not just an ancient copy of DarkMailer or whatever.

It'll be interesting to see how this pans out...

Changes to the Irish learner driver system

The Irish Road Safety Authority have just revised Irish law as it relates to 'learner drivers', the 15% of drivers who haven't yet passed a driving test. (This includes me -- my US driving license doesn't allow me to drive a manual-transmission car in Ireland, so I'm still a learner over here!)

They helpfully released the details as a rather broad PDF entitled 'Road Safety Strategy 2007-2012', which covers the changes along with other plans and statistics; and a more focused document, 'Learner Permit and Changes to the Driver Licensing System', dealing with just the learner-permit system.

Unfortunately, the latter was released as an MS Word document. Given the problems this raises -- lack of searchability, integration with the web, etc. -- I thought it'd be helpful for searchers if I put up the text in full here, so here it is.

Introduction of Learner Permit and Changes to the Driver Licensing System - Changes to the Driver Licensing System announced on 25 October 2007

In this document you will find information about changes to the driver licensing regime. These changes affect learner drivers and recognise the fact that learner drivers are a vulnerable group of road users. The changes also serve to emphasise the importance of the learning phase for drivers, one element of this is the replacement of provisional licences with learner permits. The changes also highlight the important role played by the driver who accompanies a learner driver.

Over time the intention is to expand the range of conditions applying to a learner permit and to develop a graduated licensing system where there will be a number of different restrictions/conditions applying at different stages. These restrictions will apply while driving with a learner permit and in the initial years of driving with a full driving licence.

Specific details about each of the current changes together with questions and answers on the impact of each change are set out below.

Provisional licences are being replaced by learner permits to emphasise the fact that the holder is a probationary driver and is learning to drive. Existing provisional licences will continue in force until their expiry date. On renewal the person will be issued with a learner permit.

Q: When will learner permits start to issue?

A: Learner permits will issue as and from 30 October 2007.

Q: Does the learner permit system apply to all driving licence categories?

A: Yes, the learner permit system will apply to all licence categories.

Q: Is there any change to the period of validity or the fee for a learner permit compared to that for a provisional licence?

A: No, the duration and fee remain the same as applied to provisional licences.

Q: Are there any changes to apply under the learner permit system?

A: A number of changes detailed below are being introduced for drivers with a learner permit. These are also being applied to drivers with a current provisional licence.

The holder of category B (Car) learner permit (provisional licence) must be accompanied by and under the supervision of a qualified person at all times. This change removes an exemption that, up to now, allowed a person on a second provisional licence to drive unaccompanied. To drive unaccompanied will be a penal offence and the person will be subject to prosecution.

Q: When does this new rule come into effect?

A: This is coming into effect as and from 30 October 2007.

Q: I am currently on a second (provisional licence) learner permit for driving a car, and was not required to be accompanied heretofore with this (provisional licence) learner permit. Must I now be accompanied?

A: Yes, you must be accompanied at all times when driving with a (provisional licence) learner permit for a car.

Q: I have passed the driving test in a vehicle with an automatic transmission and now hold a (provisional licence) learner permit for driving a car with a manual transmission, can I drive this car unaccompanied.

A: No, you must be accompanied by a qualified person until such time as you pass the driving test for a manual transmission car.

Q: In respect of which licence categories is a person who holds a (provisional licence) learner permit required to be accompanied by a qualified person?

A: Drivers with a (provisional licence) learner permit for vehicles of category B, C1, C, D1, D, EB, EC1, EC, ED1 or ED, (Cars, Trucks, Buses and Articulated Vehicles) must be accompanied by and under the supervision of a qualified person.

An accompanying qualified person must hold a full driving licence for the vehicle category for at least two years. It will be a penal offence for the driver not to be accompanied by a qualified person so licenced to drive.

Q. When is this change coming into effect?

A. This change will apply as and from 30 October 2007.

Q: If I am a learner driver driving a car and the accompanying person has held a driving licence for two years in respect of a motorcycle, or a tractor/work vehicle, can this person act as an accompanying qualified person?

A: No, the accompanying qualified person must hold a driving licence for two years for the category of vehicle you are driving.

Q: If a person has passed a driving test to drive the vehicle category, can this person act as an accompanying qualified person?

A: No.

Q: If a person has held a full driving licence for an automatic vehicle for two years, may this person act as the accompanying person?

A: Yes, but only if the learner driver is driving an automatic transmission vehicle in the same category. If s/he is driving a manual transmission vehicle, the accompanying qualified person has to hold a full driving licence for at least two years for a manual transmission vehicle.

Q: If I have a learner permit (provisional licence) in category C1 (small truck) can I be accompanied by a person who holds a full driving licence for category B for two years and for category C1 for one year?

A: No, the accompanying qualified person must hold a full driving licence for two years in respect of the vehicle category which you wish to drive, in this case category C1.

Q: If the accompanying driver has heId his / her driving licence since six years ago but has been disqualified for 2 of the last 3 years, may he /she act as an accompanying driver?

A: No, the accompanying qualified person, at the time you are driving, must hold a full driving licence for two years in respect of the vehicle category which you wish to drive. He/she must not have been disqualified for any period of the previous two years.

The carrying of a passenger by a motorcyclist with a (provisional licence) learner permit is a penal offence.

Q. When is this change coming into effect?

A. This change will apply as and from 30 October 2007.

Q: Can I carry a passenger on any motorcycle category for which I hold a learner permit (provisional licence) ?

A: No, you must have a full driving licence for the motorcycle in order to be able to carry a passenger.

Q: Can I carry a passenger on a category A motorcycle for which I hold a learner permit/ provisional licence if I have a full driving licence for category A1?

A: No.

Q: If I pass the motorcycle driving test, can I carry a passenger?

A: No, you must first exchange your certificate of competency (driving test pass certificate) for a full driving licence to be able to carry a passenger.

It is a penal offence for a holder of a category W (Tractor/Works vehicle) learner permit (provisional licence) to carry a passenger unless the vehicle is constructed or adapted to carry a passenger and the passenger is a qualified person, ie. a person who holds a full driving licence for the vehicle category for at least two years.

Q. When is this change coming into effect?

A. This change will apply as and from 30 October 2007.

Q: When can I carry a passenger?

A: When the passenger holds a driving licence for the vehicle category for at least two years, and where the vehicle is constructed or adapted to carry a passenger.

Q: Can I carry a passenger who is a qualified person if there is no passenger seat?

A: No, the vehicle must be constructed/ adapted for the carriage of a passenger.

It is a penal offence for the holder of a learner permit (provisional licence) in respect of any licence category to carry in the vehicle any passenger for reward.

Q. When is this change coming into effect?

A. This change will apply as and from 30 October 2007.

Q: Can I carry a passenger for reward in the course of my employment?

A: No, you may not do so while driving under a learner permit (provisional licence).

Q: If I have a category D1 learner permit (provisional licence) to drive a minibus, can I carry a passenger for reward?

A: No, you may not do so while driving under a learner permit (provisional licence).

It is a penal offence for the holder of a learner permit (provisional licence) for vehicles of category B, C1, C, D1, D, EB, EC1, EC, ED1 or ED, to drive such a vehicle unless there are displayed on the vehicle rectangular plates or signs bearing the letter 'L' not less than 15 centimetres high in red on a white ground, in clearly visible vertical positions to the front and rear of the vehicle.

Q. When is this change coming into effect?

A. This change will apply as and from 30 October 2007.

Q: If I have a category B full driving licence and a learner permit for category C (truck) or category D1 (minibus) must I display L plates?

A: Yes, you must display L plates on the truck or minibus if driving on a learner permit.

It will be a penal offence for the holder of a learner permit (provisional licence) for vehicles of category B, C1, C, D1 or D, to drive such a vehicle while the vehicle is drawing a trailer.

Q: If I have a category B driving licence and a learner permit for category C1 (small truck) can I draw a trailer?

A: No, you may not drive a truck while drawing a trailer if you hold a learner permit (provisional licence) for a truck. You must have the trailer entitlement for the category on the learner permit (provisional licence) in order to draw a trailer.

Learner Motorcyclist to display 'L' plates on a high visibility tabard.

Q: From what date will motorcyclists have to display L plates on a high visibility tabard?

A: It takes effect as and from 1 December 2007.

Q: Which learner motorcyclists are required to display L plates on a high visibility tabard?

A: All persons with a learner permit (provisional licence) for category A, A1, or M, must when driving such a vehicle display a yellow fluorescent tabard bearing the letter 'L' not less than 15 centimetres high in red on a white ground, in clearly visible vertical positions worn over the chest clothing. The 'L' plates are to be to the front and rear of the person's torso. It will be a penal offence not to so display L plates.

A person who is a first time holder of a learner permit (provisional licence) cannot take a driving test for a six month period after the commencement date of the permit (provisional licence).

Q. When is this change coming into effect?

A. This change will apply to driving test applicants with an appointment date for a test on or after 1 December 2007 and who hold a learner permit (provisional licence) for less than six months. At this point driving tests are scheduled up to this date and the change will not affect existing appointment holders.

Q: Does the change apply to all licence categories?

A: Yes, It applies to all licence categories.

Q: Why is the six month limitation being applied?

A: The purpose of the provisional licence/learner permit is to allow a learner driver to gain experience of driving. Research shows that the longer a learner is supervised while driving, the less likely s/he is to be involved in an accident. For this reason the six months limitation is being applied.

Q: I hold a first learner permit (provisional licence ) for less than six months. I have an appointment already arranged for a driving test. Can I take the test?

A: Yes, the change is being introduced with effect from 1 December 2007 and should not affect existing appointments for driving tests.

Upcoming Mike Culver talk about AWS

Mike Culver, Amazon's "Web Services Evangelist", will be in Dublin next week to evangelize about the goodness that is Amazon S3, EC2, SQS and so on. It seems he'll be talking at the following locations:

  • in the Auditorium of the Digital Exchange, Crane Street, Dublin 8 on Tuesday October 30th, 3-5pm; here's a flyer the Amazonites have been passing around. (upcoming.org page)

  • according to Damien, later that evening, he's in the Westin Hotel on Westmoreland St., D2, starting at 7pm; note, it seems you need to book places at this, see Damien's post.

  • and again at the Irish Linux User's Group on Thursday November 1st at 19:30 in the Irish Computer Society in Dublin (map).

I guess these are all going to be same talk, bar the Q&A ;)

There was some kind of an ICTE get-together mooted for Friday 2nd.

Also, the ILUG annual general meeting is scheduled on the following Saturday, 3rd November, also at the ICS. Gareth Eason notes 'we're hoping to start at 3pm sharp, with talks from Dave Wilson (HEAnet), Frank Duignan, John Looney (Google), and others, followed by a relaxing wind-down in the Schoolhouse pub later on.' (upcoming.org page)

Hopefully I'll get to at least one of the AWS talks (probably the Digital Exchange one) and the ILUG AGM... busy week!

BBC’s iPlayer — what a mess

I haven't paid a whole lot of attention to the BBC's "iPlayer" project, since, as a non-UK resident, I'm not allowed to use it anyway. But this interview at Groklaw with Mark Taylor, President of the UK Open Source Consortium, was really quite eye-opening. Here's some choice snippets.

On the management team's Microsoft links:

The iPlayer is not what it claimed to be, it is built top-to-bottom on a Microsoft-only stack. The BBC management team who are responsible for the iPlayer are a checklist of senior employees from Microsoft who were involved with Windows Media. A gentleman called Erik Huggers who's responsible for the iPlayer project in the BBC, his immediately previous job was director at Microsoft for Europe, Middle East & Africa responsible for Windows Media. He presided over the division of Windows Media when it was the subject of the European Commission's antitrust case. He was the senior director responsible. He's now shown up responsible for the iPlayer project.

On their attempts to bullshit the BBC Trust on the cross-platform issue:

In the consultations that the BBC Trust made, there were 10,000 responses from the public. And the overwhelming majority of them, over 80% -- which is an unheard-of figure in these kind of things -- said, we don't like the platform. We don't like it being single-platform. So it's a big issue. And the BBC Trust said to us, "Why the vehemence? Why have people reacted this way?" And I explained the 'Auntie' analogy. It's people don't expect that from the BBC. It's got this huge history of integrity, doing the right thing, standing up to bullies. (laughter) They've done this for a very long time. And people find that it's surprising. And they said, "Yeah, but," you know, the BBC guys said, "Well, trust us. This is going to be cross-platform." And we said, "Well, how? It's completely single-platform." They say that, but we haven't been able to find anyone who's been able to explain how they're going to achieve that at the moment, even though they're entirely locked into one single platform.

(aside: MS did this at one point with Internet Explorer -- remember, there was some mystery team in Germany that supposedly had IE ported to Solaris, hence it therefore qualified as 'cross-platform'.)

On the architecture of the product:

Q: it's a Verisign Kontiki architecture, it's peer-to-peer, and in fact one of the more worrying aspects is that you have no control over your node. It loads at boot time under Windows, the BBC can use as much of your bandwidth as they please (laughter), in fact I think OFCOM ... made some kind of estimate as to how many hundreds of millions of pounds that would cost everyone [...]. There is a hidden directory called "My Deliveries" which pre-caches large preview files, it phones home to the Microsoft DRM servers of course, it logs all the iPlayer activity and errors with identifiers in an unencrypted file. Now, does this assessment agree with what you've looked at?

Mark Taylor: Yes.

Q: What are the privacy implications for an implementation like this?

Mark Taylor: Well, just briefly going back to the assessment thing, yes it does log precisely RSS and stuff like that and more importantly, anyone technically informed who's had a look at it -- even more importantly, the user's assessment as well and -- frankly horrified if you go and spend some time in the BBC iPlayer forums, it's eye-opening to see the sheer horror of the users, some of them technically not -- you know, relatively early-stage users -- but when it gets explained to them by some of the longer-using users of it, it's concentrated misery. (laughter)

[...]

it's a remarkable thing with them as well, there's a lot of pain going on in the user forums, and some of the main technical support questions in there are "how do I remove Kontiki from my computer?" See, it's not just while iPlayer is running that Kontiki is going, it's booted up. When the machine boots up, it runs in the background, and it's eating people's bandwidth all the time. (laughter) In the UK we still have massive amounts of people who've got bandwidth capping from their ISPs and we've got poor users on the online forums saying, "Well, my internet connection has just finished, my ISP tells me I've used up all of my bandwidth."

Q: It uses up their quota, but they can't throttle it, they can't reduce it --

Mark Taylor: No, they can't throttle it. [...] It's malware as well as spyware.

And to top this off, there's a (frankly insane) budget of UKP 130,000,000 to build this -- that's $266,000,000 -- for something that could be built better by just hiring the guys behind UKNova and simply negotiating with the rights-holders directly.

Holy crap. Talk about a technical disaster masquerading as a solution to a business problem...

Plug: Decorama stickers

Plug plug! We picked up some really cute stencils for the nursery a few months back, but took our time putting them up -- we were a bit daunted by the instructions -- and only got around to putting them up last week. (We needn't have worried -- it was really easy.)

They're Decorama vinyl stickers from Bored Inc.. I can't recommend them enough -- their art is fantastic, the quality's great, and Bored Inc. were really friendly and helpful about the whole transaction.

If you're looking to do something similar, I'd definitely recommend their stuff.

‘Blended threat’ = Storm

[Commtouch have apparently released an 'Email Threats Trend Report' for the third quarter of 2007], which contains this factoid:

Blended threat messages -- or spam messages with links to malicious URLs -- accounted for up to 8% of all global email traffic during the peaks of various attacks during the quarter [...]

Spam with malware hyperlinks inside: One technique which reached a new high during the quarter was innocent-appearing spam messages that contained hyperlinks to malware-sites. This type of spam utilizes vast zombie botnets to launch 'drive-by downloads' and evade detection by most anti-virus engines. Several blended spam attacks of this type focused on leisure-time activities, such as sports and video games. Messages invited consumers to download "fun" software such as NFL game-tracking and video games from what appeared to be legitimate websites. Instead, consumers voluntarily downloaded malware onto their computers.

Those short messages that invited downloads of NFL game-tracking software ("Get Your Free NFL Game Tracker", "Football Fan Essentials", "Are you ready for football season?" etc.), and video games ("Wow, free games!", "New game software, with over 1000 games---FREE", "Holy cow, 1000 free games online" etc.), is all output from the Storm worm -- I wouldn't call it a new kind of "blended threat" per se. I'm surprised that Commtouch didn't name it; maybe they don't realise it's Storm?

I'd say it's output is higher than 8% of my incoming spam, although it has reduced its spam output quite a bit recently.

‘Dead spammer’ story a hoax

Update: yep, it's spam.

Earlier today, Digg and Reddit featured this story:

Alexey Tolstokozhev (btw, in Russian his name means 'Thick Skin'), a Russian spammer, found murdered in his luxury house near Moscow. He has been shot several times with one bullet stuck in his head. According to authorities, this last head shot is a clear mark of russian hit men (known as "killers" in Russia).

Since then, it's received plenty of attention -- I even posted it to my link blog myself. Unfortunately, I'm now certain it's a fake. (Igor at the McAfee AVERT blog concurs.)

Here are my reasons:

  • There are still no corroborating stories in the press, several hours later;

  • 'Alexey Tolstokozhev' doesn't appear in ROKSO, or even Google;

  • The entire site claims to have been shut down due to load, all except for that one page -- there isn't a single link that can be reached that works;

  • Indeed, Google has no other pages indexed on that site, which is pretty odd for a weblog;

  • And most fishy of all, the domain was registered yesterday, using a privacy-protection service, on Estdomains (which has a poor reputation).

All very fishy. My guess is that in a week's time, that page will be a linkfarm, picking up all that Google juice for free. In other words, loonov.com is a spam site...

Update: Greetings, Slashdot comment readers! Hopefully that uncritical article (which was posted after this one) will be fixed to note the hoax soon...

Other voices have since added their agreement -- Alex Eckelberry at Sunbelt software added his a few minutes after I posted this, and the Register wrote an article this morning about it.

(BTW, just to save some face -- I'd like to note that I smelled a rat at the time I posted it initially, qualifying the link with a sceptical 'hmm'. I'm not that gullible ;)

Update 2: the /. story was fixed by Zonk: 'Good story. Unfortunately, probably a fake.'

Scary Storm figure

This study of the Storm worm (via) contains this rather terrifying factoid:

Figure 12 illustrates a time-volume graph of TCP packets, SMTP packets, spam messages, and smtp servers. Our analysis of this graph reveals the following findings. First, we find that except for the first 5 minutes almost all the TCP communication is dominated by spam. Second, we measured that hosts generate on average of 100 successful spam messages per five minutes, which translates to 1200 spam messages per hour or 28,800 messages per day. If we mutiply this by the estimated size for the Storm network (which we suspect varies between 1 million and 5 million, we derive that the total number of spam messages that could be generated by Storm is somewhere between 28 billion and 140 billon per day.

While such numbers might be mind-boggling they are inline with observed spam volumes in the Internet, e.g., overall volume of spam messages in the Internet per day in 2006 was estimated to be around 140 billion [2]; Spamhaus claims to have been blocking over 50 billion spam messages per day in October 2006 [10], and AOL was blocking 1.5 billion spam messages per day in its network in June 2006 [5]. These numbers suggest that Storm could be responsible for anywhere between 17% and 50% of all spam that is generated on the Internet.

28 to 140 billion messages per day. That is a lot of spam.

Minor nitpick with the paper -- it notes that

Storm retrieves emails found in [certain] files and gathers information about possible hosts, users, and mailing lists that are referenced in these files. In particular, it looks for strings like “yahoo.com”, “gmail.com”, “rating@”, “f-secur”, “news”, “update”, “anyone@”, “bugs@”, “contract@”, “feste”, “gold-certs@”, “help@”, “info@”, “nobody@”, “noone@”, “kasp”, “admin”, “icrosoft”, “support”, “ntivi”, “unix”, “bsd”, “linux”, “listserv”, “certific”, “sopho”, “@foo”, “@iana”, “free-av”, “@messagelab”, “winzip”, “google”, “winrar”, “samples” , “abuse”, “panda”, “cafee”, “spam”, “pgp”, “@avp.” , “noreply” , “local”, “root@”, and “postmaster@”.

I would postulate that those strings are a stoplist -- that in fact the worm avoids sending spam to addresses containing those strings. The presence of "abuse" and "postmaster" in particular would suggest that.

Long-lived spam via Yahoo! search

Back in May, I noticed some spam in my Moin Moin wiki, and fixed it.

As this Yahoo! Site Explorer view of taint.org demonstrates, Yahoo!'s search is still showing these results, partly; despite the spam content being long deleted (example ), they still show the spam title and URL, despite the fact that the title and text no longer contains those spam keywords.

Annoyingly, I'm still seeing referrer clickthroughs from search.yahoo.com to these deleted pages from lusers looking for porn, as a result. Come on Yahoo!, fix your search to notice the title change at least, so people don't think the pages still contain porn!

Eircom WEP key-generation algorithm reversed

Over the weekend, this really hit the Irish blogosphere -- several Irish guys have apparently figured out the algorithm used by Eircom to generate WEP keys.

I blogged that page in the link-blog this morning, but it's worth writing about a little more. WEP is apparently easy to crack nowadays, so in a way all those wifi users were insecure anyway -- but this is interesting as a case study of how not to write a key generator:

  • Compiled code != secret: the first mistake Eircom made was to generate the WEP key entirely from a little "secret" text, some "secret" shuffles, and the serial number of the hardware. There should always be some randomness in there. Compiled code running on a user's desktop, is not secret.

  • Don't share secrets: Secondly, it's a good demo of why you don't generate two separate key values from the same source data. In this case, both the WEP key and the SSID are generated from the Netopia router's serial number -- and sufficient bits are accidentally exposed in the SSID to enable computation of the WEP key. (This is kind of moot in many cases, since the serial number is also exposed in the MAC address, in even more detail.)

As far as I can tell -- although it's not quite clear who did what -- that guy Kevin Devine did a pretty great job of reversing this code. Nice one.

I'm impressed that there's now an app which detects the static tables (S-boxes, constants etc.) used in crypto algorithms -- that idea seems very clever in retrospect, hadn't occurred to me.

Here's a boards.ie thread where this exploit was discussed; there are plenty more details there, if you're curious. It seems this has been quietly floating around back-channels since the start of September.

(By the way, am I missing something, or did Eircom ship unstripped binaries for the key generator library? I could swear that when I looked at the Boards thread earlier today, there was a cut-and-paste from IDA Pro listing a function prototype. Oh dear; if so, add that to the 'case study' list above. ;)

It seems Eircom are now recommending all customers switch to WPA -- good luck with that, since it'll break all those Nintendo DSes. That won't be popular!

Update: the original page seems to be down, but here's the source for the command-line decoder: dessid.c. See also EirWep.

Oh noes!


dsc05400
Originally uploaded by jmason

Sorry to readers of Planet Antispam -- it had stopped updating for a week, after the server move. I'd forgotten to restart the cron job... now fixed.

Taint.org Has Moved

I'm moving pretty much all my home sites and infrastructure from the venerable "dogma.boxhost.net" to a new host, "soman.fdntech.com". This weblog has just made the jump. Please leave a comment if you notice anything awry.

There may be a few rough edges, since I upgraded to WordPress 2.2.2 in the process; for example, my sooper-s3kr1t "what is my name" anti-spam protocol was set to not require a preview of all posted comments, or the correct answer -- in just over an hour I received 25 spam comments... so it's good to know it's working ;)

Dublin-area Intro To Open Streetmap

A last-minute notice -- the Irish Linux Users' Group are organising an introduction to Open Streetmap tomorrow:

Open Streetmap : An Intro

The ILUG committee is organising an introduction to the Open Streetmap project on Saturday, 1st September, 2007 in Dublin.

This will include info on how to use your GPS and upload your data to the project, to contribute to a free and open map of the world.

The Hamlet Pub, Balbriggan (N 53.61396 W 6.20608 degrees)

Sat, 1st Sep 2007 2pm ~ 5pm

If you have a GPS and a laptop, please feel free to bring them. Wireless internet is available in the venue.

To register interest, please e-mail chairman-at-linux.ie

Not Cosmo

So, we were all set to name our new arrival Cosmo, assuming it was a boy. We were certain it was going to be a boy. Guess what? It wasn't... so now we have to narrow down the girl-name shortlist in a hurry!

Isn't she lovely? Lots more photees at Flickr.

Anyway, I may be hard to get hold of for a while... this lady will be keeping me busy I think ;)

Update: Looks like the name is Beatrice Lily Mason, although there's still a fair bit of indecision, unfortunately ;)

Update 2: Beatrice Lily Gray Mason. Final answer!

Stupid Unicode Tricks

Cool Unicode trick, via Mantari -- cut and paste this character into a Unicode-aware application (like this post's comment box!), then type something and see what happens:

‫‬‭‮‪‫‬‭‮҉

My Nokia 770

A couple of weeks back, there was quite a bit of buzz in the Irish blogosphere and elsewhere about the Nokia 770; prices for new N770s had dropped from $290ish to a very reasonable $140 / EUR130-ish price-point. I, along with a good few others, bought one.

I bought mine through Expansys, with a free 1GB RS-MMC memory card. They've sold out and no longer have any N770s listed; however, Buy.com still seem to have them in stock, so if you're interested, you can probably still pick one up. (It seems Nokia is trying to sell off their remaining N770 stock, cheap, with plans to drop support for the software platform. I'm fine with this, but it may put other buyers off.)

I've now been using it for a while, and am still happy. ;) Here are my recommended top apps:

Slimserver. Originally designed to operate as the backend software for the Squeezebox thin-client MP3 player, this has a fantastic UI built for the N770, and its MP3 stream output works perfectly on the tablet.

This is by far the neatest way to get at a 6000-song music library without a laptop; there was some talk in the GNOME community of making a decent DAAP client, but so far there's no working results there that I could find. :(

maemo-mapper. This is a fantastic mapping app for the tablet; it presents map tiles downloaded from OpenStreetMap or Google Maps in an N770-optimized format, with the usual nice draggable UI. Bonus: it'll work offline, so you can follow a route while online, then take the tablet along to help navigate.

Tip: once you start maemo-mapper, click the "Download..." button in the "Repository Manager" and it'll download details for the 5 most useful map repositories, including Google and Virtual Earth.

FBReader. A very nice document reader; much nicer than trying to read long HTML pages in the builtin web browser, especially since it allows you to turn the device on its side.

In general, the Opera Mini browser works fine; be sure to enable Javascript and set up a swap file on the RS-MMC card first. It does all the basic HTML and rudimentary AJAX; Google Calendar is a no-go, but GMail and even Google Maps works adequately, modulo minor bugs. Plain Old HTML sites like Wikipedia, IMDB and so on all work great.

As long as you're realistic about the platform, it won't disappoint -- video requires custom transcoding, for example, and proprietary apps like Flash and RealPlayer lag behind their desktop equivalents, but as far as I can tell that's the case for every embedded platform. (Since I spent a couple of years developing such a platform, I'm quite comfortable with this.)

A really really nifty thing about the N770 is that it's now entirely hackable -- within 30 minutes of powering on, I was able to get a terminal window open with a root prompt, and was adding ext3 partitions to the RS-MMC card. Apps are installed using "apt-get". The terminal even has word-completion system optimized for the UNIX command-line - nice ;)

This SomethingAwful thread contains plenty more good tips. I'm happy I bought it -- so many of these gadgets can wind up as an overpriced door-stop, but this is easily worth what I paid for it.

Update: this thread at InternetTabletTalk seems pretty chock-full of good advice, too.

Test my auto-generated ruleset

(I posted this to the SA users and dev lists, too.)

I've been working on a new way to auto-generate body rules recently (see previous posts). The results are checked into SVN trunk daily in the "rulesrc/sandbox/jm/20_sought.cf" file.

We haven't had much time to figure out how to produce auto-generated 3.2.x rule updates for our entire ruleset at updates.SpamAssassin.org, so instead of dealing with that, I've taken a shortcut around it ;) I'm now making just the "20_sought.cf" ruleset available as a standalone, unofficial sa-update ruleset at sought.rules.yerp.org.

Before using it, you'll need the GPG key:

  wget http://yerp.org/rules/GPG.KEY
  sudo sa-update --import GPG.KEY                

then use this to update:

  sudo sa-update \
        --gpgkey 6C6191E3 --channel sought.rules.yerp.org \
        [...other channels...] \
        --channel updates.spamassassin.org

(similar to how you'd use Daryl's sa-update version of the SARE rulesets.)

Feel free to run sa-update as frequently as you like.

Please consider it alpha; I may take it down in a few months depending on how it goes, or if we can get it working as part of the core updates. In the meantime though, I'm curious to hear how you get on with it. (In particular, copies of false positives would be very welcome.)

Update: it's been very successful, so I'd now consider it in production.

The Prime Time Group pump-and-dump

Spamnation.info links to an interesting article by Computerworld's Gregg Keizer about the massive PRTH.PK spam run.

As usual, there are no shortage of suckers:

The spam blast did drive up Prime Time's share price from Monday's low of around 7 cents to Wednesday's high of 11 cents, a 57% jump. Thursday morning, however, the bottom dropped out, and the stock fell to under 7 cents. Trading volumes peaked Wednesday as well, at around 1.7 million shares, substantially higher than any day in the month prior. "You can actually see the wave of activity in the stock and compare it with the volume of spam that we trapped," said [Sophos analyst Ron] O'Brien.

But here's an interesting new tactic by the good guys:

Last Wednesday afternoon, Prime Time announced that it was ordering a Non Objecting Beneficial Owners (NOBO) list to get a clearer picture of who owned its shares. "The NOBO list will be used to determine the naked short positions in Prime Time Group Inc.," the company said in a statement. "The finding will then be reported to the [National Association of Securities Dealers] to take action against the violators of the naked short regulations."

"Naked short" is a investment term that refers to selling short, essentially a bet that the price will drop, but with a twist: "naked" means that the investor sells short without first making sure he can borrow the shares from another investor holding a "long" position on the stock.

I hope this works; it'd be great to see the profit mechanism behind pump-and-dump spam killed off.

Spamnation notes:

Incidentally, the greeting card spam that built the botnet used to promote PRTH.PK and CYTV.OB also continues. It has iterated through another couple of generations: the current incarnation tells recipients to collect their custom Musical ecard or custom Movie-quality ecard or other variants on that theme. We've seen about 150 of these in the past three days, suggesting that the unknown senders are probably well on their way to building up another botnet for their next stock spam run.

Spreading trojans via greeting-card spam is a trademark of the gigantic Storm botnet, AFAIK: SecureWorks info, MessageLabs info, spam levels causing DDoS for Canadian networks, DDoS threat for EDU sector.

The Haughey 419 returns

A few months back, Blogorrah noted an amazing 419 scam, claiming to be a missive from ex-Taoiseach of Ireland Charlie Haughey's wife, Maureen. It's really quite appropriate Charlie becoming the subject of a scam himself, given what he did to this country. But anyway... over the weekend, a new variant on the theme emerged:

From Mrs Maureen Haughey, ROI

My Dear Friend,

I am Maureen Haughey, widow of former Taoiseach of the Republic of Ireland, Charles J. Haughey and daughter of former Taoiseach of the Republic of Ireland and heir to de Valera, Sean F. Lemass.The Press has written a lot about unresolved mysteries and corruption surrounding CharlesÂ’s dealings, but I tell you something,my Charlie was a good man. He was human and he did whatever he did.

People marvel why I stuck with Charlie and didn’t speak during the mess that came with the exposure of his affairs with Terry Keane (I just hate to think of her). I had to stand by him through the tribunal times…. it was to do with what I’m doing now. No one knew the details of all Charlie’s financial dealings but me. I remain the only one who knows all who got loans from Charlie and didn’t come back to pay when he was disgraced. I am the only one who knows about these monies and the other Ansbacher accounts.

I write to you, an old weary woman, sick and almost tired of living. My end is near but I will not depart until my final mission is accomplished and I also write this with an unshaken belief in the power of aspirations and dreams of a human being. The Irish government thinks it can shave and reduce me to a poor widow but I have the winning ace. A few years ago, when we werenÂ’t sure if my Charlie would be convicted, he kept some money in trust for me in a Security and Finance company. He did not open the account in our names so it will not be traced to us to enable the past remain the past. The name on the account is Cedric de Vregille. I never thought Charlie would leave me so soon and it never occurred to me to ask if this name were fictitious or not or a name of any of his friends. I have tried to find this man but to no avail. The amount he deposited in this name is 30,000,000 (Thirty Million Euros).

I want an honest person to come forward and lay claims to this amount, moreover to use the funds as instructed by me. I have all the documents needed, I just need a face for the name. I have mapped out 30% of the funds for you, as you will help us (you and I) execute this job.

As soon as I receive your acceptance for this work I shall give you necessary details of my solicitor who will facilitate the release of the funds in your name. Please reply me via my personal email: maureen_haughey67@yahoo.co.uk


For my security and the sake of letting sleeping dogs lie, I strongly advice that you keep our dealings confidential. You can read more about my charlie from:

http://www.ireland.com/focus/haughey/ITstories/story11.htm

http://www.teachersparadise.com/ency/en/wikipedia/c/ch/charles_haughey.html

http://www.everything2.com/index.pl?node_id=548983&lastnode_id=0

Thank You.


Message sent using UebiMiau 2.7.2

It was sent via a webmail system at nildram.co.uk, from a proxy in Australia.

The writing is amazingly ornate -- 'I write to you, an old weary woman, sick and almost tired of living', 'the Irish government thinks it can shave and reduce me to a poor widow but I have the winning ace', etc. Very odd stuff. Also, it looks spell-checked. And, once again, poor old cyclist Cedric de Vregille gets dragged into it, too! I wonder what he did to deserve that ;)

If you fancy scambaiting, 'maureen_haughey67@yahoo.co.uk' is the one to go for. These guys seem to be having a good go of it -- 'The thought of the Irish government trying to shave an old woman has shocked and appauled me, so I will assist in anyway possible.' ha!

Rule Discovery Progress Update

Back in March, I wrote a post about a new rule discovery algorithm I'd come up with, based on the BLAST bioinformatics algorithm. I'm still hacking on that; it's gradually meandering towards production status, as time permits, so here's an update on that progress.

There have been various tweaks to improve memory efficiency; I won't go into those here, since they're all in SVN history anyway. But the results are that the algorithm can now extract rules from 3500 spam and 50000 ham messages without consuming more than 36 MB of RAM, or hitting disk. It can also now generate a SpamAssassin rules file directly, and apply a basic set of QA parameters (required hit rate, required length of pattern, etc.).

On top of this, I've come up with a workflow to automatically generate a usable batch of rules, on a daily basis, from a spam and ham corpus. This works as follows:

  • Take a sample of the past 4 days traffic from our spamtrap network. Today this was about 3000 messages.

  • add the hand-vetted spam from my own accounts over the same period (this helps reduce bias, since spamtraps tend to collect a certain type of spam), about 3400 messages.

  • discard spams that scored over 10 points (to concentrate on the stuff we're missing).

  • Pass the remaining 3517 spams, and text strings from over 50000 nonspam messages, into the "seek-phrases-in-log" script, specifying a minimum pattern length of 30 characters, and a minimum hitrate of 1% (in today's corpus, a rule would have to hit at least 34 messages to qualify).

  • That script gronks for a couple of minutes, then produces an output rules file, in this case containing 28 rules, for human vetting. (Since I've started this workflow, I've only had to remove a couple of rules at this step, and not for false positives; instead, they were leaking spamtrap addresses.)

  • Once I've vetted it, I check it into rulesrc/sandbox/jm/20_sought.cf for testing by the SpamAssassin rule QA system.

The QA results for the ruleset from yesterday (Aug 3) can be seen here, and give a pretty good idea of how these rules have been performing over the past week or two; out of the nearly 70000 messages hit by the rules, only 2 ham mails are hit -- 0.0009%.

In fact, I measured the ruleset's overall performance in the logs provided by the 4 mass-check contributors who provided up-to-date data in yesterday's nightly mass-check; bb-jm, jm, daf, dos, and theo (all SpamAssassin committers):

Contributor Hits Spams Percent
bb-jm 4249 24996 17.00%
jm 3450 14994 23.00%
daf 1236 35563 3.48%
dos 32867 100223 32.79%
theo 28077 382562 7.34%

(bb-jm and jm are both me; they scan different subsets of my mail.)

The "Percent" column measures the percentage of their spam collection that is hit by at least one of these rules; it works out to an average of 16.72% across all contributors. This is underestimating the true hitrate on "fresh" spam, too, since the mass-check corpora also include some really old spam collections (daf's collection, for example, looks like it hasn't been updated since the start of July).

Even better, a look at the score-map for these rules shows that they are, indeed, hitting the low-scoring spam that other rules don't hit.

That's pretty good going for an entirely-automated ruleset!

The next step is to come up with scores, and publish these for end-user use. I haven't figured out how this'll work yet; possibly we could even put them into the default "sa-update" channel, although the automated nature of these rules may mean this isn't a goer.

If you're interested, the hits-over-time graph for one of the rules (body JM_SEEK_ICZPZW / Home Networking For Dummies 3rd Edition \$10 /) can be viewed here.

Host monitoring with Jaiku

A few weeks back, we were having trouble with dogma, our shared server where taint.org is hosted, which would occasionally be unavailable for unknown reasons. We needed to monitor its availability so that it could be fixed when it crashed again, and we'd be able to investigate quickly. Since it was happening mostly out of working hours, SMS notification was essential.

Normally, that kind of monitoring is pretty basic stuff, and there's plenty of services out there, from Host-Tracker.com to the more complex self-hosted apps like monit and Nagios which can do that. But looking around, I found that none of them offered SMS notification for free, and since this was our personal-use server, I wasn't willing to sign up for a $10-per-month paid account to support it, or buy any hardware to act as a private SMS gateway.

Instead, I thought of Jaiku -- the Finnish company which offers a microblogging/presence platform similar to Twitter. Jaiku had a couple of cool features:

  • SMS notifications
  • it's possible to broadcast messages to a "channel", which others could subscribe to, IRC-style
  • it has an open API

This would allow me to notify any interested party of dogma's downtime, allowing subscribers to subscribe and unsubscribe using whatever notification systems Jaiku support.

With a little perl and LWP, I rigged up a quick monitoring script to check http://taint.org/ via HTTP, and report if it was unavailable over the course of 5 retries in 50 seconds. If it was broken, the script sends a JSON-formatted POST request to Jaiku's "presence.send" method, informing the target channel of the issue. (Perl source here.)

You can see the '#dogmastatus' channel here -- as you can see, we fixed the problem with dogma just over 2 weeks ago ;)

It's worth noting that I had to set up an additional user, "downtimebot", on Jaiku to send the messages -- otherwise I'd never see them on my configured mobile phone! Jaiku uses the optimisation that, if I sent the message, there's no need to cc me with a copy of what I just sent; logical enough.

Anyway, if you're interested in dogma's availability (there might be one or two taint.org readers who are), feel free to add yourself to the #dogmastatus channel and receive any updates.

Update: Fergal noted that it's pretty simple to use Cape Clear's assembly framework to perform a HTTP ping test with output to Jabber/XMPP. nifty!

A fishy Challenge-Response press release

I have a Google News notification set up for mentions of "SpamAssassin", which is how I came across this press release on PRNewsWire:

Study: Challenge-Response Surpasses Other Anti-Spam Technologies in Performance, User Satisfaction and Reliability; Worst Performing are Filter-based ISP Solutions

NORTHBOROUGH, Mass., July 17 /PRNewswire/ -- Brockmann & Company, a research and consulting firm, today released findings from its independent, self-funded "Spam Index Report-- Comparing Real-World Performance of Anti-Spam Technologies."

The study evaluated eight anti-spam technologies from the three main technology classes -- filters, real-time black list services and challenge- response servers. The technologies were evaluated using the Spam Index, a new method in anti-spam performance measurement that leverages users' real-world experiences.

[...] The report finds that the best performing anti-spam technology is challenge-response, based on that technology's lowest average Spam Index score of 160.

[...] Filter - Open Source software-(Spam Index: 388): This technology is frequently configured to work in conjunction with PC email client filters. The server adds * * SPAM * * to the subject line so that the client filter can move the message into the junk folder. This class of software includes projects such as ASSP, Mail Washer and SpamAssassin, among others.

The "Spam Index" is a proprietary measurement of spam filtering, created by Brockmann and Company. A lower "Spam Index" score is better, apparently, so C/R wins! (Funny that. The author, Peter Brockmann, seems to have some kind of relationship with C/R vendor Sendio, being quoted in Sendio press releases like this one and this one, and providing a testimonial on the Sendio.com front page.)

However -- there's a fundamental flaw with that "Spam Index" measurement, though; it's designed to make C/R look good. Here's how it's supposed to work. Take these four measurements:

  • Average number of spam messages each day x 20 (to get approximate number per work-month)
  • Average minutes spent dealing with spam each day x 20 (to get approximate minutes per work-month)
  • Number of resend requests last month
  • Number of trapped messages last month

Then sum them, and that gives you a "Spam Index".

First off, let's translate that into conventional spam filter accuracy terms. The 'minutes spent dealing with spam each day' measures false negatives, since having to 'deal with' (ie delete) spam means that the spam got past the filter into the user's inbox. The 'number of trapped messages' means, presumably, both true positives -- spam marked correctly as spam -- and false positives -- nonspam marked incorrectly as spam. The 'number of resend requests last month' also measures false positives, although it will vastly underestimate them.

Now, here's the first problem. The "Spam Index" therefore considers a false negative as about as important as a false positive. However, in real terms, if a user's legit mail is lost by a spam filter, that's a much bigger failure than letting some more spam through. When measuring filters, you have to consider false positives as much more serious! (In fact, when we test SpamAssassin, we consider FPs to be 50 times more costly than a false negative.)

Here's the second problem. Spam is sent using forged sender info, so if a spammer's mail is challenged by a Challenge/Response filter, the challenge will be sent to one of:

  • (a) an address that doesn't exist, and be discarded (this is fine); or
  • (b) to an invalid address on an innocent third-party system (wasting that system's resources); or
  • (c) to an innocent third-party user on an innocent third-party system (wasting that system's resources and, worst of all, the user's time).

The "Spam Index" doesn't measure the latter two failure cases in any way, so C/R isn't penalised for that kind of abusive traffic it generates.

Also, if a good, nonspam mail is challenged, either

  • (a) the sender will receive the challenge and take the time to jump through the necessary hoops to get their mail delivered ("visit this web page, type in this CAPTCHA, click on this button" etc.); or
  • (b) they'll receive the challenge, and not bother jumping through hoops (maybe they don't consider the mail that important); or
  • (c) they'll not be able to act on the challenge at all (for example, if an automated mail is challenged).

Again, the "Spam Index" doesn't measure the latter two failure cases.

In other words, the situations where C/R fails are ignored. Is it any wonder C/R wins when the criteria are skewed to make that happen?

Stop with the fake phish data

An anonymous friend in the anti-phishing community writes:

For those of you who blog and/or have contacts in the general computer user 'go fight 'em' community:

Is there any way you can get the word out that dropping a couple hundred fake logins on a phishing site is NOT appreciated??

It creates havoc for those monitoring the drop since it's an unbelieveable waste of time and resources to clean up the file. Also, for those drop files that 'recycle' after every 10 entries, valid data is lost.

It also creates havoc for those who get these files and try to notify victims. They waste time, too .. pulling legit info from amongst the trash.

I know there are programs out there that create/dump this stuff onto sites and some who call themselves 'phish phighters' enjoy the harassment aspect. But it wastes the time/effort of those who are seriously working these things.

New Science Gallery in Dublin

I just got this missive from the new Science Gallery at Trinity College Dublin:

The SCIENCE GALLERY is seeking EXPRESSIONS OF INTEREST for Festival of Light projects.

Calling all techno-artists, playful scientists, renegade engineers, architects, sculptors, lighting designers, fashion designers, guerilla projectionists and inventors...

The Science Gallery at Trinity College Dublin is developing a two week FESTIVAL OF LIGHT as its launching programme in January 2008 which will celebrate the art, science and technology of light through a range of installations and events in the Science Gallery and around Dublin's city centre.

We are seeking proposals for installations, events and workshops. You can download our Expression of Interest form here. We would like this to reach far and wide so please forward this onto anyone you think may be interested in submitting!

If you would like to discuss your ides with us or would like further information prior to submitting an Expression of Interest Submission please contact Elizabeth Allen at elizabeth.allen /at/ sciencegallery.org .

I'm looking forward to see what happens with this; hope it works out well.

T9 in Ireland

Tobias DiPasquale notes that the iPhone's dictionary can correct the word 'f***ing' right out of the box. Handy!

The vagaries of various companies' autocompletion dictionaries are always worth a comment. I've noticed that swearing is generally omitted, presumably for prudish reasons to do with tabloid PR fears. But as an Irishman, I find it particularly galling that Nokia's T9 dictionary cycles through the following entries for "pints":

  • Shots
  • Pious
  • Riots
  • Pints

When I type "pints" (which happens a lot), believe me, I never mean to type "pious". Stupid phone!

Planet Antispam unborked

Those of you who visit Planet Antispam may have noticed that it hadn't been updating in a few days. Somehow or other, the Planet software had corrupted its cache, and was dying with this error:

Traceback (most recent call last):
  File "planet.py", line 167, in ?
    main()
  File "planet.py", line 160, in main
    my_planet.run(planet_name, planet_link, template_files, offline)
  File "/home/planet/antispam/planet-2.0/planet/__init__.py", line 240, in run
    channel = Channel(self, feed_url)
  File "/home/planet/antispam/planet-2.0/planet/__init__.py", line 527, in __init__
    self.cache_read_entries()
  File "/home/planet/antispam/planet-2.0/planet/__init__.py", line 569, in cache_read_entries
    item = NewsItem(self, key)
  File "/home/planet/antispam/planet-2.0/planet/__init__.py", line 845, in __init__
    self.cache_read()
  File "/home/planet/antispam/planet-2.0/planet/cache.py", line 74, in cache_read
    self._type[key] = self._cache[cache_key + " type"]
  File "/usr/lib/python2.3/bsddb/__init__.py", line 116, in __getitem__
    return self.db[key]
KeyError: 'tag:blogger.com,1999:blog-9336495.post-117499582419244211 feedburner_origlink type'

Ah, Berkeley DB, always good for the infrequent inscrutable, yet fatal, error. A wipe of the contents of the cache directory, and it seems to be working again.

Unfortunately, I had to drop the RSS feed for Aunty Spam; it seems the domain has lapsed, and I can't seem to find an RSS feed that contains just the spam-related Aunty Spam posts any more.

‘I Go Chop Your Dollar’ star arrested

The Register is reporting that 'Nigerian comedian and actor Nkem Owoh' has been arrested in Amsterdam as a suspected 419 scammer:

Nigerian comedian and actor Nkem Owoh was one of the 111 suspected 419 scammers arrested in Amsterdam recently as part of a seven month investigation, dubbed Operation Apollo.

Owoh became a well known star within the Nigerian film industry, sometimes colloquially known as Nollywood because of its trite plots, poor dialogue, terrible sound, and low production standards.

Owoh starred in the 2003 film Osuofia, and a year later was one of several actors temporarily banned from appearing in movies by Nigeria's Association of Movie Marketers and Producers because he demanded excessive fees and unreasonable contract demands.

Owoh became internationally known for his song "I Go Chop Your Dollar", the anthem for 419 scammers ("Oyinbo man I go chop your dollar, I go take your money and disappear / 419 is just a game, you are the loser, I am the winner", full lyrics here), which was banned in Nigeria after many complaints.

The song was the title track from the comedy, "The Master", starring Owoh as a scheming 419er.

The alleged scammers are suspected of running a series of lottery-based (AKA 419-lite) scams.

Here's the video for "I Go Chop Your Dollar".

It's not exactly cut and dried, though. This thread suggests that he wasn't arrested for fraud; instead that the Dutch authorities detained pretty much everyone at his concert. This article suggests similar:

The Netherlands police were said to have stormed the venue of the show in a helicopter about 2a.m and arrested practically everybody at the venue. [...]

"Over 200 of them (Nigerians) were arrested that night. It was a big haul; they came with helicopter and cars and circled the whole area. As I speak with you, over 70 of those apprehended that night have been deported for possession of expired or fake immigration papers.

"Osuofia was also whisked away but was released hours after," the source said.

Update: It appears Osuofia was not arrested after all; lots more details here.

Hunting the wily mangosteen

A few weeks ago, I was in Tesco Clearwater when I spotted something I wasn't expecting; a tray of fruit labelled "Mangosteen".

Mangosteen are delicious. In Thailand, they're called "the queen of fruit" (with the oh-so-stinky and not quite as enjoyable Durian as the king). We once spent a week on a Thai beach snacking on bags of the things; they're so good.

Unfortunately the tray was empty. :(

Ever since then, every time I've gone back to that Tesco, there's been no sign of the mangosteen; not even another empty tray! Thing is, I now know they're importing them, so I'm really jonesing... if any Dublin taint.org readers happen to spot some, please (a) be sure to buy some for yourself and (b) let us know where you found it!

Linking for charidee

Tom tagged me with another blog link-meme -- a worthwhile one, though; the idea is to improve the page rank of charities in Ireland, by linking to them. Fair enough!

The list of charities so far is:

And I'll add Focus Ireland (who seem to have broken their website!). Thanks to Dorothy for the suggestion.

Who to pass it on to? How's about Una, James and Donncha?

NSAI invites comments on OOXML/OpenXML standard

Antoin writes:

NSAI (the Irish national standards body) has posted an invitation for comments on its site regarding the proposed new Office Open XML standard (ISO/IEC DIS 29500). NSAI has established an ad hoc committee to consider the matter, and I am a member of that committee, together with a number of far more important and qualified people.

Anyway, we are anxious to hear from anyone who has a view on what way NSAI should vote on this standard when it reaches committee. If you can provide links to any relevant articles, that would also be very helpful. If you have time, please review the documents and leave your comments either here or send them to the committee.

So if you've been following the ongoing drama (to be honest, I haven't), please feel free to make a submission; the deadline is 11 July.

UPS Ireland suck

I'm waiting for a replacement battery from Dell, covered under warranty. Dell service have been great, but UPS, not so much...

On Monday (25th June), after a little back-and-forth to establish that the battery was faulty, I got a mail from Dell saying:

The Part (Battery) will be with you tomorrow pre 17:00 (Next Business Day). Please note that you will require to return the faulty part at the same point of time, the courier person would not be delivering the part until you return the defective part.

Great! That's good warranty service. I'm happy.

So I wait... and wait. Finally, 2 days later, today (Wednesday 27th), at 17:45, a courier appears to pick up the faulty part. Unfortunately, he doesn't have the replacement with him.

I go online to see what's up via online tracking, and see this:

Location Date Local Time Description
DUBLIN,
IE
27/06/2007 16:41 A CORRECT STREET NAME IS NEEDED FOR DELIVERY. UPS IS ATTEMPTING TO OBTAIN THIS INFORMATION
27/06/2007 4:13 IN-TRANSIT SCAN
27/06/2007 4:12 IMPORT SCAN
DUBLIN,
IE
26/06/2007 18:31 IMPORT SCAN
26/06/2007 5:59 IMPORT SCAN
26/06/2007 5:58 OUT FOR DELIVERY
26/06/2007 3:59 ARRIVAL SCAN
KOELN (COLOGNE),
DE
26/06/2007 4:39 DEPARTURE SCAN
26/06/2007 4:14 DEPARTURE SCAN
HERKENBOSCH,
NL
25/06/2007 10:09 ORIGIN SCAN
NL 25/06/2007 14:02 BILLING INFORMATION RECEIVED

So, what, the street name is "INCORRECT" despite one UPS driver having no problem? I suspect someone just couldn't be arsed.

I rang up UPS, provided a hint, and it seems the delivery is now rescheduled for Friday. So much for "next business day" delivery! Lucky the laptop works on AC without the battery, otherwise I'd be quite annoyed.

I wonder if I can provide feedback to Dell about this? There's a possibility they might switch courier company if they get enough complaints about crappy service. It also makes me wonder if there's any decent international parcel delivery service in Ireland. At least UPS haven't yet required me to schlep over to a "local" depot 5 miles away to pick up the package myself, like An Post does...

How I wound up with a pond

My weekend went like this:

  1. buy a Green Cone composting system
  2. read instructions
  3. find out I had to dig a 3' by 2' deep hole
  4. spend all Saturday afternoon digging massive hole in the back garden, horny-handed son of toil style
  5. just as I finish, the skies open
  6. watch in horror as the hole rapidly becomes a pond
  7. since the green cone requires a dry hole, wait for it to drain...
  8. ...and wait...
  9. ...and wait...

I'm still waiting. :(

I just hope the flooded state of the pit is a side effect of the monsoon levels of rain over the last week, and will drain soon, rather than the normal situation for the garden. Otherwise, I'll have to fill the hole and give up on the Green Cone entirely... argh. I should have gone for the wormery option, like lisey suggested!

Update: Enda left a good tip in the comments -- dig deeper into the clay and fill in with more gravel. I did that and it looks like it's working... Let's see if the worms like it. I'll keep yis posted ;)

How to solve a maze with Photoshop

wow, this is cool. lod3n, confronted by this heinous puzzle, wrote:

'2 minutes in Photoshop. All too easy. So, where do I pick up my cake?

  1. Increase contrast.
  2. Select the right wall of the maze using the magic wand.
  3. Select > Modify > Expand 4 pixels
  4. Create new layer.
  5. Fill with Red.
  6. Select > Modify > Contract 2 pixels.
  7. Delete. Now you've got a line tracing the solution.
  8. Manually clean up the outer edge, and connect the dots.
  9. Cake!'

Here's the result. Seriously nifty!

(Update: wow, this got Dugg heavily -- 17000 pageviews from Digg alone! Unfortunately that caused a bit of a server meltdown. Should be back now though...)

7digital – a bit risky

Apparently EMI are now offering their DRM-free MP3s via 7digital, so I thought I've give the newly-revamped 7digital site a go. Results were a little mixed, unfortunately.

I found a couple of tracks I wanted which were available as MP3 format, clicked the "purchase" button beside them, and they were added to the "basket" on the right-hand side. Pretty typical stuff, if you've used EMusic or iTunes. Then I created an account, chose to pay using Paypal, paid a couple of quid and all was well!

The good stuff:

  • the website works great in Firefox on Linux, and was nice and speedy.

  • the range of music seems pretty good; most of the catalogue is WMA-only unfortunately, but most of the new releases now seem to be coming out with MP3 as an option.

  • it's very easy to pay by credit card or with Paypal.

There were a couple of glitches, however.

First, it allowed me to buy a file, then not give it to me. My first tester track was the Soulwax remix of 'Standing in the Way of Control' by Gossip. I happily added it to my basket, checked out, and paid -- then when I got to my 'Your downloads' page, I was presented with this:

Gossip - Standing In The Way Of Control (Soulwax Nite Version) / 6:54 / Released 24.06.2007

No download links etc... hmm. A quick check of today's date reveals that the 24th is a week from now -- the track hasn't been released yet! It seems this isn't yet "available as a digital release" for some reason, despite the fact that as far as I can tell it's been out for ages on CD. The only way to spot this in advance of purchase is to look at the "Digital release date" on the album info page and compare with today's date; there's no other notification that you'll be buying a prerelease, and will have to wait to get your digital mitts on what you buy. Grrrr.

OK, next one; my other tester track was the title track from the new White Stripes, Icky Thump. At least this one was available. Now, supposedly we're getting 320kbps MP3s, right? Not so, it seems -- this one was 192kbps, a fact that's only revealed once you've already paid for the tracks. Double grrr...

(it turns out, by the way, that only the "EMI content" is delivered in 320kbps format. I guess the other MP3 labels are sticking with 192kbps.)

So, two for two, both of the test downloads turned out to be wonky in one way or another. A bit disappointing. I hope they'll improve though -- there seems to be a new willingness to offer a decent MP3 music-download service there... and this is still more convenient for me than having to boot up a Windows virtual machine to use the iTunes Music Store.

They could really do with signposting exactly what you're getting more clearly, though; in particular, being able to search by available format and bitrate would really help.

Lyris’ low SpamAssassin threshold

via jgc's newsletter, Lyris' latest ISP Deliverability Report (Q1 2007) makes an interesting point about legitimate bulk mail and SpamAssassin:

Contrary to popular belief among marketers, message content is not a major cause of deliverability challenges for most email marketers. This finding is a result of testing the content of more than 1,705 unique emails, using [Lyris] EmailAdvisor's content scoring tool. The content scoring function is based on the content scoring rules of the widely adopted Spam Assassin open source project. The emails tested had an average content point score of 1.04 well below the filter's generally accepted spam identification level of 3.0 or higher.

Now, that's broadly good advice -- SpamAssassin hasn't really given much strength to signatures found in message body text in the past couple of years, since the signatures from other sources (especially DNS blocklists and URI blocklists) are much more reliable.

However, note the bit I emphasised. Since when is 3.0 the 'generally accepted spam identification level'? Only the most paranoid user would ever go that low, since at that level, they'd expect to find 2.22% of their nonspam mail going into the spam folder (according to our own tests). In reality, our recommended level has always been 5.0 points, and that's what we optimise for. I'm mystified as to where they're getting 3.0 from...

Irish medical tourism

Just got a mail from an old friend, Caelen, who's got a new start-up going with an interesting angle. Caelen and his (now-) wife, Barbara, spent a while travelling around Asia around the same time as we did. As I noted back in 2003, one thing he tried out, which I found particularly intriguing at the time, was to have some minor surgery in Bangkok:

This may seem foolish at first, but despite being in the heart of South East Asia, in what is generally thought to be a developing country, the Thai medical system is unbelievably good. Not only is it the medical hub for expatriates throughout the region, but tens of thousands fly here each year to have elective surgery, from laser eye treatments to boob jobs and face lifts. There are lots of reasons why they come to Bangkok but invariably quality of surgery and care comes top of the list. Simply put, medical care in Thailand is amongst the best in the word, available at a fraction of the cost.

The Thai government sees health care as the next logical step in its hospitality industry. As holiday makers in Thailand reach saturation point, growth has to come from other sectors and international healthcare has many of the same requirements as the tourism industry: good flight connections, plentiful accommodation and above all staff that are understanding and friendly. Gleaming hospitals, which could be mistaken for 5 star hotels, not only have rooms with all amenities but also have suites, restaurants, shops and cinemas. Menus from the finest restaurants in town are placed in the best rooms. Going to hospital doesn't mean you have to stop having fun - this is Bangkok after all. This is a long way from the cold greasy egg served by the kitchen's 'Miserable Person of the Year' award winner we get at home.

Back in 2002, this was pretty unprecedented -- of course, nowadays, the concept is a lot more widely practiced, what with healthcare costs rising in the US and waiting lists rising in the UK.

I can vouch that the quality of care in Bangkok was fantastic, by all accounts; fastidiously clean and professional. (I never did it myself, but many people I knew at the time took advantage of the opportunity, rather than risk something flaring up in the less, er, reliable settings of Luang Prabang or Phnom Penh.)

Anyway, turns out Caelen has come up with a new site that is related to this -- Reva Health Network. He says, 'basically, we are a medical tourism search engine where consumers can find and compare hospitals and clinics from around the world. We cover everything although the bulk of our business is currently in dental.'

If you're looking for some work done, it might be worth taking a look; it's at revahealthnetwork.com.

Update 2010-08-16: They've moved! The new URL is http://www.whatclinic.com , which makes much more sense really. Apparently they're getting 500,000 visitors a month, and proxy though 800 phone calls a day to clinics. Cool -- sounds like it's going well...

IKEA Dublin gets planning permission

Given that I'm trying to get a new house in order, here's a topic close to my heart right now -- massive IKEA store approved for Dublin:

An Bord Pleanála has given the go-ahead for the construction of a massive IKEA outlet in the Ballymun area of Dublin. Legal restrictions on the size of retail developments had already been changed to allow the Swedish furniture giant to build a 30,000 square foot shop in the area. However, several objections were received from the National Roads Authority, Green Party TD Eamon Ryan and a number of businesses which said they would be adversely affected by a huge increase in traffic on the M50 motorway. An Bord Pleanála has now decided to grant permission for the project, subject to 30 conditions aimed at preventing traffic congestion, protecting the visual amenity of the area and promoting sustainable development.

This is long overdue, and something Ireland's been crying out for -- the price and quality of furniture here is dire. I'm glad to see it.

The details are up on An Bord Pleanala's site, including the Board's conditions. For ease of reading, I've converted it to HTML using OpenOffice.

This one strikes me as potentially annoying:

A schedule of parking charges shall be applied to car park users (other than coaches and buses which shall not be charged for parking during opening hours) [...]

At least two months prior to the opening of the proposed development for trading, an initial schedule of charges shall be agreed in writing with the planning authority. Where the daily peak hour two-way traffic flows as measured by the automatic traffic counters do not comply with the thresholds set above, the schedule of parking charges shall be varied as directed by the planning authority until compliance is achieved, save that breaches or non-compliances of a very minor or trivial nature or arising from exceptional circumstances may be disregarded at the discretion of the planning authority.

Reason: To minimise traffic impacts and avoid serious traffic congestion.

Patronising pregnancy

Via Yoz comes this great article: Zoe Williams: Being pregnant and receiving unscientific advice go hand in hand. Here's a sample:

Listeria has been my particular bugbear ever since a midwife - that is, a trained prenatal professional who, unless I develop complications, represents the highest medical authority I can expect to deal with throughout my pregnancy - told me that I could get listeriosis, thereby brain-damaging my foetus, without knowing about it. Now, listeriosis is an incredibly serious disease, with extremely serious symptoms, taken extremely seriously by epidemiologists nationwide. Get it without noticing it? If I got listeriosis, the national papers would know about it. It would be the third outbreak that has occurred in [the UK] in the past 20 years.

Here are some other things that are wantonly untrue: pasteurisation, in fact, has nothing to do with a cheese's ability to harbour the listeria bacteria. The bacteria that characterise different cheeses are introduced after the pasteurisation process anyway. Listeria flourishes in moist environments, so parmesan is safe where camembert isn't, but even rinded and soft cheeses are safe once they have been cooked. But food hygiene is a much more important factor than moisture - raw fish does not come out of the sea carrying listeria, but contracts the bacteria from contact with dirty hands. Of the past two outbreaks of listeria in Britain, one was from butter and the other from lettuce (there have been other instances of product recalls, but no human contamination).

In fact the three worst recorded cases of listeria since 1992 have all been in France, and were all from pork tongue in jelly, which nobody in their right mind would ever eat. Of the past 10 listeriosis outbreaks in America, only two were from cheese, and one of those was a Mexican homemade cheese. The notion that there are pregnant people out there whipping themselves into a frenzy of guilt because they have eaten some gorgonzola is just infuriating.

This patronising "pregnant women mustn't do X" paranoia is C's pet hate of the moment; being a (pregnant) scientist, she's been checking them against Medline, looking into the extent of the real research these claims are based on, and generally writing them off one by one. I've been trying to persuade her to write a blog post about this for taint.org, so far with no luck though...

MAAWG Talk

Here's the talk I gave at MAAWG, entitled New Features in SpamAssassin 3.2.0 Of Interest To Large Receivers:

Abstract:

Many ISPs and mail receivers, at all scales, use SpamAssassin as part of their spam-filtering arsenal. The recent release of SpamAssassin 3.2.0 introduces much new functionality, and some of this is of particular interest to the large-scale mail receiver; in particular, rules compiled to parallel-matching native object code for increased speed, early short-circuiting based on administrator-specified rules, the new "msa_networks" setting to specify MSA hosts or pools, a new ruleset to detect spam/virus backscatter bounces, a way to run SpamAssassin in the Apache httpd server using mod_perl, and support for Amazon's EC2 virtual server farm. In this talk, I'll discuss each of these in detail, and discuss why it may be useful to you.

If you were at MAAWG, hope you enjoyed it ;)

DSPAM acquired by Sensory Networks

whoa, didn't see that coming. Quoting Jonathan Zdziarski via jgc's newsletter:

...The [DSPAM] project had grown to a point where it would take others - with enough free time - to bring DSPAM to the next level as a widely accepted enterprise-class solution, and [I] decided that it would be in the best interest of the project to entrust it to someone with the technical knowhow and dedication to reach these goals. Many of you are aware of my work in the past with Sensory Networks in developing a hardware-accelerated version of DSPAM (capable of supporting multi-megabit speeds in large carrier environments). I've spent a considerable amount of time with SN's team over the past several years and when we initially discussed working together, they had shown to be very excited and motivated about the project.

After careful consideration and many discussions at length, I decided to allow Sensory Networks to acquire the rights to the project, and continue development on it with their own team. SN has displayed a strong commitment to the open source community and has been working closely with other leading projects such as Snort, Clam Antivirus, and SpamAssassin. They assured me that the project will remain open-source and available to all, and at the same time the project will receive exposure in commercial environments it has not seen before, as many of you have been asking for. We've now completed the acquisition for the project, and I'd like to encourage you to support them in helping them move forward as it grows into new areas.

More details at zdziarski.com.

Dealing with backscatter, revisited

Back in January, I wrote about how I deal with email backscatter nowadays. Since then, I've made a notable tweak.

This is that I no longer reject "null-sender" traffic during the SMTP transaction. It turned out that it broke Exim's implementation of Sender Address Verification, which performs the SAV check using a MAIL FROM of <>, rendering it indistinguishable from a bounce during the SMTP transaction.

Now, I've complained about SAV, but I have to be pragmatic anyway (Postel's law and all that!) -- so it was better to just allow other sites to perform SAV lookups against our server, and fix the anti-bounce stuff some other way.

The new method (below) does this, by allowing null-sender SMTP traffic just fine; it detects bounces in Postfix if they arrive via SMTP in RFC-3464 format, and bounces that slip past are then dealt with in a more CPU-intensive manner using the SpamAssassin "VBounce" ruleset (which is part of the now-released SpamAssassin 3.2.0, btw).

This increases the load, since some bounces cannot be rejected at MAIL FROM time now, and instead we have to wait 'til DATA -- but CPU hasn't been a problem recently, so this is ok.

Here are the updated instructions:

In Postfix

In my Postfix configuration, on the machine that acts as MX for my domains -- edit '/etc/postfix/header_checks', and add these lines:

/^Content-Type: multipart\/report; report-type=delivery-status\;/  REJECT no third-party DSNs
/^Content-Type: message\/delivery-status; /     REJECT no third-party DSNs

Edit '/etc/postfix/main.cf', and ensure it contains:

header_checks = regexp:/etc/postfix/header_checks

Then run:

sudo /etc/init.d/postfix restart

This catches most of the bounces -- RFC-3464-format Delivery-Status-Notification messages from other mail servers.

In SpamAssassin

As before, install the Virus-bounce ruleset and set it up. This will catch challenge-response mails, "out of office" noise, "virus scanner detected blah" crap, and bounce mails generated by really broken groupware MTAs -- the stuff that gets past the Postfix front-line.

Dead laptop time

Argh. My Thinkpad's power socket must have received a knock during the move. It no longer works with either of the two power bricks I have here -- so it looks like it's time to either (a) buy a soldering iron and some screwdrivers (incl Torx ones?) or (b) renew my IBM warranty service and send it in for some fixing :(

Bad timing.

Update: oh look, it's working again! phew. I guess I should probably set aside some time for warranty service here anyway though...

Back

Hey -- I'm back, rested and full of tasty, tasty Niçois and Provencal cuisine.

I got back just in time to vote, for what good that did with Bertie's gang leading strongly in the current counts... argh!

For what it's worth, I gave Patricia McKenna a preference, in the end. I was reminded that she'd been entirely on our side on software patents during her time as an MEP -- so credit where it's due, there; on top of that, a vote for the Greens is better than a vote going to Sinn Fein, after all, no matter what. ;)

Carbon offsetting

I'm off to Nice on vacation for two weeks, starting tomorrow -- back on May 25th. See ya then!

In the meantime, and appropriately enough given that jet fuel I'll be consuming, here's some interesting stuff from my mate Eoin on carbon offsetting...

'It's a fecking minefield to figure out. There are many conflicting standards, some of which sound impressive but are useless in reality.

Steer clear of tree planting, especially outside Europe; even a well-run forestry in Europe will take decades to make any difference.

The best quality-mark appears to be the CDM Gold Standard. The Gold Standard is a recent introduction, a response to the weak, conflicting Kyoto standards and many ad hoc government ones. Gold Standard specifically excludes tree plantatations.

The following operators are the only ones I found that are Gold Standarded and also pass the bullshit smell test (which is far more stringent ;-) thanks to all who supplied links etc. -- eoin

  • My Climate -- Seem good. run out of Switzerland. Professional vibe. Mainly projects in the developing world.
  • Atmosfair -- like the swiss one except smaller and German. Again, seems professional, their projects page in particular reads well. Doing a German schools project as well as developing world ones.
  • Climate Friendly -- Aussies. Mainly wind power, in Oz & NZ. Again seem good, have been around for a few years. Website is decent if a bit all over the place.
  • Sustainable Travel International -- more an eco-holidays travel agent than offsetting per se. Useful bookmark.
  • Puretrust.org.uk -- These guys seem good. Interesting business model. They buy high quality carbon credits, from mainly Gold Standard providers, and retire these credits. Permanent retirement, I think, though this wasn't 100% clear on their site. So they both support the providers directly by doing business with them, and also jack up the market price by reducing supply. This supply choke isn't something that the rest of them do, at first glance anyway. Clever idea. As the market price gets higher it will put pressure on companies to reduce their emissions, not just buy their way out of it.'

Now it's worth noting that this is the state of play as of May 2007; it'll definitely change pretty quickly as time goes on. Good info, though.

Eircom broadband — it’s never easy

Argh, it's never easy.

After this post, the consensus was that nowadays, Eircom have a pretty good quality of service for their DSL offerings, taking both price and service into account. I was happy enough to go with that, so I ordered their "Eircom broadband always on 2MB and Eircom talktime anytime bundle", back around the middle of April.

I had a great call with the sales agent, Hazel. Everything went swimmingly, we were all set for the modem to be delivered and the service to be up and running in 10 working days -- by May 1st April 30th. I asked for an order reference number and she said I didn't need one, it was all handled in their system. Great!

Unfortunately it seems the call centre staff never got that quality-of-service memo.

Come May 1st, there was no sign of the modem, so I rang Eircom's order line to see how things were going. To my horror, the staff I talked to told me that there was no record of my previous order, or call... it was as if that call had never taken place at all. No part of the order had even started.

As a result, I've had to reorder from scratch. The previous 10 working days we've waited counts for nothing. (The agents lie through their teeth about this, though -- one agent says they'll send it out in the "next 3-5 days", the next agent insists that we have to wait the full 10 days, and the next says somewhere in between -- anything to get us off the line within 4 minutes.)

This is bad news, since we're waiting on the broadband to move in -- since I work from home, we can't move in until we have a good 'net connection.

We can't even make a complaint to Eircom about this fuckup, because they refuse to take complaints without the original order number to reference -- the one that "Hazel" told me wasn't needed anymore. Now that's bureaucracy. Attempts at escalation just wound up with a dead end, where supervisors had no names and had left the office at 10am anyway. >:(

Best of all, their online complaints system now takes a maximum message length of 400 characters, so you can't even provide a detailed written complaint online anymore. (That is, not unless you submit the complaint in 15 separate parts...)

What a fiasco.

So we now have to wait until May the 15th. We've submitted the complaint via the aforementioned 15 parts, and postally; if they don't take action on those, we'll complain to Comreg (and let's see what that's worth).

But here's a question -- assuming they fail to deliver the second order within time this time around, can we cancel at that stage? There's a minimum contract length of 6 months, but since the service hasn't been delivered, I would hope that hasn't started yet. The terms and conditions document says:

"Ready for Service date" (otherwise "RFS date") means the date on which eircom establishes the Facility for the Customer.

3.1 This Agreement shall commence on the Ready for Service date and shall be for the Initial Period. Provided that this Agreement has not been terminated in accordance with its terms or in accordance with the Regulations, this Agreement shall thereafter automatically renew for successive six-month periods. For the purposes of this clause 3, a six-month period will be calculated from the anniversary of the RFS date.

3.2 The Customer may cancel its order for the Facility at any time prior to the RFS date. In the event of such cancellation by the Customer it shall be obliged to return any Kit, which may have been provided to it by eircom. Any Kit shall be returned to eircom by posting it to the freepost address detailed in the welcome pack. In the event of any Kit not being returned to eircom within fourteen (14) days of the cancellation of the Order for the Facility, the Customer shall be charged by eircom and shall pay to eircom such sum as is set out in the Regulations as being the charge payable in respect of the non-return of any Kit.

So I guess as long as the facility -- the ADSL line -- is not up and running, I'm clear to cancel, right? It's a little worrying that the "facility" doesn't include the "kit" -- ie. the broadband modem, though; if they fuck up sending out the modem, but the line is up, am I liable for 200 Euros?

In terms of who are viable options to switch to -- in my opinion it's got to be fixed wireless, since everyone else now would have to go via Eircom's exchanges anyway, and be delayed there. So -- Irish Broadband. I know they had some pretty massive problems 2 or 3 years ago, but recently I've been hearing good things about them, Boards.ie has some reasonably good-sounding recent experiences, and half of my new neighbours (srsly!) are using them with great results. Anyone got recent news about how useful they are with service quality and install speed for their Breeze product in the D9/D11 area?

Alternatively, Ripwave might make a reasonable stop-gap option? 120 euros is the minimum fee (6 months at 18.95 per month), which is better than the money I'm paying now to live in two houses...

Alternatively anyone know an Eircom engineer in D9/D11 that can nip over to the exchange and plug in my connection on the DSLAM? ;)

Moin Moin attachment spam

Here's a new trick used by the web spammers -- attachments on a Moin Moin wiki. The taint.org/wk RecentChanges list illustrates it well:

2007-05-07  set bookmark
[UPDATED]       UserPreferences         04:17   Info    ?StepStep [1-21]        
  #01 Upload of attachment 'big-cocks.html'.
  #02 Upload of attachment 'big-cock.html'.
  #03 Upload of attachment 'big-boobs.html'.
  #04 Upload of attachment 'big-ass.html'.
  #05 Upload of attachment 'bdsm.html'.
  #06 Upload of attachment 'bbw.html'.
  #07 Upload of attachment 'bang-bros.html'.
  #08 Upload of attachment 'bangbros.html'.
  #09 Upload of attachment 'baby.html'.
  #10 Upload of attachment 'asian-porn.html'.
  #11 Upload of attachment 'asian-girls.html'.
  #12 Upload of attachment 'anime-porn.html'.
  #13 Upload of attachment 'anime-girls.html'.
  #14 Upload of attachment 'angelina-jolie.html '.
  #15 Upload of attachment 'amature.html'.
  #16 Upload of attachment 'amatuer.html'.
  #17 Upload of attachment 'adult-videos.html'.
  #18 Upload of attachment 'adult-stories.html' .
  #19 Upload of attachment 'adult-games.html'.
  #20 Upload of attachment '69.html'.
  #21 Upload of attachment '3d.html'.

Great. Lots of spam. This first started appearing on Feb 27 2007, in a multi-upload attack on a single page ("FindPage"), from IP address 212.26.129.162; then reoccurred on Apr 27 and May 7 from the (insecure open proxy) proxy.drevlanka.ru.

Annoyingly my "subscribe to wiki changes" patch doesn't catch this -- these aren't gatewayed through as "changes" via mail for review. I need to fix that in my copious free time. :(

Also, the RecentChanges RSS feed doesn't list them, although the HTML form does.

So unfortunately, the only way I can see to block this is either to review by visiting the RecentChanges page in a web browser regularly (how retro!), and delete them retrospectively, or simply to turn off attachments entirely -- which is what I've done, by editing "wikiconfig.py" and adding:

    actions_excluded = ['AttachFile']

It looks like quite a few other wikis around the web are running into the issue too :(

SpamAssassin 3.2.0!

W00t! SpamAssassin 3.2.0 has finally gone gold!

This release is a big one -- it's the first major release since 3.1.0, back in September 2005, just over a year and a half ago. Here is the release announcement mail, containing a list of major changes since version 3.1.8. There are a few major new features that I feel worth picking out in more detail and editorialising about:

sa-compile

This is a biggie. This new script takes the active SpamAssassin ruleset, and uses code contributed by Matt Sergeant to produce input for re2c. re2c in turn compiles the ruleset into a deterministic finite automaton, which can match multiple regular expressions in parallel. That's not all, though; re2c then compiles that DFA into C code -- which is then compiled into native object code. SpamAssassin will then load that object code and use it to replace the slower perl regexp tests, if it's available at scan-time.

Now, it's been a long time since SpamAssassin's ruleset consisted mainly of rudimentary regular expressions matched against the body text -- a good portion of SpamAssassin's ruleset these days operates against headers, performs network lookups, analyzes URLs extracted from the body, uses the more advanced features supported by Perl's NFA regexp engine, or so on. But even given that, the effects of 'sa-compile' seem to average between a 15% and 25% speedup, in my testing. That's good ;)

Many of the commercial versions of SpamAssassin include their own body-rule speedups -- but this is the first time anything similar has made it into the open source code.

Short-circuiting

Another good one for performance. There are some rules that you can reasonably assume will never hit nonspam or spam mail in a well-configured setup. For example, a hit on "ALL_TRUSTED" should mean that the message never traversed an untrusted network, therefore it cannot be spam, so why bother applying the expensive tests? It should be reasonable to "short-circuit" and immediately return a "ham" score for that mail.

This new plugin implements that algorithm -- and efficiently, too, which historically has been the hard part!

I've been using this for a while with a ruleset like this one -- in my experience, it's cut overall CPU time spent scanning mail by 20%.

It is pretty flexible, too -- there's lot of tweakage that can be done with this functionality to suit your own setup.

Reduced memory footprint

One aim of this release has been to reduce the memory usage of SpamAssassin; the core code now uses less RAM than 3.1.x does, when tested with the same ruleset. (Unfortunately we've added lots more rules in the interim, so it's a bit of a wash overall. ;)

The VBounce anti-bounce ruleset

Detects spurious bounce messages sent by broken mail systems in response to spam or viruses. More info about that here.

Apache-spamd

apache-spamd implements spamd as a mod_perl module. This was contributed by Radoslaw Zielinski, as a Google Summer of Code project last year. Thanks Radoslaw!

There are plenty more new, useful features and rules -- these are just the top ones, in my opinion. Pretty cool stuff!

Patricia McKenna and MMR, again

Great! Patricia McKenna just called around, canvassing our area -- and just got a serious telling off from the wife ;)

Catherine -- unsurprisingly, given that she's a zoology Ph.D -- was fantastic, hitting every key point of the issue: that we're both long-time Green voters who've been forced to not vote Green this time around, due to this MMR issue and the anti-science/pro-hokum angle it represents.

Interestingly, she claimed that her stance on MMR was always her own point of view, and that it wasn't party policy -- and that it was mentioned on the party website was a rumour put about by the PDs.

While it turns out that Dr. Ruairi Hanley, the author of this letter to the Indo is indeed a PD (didn't realise that!), Treasa at Winds and Breezes also noted it appearing on the Green Party site, as follows:

Questioning the Benefits of Immunisation

There are significant question marks about the effectiveness of mass immunisation programs. We would launch a major study of the benefits of these programs looking at all aspects of health

So Treasa -- are you a stealth PD rumour-monger? ;)

Worth noting that at no time did McKenna reassure C that her policy would not become government policy if the Greens were elected... as an elected representative, surely her own policies would influence the government's thinking?

Screenclick devolve again

After a short period where things were looking up, Screenclick have once again reverted to type, by ditching the lovely simple Netflix-style queue they seemed to be using, and instead instituting some new kind of bizarre homebrew wierdness.

It looks like a queue, with a line-by-line listing of movies -- but then beside each title, there are 3 radio buttons: "High", "Medium", and "Low".

The instructions run as follows:

All titles are sorted in alphabetical order within their priority group
  • - High: Please deliver these titles as soon as possible
  • - Medium: Please deliver these titles as they become available
  • - Low: I don't mind when you send these titles

So what -- does this mean that if I put a title in as "High", I'm going to receive it next, or not, or what? and what's with the alphabetical order? WTF is going on? argh.

Anyway, I just got out "Amores Perros", presumably due to this alphabetical ordering thing. not what I wanted at all. What a mess.

A week of Bertiespam

We're in the run-up to a general election here in Ireland, and I live in Bertie's constituency. For the past year or so, things have been pretty quiet, but in the last week there's been a sudden flurry of activity and direct postal mail from Bertie's office -- and from many departments of local government, too:

Mon Apr 23:

  • Fianna Fail: "Fianna Fail delivers on education in Dublin Central", tabloid newspaper.

  • direct from the office of Bertie: a photocopied letter from the Environmental Health Officers of Dublin City Council about the standards of rented houses "in my area".

Tues Apr 24:

  • HSE: "Parents Who Listen, Protect" leaflet, a full-colour glossy handbook "on building good communication in families and communities" "as part of a national initiative on child protection".

  • Dept of Environment: a leaflet on the "National Climate Change Strategy, 2007-2012, Main Points". Printed on recycled paper, naturally ;)

Fri Apr 27:

  • Fianna Fail Senator Cyprian Brady: "dear resident, please vote for me" -- one-page full-colour glossy.

  • Spring 2007 "Central News", "Official Voice of Fianna Fail in Dublin Central", a 16-page tabloid newspaper, featuring stories like "Smithfield: the Temple Bar of the Northside" (like Temple Bar, but with more winos and Children's Court, and less stuff!)

Mon Apr 30:

  • HSE: "Need a doctor urgently? Call D-DOC out-of-hours GP service", full-colour glossy leaflet.

  • from Bertie: Evening of Election Letter. "Good evening constituents" etc.

It's a veritable flood of full-colour glossies! Could be worse, I suppose -- I hear the PDs have been blanketing selected Dublin constituencies in free books. However I suspect grimy Dublin 7 is a little off their list (see "winos", above).

It's worth noting that a good half of this flood (which I've coined Bertiespam to describe) isn't from Bertie's constituency office -- it's from government departments like the HSE and the Department of Environment. It's funny that we hadn't heard a peep from them all year, then once an election looms -- "here come the voters! look busy!" ;)

What bertiespam have you been getting?

Hog’s Chip

Hey Google --

Since Fido.ie is throwing errors at me, and since you're probably a more searchable (and more global) database anyway -- the Trovan FDX-B RFID transponder number 956000000659388 is that of "Hog Dempsey", a small female black and white cat, whose owners can be contacted via any address on this page. Cheers!

HOWTO do a DOS-based BIOS upgrade without Windows

Wow, I can't believe I still have to do this in 2007 -- Taiwan really needs to discover FreeDOS! Here's how to run a DOS BIOS update on a PC without using Windows (in my case, it's a Dell laptop).

  gunzip FDSTD.288.gz
  sudo mount -t msdos -o loop `pwd`/FDSTD.288 /tmp/bootiso
  • ensure there's enough space, and copy the app into the disk image:
  df /tmp/bootiso
  sudo cp ME051A10.EXE /tmp/bootiso
  • Then make an ISO, using mkisofs' "-b" option to ensure it's bootable:
  mkdir /tmp/floppycopy
  cp -Rp /tmp/bootiso/* /tmp/floppycopy
  cp -p FDSTD.288 /tmp/floppycopy
  mkisofs -pad -b FDSTD.288 -R -o /tmp/cd.iso /tmp/floppycopy
  • And burn it:
  sudo umount /tmp/bootiso
  sudo cdrecord dev=0,0,0 -pad -v -eject /tmp/cd.iso
  • Now, take the burned CDROM, and boot it.

Answer "N" to all questions when booting, otherwise you're likely to see an error like "Cannot operate in Protected environment" when you run the BIOS update.

Thanks to the Motherboard Flash Boot CD from Linux Mini HOWTO; very helpful. I hope the next time I have to do this, they just issue a bootable ISO image instead...

Update, Sep 2013:

Wayno Guerrini emailed to say: 'I used your recipe to update the bios on a old Dell Dimension 8400. Worked like a champ, with a couple of modifications. I am running 64 bit debian wheezy.

apparently the mkisofs has been replaced by genisoimage. Syntax the same.

instead of cdrecord I had to use wodim: sudo wodim dev=/dev/sg1 -pad -v -eject /tmp/cd.iso

Thank you. Recipe worked very well. I will point people to this article, but add the changes as appropriate to my website.'

Using qpsmtpd for traps.spamassassin.org

Like many anti-spam systems these days, SpamAssassin operates a network of spamtraps. One set of these run off traps.SpamAssassin.org, a server kindly donated by ISP Sonic.net.

Large-scale spam-trapping systems like this are generally run in quite a secretive manner, but we're an open source project -- so it may be interesting if I give some details of our setup. Here's a potted history of how this spamtrap server has run over the years...

The beginning

The architecture was initially very simple. The MX was Postfix, delivering to the "trapper" user, which in turn ran procmail, which directly ran a perl script. This perl script then performed the trap actions, namely: DoS prevention, discarding viruses and malware, discarding backscatter bounces, extraction and cleanup of the incoming mails, then onward reporting, archival, and further distribution.

Given that this was a target for spam -- and we want as much spam as possible here! -- this would predictably run into load issues. Right at the beginning, back in around 2001/2002, I ran this on our shared server, where it pretty quickly caused trouble for delivery of other, more useful mail. It was around this time that Sonic kindly donated the server.

With dedicated hardware, we weren't seeing much trouble -- it was enough to just wait for the few hours for a traffic spike to pass, and the Postfix queue would then clear.

Clearing the queues

After a few months, though, this wasn't enough -- the queue would get consistently clogged, and the backlog became enough to result in the incoming spam being delayed for days before it made it from the MX to the trap archives. For a spamtrap, you want fresh spam, but not necessarily all spam -- so I installed a cron job to simply clear the queue on a nightly basis. (I also had to restart the Postfix server, too, since it'd occasionally get hung and stop accepting connections on port 25, presumably due to load issues.)

IPC::DirQueue

The next level was an inability of the procmail/perl script end to process the mail fast enough for the MTA to keep up with the incoming connections, and follow-on problems, caused by load generated by the perl script impacting the MX's activity. To work around these, I designed a new queueing backend, based around IPC::DirQueue. This allowed a new split architecture; the procmail-run perl script was extremely lightweight, delivering all inbound mail to a dirqueue and exiting quickly, allowing the MX to get back to the next inbound spam message, and the trap processing script was then split into a web of dirqueues, allowing each individual part of the trap backend pipeline to operate independently.

There were several benefits to this:

  1. Since dirqueues operate as a batch-processing model, load spikes become irrelevant; the load incurred is limited by how many dequeuer processes are run.
  2. The time taken in backend tasks becomes irrelevant to the MX throughput, since that is bottlenecked only by the lightweight perl script and its write speed to the "incoming" dirqueue.
  3. By splitting the backend work into multiple queues, outages in the spam-reporting systems or onward forwardings become much less of a problem, since they won't affect inbound spam, archival, outbound delivery to other reporting systems, forwards, etc.

Again, the dirqueues were cleared on a frequent basis, to discard the "spiky" traffic and ensure we were just seeing samples of the freshest spam. The dirqueues use a tmpfs as the backing storage directory, so it never hits the disk at all.

This worked pretty well for several years -- from 80 megabytes of spam per day to the current level, which is around 130MB per day. However, we still occasionally saw problems from load spikes, where high load caused the traps to refuse incoming SMTP connections -- purely because the load of inbound connections is too high for the Postfix MX to accept them all in a timely fashion.

qpsmtpd

Last weekend, I had a go at a project I'd been thinking of trying out for a long time -- switching from Postfix to qpsmtpd. A while back, Matt Sergeant rewrote qpsmtpd to use Danga::Socket, Danga Interactive / Six Apart's insanely scalable event-driven asynchronous socket class, as used in mogilefsd, perlbal and djabberd. This article notes that 'two large antispam companies' high-traffic spam traps have used this effectively since the second quarter of 2005, delivering concurrency as high as 10,000 on some occasions', so it seemed likely to work ;)

Sure enough, results have been great... we now have a pure-perl system handling heavy volumes without breaking a sweat, certainly compared to the previous system. qpsmtpd's plugin system was elegant, allowing me to annotate inbound spam with more details of the SMTP transaction, write plugins to deliver mail to a dirqueue directly instead of to an MTA, and do some conditional code (ie. basic "deliver this RCPT TO to this queue") where needed.

Full details are over on the QpsmtpdSpamtrap page on the taint.org wiki, for the curious.

Don’t worry about Blacklist.ie

Irish techies -- wondering what the next website to put the fear into your parents will be? Here it is: Blacklist.ie. It's been getting a bit of coverage from the Irish technology press recently, it seems, as the new site from IE Internet.

(IE Internet are the Irish internet company that puts a press release every month or so telling us how much of their mail is being filtered as spam, which Silicon Republic et al dutifully report as news, month after month.)

I got a call from my mother last week, telling me that she'd been "blacklisted", and asking how to fix it. Sure enough, when I found out that she'd heard this on blacklist.ie, I went to the site, and her IP address was indeed listed -- as was mine:

The IP address 212.2.169.61 is blacklisted.

RBLs checked:

Spam Haus not listed

Spam Cop not listed

Mailwall RBL not listed

Abuse At not listed

SORBS not listed

NJABL listed: Dynamic/Residential IP range listed by NJABL dynablock - http://njabl.org/dynablock.html

510 SG not listed

Naturally, that IP is listed -- it's entirely ok for a home-user broadband machine to appear in SORBS or NJABL as a dynablock-listed IP. (Dynablock, for those who don't know, is a set of records for addresses which are known to be residential/end-user "dynamic" addresses, rather than mail relays -- so obviously most end-user desktop machines would fall under this category.)

Unfortunately, this distinction isn't mentioned anywhere on the blacklist.ie page... just a large, red, "The IP address is blacklisted" warning.

Worried readers might then reasonably go on to read the site's Frequently Asked Questions list -- which, incredibly, includes a helpful suggestion that you sign up with IE Internet to avoid being listed in future! I'd be curious how that's supposed to help a home user get off the NJABL dynablock list... a little fishy, if you ask me!

Bar Camp Dublin next weekend

Dublin hackers/software people -- don't forget! Bar Camp Dublin is happening on April 21st -- that's 9 days from now.

It should be interesting -- there are 93 attendees signed up already, and I see a good few familiar names I haven't run into in a while! The last Bar Camp was a good opportunity to meet up for some very informal talks, and this looks likely to be the same.

Sign up here, go on...

Screenclick improve their site

Yay! They now have a proper queue! Also member reviews and other improvements -- it seems a lot better.

Can't figure out how to change my password, though ;)

Don’t vote Green in Dublin Central!

I've long held green views, and have always voted green -- I believe climate change, damage to the environment and pollution are extremely serious problems, especially for Ireland. At the same time, I also believe that science and technology has a key place in a better, greener future -- a Viridian, bright green / electric green viewpoint, in other words.

Given this, I was really shocked and appalled to hear (via the lovely C) of an interview on Today FM with Patricia McKenna, a Green Party candidate for my local constituency of Dublin Central -- one I've voted for before, no less! -- in which she revealed that she believes in the thoroughly discredited scaremongering regarding a link between the MMR vaccine and autism, and has taken the appallingly irresponsible position of not allowing her children to be vaccinated.

This blog post discusses the interview, which was broadcast on Today FM's The Last Word show on Tuesday 13 March. Here's an archived podcast of that interview so you can listen to it yourself, and here's a local copy of that WMV file in case that first link expires any time soon.

Here's a transcript of the part of the interview once the issue of vaccination is brought up. Matt Cooper is the host of the show. Keith Redmond is an opposing candidate, for the PDs. The timestamps are in minutes and seconds from the start of the audio file.

  • 8:30: Patricia McKenna: Parents have the right to choose what they opt to do, and in relation to some vaccinations, there are serious question marks hanging over them but that's not what we're talking about here...

  • 8:44: Matt Cooper (clearly annoyed): No its not, but now that it's up there, couldn't it be irresponsible for parents not to vaccinate children against serious issues (sic), if they don't have reputable scientific facts to back up the decision not to vaccinate?

  • 8:54: Patricia McKenna: Many parents in this country have chosen not to vaccinate their children in relation to the MMR because of the links to autism.

  • 9:00: Matt Cooper: Utterly untrue, totally unproven, absolutely bogus and false.

  • 9:02: Patricia McKenna: Hold on a second...

  • 9:03: Matt Cooper: Andrew Wakefield has been utterly and totally discredited in relation to that. Anyone who doesn't give the MMR vaccine to their children because of a fear of autism is almost in danger of endangering their child themselves. We're going to have a rise of measles again in this country because of people not actually giving the vaccine.

  • 9:17: Patricia McKenna: First of all, we're moving away from the issue...

  • 9:22: Matt Cooper: Yeah we are, but it's come up now, let's deal with it...

  • 9:23: Patricia McKenna: It's come up, right. Eh, have you had the measles? I've had the measles, and I've got over them well, I have a strong immune system, my 10 year old son has had the measles...

  • 9:30: Matt Cooper: And you are aware that unhandled the measles can have very serious side effects?

  • 9:33: Patricia McKenna: Look -- the side effects that are linked to the measles are in relation to... there are other things linked to it in relation to the child's well being initially. Now you just look at the number of people when you were young, all of your peers I would say have had the measles as with mine, and I think we have a tendency to over-indulge in vaccinating our children and vaccinating ourselves, because what we need -- our immune systems are getting weaker and weaker by the day, it's a -- I think we need to be very careful about how we actually approach this so that when medicines are necessary, we will not be immune to them...

  • 10:08: Matt Cooper (interrupting): Do you know that children have died of the measles in this country in the last 5 years?

  • Keith Redmond: because of views like that.

  • Patricia McKenna: Well I'm saying is that, as far as I'm concerned...

  • 10:18: Matt Cooper (repeats): Do you know that children have died of the measles in this country in the last 5 years?

  • 10:30: Patricia McKenna: The children that have died of the measles because of other complications (sic), not the measles themselves.

  • Keith Redmond: that have not been vaccinated.

  • Patricia McKenna: Not the measles themselves, but other complications, right? Now if you're saying that parents should -- it's a bit like --

  • Keith Redmond: Matt, can I just come back to...

  • 10:32: Matt Cooper: Sorry, one second Keith. Would you also concede Patricia, that there is absolutely no link between the MMR and autism, that that link was a bogus link put up by Andrew Wakefield who has been completely and utterly discredited and it has done an awful lot of damage, the misrepresentation of his views in relation to the MMR and autism.

  • 10:50: Patricia McKenna: Well in relation to the MMR, I am not satisfied that it's safe, and I am not satisfied with the idea of lumping a whole lot of vaccines -- different vaccinations together en masse, inducing them (sic) to our children -- but having said that, parents should have the right to choose and decide what is best for their children...

  • 11:06: Matt Cooper: But would you concede that Andrew Wakefield, who is the man that pushed that whole agenda, was exposed as a fraud?

  • 11:11: Patricia McKenna: But the jury is still out in relation to...

  • 11:15: Matt Cooper: No, it's not.

  • 11:16: Patricia McKenna: Yeah well I'm sorry but the jury is still out in relation to how safe the MMR is. And I think it's unfair to label all parents who decide for their own children's safety, that they may not want to go down the route of vaccination, that they're being irresponsible, because I wouldn't consider myself irresponsible, I would consider I want what's best for my child.

  • 11:37: Keith Redmond: [again says something]

  • Matt Cooper: Give Keith a chance to come in.

  • 11:41: Keith Redmond: This totally exemplifies the Greens' approach to any kind of science. We have a woman there who knows, in her heart of hearts, that her argument is wrong but refuses to admit it because it relies on science. Now, we have exactly the same issue with flouridation -- we know the science, we know the facts, and we still have this scaremongering every now and again. And the Green Party are totally irresponsible and you're right, they are frightening parents across the country right now and it's absolutely reprehensible.

My god, this insanity has me agreeing with a feckin' PD!

This is luddism, pure and simple. Matt Cooper is spot on the money -- children are dying in Dublin because of this "my child, my rules" selfishness and simple inability to understand the science surrounding vaccination as a public health policy.

This is appalling. To put it bluntly, there is no fucking way I'll be voting Green if this kind of cargo-cult, anti-science superstition is the kind of shite they're espousing these days. ...and if you think I'm feeling strongly about this, you should hear my (zoologist) wife.

But it goes on -- here's a letter to the Irish Independent on this issue from Feb 9 2007, which raises another worrying factor:

... until two days ago, there was a statement on the Green Party website informing voters that there were "serious question marks about the benefit of mass vaccination programs".

Furthermore, the party promised that there would be a "major review" of vaccination if they were returned to office.

Now that these statements have apparently been removed from the Green party website are we to take it that they are no longer Green policy?

This blog posting at Winds and Breezes also notes this. So -- is this official Green policy or not?

Update: In the comments, it was noted that McKenna is pretty much acting alone in this; it, apparently, is not Green Party policy at all. I've updated the title to reflect that it's only one constituency's candidate that needs to be shunned.

Also, Conor O'Neill has a great idea over here:

I was thinking further on this yesterday and I realised what the Greens need to do in order to be taken seriously... They need to become the “Party of Science”. Proper environmentalism is based on rigorous science and strategic thinking. Every policy they define should be backed up with rock-solid science and a detailed long-term financial analysis proving why it is in our best interests to adopt them.

Man, I would love to see that!

Eircom broadband?

I'm moving house. Naturally, first priority after getting the keys is getting the broadband set up ;)

Current broadband: BT DSL. Supposedly "up to" 3Mbps -- however, as with most DSL connections in Ireland, it's rate-adaptive RADSL, which means it trades off connection speed against distance to exchange and line quality.

Sadly, this has really deteriorated since the last time I checked! A "bing" test between the BT-supplied DSL router and the far end looks like this:

BING    10.18.72.1 (10.18.72.1) and 193.95.142.243 (193.95.142.243)
        44 and 108 data bytes (1024 bits)
193.95.142.243: minimum delay difference is zero, can't estimate link throughput
193.95.142.243:  6.966Mbps 0.147ms 0.143555us/bit
193.95.142.243: minimum delay difference is zero, can't estimate link throughput
193.95.142.243: 19.692Mbps 0.052ms 0.050781us/bit
193.95.142.243:  4.697Mbps 0.218ms 0.212891us/bit
193.95.142.243:  3.261Mbps 0.314ms 0.306641us/bit
193.95.142.243:  3.170Mbps 0.323ms 0.315430us/bit
193.95.142.243:  2.479Mbps 0.413ms 0.403320us/bit
193.95.142.243:  2.723Mbps 0.376ms 0.367187us/bit
193.95.142.243:  2.688Mbps 0.381ms 0.372070us/bit
193.95.142.243:  2.716Mbps 0.377ms 0.368164us/bit
193.95.142.243:  2.065Mbps 0.496ms 0.484375us/bit
193.95.142.243:  1.984Mbps 0.516ms 0.503906us/bit
193.95.142.243:  1.270Mbps 0.806ms 0.787109us/bit
193.95.142.243:  1.017Mbps 1.007ms 0.983398us/bit
193.95.142.243:  1.002Mbps 1.022ms 0.998047us/bit
193.95.142.243:  1.008Mbps 1.016ms 0.992187us/bit
193.95.142.243: 983.670Kbps 1.041ms 1.016602us/bit
193.95.142.243: 993.210Kbps 1.031ms 1.006836us/bit
193.95.142.243: 987.464Kbps 1.037ms 1.012695us/bit

--- 10.18.72.1 statistics ---
bytes   out    in   dup  loss   rtt (ms): min       avg       max   std dev
   44   762   758          0%           2.524     3.858    19.083     2.194
  108   762   762          0%           2.639     4.187    58.273     3.079

--- 193.95.142.243 statistics ---
bytes   out    in   dup  loss   rtt (ms): min       avg       max   std dev
   44   762   761          0%          13.061    20.025    78.689     8.226
  108   762   760          0%          14.213    17.954    61.137     4.697

--- estimated link characteristics ---
host                              bandwidth       ms
193.95.142.243                      987.464Kbps      10.536

987Kbps is not 3Mbps any more, not by a long shot. I'd say I now have a lot of new friends adding contention at the ol' DSLAM. I'm paying way too much money for what I'm getting :(

(Update: actually, it may not be contention. Judging by boards.ie traffic, high-contention situations in Ireland are usually faster in the mornings and daytime, then slower from 4pm-9pm as the commuters and kids get home -- however, this slowdown is pretty consistent across all times of day.)

(Update 2: as of right now, late afternoon on Apr 12, it's the worst I've seen it -- packet rates of 600Kbps, and packet loss of 5%-20%.)

On top of this, they have the really annoying daily disconnection policy, which I have hacked around with IPv6 and a VPN, but which still manages to waste my time and cause aggravation, even after frickin' months of pissing about.

For this, and the packaged phone service, I'm paying just under EUR 60 per month, including all call charges and VAT.

At that price, Eircom are offering a pretty good bundle -- free connection, free modem, 2Mbps downstream, 256Kbps upstream, unlimited free local and national calls at all times, 5% off calls to mobiles, 10c/min calls to the UK and US.

Now, a drop to 2Mbps may seem a lot, but bear in mind I'm getting just under 1 right now! I'm pretty sure the new gaff will have similar-quality lines and exchanges. Also, if I get the 2Mbps line, and the attenuation and S/N statistics indicate that it can support 3Mbps, I can always upgrade pretty easily.

The only problem now is getting over my revulsion at buying from Eircom, ugh...

Am I missing something? Does that Eircom bundle not include line rental maybe?

About the title change

The eagle-eyed may have spotted a change that took place a month or two ago in the taint.org configuration -- I ditched the old weblog tagline.

Previously, this weblog was titled "taint.org: Happy Software Prole". This title had been in place since around October 2003, when Daniel Lyons wrote a particularly idiotic article for Forbes entitled "Linux's Hit Men", which I took umbrage to:

Here we go again -- the old 'free software is communism' line [...] The article goes on to bemoan how software companies who write proprietary extensions into GPL-licensed software, have to comply with the terms of the license. It's all a bit of an obvious dig -- but I am looking forward to the follow-up article -- that's the one where the author bemoans how commercial software companies send out their 'enforcers' to extort money from companies who don't bother paying the royalties and runtime license fees their licenses require.

As an free/open-source-software guy, I happily adopted 'happy software prole' as an absurd tagline, in the spirit of detournement. Fast-forward to 3.5 years on, however, and I'd say most people can't even remember the Forbes article, or that Daniel Lyons guy! So that tagline was a bit old and busted, really.

On top of this, I'd noticed something I do in my weblog reading -- I've started renaming blogs in the feed reader from their fancy title, to simply the name of the author.

I've found that when reading blogs, I'm interested in who's writing. When skimming through the feeds of a morning, having to spend 5 seconds to recall that "ByteSurgery.com" is Robin Blandford is just a wee bit superfluous, sorry Robin. ;)

As a favour for readers, I've saved them the trouble, and renamed the blog to be quite explicit about who's writing; the taint.org tagline is now just "taint.org: Justin Mason's Weblog". Let's face it -- it's a bit functional. Hopefully it's helpful, though!

(And finally, it gives me the edge in the ongoing Google war against the non-me "Justin Masons" out there... and against a heart surgeon and a Texan basketball player, I need it. ;)