Skip to content

Justin's Linklog Posts

Links for 2010-02-05

Links for 2010-02-03

Links for 2010-02-02

Links for 2010-01-28

Links for 2010-01-27

Links for 2010-01-26

Links for 2010-01-22

Links for 2010-01-21

Links for 2010-01-20

Links for 2010-01-14

Links for 2010-01-12

Links for 2010-01-11

Links for 2010-01-06

  • Una “UnaRocks” Mullally on the state of Irish blogs : ‘I think that ‘first wave’ of Irish blogging was over a long time ago, probably around the time Blogorrah hit the dirt, but in spite of time and an increase of participants and bigger audience there seems to be no real drive to improve content. People will always read something good – online or offline – and until that something good (hopefully in plural) starts to emerge and while good bloggers log off indefinitely, Irish blogging, for what it’s worth, is in a state of disarray.’
    (tags: irish irishblogs ireland writing blogosphere blogging unarocks)

SAY2K10 Doh

Happy new year! Or maybe not. Doh.

Over a year ago, Lee Maguire noticed that a contributed SpamAssassin rule, __FH_DATE_PAST_20XX__, was naively written — simply to match any date in the year 2010 or later — and would start to false-positive on all mail in 14 months. We made the trivial fix to avoid this (for at least 10 years, by which point the rule would have obsoleted itself through normal means), and I committed it to SVN.

Problem solved, right? Nope. I’d committed to trunk, but in a moment of inattention had forgotten to backport the fix to the stable release branch, 3.2.x, as well. Nobody else noticed the mistake, and several months later, boom:

Bugger.

Annoyingly, the GA had assigned this rule 3.5 points in the 3.2.0 rescoring run. This meant that the effective default threshold had been lowered from 5.0 points to 1.5, which produced a 2% false positive rate during the first 13 hours of the new year.

After that point, the fix was pushed to the sa-update channel, and anyone who runs sa-update regularly (as they should!) was brought back to normal filtering behaviour.

The rule is superfluous anyway, since it overlaps with a better-written "eval" rule, DATE_IN_FUTURE_96_XX. Accordingly, most likely scenario is that it’ll be removed.

Personally, I see a few lessons from this:

  • Obviously, I need to pay more attention. This is easier said than done though, since SpamAssassin has nothing to do with my day job anymore; it’s a spare-time thing nowadays, and that’s a rare resource, unfortunately. :( But still, a chastening result, and I’m very sorry for my part in this screwup.

  • We need more active committers on Apache SpamAssassin. If we’d had more eyes, the fact that I’d forgotten to backport the fix might have been spotted. we’re definitely in a better situation now in this regard than we were 6 months ago, so that’s good.

  • IMO, this is a good demonstration of how too many simple rules are risky; without careful vetting and moderation, it’s easy for a bad one to slip past. Perhaps we need to move more towards a DNSBL/network-rule driven approach, although this has its downsides too. Still thinking about this.

  • It’d be good to fix the GA so that it wouldn’t assign such high points to simple rules like this, without some indication that a human has vetted them and believes them trustworthy.

Daryl posted a good comment on /.:

Clearly we dropped the ball on this one. As far as I know it’s our first big rule screw up in the project’s 10 years. If you’re going to screw up you might as well do it well.

+1 to that!

And to everyone who had to clean up the fallout and spend a holiday recovering lost mails from spam folders… sorry :(

Sup Rocks

For the past 2 years or so, I’ve been using GMail to handle my main mail feed for jmason.org. I’m an absolute convert to its "river of threads"/search-based workflow.

Since starting at Amazon, I’ve had to start dealing with a heavy volume of work mail. Previously jobs have either had low mail volumes, or used Google Apps hosting for their mail, but Amazon’s volumes are high and — obviously — they’re not using Google. ;) For a while, I tried using Thunderbird, but it just didn’t really cut it; I could never keep track of mails I wanted archived, or remember which folder they were in, etc. — the same old problems that GMail solved.

Enter Sup. It’s a console-based *nix email client, with a Mutt-like curses interface, which offers something closely approximating the GMail experience:


Sup is a console-based email client for people with a lot of email. It supports tagging, very fast full-text search, automatic contact-list management, custom code insertion via a hook system, and more. If you’re the type of person who treats email as an extension of your long-term memory, Sup is for you.

Inbox Zero is a daily occurrence for my work email now; I can simply archive pretty much everything, and reliably know the excellent full-text search support will allow me to find it again in an instant when I need it. The new-user guide is well worth a read to get an idea of its featureset and UI.

Setting it up

The process of getting it set up is quite hairy; here are some instructions for Ubuntu, which thoroughly failed to work for me on 9.04. I had a similarly tricky time using some Ruby packages on the Red Hat work desktop, but eventually avoided it by just building vanilla Ruby from source, then using that to install "gem" and from that, "sudo gem install sup". Much easier…

Next step is to get the mail. From some reading, it appears the most reliable way to deal with a MS Exchange 2007 server is to use offlineimap to sync it to a local set of maildirs, then add those as Sup "sources" using sup-add, one by one. This is very well supported in Sup, and works well. Offlineimap is very easy to install on Ubuntu, and can easily be built from source if that’s not an option. My config is pretty much a vanilla copy of the minimal config.

There’s a good Sup hook to run "offlineimap" every poll interval, and rescan synced sources that contain new mail. It works well.

Sup has an interesting approach to mail storage — it doesn’t. Instead, it stores pointers to the messages’ locations in their source storage. This is a great idea, since bugs in Sup therefore cannot lose your mail — just your metadata about your mail. However, it means that if the source changes in a way which moves or removes messages, you need to tell Sup to rescan (using "sup-sync"), but that’s no big deal in practice; in the more usual case, if new mail arrives, it’s automatically rescanned.

I have just under 7000 mail messages in my Sup index, and rescans are speedy and searches super-fast. It’s very nicely done.

Outbound mail is delivered using /usr/sbin/sendmail by default, which should be working on any decent *nix desktop anyway ;)

Recommended Hooks

The Hooks wiki page has a few good hooks that you should install:

  • ~/.sup/hooks/before-poll.rb: the above-mentioned offlineimap poll hook
  • ~/.sup/hooks/mime-decode.rb: ‘uses w3m to translate all HTML attachments that don’t have a text/html alternative.’ Well worth installing.
  • ~/.sup/hooks/before-add-message.rb: essential to filter out cron noise and the like so it doesn’t hit the inbox; unfortunately Sup doesn’t (yet) support GMail’s "filter messages like this" UI.

Bad Points

  • Long URIs: unfortunately, very long URIs are broken by Sup’s renderer, and it doesn’t offer a native way to "activate" URIs and have them displayed in the browser; instead one has to cut and paste them. This is pretty lame. I’ve hacked up a perl script that will reconstruct the full URLs from the broken rendering, when the text is piped to it, but that’s a horrible hack.

  • Index Corruption: I’ve had the misfortune (once, in the month since I started) of corrupting my search index, causing Ruby exception stack traces when I attempted to run "sup-sync" to scan new mail. The only fix appeared to be to restore my index from a "sup-dump" backup. Thankfully all seems fine now, but it was a definite reminder of the product’s beta status.

  • Calendaring: still as painful as it’s ever been with UNIX command line email.

  • HTML: A good-quality, email-oriented, native HTML renderer would be awesome.

  • MIME: Sup again takes the traditional approach from UNIX command line clients of delegating to the mailcap file and its rules; unfortunately my RHEL5 desktop is too crappy to have a good mailcap setup. So I’ve had to write this from scratch to deal with the usual .docs and .xls’s etc., flying about.

  • Inconsistent Key Mapping: Given that it shares so much UI with GMail in other respects, it’s a little annoying that Sup doesn’t have the same key mapping. Not a big deal, as it took only a couple of hours to get the hang of Sup’s, though.

Overall

If you’re happy enough to spend a day or two getting the damn thing installed, and aren’t afraid of a little dalliance with the bleeding edge, I strongly recommend it. It’s definitely the best *NIX mail reader at the moment.

Links for 2009-12-15

Links for 2009-12-09