Justin's Linklog Posts
How do we kick our synchronous addiction? : great post on the hazards of programming in an async framework, and how damn hard it is. good comments thread too (via jzawodny)
(tags: via:jzawodny coding python javascript scalability ruby concurrency erlang async node.js twisted)
PeteSearch: How to split up the US : wow. fascinating results from social-network cluster analysis of Facebook, splitting up the entire USA into 7 clusters
(tags: clusters facebook data statistics maps culture analytics datamining demographics socialnetworking graph dataviz)
Inside View from Ireland: Analysing Electronic Forensics Evidence : fascinating note from Bernie Goldbach: ‘MORE THAN 20 YEARS ago, I worked with message traffic and the work told me the importance of verifying source material.’
(tags: bernie spam anti-spam authentication spoofing security phishing)
Op-Ed Contributor – Microsoft’s Creative Destruction – NYTimes.com : MS internal politics routinely torpedoed cool new projects. surprise, surprise. ‘Engineers in the Windows group falsely claimed [ClearType] made the display go haywire when certain colors were used. The head of Office products said it was fuzzy and gave him headaches. The VP for pocket devices was blunter: he’d support ClearType and use it, but only if I transferred the program and the programmers to his control.’
(tags: cleartype microsoft software bureaucracy politics culture management corporate nytimes)
Dublin City Development Plan 2011-2017: Public Consultation – boards.ie : Dublin City Council is offering the ability to public consultation via a Boards forum. cool
(tags: boards dublin council consultation politics civic)
Trojan torrent sites – why you should never reuse passwords : ‘for a number of years, a person has been creating torrent sites that require a login and password as well as creating forums set up for torrent site usage and then selling these purportedly well-crafted sites and forums to other people innocently looking to start a download site of their very own. However, these sites came with a little extra — security exploits and backdoors throughout the system. This person then waited for the forums and sites to get popular and then used those exploits to get access to the username, email address, and password of every person who had signed up.’
(tags: passwords security torrents warning twitter accounts)What Second Life can teach your datacenter about scaling Web apps : good scaling advice from Linden Labs’ Ian Wilkes (who doesn’t seem to have a blog, sadly)
(tags: linden ian-wilkes scaling datacenters scalability deployment ops services)
Lift View First : explaining Lift’s code-free “display only” templating system. I like it. Very similar concept to WebMake’s “scraped templates”: http://webmake.taint.org/doc/scraping.html , nearly 10 years old now!
(tags: java scala lift templates templating scraping)Daily Links Posts from pinboard.in : hmm. may be one for the TODO list
(tags: pinboard tags blog wordpress rss links)Ross Anderson and Steven J Murdoch rip into Verified By VISA : ‘this is yet another case where security economics trumps security engineering, but in a predatory way that leaves cardholders less secure.’
(tags: verified-by-visa security phishing web banks banking money authentication finance visa 3dsecure papers)
Spamalyser : a custom pastebin for spam messages. cool
(tags: spamalyser spam anti-spam paste pastebin web)
DNS Pre-fetch Exposure on Thunderbird and Webmail : Ugh, very bad idea indeed. A backchannel for spammers/phishers/attackers from the mail reader is something we definitely do not want to provide. This is why we chose to cut URLs at the registrar boundary for URIBL lookups in SpamAssassin
(tags: privacy email dns mozilla thunderbird prefetching urls abuse security spam)Pricewatch – The route of the problem : great article about Dublin Bus’ shortcomings, featuring an interview with Antoin! Very interesting to hear about the upcoming GPS-based accurate bus timetabling service to be visible via their website, that’ll be fantastic
(tags: gps busses dublin-bus dublin mass-transit commute travel)
explanation of the PS3 exploit : good walk-through by Nate Lawson
(tags: ps3 root hypervisor exploits mod-chips consoles reversing)The SAY2K10 bug [LWN.net] : LWN follows up on the FH_DATE_PAST_20XX fiasco. ‘It would appear that what SpamAssassin needs is some dedicated maintenance talent which is not dependent on evening hours put in by developers committed to other projects.’ I wish
(tags: spamassassin say2k10 bugs maintainance lwn commentary)
Whisky Map of Distilleries in Scotland (Malt Madness Distillery Data) : wow. my new shopping list. also: now do one for Ireland ;)
(tags: whisky yum reference maps geodata distilleries single-malts)The Apache Software Foundation Announces Apache SpamAssassin Version 3.3.0 : w00t!
(tags: asf apache spamassassin releases 3.3.0 anti-spam)The New Data Center Rack From … IKEA? : the LACKRack — IKEA’s “LACK” side tables have exactly 19 inches of space, perfect for rackmounted hardware with a little hacking
(tags: lack ikea funny furniture hardware datacenter rackmount)
Waiting for the Apple Tablet, with Joel Johnson : possibly the best article written yet about the iTablet
(tags: itablet apple civilization vans bulldogs off-the-grid products consumerism joel-johnson)
Dublin & Wicklow Walks » Lugnaquilla : this is the plan for tomorrow — looks good!
(tags: lugnaquilla walks wicklow dublin ireland hiking)
AOL sacks pretty much the entire US postmaster team : ‘This is a totally devastating blow to everyone’
(tags: aol anti-spam layoffs postmaster email smtp)One Mutation per 15 Cigarettes Smoked : aka, lung cancer develops after 50 pack-years of smoking. sobering thought
(tags: cancer lung-cancer smoking tobacco risk mutation)The Top Google Search Result for each Unicode character : exactly what it says on the tin
(tags: google search unicode hublog)
How would you serve 100,000 simultaneous comet requests with node.js? : C10K microbenchmarking fun in Javascript (via:simonw)
(tags: web http javascript scaling comet c10k node.js long-poll)French Anti-Piracy Organisation Hadopi Uses Pirated Font In Own Logo : ‘Of course you have to appreciate the irony – the agency in charge of enforcing France’s new anti-piracy legislation using a pirated proprietary font in its very own logo.’ hoho! hoist by their own petard
(tags: hadopi piracy copyright design fail france fonts typography logos ip)YouTube – Mass Effect 2 Launch Trailer : whoa. really looking forward to this, Mass Effect was one of the best games I’ve ever played
(tags: mass-effect games via:colmbrophy xbox scifi video youtube trailers)
Auto-appendectomy in the Antarctic: case report — Rogozov and Bermel 339: b4965 — BMJ : holy shit. This is absolutely amazing, a first-person account of auto-appendectomy (via infovore)
(tags: history science russian medicine antarctica medical amazing appendectomy surgery)Google Translate fail : Google reckons that the English translation of “Amhran na bhFiann” — the Irish national anthem — is “Save The Queen”. ie. part of the *English* national anthem. the perils of machine learning (via Adam Maguire)
(tags: via:AdamMaguire funny fail google translation machine-learning)
Google Agrees to Censor Encyclopedia Dramatica Entry in Australia : nice work, Aussies! this is very stupid indeed (via Waxy)
(tags: censorship google satire australia stupid encyclopedia-dramatica trolling)
Mobile Internet access data retention (not!) : so, it seems the wireless ISPs don’t have sufficient IPv4 space for their customers, and are filtering access to the internet via NAT; unfortunate side effect is that this breaks data retention as defined in the UK. wonder if the same applies here?
(tags: uk data-retention privacy nat isps wireless mobile phones networking internet filtering)I was a Doctor at an online pharmacy : Reddit thread from answers from a “doctor” at a dodgy online prescription-drugs store, supposedly not a spamvertized one though
(tags: medicine pharma spam reddit iama scummy illegal law)
Semi-Realtime Satellite Desktop Backgrounds : Russ Garrett with another set of near-realtime desktop weather imagery (cf. http://taint.org/xplanet/ )
(tags: weather desktop image satellite realtime backgrounds)Upload and store your files in the cloud with Google Docs : no sync or automated backup yet, so more like sendspace than dropbox, limited usefulness
(tags: google backup online-backup sync storage)the MagicJack : a GSM femtocell for the home — USB-driven, the size of a pack of cards, $40. this won’t last long
(tags: femtocells gsm phone home voip telephone)Zamberlan Snow Chains : chains — for your shoes. basically crampon overshoes, to deal with ice and snow, EUR45
(tags: chains ice snow shoes boots footwear weather crampons)
Irish Weather Network : live weather-station data from across Ireland, overlaid on a Google Map, using amateur and professional stations. fascinating
(tags: weather data mapping ireland live)Malicious App In Android Market : phisher creates a banking app for Android phones which relays the authorization details to another site, possible because of insufficient app vetting (via Mulley)
(tags: apps iphone android smartphones phones mobile phishing security banking fraud)
fixing a frozen condensate trap on a condensing boiler : another day, another broken boiler
(tags: boilers home maintainance diy fix cold frozen)
Two Gentlemen of Lebowski : nicely done; Lebowski a la Shakespeare (via Waxy)
(tags: via:waxy shakespeare writing humor lebowski movies parody funny)
Una “UnaRocks” Mullally on the state of Irish blogs : ‘I think that ‘first wave’ of Irish blogging was over a long time ago, probably around the time Blogorrah hit the dirt, but in spite of time and an increase of participants and bigger audience there seems to be no real drive to improve content. People will always read something good – online or offline – and until that something good (hopefully in plural) starts to emerge and while good bloggers log off indefinitely, Irish blogging, for what it’s worth, is in a state of disarray.’
(tags: irish irishblogs ireland writing blogosphere blogging unarocks)
Happy new year! Or maybe not. Doh.
Over a year ago, Lee Maguire noticed that a contributed SpamAssassin rule, __FH_DATE_PAST_20XX__, was naively written — simply to match any date in the year 2010 or later — and would start to false-positive on all mail in 14 months. We made the trivial fix to avoid this (for at least 10 years, by which point the rule would have obsoleted itself through normal means), and I committed it to SVN.
Problem solved, right? Nope. I’d committed to trunk, but in a moment of inattention had forgotten to backport the fix to the stable release branch, 3.2.x, as well. Nobody else noticed the mistake, and several months later, boom:
Bugger.
Annoyingly, the GA had assigned this rule 3.5 points in the 3.2.0 rescoring run. This meant that the effective default threshold had been lowered from 5.0 points to 1.5, which produced a 2% false positive rate during the first 13 hours of the new year.
After that point, the fix was pushed to the sa-update channel, and anyone who runs sa-update regularly (as they should!) was brought back to normal filtering behaviour.
The rule is superfluous anyway, since it overlaps with a better-written "eval" rule, DATE_IN_FUTURE_96_XX. Accordingly, most likely scenario is that it’ll be removed.
Personally, I see a few lessons from this:
-
Obviously, I need to pay more attention. This is easier said than done though, since SpamAssassin has nothing to do with my day job anymore; it’s a spare-time thing nowadays, and that’s a rare resource, unfortunately. :( But still, a chastening result, and I’m very sorry for my part in this screwup.
-
We need more active committers on Apache SpamAssassin. If we’d had more eyes, the fact that I’d forgotten to backport the fix might have been spotted. we’re definitely in a better situation now in this regard than we were 6 months ago, so that’s good.
-
IMO, this is a good demonstration of how too many simple rules are risky; without careful vetting and moderation, it’s easy for a bad one to slip past. Perhaps we need to move more towards a DNSBL/network-rule driven approach, although this has its downsides too. Still thinking about this.
-
It’d be good to fix the GA so that it wouldn’t assign such high points to simple rules like this, without some indication that a human has vetted them and believes them trustworthy.
Daryl posted a good comment on /.:
Clearly we dropped the ball on this one. As far as I know it’s our first big rule screw up in the project’s 10 years. If you’re going to screw up you might as well do it well.
+1 to that!
And to everyone who had to clean up the fallout and spend a holiday recovering lost mails from spam folders… sorry :(
Atheist Ireland Publishes 25 Blasphemous Quotes : in protest against the Fianna Fail religious right’s ludicrous new blasphemy law
(tags: blasphemy ireland law legal censorship democracy atheism religion quotes)
Body By Victoria – Secure Computing: Sec-C : Dr. Neal Krawetz brings the science on detecting Photoshop retouching
(tags: pixels images forensics jpeg photoshop fake analysis detection)jwz – How to use Facebook with a feed reader : “Justin Mason likes this”
(tags: jwz facebook feeds rss atom howto syndication)
Parselets.com : ‘free, open, developer-generated APIs for a wide variety of websites. Parselets.com is a place to create and share them. [..] Check out [..] ways to use parselets from our web service, Ruby, Python, C/C++, or the *nix command-line.’
(tags: parselets scraping html web regexps sitescooper json)
RegExr: Online Regular Expression Testing Tool : a very nice interactive editor in Flash, supporting lots of the usual perlish stuff. via Joe
(tags: via:jdrumgoole regexps regular-expressions spamassassin rule-dev flash regex flex utilities)
For the past 2 years or so, I’ve been using GMail to handle my main mail feed for jmason.org. I’m an absolute convert to its "river of threads"/search-based workflow.
Since starting at Amazon, I’ve had to start dealing with a heavy volume of work mail. Previously jobs have either had low mail volumes, or used Google Apps hosting for their mail, but Amazon’s volumes are high and — obviously — they’re not using Google. ;) For a while, I tried using Thunderbird, but it just didn’t really cut it; I could never keep track of mails I wanted archived, or remember which folder they were in, etc. — the same old problems that GMail solved.
Enter Sup. It’s a console-based *nix email client, with a Mutt-like curses interface, which offers something closely approximating the GMail experience:
Sup is a console-based email client for people with a lot of email. It supports tagging, very fast full-text search, automatic contact-list management, custom code insertion via a hook system, and more. If you’re the type of person who treats email as an extension of your long-term memory, Sup is for you.
Inbox Zero is a daily occurrence for my work email now; I can simply archive pretty much everything, and reliably know the excellent full-text search support will allow me to find it again in an instant when I need it. The new-user guide is well worth a read to get an idea of its featureset and UI.
Setting it up
The process of getting it set up is quite hairy; here are some instructions for Ubuntu, which thoroughly failed to work for me on 9.04. I had a similarly tricky time using some Ruby packages on the Red Hat work desktop, but eventually avoided it by just building vanilla Ruby from source, then using that to install "gem" and from that, "sudo gem install sup". Much easier…
Next step is to get the mail. From some reading, it appears the most reliable way to deal with a MS Exchange 2007 server is to use offlineimap to sync it to a local set of maildirs, then add those as Sup "sources" using sup-add, one by one. This is very well supported in Sup, and works well. Offlineimap is very easy to install on Ubuntu, and can easily be built from source if that’s not an option. My config is pretty much a vanilla copy of the minimal config.
There’s a good Sup hook to run "offlineimap" every poll interval, and rescan synced sources that contain new mail. It works well.
Sup has an interesting approach to mail storage — it doesn’t. Instead, it stores pointers to the messages’ locations in their source storage. This is a great idea, since bugs in Sup therefore cannot lose your mail — just your metadata about your mail. However, it means that if the source changes in a way which moves or removes messages, you need to tell Sup to rescan (using "sup-sync"), but that’s no big deal in practice; in the more usual case, if new mail arrives, it’s automatically rescanned.
I have just under 7000 mail messages in my Sup index, and rescans are speedy and searches super-fast. It’s very nicely done.
Outbound mail is delivered using /usr/sbin/sendmail by default, which should be working on any decent *nix desktop anyway ;)
Recommended Hooks
The Hooks wiki page has a few good hooks that you should install:
- ~/.sup/hooks/before-poll.rb: the above-mentioned offlineimap poll hook
- ~/.sup/hooks/mime-decode.rb: ‘uses w3m to translate all HTML attachments that don’t have a text/html alternative.’ Well worth installing.
- ~/.sup/hooks/before-add-message.rb: essential to filter out cron noise and the like so it doesn’t hit the inbox; unfortunately Sup doesn’t (yet) support GMail’s "filter messages like this" UI.
Bad Points
-
Long URIs: unfortunately, very long URIs are broken by Sup’s renderer, and it doesn’t offer a native way to "activate" URIs and have them displayed in the browser; instead one has to cut and paste them. This is pretty lame. I’ve hacked up a perl script that will reconstruct the full URLs from the broken rendering, when the text is piped to it, but that’s a horrible hack.
-
Index Corruption: I’ve had the misfortune (once, in the month since I started) of corrupting my search index, causing Ruby exception stack traces when I attempted to run "sup-sync" to scan new mail. The only fix appeared to be to restore my index from a "sup-dump" backup. Thankfully all seems fine now, but it was a definite reminder of the product’s beta status.
-
Calendaring: still as painful as it’s ever been with UNIX command line email.
-
HTML: A good-quality, email-oriented, native HTML renderer would be awesome.
-
MIME: Sup again takes the traditional approach from UNIX command line clients of delegating to the mailcap file and its rules; unfortunately my RHEL5 desktop is too crappy to have a good mailcap setup. So I’ve had to write this from scratch to deal with the usual .docs and .xls’s etc., flying about.
-
Inconsistent Key Mapping: Given that it shares so much UI with GMail in other respects, it’s a little annoying that Sup doesn’t have the same key mapping. Not a big deal, as it took only a couple of hours to get the hang of Sup’s, though.
Overall
If you’re happy enough to spend a day or two getting the damn thing installed, and aren’t afraid of a little dalliance with the bleeding edge, I strongly recommend it. It’s definitely the best *NIX mail reader at the moment.
Deployment is just a part of dev/ops cooperation, not the whole thing : metrics, monitoring, instrumentation, fault tolerance, load mitigation called out as other factors by Allspaw
(tags: ops deployment operations engineering metrics devops monitoring fault-tolerance load)Build Web Apps for iPhone using Dashcode : hmm, not too tricky
(tags: iphone html css js dev coding dashcode)
Fill and span DVD archives with Discspan : filed under “about time I did another DVD backup”
(tags: backup dvd spanning via:donncha linux storage offline recovery)
mnot’s Weblog: HTTP + Politics = ? : how the Great Firewall of Oz breaks so much more than the web browser
(tags: http web politics australia internet proxies filtering)Play framework : ‘a Java framework made by Web developers. Discover a clean alternative to bloated enterprise Java stacks. Play focuses on developer productivity and targets RESTful architectures.’
(tags: java rails webdev mvc webapps play playframework)Turing-incomplete Lua? : discussion thread on the cons of using Turing-complete general-purpose programming languages in places where it’s not necessary, such as configuration files
(tags: configuration turing-complete safety coding software lua)
Why it’s time to lighten up about “weird” Japan : ‘Being majime (too serious) is not cool in Japan; likewise it is important for voyeurs of Japanese culture to recognize that most everything pop-culture-y that is exported to the West comes at us with a wink. If you’re all up in arms about it, then maybe the joke is on you.’
(tags: japan majime seriousness fun weird news journalism)
GameFAQs: Assassin’s Creed II (X360) Puzzle/Codex FAQ : linked by Nelson; will return to this once i’ve gotten into the game
(tags: assassins-creed games via:nelson toread xbox)
How to build a Google Chrome extension in 15 minutes : wow. that _is_ easy; wonder if it’d be nearly as easy to write an extension as it is nowadays to write userscripts in Firefox
(tags: user-scripts google chrome firefox extensions coding html css)Useful Google Chrome Extensions : from Nelson. looks like it’s becoming a viable browser, maybe I’ll give it a go
(tags: chrome google extensions web nelson-minar)The Beer with the Green Label : Sierra Nevada tries to reclaim its cred – CHOW : ‘Ask a craft brewer which other brewers he most admires, and he’s likely to mention Sierra Nevada. The Chico, California, brewery is considered to be sacred ground, and its beers expertly crafted. “When you die as a brewer, you go to Chico,” says Matthew Brynildson, brewmaster of Firestone Walker in Southern California.’ paging Ben
(tags: sierra-nevada beer ipa yum via:torrez)
Code: Flickr Developer Blog » Flipping Out : Flickr don’t use branches. mental
(tags: branching integration branch version-control coding flickr sysadmin wtf deployment)
best Comic Sans story ever : MeFi commenter ftw
(tags: comic-sans mefi funny morbid comments fonts via:fp)