Skip to content

Category: Uncategorized

Links for 2009-03-20

4chan Memes, circa 1889

In the comments to this unremarkable story about 4chan's Boxxy fad, I came across this gem from CSClark:

I don't know why I didn't think to see if this sort of phenomenon was covered in Extraordinary Popular Delusions... Of course, it is.

Walk where we will, we cannot help hearing from every side a phrase repeated with delight, and received with laughter, by men with hard hands and dirty faces, by saucy butcher lads and errand-boys, by loose women, by hackney coachmen, cabriolet-drivers, and idle fellows who loiter at the corners of streets. Not one utters this phrase without producing a laugh from all within hearing. It seems applicable to every circumstance, and is the universal answer to every question; in short, it is the favourite slang phrase of the day, a phrase that, while its brief season of popularity lasts, throws a dash of fun and frolicsomeness over the existence of squalid poverty and ill-requited labour, and gives them reason to laugh as well as their more fortunate fellows in a higher stage of society.

Wherein we also learn that the FAIL of the day was Quoz:

When a disputant was desirous of throwing a doubt upon the veracity of his opponent, and getting summarily rid of an argument which he could not overturn, he uttered the word Quoz, with a contemptuous curl of his lip, and an impatient shrug of his shoulders. The universal monosyllable conveyed all his meaning, and not only told his opponent that he lied, but that he erred egregiously if he thought that any one was such a nincompoop as to believe him.

I'm also sure I've read of a fad - Greek, Roman, 18th century, something like that - where a group of young (aristocratic?) men who would suddenly grab a common woman and proclaim her Helen and make her their queen and swear to die for her and so on. And the tearing down of such idols could be seen, if you were wont to be pretentious like me, as part of Frazer's Golden Bough's Sacrificial King idea, although I'm not sure script kiddies care if the crops grow. (One other problem with that is that Frazer was romancing; but so are the more literal memecists, so yah!)

Since then however, it appears that "quoz" has entirely flipped meaning, according to UrbanDictionary:

slang for quality, a cockney term for something good. usually accompanied with a hand action of slaping ur index finger against the stationary thumb and middle finger. 'thats quoz man! propa quoz.' finger slappy hand thingy

Links for 2009-03-19

Links for 2009-03-18

“Fundamentally flawed”

Killer presentation -- "RPC And Its Offspring: Convenient, Yet Fundamentally Flawed" from Steve Vinoski, who presented it at QCon London last week. It's full of reminders of the mid-90's, hacking away on CORBA technology -- Steve was one of the key players at Iona while I was there.

But never mind where we've been; let me hit you with the summary slide to show where Steve's going:

  • RPC is a convenient but flawed accident of history

    • 1980s research focused on monoliths of programming languages, distributed applications, and operating systems
    • each computer vendor of the time owned their own full stack, from language to hardware and network, and you used what they gave you
    • imperative languages won back then simply because of their superior performance at that time
  • It’s almost 2010, folks — we can do WAY better

    • pull your head from the imperative language sand and learn functional programming
    • the world is many-core and highly distributed, and the old ways aren’t going to keep working much longer

Awesome ;)

Links for 2009-03-16

A plug for Kiva.org

I just made a loan using Kiva.org to a weaver in Nepal and a group of Vietnamese broom makers.

You can go to Kiva's website and lend to someone in the developing world who needs a loan for their business. Each loan has a picture of the entrepreneur, a description of their business and how they plan to use the loan so you know exactly how your money is being spent -- and you get updates letting you know how the entrepreneur is going.

The best part is, when the entrepreneur pays back their loan you get your money back - and Kiva's loans are managed by microfinance institutions on the ground who have a lot of experience doing this, so you can trust that your money is being handled responsibly.

Kiva's microfinancing seems like a nice way of helping the developing world, and I've heard good things about it. Here's hoping it works out well for my two recipients!

Links for 2009-03-13

Links for 2009-03-12

Links for 2009-03-11

Google Reader productivity hack: change your Home

So, if you use Google Reader, read your news with the "All items" page, and are subscribed to hundreds of feeds, it can be pretty overwhelming. I've found a better way to deal with this.

Select a 'most important' subset of feeds. For each of those, click through to the feed details page, hit the "Feed Settings..." menu, and select "Change folders...". Put the feed into a new "top" folder (creating it if necessary).

Now go to "Settings" -> "Preferences" and check out the "Start page" preference. By default, it's set to "Home"; change it to "Folders and Tags: top".

Hey presto -- now, when you load Google Reader, it'll come up with your "top" items. You can get through those quickly enough, and get on to other more important tasks. When you're bored and need something to read, though, just hit "Navigation" -> "All items" (or even just type 'ga'), and every other feed is now there for your delectation. Sweet!

Links for 2009-03-10

Links for 2009-03-05

Ready for the blackout?

Reminder -- Ireland's Blackout Week starts tomorrow:

Take part in Blackout Week

  1. To demonstrate your feelings about [IRMA's censorship demands], you can make your avatar black on any websites you have a presence on.
  2. This is inspired by Creative Freedom New Zealand's blackout campaign.
  3. From Black Thursday on the 5th of March, for one week, set your picture on sites like Facebook, Bebo, Twitter, MSN, etc black to raise awareness for Blackout Ireland.
  4. On that Thursday we encourage you to express yourself publicly about this issue, whether by blog posts, letters to newspapers or any form of communication you can think of.

Links for 2009-03-03

  • Locale : 'Locale allows you to create Situations, which specify Conditions under which your Settings should change; e.g. your "At Work" situation might notice when your location condition is "1600 Amphitheatre Parkway," and trigger your ringer to vibrate.' in essence, rule-based AI for your phone. want it! and the phone too while I'm at it!
    (tags: want android phone apps google location mapping)

Using VC to track system config changes by mail

Here's a great idea from a thread on the SpamAssassin users list, from Roger Marquis:

Karsten Bräckelmann [questioning the utility of a mechanism to dump the entire contents of the SpamAssassin configuration database]:

'postconf' without the handy -n switch dumps about 500 lines. The equivalent dump for SA including the rules is about 6000 lines. And that's a plain dump, without following and unfolding meta rules or anything.

Whether 6K or 60K would not necessarily make a difference to how I would like to use an SA 'postconf -n' equivalent. That use is change management. The intent is not in the full report itself but in its deltas.

As full time mail/systems admins we get invaluable data from tripwire/integrit, 'postconf -n', dconf, 'rpm -qa', 'dpkg -l *', 'pkg_info -a', ... whose output is checked in to RCS daily. This provides a nice configuration snapshot and historical record but its real usefulness comes from rcsdiff piped into a daily report. These are (usually) relatively concise, and IMO, absolutely essential for monitoring production Unix/Linux systems.

I like it! I think I'd check it into a git repo, though. The concept of applying VC smarts to traditional sysadmin tasks is definitely a meme on the way up -- see also etckeeper.

Links for 2009-03-02

Links for 2009-02-27

Blackout Ireland – a response to IRMA’s censorship demands

As Adrian noted last week, IRMA are demanding that Eircom block the Pirate Bay -- first on a list of websites they don't like -- on pain of being sued. On top of that, they intend for the other Irish ISPs to follow suit -- here's a key line from the letter they sent to Blacknight MD Michele Neylon:

in the event of a positive response to this letter it is proposed to make practical arrangements with Blacknight of a like nature to those made with eircom.

If that comes to pass, this will be an appalling situation for Irish internet users, and we need to act to ensure it doesn't happen. Digital Rights Ireland:

The net effect of this scheme, if it is allowed to go into effect, will be to impose an internet death penalty on two groups. On users, who will be cut off on the allegation of a private body, with no court involvement, and on websites, which could be blocked to Irish users based on a court hearing where only one side is heard.

Pace Mulley:

So first they’ll start with the Pirate Bay. Then comes Mininova, IsoHunt, then comes YouTube (they have dodgy stuff, right?), how long before we have Boards.ie because someone quoted a newspaper article or a section of a book?

Digital Rights Ireland have posted an excellent document detailing the following plan of action for Irish internet users concerned about this:

  • Contact your ISP and let them know that this is a key issue for you, as their customer.

  • Join up with your fellow netizens. Subscribe to the Blackout Ireland blog. Follow the #blackoutirl hashtag on Twitter. Join the Blackout Ireland Facebook group. It looks likely that there'll be a week-long blackout campaign starting next Thursday, March 5th.

  • Contact politicians. This is likely to cause irreparable damage to the Irish internet, so our pols should be very worried. See the DRI post for details on getting in touch with Minister for Communications Eamonn Ryan.

New Zealand is running their own blackout campaign right now, so that may help our planning.

International readers -- make no mistake, you're next. IRMA in this case is acting as the local delegate of IFPI, which stated in 2007 that this was one of the 3 technical options for ISPs to control piracy:

Here's some other interesting coverage:

Fantastic interview with BitBuzz CEO Alex French:

If ISPs, including Eircom, agree not to oppose blocking access to The Pirate Bay and other similar websites, is this not an agreement to web censorship? “I don’t think there is any other way to interpret it,” said French.

“They are essentially agreeing to censor certain websites at the behest of the recording industry, without these websites ever having necessarily shown to be illegal in the Republic of Ireland. I would have a huge concern over what other websites may be blocked and what other industries will pile in now that the precedent has been set.”

Some sample letters:

And further discussion -- here's a massive boards.ie discussion thread, now closed in favour of this newer thread.

Update: here's the letter I sent to the Minister, if you're curious or need inspiration.

Links for 2009-02-26

Links for 2009-02-25

Ubuntu to bundle Eucalyptus

Introducing Karmic Koala, Ubuntu 9.10:

What if you want to build an EC2-style cloud of your own? Of all the trees in the wood, a Koala's favourite leaf is Eucalyptus. The Eucalyptus project, from UCSB, enables you to create an EC2-style cloud in your own data center, on your own hardware. It's no coincidence that Eucalyptus has just been uploaded to universe and will be part of Jaunty - during the Karmic cycle we expect to make those clouds dance, with dynamically growing and shrinking resource allocations depending on your needs.

A savvy Koala knows that the best way to conserve energy is to go to sleep, and these days even servers can suspend and resume, so imagine if we could make it possible to build a cloud computing facility that drops its energy use virtually to zero by napping in the midday heat, and waking up when there's work to be done. No need to drink at the energy fountain when there's nothing going on. If we get all of this right, our Koala will help take the edge off the bear market.

AWESOME -- exactly where the Linux server needs to go. Eucalyptus is the future of server farms. Really looking forward to this...

Links for 2009-02-24

Blimey, I won

Somehow or other, I seem to have won the 2009 Irish Blog Award for Best Technology Blog/Blogger! To be honest, for the last year I haven't been spending as much time on the blog as before, due mainly to a rather compelling distraction, so I'm doubly grateful for winning.

Unfortunately, I was out of the country, at Nishad and Janet's wedding, so missed my chance to get up on stage and thank my fellow bloggers in person -- but I asked John to do so instead. Seems he in turn got stage fright and delegated to his missus, who picked up the trophy. Thanks Fiona! That's probably just as well, since I'm pretty incoherent in that kind of situation myself.

Cheers to my fellow nominees, Eoghan, Robin, Michele and Pat. One of you guys should totally have won ;)

And last of all -- cheers to BitBuzz for sponsoring the category, and Mulley for the whole bash. I definitely have to turn up next year!

Now I need to put more time in this year to really earn that award...

Links for 2009-02-16

Plenty of money for Dublin’s bikes

So it seems that JC Decaux have been complaining about the costs of running the Velib scheme in Paris:

Since the scheme's launch, nearly all the original bicycles have been replaced at a cost of 400 euros each.

Of course, this won't be a problem in Dublin. Going by Newstalk's estimates of how much the advertising space provided to JC Decaux for free, in exchange for the (as yet nonexistent) 450 bikes would have cost, each bike comes at a public cost of 111,000 Euros. That should cover a lot of "velib extreme".

(OK, that may be overestimating it. The Irish Times puts a more sober figure of EUR 1m per year; that works out as EUR 2,000 per bike per year. Still should cover a few broken bikes.)

A quick reminder:

ParisDublin
20,000 bikes450 promised
~1,600 billboards~120 installed
~12.5 bikes per billboard~3.8 bikes per billboard
10km range (from 15e to 19e arondissement)4km range (from the Mater Hospital to the Grand Canal)

And, of course, there's no sign of the bikes here yet... assuming they ever arrive. Heck of a job, Dublin City Council.

BTW, here's the rate card for advertising on the "Metropole" ad platforms, if you're curious, via the charmingly-titled Go Ask Me Bollix.

Links for 2009-02-13

Fixing the Gmail Tasks window bug

Hey Gmail users! If you're using Tasks, there's a slightly annoying bug in Gmail right now -- you may see the "Use this link to open Tasks" tip window appear every time you access the inbox page.

Several other people have reported it, and apparently the Google guys are 'working to resolve it' at the moment. In the meantime, though, here's a way to work around the issue without losing Tasks (you will, unfortunately, lose the offline-gmail functionality, though). Simply disable Offline Gmail (Settings -> Offline -> "Disable Offline Gmail for this computer"), and the bug no longer manifests itself.

You can allow Gmail to keep the stored mail on your computer if you like, which will be handy for when the bug is fixed and Offline can be re-enabled -- hopefully sooner rather than later.

Continuous deployment

This is awesome, if a little insane. Continuous Deployment at IMVU: Doing the impossible fifty times a day:

Continuous Deployment means running all your tests, all the time. That means tests must be reliable. We’ve made a science out of debugging and fixing intermittently failing tests. When I say reliable, I don’t mean “they can fail once in a thousand test runs.” I mean “they must not fail more often than once in a million test runs.” We have around 15k test cases, and they’re run around 70 times a day. That’s a million test cases a day. Even with a literally one in a million chance of an intermittent failure per test case we would still expect to see an intermittent test failure every day. It may be hard to imagine writing rock solid one-in-a-million-or-better tests that drive Internet Explorer to click ajax frontend buttons executing backend apache, php, memcache, mysql, java and solr. I am writing this blog post to tell you that not only is it possible, it’s just one part of my day job.

OK, so far, so sensible. But this is where it gets really hairy:

Back to the deploy process, nine minutes have elapsed and a commit has been greenlit for the website. The programmer runs the imvu_push script. The code is rsync’d out to the hundreds of machines in our cluster. Load average, cpu usage, php errors and dies and more are sampled by the push script, as a basis line. A symlink is switched on a small subset of the machines throwing the code live to its first few customers. A minute later the push script again samples data across the cluster and if there has been a statistically significant regression then the revision is automatically rolled back. If not, then it gets pushed to 100% of the cluster and monitored in the same way for another five minutes. The code is now live and fully pushed. This whole process is simple enough that it’s implemented by a handfull of shell scripts.

Mental. So what we've got here is:

  • phased rollout: automated gradual publishing of a new version to small subsets of the grid.

  • stats-driven: rollout/rollback is controlled by statistical analysis of error rates, again on an automated basis.

Worth noting some stuff from the comments. MySQL schema changes break this system:

Schema changes are done out of band. Just deploying them can be a huge pain. Doing an expensive alter on the master requires one-by-one applying it to our dozen read slaves (pulling them in and out of production traffic as you go), then applying it to the master’s standby and failing over. It’s a two day affair, not something you roll back from lightly. In the end we have relatively standard practices for schemas (a pseudo DBA who reviews all schema changes extensively) and sometimes that’s a bottleneck to agility. If I started this process today, I’d probably invest some time in testing the limits of distributed key value stores which in theory don’t have any expensive manual processes.

They use an interesting two-phased approach to publishing of the deploy file tree:

We have a fixed queue of 5 copies of the website on each frontend. We rsync with the “next” one and then when every frontend is rsync’d we go back through them all and flip a symlink over.

All in all, this is very intriguing stuff, and way ahead of most sites. Cool!

(thanks to Chris for the link)

Links for 2009-02-11

Config management as cookery

interesting to see Chef, a configuration management framework using cooking as a metaphor.

Back in the early '90s in Iona, I wrote a user/group synchronization tool called "greenpages" which used a cooking metaphor; "spice" (data) was added to "raw" (template) files to produce "cooked" output. Great minds, eh!

Links for 2009-02-09

IR book recommendation

Thanks to Pierce for pointing me at this review of an interesting-sounding book called Introduction to Information Retrieval. The book sounds quite useful, but I wanted to pick out a particularly noteworthy quote, on compression:

One benefit of compression is immediately clear. We need less disk space.

There are two more subtle benefits of compression. The first is increased use of caching ... With compression, we can fit a lot more information into main memory. [For example,] instead of having to expend a disk seek when processing a query ... we instead access its postings list in memory and decompress it ... Increased speed owing to caching -- rather than decreased space requirements -- is often the prime motivator for compression.

The second more subtle advantage of compression is faster transfer data from disk to memory ... We can reduce input/output (IO) time by loading a much smaller compressed posting list, even when you add on the cost of decompression. So, in most cases, the retrieval system runs faster on compressed postings lists than on uncompressed postings lists.

This is something I've been thinking about recently -- we're getting to the stage where CPU speed has so far outstripped disk I/O speed and network bandwidth, that pervasive compression may be worthwhile. It's simply worth keeping data compressed for longer, since CPU is cheap. There's certainly little point in not compressing data travelling over the internet, anyway.

On other topics, it looks equally insightful; the quoted paragraphs on Naive Bayes and feature selection algorithms are both things I learned myself, "in the field", so to speak, working on classifiers -- I really should have read this book years ago I think ;)

The entire book is online here, in PDF and HTML. One to read in that copious free time...

Good reasons to host inelastically on EC2

Recently, there's been a bit of discussion online about whether or not it makes sense for companies to host server infrastructure at Amazon EC2, or on traditional colo infrastructure. Generally, these discussions have focussed on one main selling point of EC2: its elasticity, the ability to horizontally scale the number of server instances at a moment's notice.

If you're in a position to gain from elasticity, that's great. But it is still worth noting that even if you aren't in that position, there's another good reason to host at an EC2-like cloud; if you want to deploy another copy of the app, either from a different version-control branch (dev vs staging vs production deployments), or to run separate apps with customizations for different customers. These aren't scaling an existing app up, they're creating new copies of the app, and EC2 works nicely to do this.

If you can deploy a set of servers with one click from a source code branch, this is entirely viable and quite useful.

Another reason: EC2-to-S3 traffic is extremely fast and cheap compared to external-to-S3. So if you're hosting your data on S3, EC2 is a great way to crunch on it efficiently. Update: Walter observed this too on the backend for his Twitter Mosaic service.

Ice Cycling

I seem to have invented a new extreme sport on the way into work: Ice Cycling. The roads were like an ice-skating rink. Scary stuff :(

Here's some advice for anyone in the same boat:

  • use a high gear: avoid using low gear if possible, even when starting off. Low revs mean you're more likely to get traction.

  • try to avoid turns: keep the bike as upright as possible.

  • try to avoid braking: braking is very likely to start a skid in icy conditions.

  • use busy roads: where the ice has been melted by car traffic. In icy conditions, you should ride where the cars have been, since they'll have melted the ice.

  • ride away from the gutters: they're more likely to be iced over than the centre of a lane. Again, ride where the cars have been.

  • avoid road markings: it seems these were much icier than the other parts of the road; possibly because their high albedo meant the ice on them hadn't been melted by the sun yet. So look out for that.

Here's a good thread on cyclechat.co.uk, and don't miss icebike.org: 'Whether commuting to work, or just out for a romp in the woods, you arrive feeling very alive, refreshed, and surrounded with the aura of a cycling god. You will be looked upon with the smile of respect by friends and co-workers. - - - Or was that the sneer of derision...no matter, ICEBIKING is a blast!' o-kay.

Their recommendations are pretty sane, though. ;)

Links for 2009-02-05

Links for 2009-02-03

Links for 2009-01-30

UK’s proposed anti-filesharing quango

Wow. The IFPI's strategy of "divide and conquer" by taking individual ISPs to court to force them to institute a 3 strikes policy, as successfully deployed against Eircom this week, is possibly marginally better than this insane obsolete-business-model handout proposed by the UK government in their Digital Britain report:

Lord Carter of Barnes, the Communications Minister, will propose the creation of a quango, paid for by a charge that could amount to £20 a year per broadband connection.

The agency would act as a broker between music and film companies and internet service providers (ISPs). It would provide data about serial copyright-breakers to music and film companies if they obtained a court order. It would be paid for by a levy on ISPs, who inevitably would pass the cost on to consumers.

Jeremy Hunt, the Shadow Culture Secretary, said: “A new quango and additional taxes seem a bizarre way to stimulate investment in the digital economy. We have a communications regulator; why, when times are tough, should business have to fund another one?”

Well said. An incredibly bad idea.

By the way, I've noticed some misconceptions about the Eircom settlement. Telcos selling Eircom bitstream DSL (ie. the 2MB or 3MB DSL packages) are immune right now.

They are, however, next on the music industry's hit-list, reportedly...

Links for 2009-01-29

Eircom forced to implement “3 strikes and you’re out” for filesharers

Eircom has been forced to implement "3 strikes and you're out", according to Adrian Weckler:

If the music labels come to it with IP addresses that they have identified as illegal file-sharers, Eircom will, in its own words:

"1) inform its broadband subscribers that the subscribers IP address has been detected infringing copyright and

"2) warn the subscriber that unless the infringement ceases the subscriber will be disconnected and

"3) in default of compliance by the subscriber with the warning it will disconnect the subscriber."

My thoughts -- it's technically better than installing Audible Magic appliances to filter all outbound and inbound traffic, at least.

However, there's no indication of the degree to which Eircom will verify the "proof" provided by the music labels, or that there's any penalty for the labels when they accuse your laser printer of filesharing. I foresee a lot of false positives.

Update: LINX reports that the investigative company used will be Dtecnet, a 'company that identifies copyright infringers by participating in P2P file-sharing networks'. TorrentFreak says:

DtecNet [...] stems from the anti-piracy lobby group Antipiratgruppen, which represents the music and movie industry in Denmark. There are more direct ties to the music industry though. Kristian Lakkegaard, one of DtecNet’s employees, used to work for the RIAA’s global partner, IFPI. [...]

Just like most (if not all) anti-piracy outfits, they simply work from a list of titles their client wishes to protect and then hunts through known file-sharing networks to find them, in order to track the IP addresses of alleged infringers.

Their software appears as a normal client in, for example, BitTorrent swarms, while collecting IP addresses, file names and the unique hash values associated with the files. All this information is filtered in order to present the allegations to the appropriate ISP, in order that they can send off a letter admonishing their own customer, in line with their commitments under the MoU.

[...] it will be a big surprise if [Dtecnet's evidence is] of a greater ‘quality’ than the data provided by MediaSentry.

More coverage of the issues raised by the RIAA's international lobbying for the 3-strikes penalty:

Links for 2009-01-28

Links for 2009-01-23

Links for 2009-01-21

Links for 2009-01-20

Switched to Magnet

I've switched my home broadband from Eircom's 3Mbps all-in-one package to Magnet's 10Mbps LLU package. It's about a tenner a month cheaper, and significantly faster of course.

The modem arrived last Friday, about 2 weeks after ordering; that night, when I went to check my mail, I noticed that the DSL had gone down, and indeed so had the phone. I was dreading a weekend without the interwebs, it being 9pm on Friday night -- but lo, when I plugged in the Magnet router, it all came up perfectly first time!

Great instructions too. Extremely readable and quite comprehensible for a reasonably non-techie person, I'd reckon. So far, they've provided great service, too.

I'm not actually getting the full 10Mbps, unfortunately; it's RADSL, and I'm only getting 5Mbps when I test it. Just as well I didn't pay the extra tenner to get their 24Mbps package. Still, that's a hell of a lot faster than the sub-1Mbps speeds I've been getting from Eircom.

It's hard to notice an effective difference when browsing though, as that kind of traffic is dominated by latency effects rather than throughput.

I haven't even tried their "PCTV" digital TV system; it seems a bit pointless really, I have a networked PVR already, and anyway I doubt they support Linux.

One thing that's wierd; when my wife attempts to view video on news.bbc.co.uk on her Mac running Firefox, it stalls with the spinny "loading video" image, and the status line claims that it's downloading from "ad.doubleclick.net". This worked fine (of course) on Eircom. If I switch to my user account and use Firefox there, it works fine, too -- possible difference being that I'm using AdBlock Plus and she's not. Something to do with the number of simultaneous TCP connections to multiple hosts, maybe? Very odd anyway. It'd be nice to get some time to sit down with tcpdump and figure this one out... any suggestions?

Links for 2009-01-19

Links for 2009-01-15

Google.ie HTTPS fail

Check out what happens when you visit https://www.google.ie/ :

Clicking through Firefox's ridiculous hoops gets me these dialogs:

Good work, Google and Firefox respectively!

Links for 2009-01-14

Links for 2009-01-13

Hack: reassassinate

A coworker today, returning from a couple of weeks holiday, bemoaned the quantities of spam he had to wade through. I mentioned a hack I often used in this situation, which was to discard the spam and download the 2 weeks of supposed-nonspam as a huge mbox, and rescan it all with spamassassin -- since the intervening 2 weeks gave us plenty of time for the URLs to be blacklisted by URIBLs and IPs to be listed by DNSBLs, this generally results in better spamfilter accuracy, at least in terms of reducing false negatives (the "missed spam"). In other words, it gets rid of most of the remaining spam nicely.

Chatting about this, it occurred to us that it'd be easy enough to generalize this hack into something more widely useful by hooking up the Mail::IMAPClient CPAN module with Mail::SpamAssassin, and in fact, it'd be pretty likely that someone else would already have done so.

Sure enough, a search threw up this node on perlmonks.org, containing a script which did pretty much all that. Here's a minor freshening: download

reassassinate - run SpamAssassin on an IMAP mailbox, then reupload

Usage: ./reassassinate --user jmason --host mail.example.com --inbox INBOX --junkfolder INBOX.crap

Runs SpamAssassin over all mail messages in an IMAP mailbox, skipping ones it's processed before. It then reuploads the rewritten messages to two locations depending on whether they are spam or not; nonspam messages are simply re-saved to the original mailbox, spam messages are sent to the mailbox specified in "--junkfolder".

This is especially handy if some time passed since the mails were originally delivered, allowing more of the message contents of spam mails to be blacklisted by third-party DNSBLs and URIBLs in the meantime.

Prerequisites:

  • Mail::IMAPClient
  • Mail::SpamAssassin

Links for 2009-01-09

Links for 2009-01-08

  • Map/Reduce and Queues for MySQL using Gearman : A talk by Eric Day and Brian Aker at the upcoming MySQL Conference in April: '[Gearman] development is now active again with an optimized rewrite in C, along with features such as persistent message queues, queue replication, improved statistics, and advanced job monitoring. For MySQL, there is also a new user defined function to run Gearman jobs, as well as the possibility to write your own aggregate UDFs using Gearman. This gives you the ability to run functions in separate processes, separate servers, and in other languages. The Gearman framework gives you a robust interface to also run these functions reliably in the “cloud”. This session will introduce these concepts and give examples of sample applications.' Persistent queues (at last)? Gearman integration directly in the DB? excellent!
    (tags: gearman queueing mysql databases brian-aker mapreduce sql conferences talks papers)

Links for 2009-01-07

Links for 2009-01-06

Links for 2009-01-02

Links for 2009-01-02

Links for 2008-12-28

Links for 2008-12-22

Links for 2008-12-21

Links for 2008-12-19

Links for 2008-12-18

Links for 2008-12-17

If only this were true

Some people, when facing a problem, think "I'll use regular expressions." Now they have HORDES OF CUTE PEOPLE WANTING TO SLEEP WITH THEM

-- Yoz, on twitter

Listening to music over wifi?

Hey lazyweb! Long time, no write.

I'm wondering what setup people use to deal with the following situation. Upstairs, I have an Ubuntu 8.04 server with 71GB of MP3s. Downstairs, I have a stereo system. In between the two is a wireless network. How can I listen to the music downstairs, without simply copying the lot (or subsets thereof) onto a local disk on some appliance down there?

Currently, I'm using a VNC client on a Nokia 770 to control a JuK window on the server. This works great, believe it or not! KDE 3 can be coaxed into providing a fantastic UI for a small touchscreen. This then uses Pulseaudio to transmit the sound output using the ESD protocol over TCP to the ESD server on the N770, and the N770 plays back the sound.

Until a few months ago, this worked great. However, something (either hardware changes, network topology changes, or an upgrade to Ubuntu 8.04 on the server) has resulted in effective bitrates between the server and the N770 dropping frequently -- hence the audio drops out or changes pitch, rendering it unlistenable :(

I've tried using UPNP servers (specifically mediatomb, ushare, and Twonkymedia), with the built-in Media Streamer app on the N770. All fail. MP3s cut off near the end, M3U playlists aren't supported, and sometimes Media Streamer just locks up. In addition it's pretty messy trying to get the UPNP servers to notice changes to the MP3 collection.

I've also tried using Squeezecenter (nee Slimserver), but the MP3 stream playback support on the N770 is pretty atrocious; there are audible decoding artifacts.

So -- anyone got a suggestion? Even something involving iTunes might be helpful -- as long as it can at least preserve the Linux server. I'm unlikely to host the full MP3 collection on anything else...

Links for 2008-12-11

Links for 2008-12-10

Links for 2008-12-09

Links for 2008-12-08

Links for 2008-12-07

Links for 2008-12-03

Links for 2008-11-26

Recession Hits The Digital Depot

The Digital Depot is 'an innovative, state-of-the-art building specifically designed to meet the needs of fast growing digital media companies [...] developed as a joint initiative of Enterprise Ireland, Dublin City Council and The Digital Hub Development Agency.' Generally, it's a pretty nice place to work, and a great resource for startups and small tech companies.

However, recently, it looks like they've been embarking on some innovative, state-of-the-art cost-cutting exercises.

There's a little canteen area, for companies to make tea and coffee, wash up their mugs, etc. Check out this snapshot from the canteen this morning, courtesy of JK's phone cam:

Notice anything odd about that bottle of washing-up liquid?

Yum yum! Nothing nicer than washing your mug with a dash of toilet cleaner.