Skip to content

Category: Uncategorized

Links for 2009-05-07

Links for 2009-05-06

Links for 2009-05-01

Irish Examiner innumeracy

Here's a great example of numerical illiteracy spotted by my mate Tom:

some classic reporting in the Irish Examiner today...

"Department staff clocked up 20,000 sick days in the three years" is the headline. Closer examination of the article reveals there are 5,000 people in the department. Do the maths (which the paper doesn't - I wonder why) and that's a SHOCKING 1.3 sick days a year.

Even better is this quote: "Department of Agriculture staff clocked up 3,095 uncertified sick days last year - 653 of these on a Monday"

So that would be about a fifth of the sick days being taken on one of the five working days in the week. DISGRACE!

Let's hear it for old media's commitment to quality journalism!

Links for 2009-04-30

Links for 2009-04-29

Links for 2009-04-25

Links for 2009-04-24

Links for 2009-04-22

Links for 2009-04-21

Links for 2009-04-20

Links for 2009-04-17

Reminder: Irish computing history talk next Monday

Don't forget -- next Monday, the Heritage Society of Engineers Ireland, in association with The Irish Computer Society, and the ICT and Electronic and Electrical Divisions of Engineers Ireland, will be hosting an evening lecture entitled "Reminiscences of Early days of Computing in Ireland", by Gordon Clarke (M.A., CEng., F.B.C.S., C.I.T.P., F.I.C.S). Sounds like it'll be great. More details.

Update: it starts at 8pm; useful info! Also, the event's flyer can be found on this page, which notes:

For those new to using our webcast facility, please see www.engineersireland.ie/webcast for information on how to set-up and access our webcasts. To view the event, please log onto the url below: https://engineersireland.webex.com/engineersireland/onstage/g.php?t=a&d=841959965 The password: computer

Links for 2009-04-16

Linux per-process I/O performance: measuring the wrong thing

A while back, I linkblogged about "iotop", a very useful top-like UNIX utility to show which processes are initiating the most I/O bandwidth.

Teodor Milkov left a comment which is well worth noting, though:

Definitely iotop is a step in the right direction.

Unfortunately it's still hard to tell who's wasting most disk IO in too many situations.

Suppose you have two processes - dd and mysqld.

dd is doing massive linear IO and its throughput is 10MB/s. Let's say dd reads from a slow USB drive and it's limited to 10MB/s because of the slow reads from the USB.

At the same time MySQL is doing a lot of very small but random IO. A modern SATA 7200 rpm disk drive is only capable of about 90 IO operations per second (IOPS).

So ultimately most of the disk time would be occupied by the mysqld. Still iotop would show dd as the bigger IO user.

He goes into more detail on his blog. Fundamentally, iotop works based on what the Linux kernel offers for per-process I/O accounting, which is I/O bandwidth per second, not I/O operations per second. Most contemporary storage in desktops and low-end server equipment is IOPS-bound ('A modern 7200 rpm SATA drive is only capable of about 90 IOPS'). Good point! Here's hoping a future change to the Linux per-process I/O API allows measurement of IOPS as well...

Links for 2009-04-14

Big table desking

We have an extremely open-plan layout in work -- no partitions, just long benches of keyboards and monitors. It looks a bit like this, but with less designer furniture and more Office Depot:

Aman pointed out that this is a new trend in workplace design, which Workalicious calls "Big Table Desking":

I'm still not sure what to make of the frequent instances of Big Table Desking. While this kind of workstation arrangement is no doubt a new trend, the no-privacy work place is a throwback to the 1950s office pool, a line up of identical desks classroom style. Is it the peer to peer seating position that overcomes this? How would it? By building community? As opposed the pilot and passenger 747, catholic church model of everybody facing "forward". Does the Big Table Desk break down this heirarchy by facing people towards one another, sharing a big desk instead of staking out territory? Is the big table desk a microcosm, a representation of a healthy organizational structure?

No comment ;)

It seems to be popular with designers, presumably due to their collaborative working needs.

Mind you, it also looks a bit like a Taylorist workplace layout from 1904, of which Wired says:

American engineer Frederick Taylor was obsessed with efficiency and oversight and is credited as one of the first people to actually design an office space. Taylor crowded workers together in a completely open environment while bosses looked on from private offices, much like on a factory floor.

YouBloom plug

Last week I got a very nice mail looking to plug a new music site:

'I'm not sure if this would interest you at all but wanted to pass on the link to a new website called YouBloom.

It's a new social networking and e-commerce website set up with independent artists in mind - to help them to make make real money (unlike MySpace etc which just make money from the artists)! It was set up by Irish Musician Phil Harrington and is backed by Sir Bob Geldof.

Admittedly I am involved with the website. I have been helping bring artists on site for the last few months, since I was introduced to the concept by a friend, but would love for you to take a look at the site anyway - even if it turns out to be of no interest to you.'

I normally wouldn't post these, but I'm a sucker for flattery ;) and the poster had taken the time to read my blog a little. It also looks like the site allows bands to offer free MP3 downloads of their tunes, which IMO is a key factor for bands trying to get promotion.

UPC.ie’s new Channel 4 frequency for MythTV

So, after spending an hour or two attempting to figure out where the hell UPC had moved Channel 4 to, I eventually found out that it was now being broadcast on 543 Mhz. I also found out that this wasn't part of the standard list of A1 to A30 channels in the "pal-ireland" range. :(

Thankfully, I then found this Frequency to MythTV channel converter page; here's the correct values to use on the MythWeb channels page:

  • Freqid = 30
  • Finetune = -4

Links for 2009-04-10

Links for 2009-04-08

Links for 2009-04-07

Links for 2009-04-06

“you are, in fact, in the message queue business”

Oh man, this Twitter Ruby-vs-Scala language spat is hilarious; talk about handbags at dawn. I loved this exchange in the comments to this post in particular:

BJ Clark:

I'm mostly surprised that a guy who wrote the book on Scala comes out and says that Scala is better than everything else and someone actually listened and took him seriously. He has a vested interest in saying that Scala is the next big thing and I've yet to see any evidence that Kestrel is better (at anything) than RabbitMQ.

And frankly, I still get fail whales at Twitter on a daily basis, so, what exactly are they so proud about over there?

Steve Jenson:

Kestrel pages queues to disk: if you get more messages than you have memory, it's fine. If RabbitMQ gets more messages than memory, it crashes. We talked to them extensively about this problem and they're going to address it. We were hoping we'd be able to use RabbitMQ or another message queue. We didn't want to be in the message queue business. At this point, given that we know the code and it's performance inside and out, it makes sense to continue using and developing it.

BJ Clark:

I don't feel like arguing with you but your logic isn't clear to me. It would make sense that if you don't want to be in the message queue business, you'd submit patches against an established message queue to make it work in your situation instead of writing your own message queue, twice. This is overlooking the fact that twitter is basically a massive message queue and you are, in fact, in the message queue business.

Zing!

Links for 2009-04-05

URL shortening services: my experience

A good post from Joshua Schachter about URL shortening services.

For what it's worth, I ran into the unwanted-interstitial risk. At one stage, before I'd bothered registering jmason.org, sitescooper.taint.org or my other domains, I used a URL-shortening service to provide a memorable, short URL for an open-source application I wrote -- http://zap.to/snarfnews/.

At some point a few years down the line, the forwarding process started accreting ads; eventually they became soft-porn in content, and I was forced to apologise to users for the forwarding I could no longer control!

By now, 10 years down the line, it seems to hijack the page entirely, returning a page in Cyrillic I can't even read :( (apparently it's a page of Flash games; thanks, Alexandr Ciornii, for the interpretation!)

Anyway, lesson learned.

Links for 2009-04-03

“Report Says Deal”

Twitter has this "Trending Topics" sidebar now, which lists the following topics:

Trending Topics

  • TGIF
  • National Cleavage
  • G20
  • Easter
  • #grammarsongs
  • France
  • #rp09
  • French
  • Grand National
  • Report Says Deal

Now, I'm not going to go into the topic of National Cleavage right now. 'Report Says Deal' is intriguing because it makes no sense, until you click through to see:

Real-time results for "Report Says Deal"

  1. Too_cool_normal dlloydsecret Google to Buy Twitter? Report Says Deal is in the Works http://bit.ly/Wt1Wb half a minute ago from twitterfeed    
  2. Orig_8102_003_normal dlloydthemlmpro Google to Buy Twitter? Report Says Deal is in the Works http://bit.ly/Wt1Wb 1 minute ago from twitterfeed    
  3. Ad-tech-paul2_normal techupdates [PCWrld] Google to Buy Twitter? Report Says Deal is in the Works http://tinyurl.com/c63ont 3 minutes ago from twitterfeed    
  4. Orkut_normal icidade Google to Buy Twitter? Report Says Deal is in the Works. http://is.gd/quu9 4 minutes ago from TweetDeck    
  5. Img00315_normal chrisgraves Retweeting @CinWomenBlogger: Retweeting @ays: Google to Buy Twitter? Report Says Deal is in the Works - PC World http://bitly.com/LhT4 6 minutes ago from twhirl

So I'd say that Twitter's "Trending Topics" uses N-grams of between 1 and 3 "words" for topic identification. In this case, rather than "Report Says Deal", a better topic string would be something like:

Google to Buy Twitter? Report Says Deal is in the Works - PC World

or even:

Google to Buy Twitter? Report Says Deal is in the Works - PC World http://bitly.com/LhT4

Funnily enough this is exactly the issue I ran into while developing this algorithm. The trick at this point is to apply a variant of the BLAST pattern-discovery algorithm, expanding the patterns sideways while they still match the same subsets of the corpus until they're maximal.

Twitter folks, if you can read Perl, "assemble_regexps()" in seek-phrases-in-log in SpamAssassin SVN does this pretty nicely, and reasonably efficiently, and is licensed under the ASL 2.0. ;)

Links for 2009-04-02

Links for 2009-04-01

Links for 2009-03-30

OSSBarCamp this weekend

It's two days until OSSBarCamp, a free open-source-focussed Bar Camp unconference at Kevin Street DIT, this Saturday. I'm looking forward to it -- although unfortunately I missed the boat on giving a talk. (Unlike the traditional Bar Camp model, this is using a pre-booked talk system.)

Particularly if you're working with open source in Ireland, you should come along!

I have high hopes for John Looney's discussion of cloud computing and how it interacts with open source. Let's hope he's not too Google-biased in his definition of "cloud computing". ;)

Also of interest -- Fintan Boyle's "An Introduction To Developing With Flex". To be honest, I hadn't even realised that Adobe Flex was now open source. cool.

Links for 2009-03-25

Links for 2009-03-24

Links for 2009-03-23

Talk: Early days of Computing in Ireland

On Monday April 20th, the Heritage Society of Engineers Ireland, in association with The Irish Computer Society, and the ICT and Electronic and Electrical Divisions of Engineers Ireland, will be hosting an evening lecture: 'Reminiscences of Early days of Computing in Ireland':

In 1957 the Irish Sugar Company installed the first stored program computer in Ireland. Other large organisations slowly followed suit.

Gordon Clarke will discuss how the early computers enhanced the electro-mechanical systems that had developed over the previous 60 years. He will talk about their specifications, a few of the first applications and tell the story of the very early years of designing and developing computer based systems.

All Welcome. Admission Free. No booking required. This event will be web-cast

For Details: www.engineersireland.ie, or Con Kehely: (01) 6860113 (con.kehely /at/ dublincity.ie)

Location: Engineers Ireland, 22 Clyde Road D4

Sounds great! Thanks to Frank Duignan on the ILUG list for forwarding the notice.

Links for 2009-03-20

4chan Memes, circa 1889

In the comments to this unremarkable story about 4chan's Boxxy fad, I came across this gem from CSClark:

I don't know why I didn't think to see if this sort of phenomenon was covered in Extraordinary Popular Delusions... Of course, it is.

Walk where we will, we cannot help hearing from every side a phrase repeated with delight, and received with laughter, by men with hard hands and dirty faces, by saucy butcher lads and errand-boys, by loose women, by hackney coachmen, cabriolet-drivers, and idle fellows who loiter at the corners of streets. Not one utters this phrase without producing a laugh from all within hearing. It seems applicable to every circumstance, and is the universal answer to every question; in short, it is the favourite slang phrase of the day, a phrase that, while its brief season of popularity lasts, throws a dash of fun and frolicsomeness over the existence of squalid poverty and ill-requited labour, and gives them reason to laugh as well as their more fortunate fellows in a higher stage of society.

Wherein we also learn that the FAIL of the day was Quoz:

When a disputant was desirous of throwing a doubt upon the veracity of his opponent, and getting summarily rid of an argument which he could not overturn, he uttered the word Quoz, with a contemptuous curl of his lip, and an impatient shrug of his shoulders. The universal monosyllable conveyed all his meaning, and not only told his opponent that he lied, but that he erred egregiously if he thought that any one was such a nincompoop as to believe him.

I'm also sure I've read of a fad - Greek, Roman, 18th century, something like that - where a group of young (aristocratic?) men who would suddenly grab a common woman and proclaim her Helen and make her their queen and swear to die for her and so on. And the tearing down of such idols could be seen, if you were wont to be pretentious like me, as part of Frazer's Golden Bough's Sacrificial King idea, although I'm not sure script kiddies care if the crops grow. (One other problem with that is that Frazer was romancing; but so are the more literal memecists, so yah!)

Since then however, it appears that "quoz" has entirely flipped meaning, according to UrbanDictionary:

slang for quality, a cockney term for something good. usually accompanied with a hand action of slaping ur index finger against the stationary thumb and middle finger. 'thats quoz man! propa quoz.' finger slappy hand thingy

Links for 2009-03-19

Links for 2009-03-18

“Fundamentally flawed”

Killer presentation -- "RPC And Its Offspring: Convenient, Yet Fundamentally Flawed" from Steve Vinoski, who presented it at QCon London last week. It's full of reminders of the mid-90's, hacking away on CORBA technology -- Steve was one of the key players at Iona while I was there.

But never mind where we've been; let me hit you with the summary slide to show where Steve's going:

  • RPC is a convenient but flawed accident of history

    • 1980s research focused on monoliths of programming languages, distributed applications, and operating systems
    • each computer vendor of the time owned their own full stack, from language to hardware and network, and you used what they gave you
    • imperative languages won back then simply because of their superior performance at that time
  • It’s almost 2010, folks — we can do WAY better

    • pull your head from the imperative language sand and learn functional programming
    • the world is many-core and highly distributed, and the old ways aren’t going to keep working much longer

Awesome ;)

Links for 2009-03-16

A plug for Kiva.org

I just made a loan using Kiva.org to a weaver in Nepal and a group of Vietnamese broom makers.

You can go to Kiva's website and lend to someone in the developing world who needs a loan for their business. Each loan has a picture of the entrepreneur, a description of their business and how they plan to use the loan so you know exactly how your money is being spent -- and you get updates letting you know how the entrepreneur is going.

The best part is, when the entrepreneur pays back their loan you get your money back - and Kiva's loans are managed by microfinance institutions on the ground who have a lot of experience doing this, so you can trust that your money is being handled responsibly.

Kiva's microfinancing seems like a nice way of helping the developing world, and I've heard good things about it. Here's hoping it works out well for my two recipients!

Links for 2009-03-13

Links for 2009-03-12

Links for 2009-03-11

Google Reader productivity hack: change your Home

So, if you use Google Reader, read your news with the "All items" page, and are subscribed to hundreds of feeds, it can be pretty overwhelming. I've found a better way to deal with this.

Select a 'most important' subset of feeds. For each of those, click through to the feed details page, hit the "Feed Settings..." menu, and select "Change folders...". Put the feed into a new "top" folder (creating it if necessary).

Now go to "Settings" -> "Preferences" and check out the "Start page" preference. By default, it's set to "Home"; change it to "Folders and Tags: top".

Hey presto -- now, when you load Google Reader, it'll come up with your "top" items. You can get through those quickly enough, and get on to other more important tasks. When you're bored and need something to read, though, just hit "Navigation" -> "All items" (or even just type 'ga'), and every other feed is now there for your delectation. Sweet!

Links for 2009-03-10

Links for 2009-03-05

Ready for the blackout?

Reminder -- Ireland's Blackout Week starts tomorrow:

Take part in Blackout Week

  1. To demonstrate your feelings about [IRMA's censorship demands], you can make your avatar black on any websites you have a presence on.
  2. This is inspired by Creative Freedom New Zealand's blackout campaign.
  3. From Black Thursday on the 5th of March, for one week, set your picture on sites like Facebook, Bebo, Twitter, MSN, etc black to raise awareness for Blackout Ireland.
  4. On that Thursday we encourage you to express yourself publicly about this issue, whether by blog posts, letters to newspapers or any form of communication you can think of.

Links for 2009-03-03

  • Locale : 'Locale allows you to create Situations, which specify Conditions under which your Settings should change; e.g. your "At Work" situation might notice when your location condition is "1600 Amphitheatre Parkway," and trigger your ringer to vibrate.' in essence, rule-based AI for your phone. want it! and the phone too while I'm at it!
    (tags: want android phone apps google location mapping)

Using VC to track system config changes by mail

Here's a great idea from a thread on the SpamAssassin users list, from Roger Marquis:

Karsten Bräckelmann [questioning the utility of a mechanism to dump the entire contents of the SpamAssassin configuration database]:

'postconf' without the handy -n switch dumps about 500 lines. The equivalent dump for SA including the rules is about 6000 lines. And that's a plain dump, without following and unfolding meta rules or anything.

Whether 6K or 60K would not necessarily make a difference to how I would like to use an SA 'postconf -n' equivalent. That use is change management. The intent is not in the full report itself but in its deltas.

As full time mail/systems admins we get invaluable data from tripwire/integrit, 'postconf -n', dconf, 'rpm -qa', 'dpkg -l *', 'pkg_info -a', ... whose output is checked in to RCS daily. This provides a nice configuration snapshot and historical record but its real usefulness comes from rcsdiff piped into a daily report. These are (usually) relatively concise, and IMO, absolutely essential for monitoring production Unix/Linux systems.

I like it! I think I'd check it into a git repo, though. The concept of applying VC smarts to traditional sysadmin tasks is definitely a meme on the way up -- see also etckeeper.

Links for 2009-03-02

Links for 2009-02-27

Blackout Ireland – a response to IRMA’s censorship demands

As Adrian noted last week, IRMA are demanding that Eircom block the Pirate Bay -- first on a list of websites they don't like -- on pain of being sued. On top of that, they intend for the other Irish ISPs to follow suit -- here's a key line from the letter they sent to Blacknight MD Michele Neylon:

in the event of a positive response to this letter it is proposed to make practical arrangements with Blacknight of a like nature to those made with eircom.

If that comes to pass, this will be an appalling situation for Irish internet users, and we need to act to ensure it doesn't happen. Digital Rights Ireland:

The net effect of this scheme, if it is allowed to go into effect, will be to impose an internet death penalty on two groups. On users, who will be cut off on the allegation of a private body, with no court involvement, and on websites, which could be blocked to Irish users based on a court hearing where only one side is heard.

Pace Mulley:

So first they’ll start with the Pirate Bay. Then comes Mininova, IsoHunt, then comes YouTube (they have dodgy stuff, right?), how long before we have Boards.ie because someone quoted a newspaper article or a section of a book?

Digital Rights Ireland have posted an excellent document detailing the following plan of action for Irish internet users concerned about this:

  • Contact your ISP and let them know that this is a key issue for you, as their customer.

  • Join up with your fellow netizens. Subscribe to the Blackout Ireland blog. Follow the #blackoutirl hashtag on Twitter. Join the Blackout Ireland Facebook group. It looks likely that there'll be a week-long blackout campaign starting next Thursday, March 5th.

  • Contact politicians. This is likely to cause irreparable damage to the Irish internet, so our pols should be very worried. See the DRI post for details on getting in touch with Minister for Communications Eamonn Ryan.

New Zealand is running their own blackout campaign right now, so that may help our planning.

International readers -- make no mistake, you're next. IRMA in this case is acting as the local delegate of IFPI, which stated in 2007 that this was one of the 3 technical options for ISPs to control piracy:

Here's some other interesting coverage:

Fantastic interview with BitBuzz CEO Alex French:

If ISPs, including Eircom, agree not to oppose blocking access to The Pirate Bay and other similar websites, is this not an agreement to web censorship? “I don’t think there is any other way to interpret it,” said French.

“They are essentially agreeing to censor certain websites at the behest of the recording industry, without these websites ever having necessarily shown to be illegal in the Republic of Ireland. I would have a huge concern over what other websites may be blocked and what other industries will pile in now that the precedent has been set.”

Some sample letters:

And further discussion -- here's a massive boards.ie discussion thread, now closed in favour of this newer thread.

Update: here's the letter I sent to the Minister, if you're curious or need inspiration.

Links for 2009-02-26

Links for 2009-02-25

Ubuntu to bundle Eucalyptus

Introducing Karmic Koala, Ubuntu 9.10:

What if you want to build an EC2-style cloud of your own? Of all the trees in the wood, a Koala's favourite leaf is Eucalyptus. The Eucalyptus project, from UCSB, enables you to create an EC2-style cloud in your own data center, on your own hardware. It's no coincidence that Eucalyptus has just been uploaded to universe and will be part of Jaunty - during the Karmic cycle we expect to make those clouds dance, with dynamically growing and shrinking resource allocations depending on your needs.

A savvy Koala knows that the best way to conserve energy is to go to sleep, and these days even servers can suspend and resume, so imagine if we could make it possible to build a cloud computing facility that drops its energy use virtually to zero by napping in the midday heat, and waking up when there's work to be done. No need to drink at the energy fountain when there's nothing going on. If we get all of this right, our Koala will help take the edge off the bear market.

AWESOME -- exactly where the Linux server needs to go. Eucalyptus is the future of server farms. Really looking forward to this...

Links for 2009-02-24

Blimey, I won

Somehow or other, I seem to have won the 2009 Irish Blog Award for Best Technology Blog/Blogger! To be honest, for the last year I haven't been spending as much time on the blog as before, due mainly to a rather compelling distraction, so I'm doubly grateful for winning.

Unfortunately, I was out of the country, at Nishad and Janet's wedding, so missed my chance to get up on stage and thank my fellow bloggers in person -- but I asked John to do so instead. Seems he in turn got stage fright and delegated to his missus, who picked up the trophy. Thanks Fiona! That's probably just as well, since I'm pretty incoherent in that kind of situation myself.

Cheers to my fellow nominees, Eoghan, Robin, Michele and Pat. One of you guys should totally have won ;)

And last of all -- cheers to BitBuzz for sponsoring the category, and Mulley for the whole bash. I definitely have to turn up next year!

Now I need to put more time in this year to really earn that award...

Links for 2009-02-16

Plenty of money for Dublin’s bikes

So it seems that JC Decaux have been complaining about the costs of running the Velib scheme in Paris:

Since the scheme's launch, nearly all the original bicycles have been replaced at a cost of 400 euros each.

Of course, this won't be a problem in Dublin. Going by Newstalk's estimates of how much the advertising space provided to JC Decaux for free, in exchange for the (as yet nonexistent) 450 bikes would have cost, each bike comes at a public cost of 111,000 Euros. That should cover a lot of "velib extreme".

(OK, that may be overestimating it. The Irish Times puts a more sober figure of EUR 1m per year; that works out as EUR 2,000 per bike per year. Still should cover a few broken bikes.)

A quick reminder:

ParisDublin
20,000 bikes450 promised
~1,600 billboards~120 installed
~12.5 bikes per billboard~3.8 bikes per billboard
10km range (from 15e to 19e arondissement)4km range (from the Mater Hospital to the Grand Canal)

And, of course, there's no sign of the bikes here yet... assuming they ever arrive. Heck of a job, Dublin City Council.

BTW, here's the rate card for advertising on the "Metropole" ad platforms, if you're curious, via the charmingly-titled Go Ask Me Bollix.

Links for 2009-02-13

Fixing the Gmail Tasks window bug

Hey Gmail users! If you're using Tasks, there's a slightly annoying bug in Gmail right now -- you may see the "Use this link to open Tasks" tip window appear every time you access the inbox page.

Several other people have reported it, and apparently the Google guys are 'working to resolve it' at the moment. In the meantime, though, here's a way to work around the issue without losing Tasks (you will, unfortunately, lose the offline-gmail functionality, though). Simply disable Offline Gmail (Settings -> Offline -> "Disable Offline Gmail for this computer"), and the bug no longer manifests itself.

You can allow Gmail to keep the stored mail on your computer if you like, which will be handy for when the bug is fixed and Offline can be re-enabled -- hopefully sooner rather than later.

Continuous deployment

This is awesome, if a little insane. Continuous Deployment at IMVU: Doing the impossible fifty times a day:

Continuous Deployment means running all your tests, all the time. That means tests must be reliable. We’ve made a science out of debugging and fixing intermittently failing tests. When I say reliable, I don’t mean “they can fail once in a thousand test runs.” I mean “they must not fail more often than once in a million test runs.” We have around 15k test cases, and they’re run around 70 times a day. That’s a million test cases a day. Even with a literally one in a million chance of an intermittent failure per test case we would still expect to see an intermittent test failure every day. It may be hard to imagine writing rock solid one-in-a-million-or-better tests that drive Internet Explorer to click ajax frontend buttons executing backend apache, php, memcache, mysql, java and solr. I am writing this blog post to tell you that not only is it possible, it’s just one part of my day job.

OK, so far, so sensible. But this is where it gets really hairy:

Back to the deploy process, nine minutes have elapsed and a commit has been greenlit for the website. The programmer runs the imvu_push script. The code is rsync’d out to the hundreds of machines in our cluster. Load average, cpu usage, php errors and dies and more are sampled by the push script, as a basis line. A symlink is switched on a small subset of the machines throwing the code live to its first few customers. A minute later the push script again samples data across the cluster and if there has been a statistically significant regression then the revision is automatically rolled back. If not, then it gets pushed to 100% of the cluster and monitored in the same way for another five minutes. The code is now live and fully pushed. This whole process is simple enough that it’s implemented by a handfull of shell scripts.

Mental. So what we've got here is:

  • phased rollout: automated gradual publishing of a new version to small subsets of the grid.

  • stats-driven: rollout/rollback is controlled by statistical analysis of error rates, again on an automated basis.

Worth noting some stuff from the comments. MySQL schema changes break this system:

Schema changes are done out of band. Just deploying them can be a huge pain. Doing an expensive alter on the master requires one-by-one applying it to our dozen read slaves (pulling them in and out of production traffic as you go), then applying it to the master’s standby and failing over. It’s a two day affair, not something you roll back from lightly. In the end we have relatively standard practices for schemas (a pseudo DBA who reviews all schema changes extensively) and sometimes that’s a bottleneck to agility. If I started this process today, I’d probably invest some time in testing the limits of distributed key value stores which in theory don’t have any expensive manual processes.

They use an interesting two-phased approach to publishing of the deploy file tree:

We have a fixed queue of 5 copies of the website on each frontend. We rsync with the “next” one and then when every frontend is rsync’d we go back through them all and flip a symlink over.

All in all, this is very intriguing stuff, and way ahead of most sites. Cool!

(thanks to Chris for the link)

Links for 2009-02-11

Config management as cookery

interesting to see Chef, a configuration management framework using cooking as a metaphor.

Back in the early '90s in Iona, I wrote a user/group synchronization tool called "greenpages" which used a cooking metaphor; "spice" (data) was added to "raw" (template) files to produce "cooked" output. Great minds, eh!

Links for 2009-02-09

IR book recommendation

Thanks to Pierce for pointing me at this review of an interesting-sounding book called Introduction to Information Retrieval. The book sounds quite useful, but I wanted to pick out a particularly noteworthy quote, on compression:

One benefit of compression is immediately clear. We need less disk space.

There are two more subtle benefits of compression. The first is increased use of caching ... With compression, we can fit a lot more information into main memory. [For example,] instead of having to expend a disk seek when processing a query ... we instead access its postings list in memory and decompress it ... Increased speed owing to caching -- rather than decreased space requirements -- is often the prime motivator for compression.

The second more subtle advantage of compression is faster transfer data from disk to memory ... We can reduce input/output (IO) time by loading a much smaller compressed posting list, even when you add on the cost of decompression. So, in most cases, the retrieval system runs faster on compressed postings lists than on uncompressed postings lists.

This is something I've been thinking about recently -- we're getting to the stage where CPU speed has so far outstripped disk I/O speed and network bandwidth, that pervasive compression may be worthwhile. It's simply worth keeping data compressed for longer, since CPU is cheap. There's certainly little point in not compressing data travelling over the internet, anyway.

On other topics, it looks equally insightful; the quoted paragraphs on Naive Bayes and feature selection algorithms are both things I learned myself, "in the field", so to speak, working on classifiers -- I really should have read this book years ago I think ;)

The entire book is online here, in PDF and HTML. One to read in that copious free time...

Good reasons to host inelastically on EC2

Recently, there's been a bit of discussion online about whether or not it makes sense for companies to host server infrastructure at Amazon EC2, or on traditional colo infrastructure. Generally, these discussions have focussed on one main selling point of EC2: its elasticity, the ability to horizontally scale the number of server instances at a moment's notice.

If you're in a position to gain from elasticity, that's great. But it is still worth noting that even if you aren't in that position, there's another good reason to host at an EC2-like cloud; if you want to deploy another copy of the app, either from a different version-control branch (dev vs staging vs production deployments), or to run separate apps with customizations for different customers. These aren't scaling an existing app up, they're creating new copies of the app, and EC2 works nicely to do this.

If you can deploy a set of servers with one click from a source code branch, this is entirely viable and quite useful.

Another reason: EC2-to-S3 traffic is extremely fast and cheap compared to external-to-S3. So if you're hosting your data on S3, EC2 is a great way to crunch on it efficiently. Update: Walter observed this too on the backend for his Twitter Mosaic service.

Ice Cycling

I seem to have invented a new extreme sport on the way into work: Ice Cycling. The roads were like an ice-skating rink. Scary stuff :(

Here's some advice for anyone in the same boat:

  • use a high gear: avoid using low gear if possible, even when starting off. Low revs mean you're more likely to get traction.

  • try to avoid turns: keep the bike as upright as possible.

  • try to avoid braking: braking is very likely to start a skid in icy conditions.

  • use busy roads: where the ice has been melted by car traffic. In icy conditions, you should ride where the cars have been, since they'll have melted the ice.

  • ride away from the gutters: they're more likely to be iced over than the centre of a lane. Again, ride where the cars have been.

  • avoid road markings: it seems these were much icier than the other parts of the road; possibly because their high albedo meant the ice on them hadn't been melted by the sun yet. So look out for that.

Here's a good thread on cyclechat.co.uk, and don't miss icebike.org: 'Whether commuting to work, or just out for a romp in the woods, you arrive feeling very alive, refreshed, and surrounded with the aura of a cycling god. You will be looked upon with the smile of respect by friends and co-workers. - - - Or was that the sneer of derision...no matter, ICEBIKING is a blast!' o-kay.

Their recommendations are pretty sane, though. ;)

Links for 2009-02-05

Links for 2009-02-03

Links for 2009-01-30

UK’s proposed anti-filesharing quango

Wow. The IFPI's strategy of "divide and conquer" by taking individual ISPs to court to force them to institute a 3 strikes policy, as successfully deployed against Eircom this week, is possibly marginally better than this insane obsolete-business-model handout proposed by the UK government in their Digital Britain report:

Lord Carter of Barnes, the Communications Minister, will propose the creation of a quango, paid for by a charge that could amount to £20 a year per broadband connection.

The agency would act as a broker between music and film companies and internet service providers (ISPs). It would provide data about serial copyright-breakers to music and film companies if they obtained a court order. It would be paid for by a levy on ISPs, who inevitably would pass the cost on to consumers.

Jeremy Hunt, the Shadow Culture Secretary, said: “A new quango and additional taxes seem a bizarre way to stimulate investment in the digital economy. We have a communications regulator; why, when times are tough, should business have to fund another one?”

Well said. An incredibly bad idea.

By the way, I've noticed some misconceptions about the Eircom settlement. Telcos selling Eircom bitstream DSL (ie. the 2MB or 3MB DSL packages) are immune right now.

They are, however, next on the music industry's hit-list, reportedly...

Links for 2009-01-29

Eircom forced to implement “3 strikes and you’re out” for filesharers

Eircom has been forced to implement "3 strikes and you're out", according to Adrian Weckler:

If the music labels come to it with IP addresses that they have identified as illegal file-sharers, Eircom will, in its own words:

"1) inform its broadband subscribers that the subscribers IP address has been detected infringing copyright and

"2) warn the subscriber that unless the infringement ceases the subscriber will be disconnected and

"3) in default of compliance by the subscriber with the warning it will disconnect the subscriber."

My thoughts -- it's technically better than installing Audible Magic appliances to filter all outbound and inbound traffic, at least.

However, there's no indication of the degree to which Eircom will verify the "proof" provided by the music labels, or that there's any penalty for the labels when they accuse your laser printer of filesharing. I foresee a lot of false positives.

Update: LINX reports that the investigative company used will be Dtecnet, a 'company that identifies copyright infringers by participating in P2P file-sharing networks'. TorrentFreak says:

DtecNet [...] stems from the anti-piracy lobby group Antipiratgruppen, which represents the music and movie industry in Denmark. There are more direct ties to the music industry though. Kristian Lakkegaard, one of DtecNet’s employees, used to work for the RIAA’s global partner, IFPI. [...]

Just like most (if not all) anti-piracy outfits, they simply work from a list of titles their client wishes to protect and then hunts through known file-sharing networks to find them, in order to track the IP addresses of alleged infringers.

Their software appears as a normal client in, for example, BitTorrent swarms, while collecting IP addresses, file names and the unique hash values associated with the files. All this information is filtered in order to present the allegations to the appropriate ISP, in order that they can send off a letter admonishing their own customer, in line with their commitments under the MoU.

[...] it will be a big surprise if [Dtecnet's evidence is] of a greater ‘quality’ than the data provided by MediaSentry.

More coverage of the issues raised by the RIAA's international lobbying for the 3-strikes penalty:

Links for 2009-01-28