Justin's Linklog – Page 95 – (Things I found interesting recently.)

More parallel string-match algorithm hacking: re2xs

Published August 17, 2006

Last week, Matt Sergeant released a great little perl script, re2xs, which takes a set of simplified regexps, converts them to the subset of regular expression language supported by re2c, then uses that to build an XS module.

In other words, it offers the chance for SpamAssassin rules to be compiled into a trie structure in C code to match multiple patterns in parallel. Given that this is then compiled down to native machine code, it has the potential to be the fastest method possible, apart from using <a href=’http://www.sensorynetworks.com/pressreleases/PR0060_2006_05_02_NCASA-formatted.pdf’>dedicated hardware co-processors.

Sure enough, Matt’s results were pretty good — he says, ‘I managed to match 10k regexps against 10k strings in 0.3s with it, which I think is fairly good.’ ;)

Unfortunately, turning this into something that works with SpamAssassin hasn’t been quite so easy. SpamAssassin rules are free to use the full perl regular expression language — and this language supports many features that re2c’s subset does not. So we need to extract/translate the rule regexps to simplified subsets. This has generally been the case with all parallel matching systems, anyway, so that’s not a massive problem.

More problematically, re2c itself does not support nested patterns — if one token is contained within another, e.g. "FOO" within "FOOD", then the subsumed token will not be listed as a match. SpamAssassin rules, of course, are free to overlap or subsume each other, so an automated way to detect this is required.

For simple text patterns, this is easy enough to do using substring matching — e.g. "FOOD" =~ /\QFOO\E/ . Unfortunately, once any kind of sophisticated regexp functionality is available, this is no longer the case: consider /FOO*OD/ vs /FOO/ , /F[A-Z]OD/ vs /FO[M-P]/ , /F(?:OO|U)D/ vs /F(?:O|UU)?O/ .

The only way to do this is to either (a) fully parse the regexp, build the trie, and basically reimplement most of re2c to do this in advance; or (b) change the trie-generation code in re2c to support states returning multiple patterns, as <a href=’http://en.wikipedia.org/wiki/Aho-Corasick_algorithm’>Aho-Corasick does.

I <a href=’http://sourceforge.net/tracker/index.php?func=detail&aid=1540845&group_id=96864&atid=616203′>requested support for this in re2c, but got a brush-off, unfortunately. So work continues…

In other news, that food poisoning thing I had back at the end of June has lingered on. It’s now pretty clear that it isn’t food poisoning or a stomach bug… but I still have no idea what it actually is. No fun :(

“Stretch-to-fit Textareas” Greasemonkey User Script

Published August 10, 2006

Here’s another quick-hack Greasemonkey user script I wrote recently.

Stretch-to-fit Textareas is a user script which improves the usability of editable textareas; it causes them to "stretch" vertically to fit their contents, as you type. This behaviour was inspired by that of textareas in FogBugz.

It can be inhibited by turning off the small checkbox to the right of each textarea.

Update: it’s worth noting that this is different from the Resizeable Textareas Firefox extension. Whereas the latter allows the user to resize the textareas by hand, this user script does that action automatically, based on the contents of the field; no manual resize-handle-searching and dragging is required. On the other hand, this user script will only stretch textareas vertically, whereas the extension allows them to be dragged in both dimensions. In fact, the two are complementary — I’m running both, and I suggest you do too ;)

Update 2: here’s a Firefox extension version — Greasemonkey not required!

LKML discusses anti-spam moderation

Published August 9, 2006

LKML: Alexey Zaytsev: Time to forbid non-subscribers from posting to the list? — the linux-kernel mailing list discusses list moderation as an anti-spam strategy.

Spam really sucks; anything that deals with email now has to include some set of anti-spam features because of it. The LKML has important features that mitigate against simply closing the list partially, such as being a point where bug reports are submitted — so this is a thorny issue for them.

For what it’s worth, I have written a system to further automate moderation beyond the basic features provided by Mailman and ezmlm. http://taint.org/wk/ModerateList describes this in detail; in essence, it’s a specialised mail user agent designed to moderate lists quickly and efficiently, with an outboard spam filter built in (SpamAssassin, of course, via its perl API).

I moderate about a thousand messages per week using this (last time I checked), and it takes about 30 seconds per day to do so, so it’s pretty efficient.

In other news: wow, talking to a good accountant can really mitigate complicated tax issues… phew.

Wedding Poems

Published August 5, 2006

OK — looks like I’ve found the perfect poem for our wedding ceremony; allow me to present "Gravity of Love":

One day, one day I asked myself
What is the right number or symbol?
What is the perfect equation?
What truly is LOGIC?
And who decides right reasoning?

In cause of no answer to my quest,
I traveled through the physical and metaphysical,
I traveled through the delusional and mystical
And at last back to the physical.

I made most important invention of my life career
That it’s only in the mysterious equation; logic of love
Any logical; mystical and psychological reasoning can be found.
It’s you in me I only believe that’s true and real

All I can say is — Wow.

Underwhelmed by ScreenClick

Published August 3, 2006

For the past few years, I’ve been a very happy user of <a href=’http://www.netflix.com/’>Netflix, the innovative web site which let you receive DVDs via the post for a flat fee per month, for US residents. When I got back to Dublin, I was very happy to see that there was a local equivalent, in the form of ScreenClick — so I signed up.

However, I’ve become increasingly disillusioned with their service, for the same reasons as <a href=’http://www.yourtechstuff.com/techwire/2006/03/screenclickcom_.html’>Adrian Weckler writes about here…

Turnaround time: this varies wildly, and can take nearly a week to turn around a DVD from dropping it in the postbox to receiving the next one. Netflix was reliably two days for me, out in suburban Orange County, California; <a href=’http://raven.phoenyx.net/netflix/2005/08/netflix-bliss.html’>Even this Kansas blogger noted that the longest they’d waited was 4 days.

This may seem to be an externality for Screenclick — but really, it shouldn’t be. Their business is built on the postal service, and they have to have decent results for it to work.

The ‘wishlist’ model: Netflix uses a queue, operating on a first-in, first-out model, while Screenclick uses something they call a ‘wishlist’, where the DVDs are delivered based both on position in the list and availability — in other words, you can find you’ve been delivered the DVD at number 10 in your list, instead of whatever’s at the top.

Again, superficially a minor point. However, one important factor is that these services are bought by households, not by individuals. Chez jm, that means that we operated a pretty strict alternating system in our Netflix queue — one movie for me, one movie for the lovely C, repeat. This is now thoroughly scuppered with a random ‘lucky dip’ system. On top of that, forget about watching a serial in order. The end result is a mess.

The website: it’s atrocious, a hodge-podge of ads for third-party sites, press coverage of Screenclick, more ads for Screenclick (hey, I’m already a customer!), and <a href=’http://www.screenclick.com/NewsReviewsDetail.aspx?NewsID=234′>news clippings I couldn’t care less about — with finally a few tiny sidebar boxes containing the things I want (login, search box and wishlist). My impression: it’s designed to sell the company to investors and advertisers, not for customer use.

On top of that, it’s all squished into a tiny window — Irish web designers need to buy bigger screens! That late-’90’s Jakob Nielsen thing about users not knowing how to scroll? They’ve learned by now.

That’s not even talking about the awful Javascript that’s used to edit the wishlist ordering, where little buttons need to be clicked repetitively, one by one, to reorder the list. Surely someone took a look around at other sites first — Amazon perhaps — to see how other sites do it?

Anyway, on this count, I sent in a mail containing a batch of bug reports and unsolicited opinions, and got no reply. ;)

Less bang-for-buck: pretty simple. Netflix: 3 movies at a time, more movies in the collection, $17.99 per month; Screenclick, 2 movies at a time, EUR 19.99 ($25.56, $10 more expensive than the equivalent Netflix service) per month. Surprisingly, this is actually a minor issue compared to the others, though, since it’s made plain from the outset.

These may seem to be minor points, but when selling a disposable-income service to consumers, the difference between an essential leisure-time service and a waste of pocket money is a very fine line. Looks like <a href=’http://www.yourtechstuff.com/techwire/2006/06/bye_bye_screenc.html’>Adrian eventually cancelled. I’m not at that point yet, but it’s heading that way…

‘Bugzilla See Earlier Comments’ User Script

Published August 1, 2006

Here’s a new Greasemonkey user script which fixes a minor annoyance in the Bugzilla user interface. When viewing the ‘Create a New Attachment’ page, this will transclude the previous comments onto the bottom of that page, for reference while editing: bz_see_earlier_comments.user.js

Thanks to Jesse Ruderman for the nifty AJAXish iframe-transclusion trick it uses.

What Jeff Killed

Published July 31, 2006

What Jeff Killed is a blog from Shadow Hills, CA, documenting the murderous antics of Jeff, a large ginger tomcat:

we provide Jeff with food and water; however, this does little to lessen his killer instinct. To humans, Jeff is an exceptionally good-tempered and friendly cat; to rodents and other small animals, he is death itself. It could be that Jeff likes to bring us gifts to repay our hospitality. Perhaps he is simply a hardwired killing machine. All we know for certain is that he hunts down a wide variety of small animals and disembowels, decapitates, and dines on them. Often.

This was passed on by the lovely C, who noted ‘number of kills is about the same, cat for cat’ — indeed, Bubba, our cat, certainly had a similar career in Irvine, CA. However, I notice that as yet, there are no cases where Jeff has left the entrails and decapitated head of a rabbit lying up against the sandals of the neighbour’s 6 year old daughter… that was fun.

Kick.ie

Published July 26, 2006

I just noticed an interesting new site on the Irish web — <a href="http://kick.ie/”>kick.ie.

It’s closely based on the model of Digg, with a community of contributors who post new stories, comment, and "kick" stories they like so that those stories are given top billing. The interesting twist is that it’s not as general as Digg — instead of having a very broad "news" site, covering all bases, there are instead a smaller set of topic-focused "kick" sites. Using this model for the relatively-small Irish weblogging scene works pretty well, I think.

It’s nicely done — fast, clean, and featuring nifty features like RSS feeds throughout, and reader-contributed tagging. Nice work by <a href=’http://weblogs.asp.net/gavinjoyce/’>Gavin Joyce!

Well worth subscribing to.

(Also, it’s cool to see that one of my posts <a href=’http://taint.org/2006/07/12/130111a.html’>discussing Irish road deaths managed to mass <a href=’http://www.kick.ie/regional/Road_Deaths_in_Ireland’>7 ‘kicks’ a couple of weeks back ;)

Year 2038 Bug Strikes Early

Published July 20, 2006

Noted previously in the link-blog — here are more details on the first known instance of the Year 2038 UNIX epoch rollover bug, where AOLServer installs hung due to a 32-year timeout value hitting the end-of-epoch.

It appears that it was caused by an ‘official workaround’ for an Oracle driver bug, where an infinite timeout was desired. Instead of implementing true support for infinite timeouts, the developer just used a very large value — one BILLION seconds, Dr. Evil-style. Unfortunately, this led to the overflow issue.

Here’s some key snippets from the mailing list thread:

Bas Scheffers:

On 17 May 2006, at 21:34, Dossy Shiobara wrote:

Dave Siktberg seems to have narrowed it down to 2006-05-12 21:25.

In what timezone? It sound like that could equate to "Sat May 13 02:27:28 BST 2006", or 1147483648 seconds since epoch, which makes it exactly 1,000,000,000 seconds until expiry of 32 bit time. Coincidence? Seems too strange as to a computer that is not a nice round number.

‘Jesus’ Jeff Rogers:

I had problems starting at the exact same time but on Solaris, where they manifested as a EINVAL return from pthread_cond_tomedwait. After a day of tracing the problem with debug builds and working with my sysadmin to track what changed (of course, nothing had) I cam to the same 1 billion second issue.

Which coincidentally is the expiry time (MaxOpen and MaxIdle) set on my database connections. My system is ACS-derived, so I wouldn’t be surprised if these database settings are common in other ACS-derived systems.

The only bug is that Ns_CondTimedWait doesn’t do any wraparound on the time parameter. All the same, I’ve been enjoying telling people that I hit my first y2038 bug.

Andrew Piskorski:

For those interested in ancient trivia, I think it was TWO bugs, one in the Oracle driver and/or OCI libraries (most likely OCI), and one in AOLserver. I think the workaround dates from before I ever used AOLserver, but I have these old comments in my AOLserver config file:

MaxIdle and MaxOpen:

Settings these to 1000000000 is a historical bug workaround. Could now probably set this to some normal number, or set to 0 to disable entirely. E.g., in this thread Rob Mayoff says:

http://www.arsdigita.com/bboard/q-and-a-fetch-msg?msg%5fid=000Ibq

It is a bug workaround. Many Linux users (including me) saw that when AOLserver tried to close a database connection, it would hang in the Oracle driver. So people started setting and MaxIdle to a very large number to keep connections from closing. You can also set them to zero, but at the time the bug was discovered, AOLserver had a bug that prevented you from setting them to zero.

I believe the bug was also seen, very rarely, on Solaris.

Curtis Galloway managed to get Oracle to investigate. They suggested to workarounds: use IPC or TCP to connect (which is what I do on my system), or set bequeath_detach=yes in sqlnet.ora.

2002/01/10 14:22 EST

Uselessly, the arsdigita thread URL is now a victim of needless website reorganisation, and redirects to their front page. Still, I think that’s enough info.

This is certainly going to be one of the first widely-recorded Y2038 rollover bugs, I think…

A Little Downtime

Published July 19, 2006

Quick note: taint.org, and the other sites on the same host, will be down for somewhere between 30 minutes and an hour tomorrow, at 1000 UTC, as the host moves to a new datacenter (and a new IP address).

Handily, the host will also get a hefty RAM upgrade, which should improve matters the next time we get slashdotted ;)

(If you need to get in touch during the downtime, jmason at gmail dot com will be the best bet.)

Update: this is now complete.

‘Small Engine Repair’

Published July 19, 2006

Last Friday, I visited the <a href=’http://www.galwayfilmfleadh.com/home.htm’>Galway Film Fleadh to see the Irish premiere of a new feature-length movie called Small Engine Repair, which was directed by a mate of mine called Niall Heery.

I loved it — funny, extremely black comedy, reminded me a lot of The Deer Hunter in visual style, but unmistakably Irish at the same time. (Blog movie reviews seem to be out of favour right now, so I’ll leave it at that.)

Here’s hoping it picks up wider distribution very soon — it deserves to be big, I think. Nice one, Niall! Happily, the voters of the Fleadh agreed — it went on to win the Best First Feature award.

Actually, it’s been a good year for friends and family at the <a href=’http://www.galwayfilmfleadh.com/home.htm’>Fleadh — I note that my cousin, Eoin Ryan, picked up first prize for Best Irish Short Animation with his excellent short, Demon. cool!

Road Deaths in Ireland

Published July 12, 2006

Road deaths are a hot topic in Ireland. They’re actually lower, per capita, than rates in other countries, but are given plenty of column inches and headlines here, and have become a government priority as a result.

Here’s the latest headline:

[Gay Byrne, head of the Road Safety Authority] claimed young people were ignoring road safety campaigns and that all he could do was to warn people to reduce speed and not to drink and drive. "I don’t know what else we can do. We have done all the horror ads, but there are obviously a great number of people who don’t look at television, listen to radio, or read newspapers and don’t get the message," he said.

Ads. Great. Well, one thing that could be done is fixing the unsafe roads, and building decent ones; Irish country roads, while picturesque, are unable to deal with the levels of traffic they’re now facing. It’s time to apply modern safety standards, instead of considering a 2-lane boreen to be adequate.

There’s been a bit of improvement here; the roads from Dublin to Sligo, and from Dublin to Dundalk, for example, are both now fantastic, well-designed roads, and safe as a result. But try to get from Sligo to anywhere that isn’t Dublin, and you’re right back on those boreens again — with maniacs overtaking on blind corners into oncoming traffic and so on.

But here’s the real reason for the post. I have to reserve some special scorn for this idiot:

Hotelier Declan Corbett, who employed both siblings, yesterday called on Mr Byrne to resign following his comments.

"I am after coming down from the Frewen family house and if Gay Byrne or Michael McDowell were after witnessing what I saw he wouldn’t be coming out this morning with this ranting and blaming the young people of Ireland," he said. […]

"Gay Byrne was given this job and he shouldn’t have been given this job. It’s typical Dublin 4 job-for-the-boys. A job like this should be given to someone in rural Ireland – somebody like Sean Og O’hAilpin that young people look up to."

Sean Og O’hAilpin, eh? As Paul Moloney noted — that’d be the same Sean Og who ~~ended his Gaelic football career when he~~ overtook a car on a bend, at speed, crashing head-on into oncoming traffic? A great example, indeed.

I think that might be the problem.

A Released Perl With Trie-based Regexps!

Published July 7, 2006

Good news! <a href="http://search.cpan.org/~rgarcia/perl-5.9.2/pod/perl592delta.pod#Performance_Enhancements”>From the Perl 5.9.2 ‘perl592delta’ change log:

The regexp engine now implements the trie optimization : it’s able to factorize common prefixes and suffixes in regular expressions. A new special variable, ${^RE_TRIE_MAXBUF}, has been added to fine-tune this optimization.

in other words, the trie-optimization patch contributed by demerphq back in March 2005 is now in a released build of Perl. Yay!

Here’s a writeup of what it does:

A trie is a way of storing keys in a tree structure where the branching logic is determined by the value of the digits of the key. Ie: if we have "car", "cart", "carp", "call", "cull" and "cars" we can build a trie like this:
c + a + r + t
|   |   |
|   |   + p
|   |   |
|   |   + s
|   | 
|   + l - l
|   
+ u - l - l
What the patch does is make /a | list | of | words/ into a trie that matches those words. This means that we can efficiently tell if any of the words are at a given location in a strng by simply walking the string and trie at the same time. In many cases we can rule out the entire list by looking at only one character of the input. The current way perl handles this would require looking at N chars where N is the number of words involved. (BTW: Thats the beauty of a trie, its lookup time is independent of the number of words it stores but rather on the key length of the word being looked up. )

SpamAssassin is, of course, both (a) very regular-expression-intensive and (b) searches a single block of text for a large number of independent patterns in parallel. I’d love to see someone coming up with a patch to SpamAssassin that uses trie-compatible regexps when the perl version is >= 5.9.2, and gets increased performance that way. hint ;)

BTW, the Regexp::Trie module on CPAN is related — in that it, similar to Regexp::Optimizer, Regex::PreSuf, or Regexp::Assemble, will compile a list of words or regular expressions into a super-efficient trie-style regexp. However, without the trie patch to the regexp engine itself, this would be a minor efficiency tweak at best; although having said that, Regexp::Assemble’s POD notes:

You should realise that large numbers of alternations are processed in perl’s regular expression engine in O(n) time, not O(1). If you are still having performance problems, you should look at using a trie. Note that Perl’s own regular expression engine will implement trie optimisations in perl 5.10 (they are already available in perl 5.9.3 if you want to try them out). Regexp::Assemble will do the right thing when it knows it’s running on a a trie’d perl. (At least in some version after this one).

(PS: interestingly, demerphq mentioned back in March 2005 that he was working on Aho-Corasick matching next. A-C is a great parallel-matching algorithm, and I would imagine it would increase performance yet more. I wonder what happened to that…)

Linksys NSLU2 Contemplation

Published July 7, 2006

These days, I shouldn’t have time for after-hours hobby projects; I should be organising weddings and so on. But it’s a compulsion. ;)

As a result, here’s some notes I’ve been keeping on building a home NAS (network-attached storage) server, using the nifty little Linksys NSLU2: http://taint.org/wk/BuildingNasServer

Anyone done this? Care to leave a comment noting the results? I’m curious.

Smithfield’s Decay

Published July 5, 2006

I live in Dublin 7, on the north side of Dublin. Historically, the north side has been run-down and under-developed, always losing out to the more well-maintained, and well-funded, south side.

A few years ago, though, it looked like this was changing; the Spire in O’Connell St. was erected, new bars and shops opened, and the Luas line was installed. One site, Smithfield Square in Dublin 7, was radically overhauled; its derelict buildings were renovated or knocked down, new construction was going up, and fantastic architecture was being put in place. The future was looking bright.

That was back around 2000/2001; in fact, I remember walking past the avenue of braziers on Milennium night. Fast forward — I’ve been back in Dublin 6 months now, and as far as I can tell, all that has petered out, while I was away. This Frank McDonald article in the Irish Times sums it up perfectly:

The cafes, bars and restaurants that were meant to be part of [Smithfield] are nowhere to be seen. The promoters had promised residents "an entire lifestyle on your doorstep, extended by the possibilities of the city and beyond". There was to be an eclectic mix of restaurants and stylish bars – "a unique mix of offerings, ranging from food to culture to entertainment and leisure in a family-friendly development", according to Paddy Kelly.

In November 2003, his son Chris said: "We are hoping it will emulate the New York example where everything – from your launderette, hairdresser and your masseuse – is only a block away, and that people will live, work and socialise within the same area". On another occasion, London’s Covent Garden was cited as the urban model.

Incredibly, the lower end of Smithfield – through which Luas runs – remains unfinished six years after the rest of it was re-paved in an award-winning scheme by McGarry Ni Eanaigh Architects. It also has a redundant stone-clad structure, which served briefly as a plug-in point for open-air concerts.

The only real entertainment available in the area is the annual Christmas ice rink or the seriously indigenous and pre-existing horse fair, still being held on the first Sunday of every month.

Otherwise, the plaza attracts an assortment of winos, or juvenile offenders on their way to the Children’s Court, handcuffed to prison warders.

The little stage set up for open-air concerts is now covered in graffiti, and hosts a solid crew of junkies and winos; the braziers are no longer lit; the square boasts a permanent encrustation of construction fencing. The fruit and veg market that used to be held in one of the buildings has been bought out and moved on to somewhere on the outskirts of town, replaced by "Fresh", which — while it sells the odd bit of interesting food, like the nice Bretzel bakery bread — is really just an upscale Spar. Even the local Indian takeaway has dropped in quality, and is now shipping out generic dishes that aren’t even made with Indian spices.

To be quite honest, Smithfield — and, to be honest, much of the north side — gives the impression it’s been abandoned again, after only one or two years of short-term investment, and no long-term thinking.

What happened?

(PS: it’s not over for Dublin 7, though — about a half-mile from Smithfield, a flashy new restaurant is set to open this weekend. But who’s to say that Capel St. won’t find itself similarly forgotten in a year or two?)

Blogorrah

Published July 5, 2006

Blurred Keys: Blogorrah.com – the start of empire building with ‘very few overheads’. Blurred Keys, "an Irish media blog", brings the revelation that Blogorrah "copies" Gawker.com.

Honestly, though, this is blatantly obvious — and I’d consider it unfair to call this "copying". It’s simply taking a successful format and adapting it to the local market, and doing so very well indeed if you ask me.

Blogorrah is a hilarious read. If you’re Irish and you’re not subscribed, you’re really missing out… it’s the funniest thing on the Irish web these days.

Daily Links Posting Off Again

Published July 5, 2006

I’ve turned this off again; even though it provides a nice way for people to comment and discuss link posts (which del.icio.us doesn’t provide, unfortunately), it does tend to break up the flow of the "main" article part of the weblog, and isn’t entirely popular I think.

If you’re interested in the links, your best bet is to read either the main page itself in your browser, where the link-blog appears over there —> , or one of these RSS feeds:

links for 2006-07-04

Published July 4, 2006

Richi’Blog: Hotmail Has Many, Many Spamtraps

Use old user accounts; reject with “550 user unknown” for 6 months; recycle into a spamtrap. This is the technique myself and Matt Sergeant have used for several years; I don’t think I’ve ever noted it on a web-accessible URL though, so here it is

(tags: anti-spam spam hotmail spamtraps honeypots)
Janek Simon and his Carpet Invaders

‘Janek Simon unites the old geometric designs of Caucasian and Armenian carpets with the low-resolution abstractness of the Space Invaders’ (via deepdisco)

(tags: carpets space-invaders games art via:deepdisco janek-simon)

links for 2006-07-03

Published July 3, 2006

85363f-deathwind.gif (GIF Image, 250×100 pixels)

This GIF is both (a) an imitation-Apple ][-screenshot and (b) valid, compilable C code for Hunt The Wumpus. amazing! it reads: “COMPILE THIS FLAG: gcc -no-integrated-cpp -DGIF89a=”char *s=\”” -x c -W flag.gif”

(tags: fyad-flag gcc gif hacks somethingawful apple-ii hunt-the-wumpus)
InternetNews coverage of Google’s architecture, from Urs Hoelzle at EclipseCon 2005

covering MapReduce, GFS, and — a new one to me — Global Work Queue: ‘like old-time batch processing .. schedules queries into batch jobs and places them on pools of machines. The setup is optimized for running random computations over tons of data.’

(tags: queueing batch-jobs google distribution massively-parallel ipc-dirqueue global-work-queue)
A Search Engine That’s Becoming an Inventor – New York Times

more info on the Google backend systems

(tags: google backend distribution massively-parallel queueing)
Wooster Collective: Another Crate Piece from Melbourne

milk crates hold special status down under — this is excellent

(tags: crates melbourne australia street-art art tetris)

links for 2006-07-02

Published July 2, 2006

AdamMaguire.com: The Government prepares itself for the stem cell debate

hmm; either the Irish government is hedging its bets regarding stem cells — or the left hand doesn’t know what the right is doing. Better, but still unclear….

(tags: stem-cells ireland science research forfas)
TechWire: Dublin City Council “not interested” in city-wide mesh/wi-fi broadband

‘the Council (a) doesn’t get it (b) isn’t interested and (c) doesn’t think anyone else would care enough for it to be worth its while’. pathetic — that could do incredible things for Dublin

(tags: dublin wifi broadband ireland bureaucrats)

Ecch – that must have been poisonous! –more–

Published June 30, 2006

Since consuming a misjudged sossie at a BBQ last Saturday, I’ve been suffering from a stomach bug, causing nausea, sweating and the occasional vomit (never fun). On top of this, I spent Monday to Wednesday in Serbia on a work trip.

The result — I’ve managed to miss the entirety of ApacheCon EU 2006 in Dublin. I considered dropping down to catch the end of it this morning, but had to abort the attempt due to a bout of in-transit nausea.

All in all, a pretty miserable week. :(

Update: here’s something vaguely uplifting — a cover of Europe’s ‘Final Countdown’ in Khmer.

Update 2: wow, that little stomach bug has been wreaking havoc — over the weekend 3 more people laid low in our social group. sorry all…

links for 2006-06-29

Published June 29, 2006

The Daily WTF – One Version to Rule Them All

‘I’ve noticed that in several places (most prominently, Help-About), there is the product version, build number, etc. … We don’t want the customers knowing this information and need it removed.’ genius (via Donal)

(tags: daily-wtf via:donal funny versioning marketing cluetrain customers software)
BreakingNews.ie: Dermot Ahern makes stem cell research pledge to Pope

‘The Government will ban any EU funding for stem cell research in Ireland, the Irish Foreign Affairs Minister told the Pope today.’ Amazing. This is not the Ireland I was hoping to return to! Are we back in 1980 again? wtf…

(tags: dermot-ahern fundamentalism ireland pope research science progress eu)
A Shout Out to My Pepys – THE FUTURE LIES AHEAD

‘Ladies and gentlemen, I’m in a select club of the first victims of the Year 2038 Bug.’

(tags: year-2038 bugs ouch software aolserver unix epoch time)
Emergent Chaos: Email Thread Visualization

fancy infoviz of discussion threads; as I comment on the page, I think this is overkill, and GMail’s “conversation” view does just fine

(tags: threading email usenet infoviz discussion)
Google Account Authentication

“Google TypeKey” in other words

(tags: distributed-authentication authentication google web)

links for 2006-06-27

Published June 27, 2006

Light Blue Touchpaper: Ignoring the â€œGreat Firewall of Chinaâ€

just firewall out RSTs, and the Great Firewall’s keyword blocker is defeated

(tags: china censorship half-baked richard-clayton tcp-ip firewalls great-firewall-of-china)
Essentials, 2006 edition

Mark Pilgrim’s suggested apps for an Ubuntu desktop — some quite good suggestions here, with lots of KDE goodness. I just wish amaroK was as user-friendly and usable as the amazing (but not well-maintained) JuK, though

(tags: linux kde unix desktops software applications)

links for 2006-06-26

Published June 26, 2006

Emergent Chaos: I’m Joining Microsoft

wow, Adam Shostack joins MS!

(tags: adam-shostack microsoft jobs work security software)

links for 2006-06-24

Published June 24, 2006

Defense Tech: Damn It! ’24’ Stars Meet Homeland Security Bigs

The set designer for ’24’ helped design the operations center at the National Counterterrorism Center, apparently. I bet that really helps (via substitute)

(tags: via:substitute funny absurd dhs government tax-dollars 24 tv-vs-reality)

links for 2006-06-22

Published June 22, 2006

separated by a common language

‘Observations on British and American English by an American linguist in the UK’, via Ben. I fall between these two stools on a regular basis, or three if you count Hiberno-English as well

(tags: english language speech transatlantic)

links for 2006-06-21

Published June 21, 2006

Amazon.com: Lance James’ Blog

author of “Phishing Exposed”, general smart guy where phishing attacks are concerned. (also: Amazon does blogs now?)

(tags: phishing lance-james anti-spam weblogs amazon)
ESV Bible Blog: Mechanical Turk Recap

Bible annotation using Amazon’s “Mechanical Turk” HIT service; a success. However they did invite their blog readers to participate, which would have skewed results by providing willing participants

(tags: amazon mechanical-turk hit web bible esv)
Daring Fireball: Interoperability and DRM Are Mutually Exclusive

a great post from John Gruber, pointing out the key problem with DRM — it forces vendor lock-in, and precludes interoperability, as a core design goal

(tags: interoperability drm vendor-lock-in apple itunes aac mp3 music bpi)
Conference on Email and Anti-Spam

this year’s CEAS, July 27-28 2006. CEAS is reliably the best anti-spam conference; worth attending, although I won’t be this year

(tags: ceas anti-spam)
Mark Jason Dominus: Higher-Order Perl

‘[the book] is about functional programming techniques in Perl. It’s about how to write functions that can modify and manufacture other functions.’ wow, missed this — sounds AWESOME

(tags: functional-programming perl eval mjd books toread wishlist)
Softguide Dublin City Centre Maps

it’s pretty hard to find decent maps of Dublin online — these are very good, although not quite Google-maps-shiny, they surpass GMaps’ quality in terms of data (via Sander Temme)

(tags: maps dublin softguide)

links for 2006-06-20

Published June 20, 2006

The Rise and Fall of CORBA

Michi Henning (!) slates the history of CORBA extensively, blaming the OMG’s process and praising the open source community. wow (via slashdot)

(tags: via:slashdot corba michi-henning distobj rpc networking oo omg)
BLDGBLOG: Your Concrete Utopia

for sale: one partially plutoniumâ€“contaminated Pacific atoll, 718 miles from its nearest neighbour; unfortunately the golf course is closed. see also Ballard’s ‘Terminal Beach’

(tags: land via:bldgblog islands utopias pacific cold-war nuclear-tests johnston-atoll terminal-beach ballard)

Winding stair, Aiguafreda

Published June 20, 2006

Taken last week in Aigufreda on the Costa Brava, Catalunya, Spain.

links for 2006-06-19

Published June 19, 2006

Juggling oranges [dive into mark]

Mail.app proprietary crapness. ‘Iâ€™m forced to migrate all my mail yet again from yet another proprietary format, and the best documentation Iâ€™ve found so far is on LiveJournal. .. somebody deserves to be fired for that.’

(tags: mail mbox fidelity mail.app apple mark-pilgrim open-data openness data future-proofing proprietary)
Micronomicon Abroad: MONEY!

Maya meticulously recorded almost every penny/baht/kip/ringgit spent over the course of her 6-month travels through SE Asia. about right, going by my own experience; I wish I’d bought more souvenirs

(tags: travel backpacking asia holidays vacation)

Vodafone Ireland’s flat rate mobile data card

Published June 19, 2006

<a href="http://www.yourtechstuff.com/techwire/2006/06/vodafone_irelan.html”>Adrian Weckler posts details of Vodafone Ireland’s new flat price datacard; costing 50 Euros per month, including VAT; fully flat rate (hooray, something useful at last!); and they claim that they’ll be rolling out HSDPA, which offers 1.2Mbps to 11Mbps rates, ‘starting in Dublin in October’.

Those are great numbers, but further info seems thin on the ground; they haven’t bothered updating their own website yet, amazingly.

Anyone got further info? What rates does it offer right now? How would one order such a beast?

links for 2006-06-08

Published June 8, 2006

DoubleTwist Ventures

‘focuses on the development of interoperability solutions for digital media, and the reverse engineering of proprietary systems for which licensing options are non-existent or impractical’ — and have hired Jon Lech Johansen

(tags: reverse-engineering drm copyright jon-lech-johansen)

Holidaze

Published June 8, 2006

Quick note — I’m off on vacation next week — so I probably won’t read any email while I’m there ;) Talk to you after the 17th.

links for 2006-06-07

Published June 7, 2006

Haystack

C|Net’s distributed filesystem, a la GFS, Mogilefs (via acme)

(tags: via:acme distributed filesystems gfs mogilefs cnet haystack)
Understanding the Network-Level Behavior of Spammers [PDF, slides]

good data on large-scale spammer behaviour as of 2006, presented at NANOG37. Relay-IP-based techniques not so good any more, but we knew that. Unfortunately doesn’t analyze SURBL/URIBL content-oriented DNSBLs, which have picked up the slack nicely

(tags: dnsbls blocklists nanog anti-spam nick-feamster anirudh-ramachandran botnets bgp routes)

Running Dapper

Published June 7, 2006

I took the plunge over the weekend, and live-upgraded the new ‘Dapper Drake’ Ubuntu release — ouch. Here’s the two key lessons I learned:

Don’t run "grub-install" in a misremembered attempt to update the current GRUB boot menu ‘menu.lst’ file with the new kernel; sadly, this will quietly remove important details from your old menu.lst, such as "initrd" lines, rendering those kernels unbootable. Moral: ensure brain is in gear before meddling with MBRs!
If you’re a Kubuntu user, watch out. Ensure you run apt-get install ubuntu-base ubuntu-desktop — bringing the entirety of GNOME up to date — as well as apt-get install kubuntu-desktop after the upgrade; it appears that some part of a new hotplugging subsystem is not included as a dependency of kubuntu-desktop. Failure to do this results in an inability to use USB/hotpluggable devices, including internal devices like the Synaptics touchpad. No pointer devices (mice or touchpads) means no X server at boot, which is always a little annoying.

Some day I’ll just do things the right way, and do a fresh-from-CD install instead. Ah well. The good stuff: the new kernel, or possibly Xorg, is proving to be a lot speedier — window updates are noticeably smoother; and the new Ubuntu GNOME theme is similarly tasty.

SpamAssassin advisory CVE-2006-2447

Published June 7, 2006

CVE 2006-2447, in which Radoslaw Zielinski spotted a nasty in spamd’s ‘vpopmail’ support in pretty much all recent versions of Apache SpamAssassin.

If you use spamd with vpopmail, go read the advisory and determine if you need to take action. Not many people will need to, I think; it’s a very rare setup. Still, it’s important to get the warning out there anyway.

The irony is that the bug is triggered partly by the "–paranoid" switch. This was intended to increase security, by increasing paranoia when possibly-unsafe situations arose — hence providing a great demonstration of how the addition of optional code paths, even in the best intentions, can reduce security by allowing bugs to creep in unnoticed.

links for 2006-06-06

Published June 6, 2006

Gallows humor from inside Enron

fake 419s, ‘How to Explain Enron to Your Children’, and ‘we falsify commodity markets so that we can deliver physical commodities to our customers at a ridiculously unsustainable price’ — all scraped from the Enron mail corpus

(tags: enron funny mail corporate corruption gallows-humour)
Optimizing Javascript for Execution Speed

great notes on speeding up javascript; I have a Greasemonkey script this will be useful with, once I get some tuits (via yoz)

(tags: via:yoz javascript optimization speed coding toread greasemonkey userscripts)

links for 2006-06-05

Published June 5, 2006

ITworld.com – Even the Builders of Windows Find Tech Support a Challenge

Microsoft CEO Steve Ballmer attends wedding; a parent asks if he’d have a look at their PC; Ballmer spends _no less than two days_ attempting to rid it of encrusted malware infestations — before giving up and shipping it back to Redmond. hilarious

(tags: malware steve-ballmer ceos spyware viruses ms-windows microsoft funny)

Web x, where x != 2.0

Published June 2, 2006

Regarding the O’Reilly/CMP "Web 2.0 (SM)" trademark shitstorm, <a href="http://seanmcgrath.blogspot.com/archives/2006_05_28_seanmcgrath_archive.html#114923030267157006″>Sean McGrath humourously suggested a workaround — using a different revision number instead of "2.0", specifically e, 2.71….

However, it’s not quite that simple in many jurisdictions, apparently. It seems that trademark law — in the US, at least — allows trademarks which include a number to also cover uses within roughly plus or minus 10 of that number. In other words, CMP’s application will cover the range from Web -8.0 (SM) (assuming negative numbers are included?) to Web 12.0 (SM).

So much for "Web 3.0", "Web 2.1", "Web 2.71…", and so on. Back to the drawing board, Sean! ;)

(disclaimer: IANAL, of course. Credit to Craig for that tidbit.)

Update: doh, got the value of e wrong…

links for 2006-06-01

Published June 1, 2006

WP-Cache

I got slashdotted yesterday! Unfortunately, stock WordPress falls over pretty quickly. Once I managed to get this plugin installed, though, things were a lot better… thumbs up for WP-Cache

(tags: slashdot slashdotting load wordpress plugins caching weblogs)
how to reduce the size of an XP vmware image

I need to do this soon; damn copy-on-write disk images are chewing up my disk space

(tags: vmware disk-images emulation windows-xp xplite disks vmplayer)
Schneier on Security: Common Passwords

one large website’s password list analysed; 1.4% of passwords were “123456”, and 2.5% overall began with 1234

(tags: passwords security i-love-to-count)
Live-upgrading to Ubuntu 6.06 “Dapper Drake”

Dapper is now released — and is live-upgradable via apt-get. am I stupid enough to do this? quite possibly; I’ve done it for the past 5 upgrades

(tags: ubuntu dapper linux upgrades debian apt)
Pingerati

a message router for pings, for web pages containing microformat data. Interesting to see that Upcoming.org is currently the only ping producer — their pings are then consumed by evdb, the only third-party ping receiver listed

(tags: evdb upcoming open-apis apis pingerati technorati message-routers)
slashdotting.png (PNG Image, 1024×768 pixels)

graph of request frequency over the past few days at taint.org; that spike was pretty major

(tags: graphs weblog meta)

Justin's Linklog Posts