Justin's Linklog – Page 96 – (Things I found interesting recently.)

links for 2006-05-31

Published May 31, 2006

ongoing: On Grids

great article on current grid computing, featuring MPI, MapReduce, Hadoop, and promising a new UNIXy thing from tbray called Sigrid (ha!). Mind-boggling quote from Jim Gray: ‘Memory is the new disk. Disk is the new tape.’

(tags: grid-computing parallel tim-bray mapreduce hadoop mpi sigrid jim-gray server-farms)
“patent goo” — self-replicating Paxil

spontaneously converts the off-patent anhydrous form of the drug into the patented hemihydrate form, which then successively converts more and more of the anhydrous form, Ice-9-style. Never mind “viral” licenses, this takes the biscuit! (via substitute)

(tags: via:substitute viral-licenses gray-goo paxil drugs chemistry bizarre polymorph ice-9 patents)

Blog Spam, and a ‘nofollow’ Post-Mortem

Published May 31, 2006

An interesting article on blog-spam countermeasures — Google’s embarrassing mistake. Quote:

I think it’s time we all agreed that the ‘nofollow’ tag has been a complete failure.

For those of you new to the concept, nofollow is a tag that blogs can add to hyperlinks in blog comments. The tag tells Google not to use that link in calculating the PageRank for the linked site. […]

Since its enthusiastic adoption a year and a half ago, by Google, Six Apart, Wordpress, and of course the eminent Dave Winer, I think we can all agree that nofollow has done — nothing. Comment spam? Thicker than ever. It’s had absolutely no effect on the volume of spam. That’s probably because comment spammers don’t give a crap, because the marginal cost of spamming is so low. Also, nofollow-tagged links are still links, which means that humans can still click on them — and if humans can click, there’s a chance somebody might visit the linked sites after all.

I agree. At the time, I pointed at this comment from Mark Pilgrim:

Spammers have it in their heads now that weblog comments are a vector to exploit. They don’t look at individual results and tweak their software to stop bothering individuals. They write generic software that works with millions of sites and goes after them en masse. So you would end up with just as much spam, it would just be displayed with unlinked URLs.

Spammers don’t read blogs; they just write to them.

I still think he was spot on.

However, one part of the ‘Google’s embarrassing mistake’ article is a red herring — I think the chilling effect on "nonspam links" is not to be worried about; as Jeremy Zawodny said, life’s too short to worry about dropping links purely in the hopes of giving yourself Page Rank. I don’t know if I really want links that people are leaving purely for that reason. ;)

In fact, I wouldn’t be surprised to hear that Google’s crawler starts treating "nofollow" links as mildly non-spammy in a future revision, due to their wide use in wikis, blogs etc.

To be honest, though — I don’t see the problem of blog-spam much anymore. As I said here:

[Weblog] comment spam should be a lot easier to deal with than SMTP spam. … With weblog comments, you control the protocol entirely, whereas with SMTP you’re stuck with an existing protocol and very little "wiggle room".

On my WordPress weblog [ie. here] — which, admittedly, gets only about 1/4 of the traffic plasticbag.org does — I’ve instituted a very simple check stolen from Jeremy Zawodny. I simply include a form field which asks the comment poster for my first name, and if they fail to supply that, the comment is dropped. In addition, I’ve removed the form fields to post directly, requiring that all comments are previewed; this has the nice bonus of increasing comment quality, too.

Those are the only antispam measures I’m using there, and as a result of those two I get about 1 successful spam posted per week, which is a one-click moderation task in my email. That’s it.

The key is to not use the same measures as everyone else — if every weblog has a different set of protocols, with different form fields asking different simple questions, the only spammers that can beat that are the ones that write custom code for your site — or use human operators sitting down to an IE window.

Trackbacks, however — turn that off. The protocol was designed poorly, with insufficient thought given to its abuse potential; there’s no point keeping it around, now that it’s a spam vector.

Finally, a "perfect" solution to blog spam, while allowing comments, is unachievable. There will always be one guy who’s going to sit down at a real web browser to hand-type a comment extolling the virtues of some product or another. The goal is to get it to a level where you get one of those per week, and it’s a one-click operation to discard them.

(Update: This story got Slashdotted! The poor server’s been up and down repeatedly — looks like it needs an upgrade. In the meantime, WP-Cache has proven its weight in gold; recommended…)

links for 2006-05-30

Published May 30, 2006

One year without nicotine!

yay me!

(tags: non-smoking nicotine addiction cigarettes life progress)

Retroactive Tagging With TagThe.Net

Published May 30, 2006

Hacky hack hack.

Ever since I enabled tags on taint.org, I’ve been mildly annoyed by the fact that there were thousands of older entries deprived of their folksonomic chunky goodness. A way to ‘retroactively tag’ those entries somehow would be cool.

Last week, Leonard posted a link on his linkblog to TagThe.net, a web service which offers a nifty REST API; simply upload a chunk of text, and it’ll suggest a few tags for that text, like this:

echo 'Hi there, I am a tag-suggesting robot' | curl "http://tagthe.net/api/?text=`urlencode`"
<?xml version="1.0" encoding="UTF-8"?>
<memes>
  <meme source="urn:memanage:BAD542FA4948D12800AA92A7FAD420A1" updated="Tue May 30 20:20:39 CEST 2006">
    <dim type="topic">
      <item>robot</item>
    </dim>
    <dim type="language">
      <item>english</item>
    </dim>
  </meme>
</memes>

This looked promising.

Anyway, I’ve now implemented this — it worked great! If you’re curious, here’s details of how I did it. It’s a bit hacky, since I’m only going to be doing this once — and very UNIXy and perlish, because that’s how I do these things — but maybe somebody will find it useful.

How I Retroactively Tagged taint.org

This weblog runs WordPress — so all the entries are stored in a MySQL database. I took the MySQL dump of the tables, and a quick script figured out that out of somewhere over 1600-ish posts, there were 1352 that came from the pre-tag era, requiring tag inference. A mail to the TagThe.Net team established that they were happy with this level of usage.

I grepped the post IDs and text out of the SQL dump, threw those into a text file using the simple format ‘id=NNN text=SQLHTMLSTRING’ (where SQLHTMLSTRING was the nicely-escaped HTML text taken directly from the SQL dump), and ran them through this script.

That rendered the first 2k of each of those entries as a URL-encoded string, invoked the REST API with that, got the XML output, and extracted the tags into another UNIXy text-format output file. (It also added one tag for the ‘proto-tag’ system I used in the early days, where the first word of the entry was a single tag-style category name.)

Next, I ran this script, which in turn took that intermediate output and converted it to valid PHP code, like so:

cat suggestedtags | ./taglist-to-php.pl  > addtags.php
scp addtags.php my.server:taint.org/wp-admin/

The generated page ‘addtags.php’ looks like this:

<?php
  require_once('admin.php');
  global $utw;
  $utw->SaveTags(997, array("music","all","audio","drm-free",
      "faq","lunchbox","destination","download","premiere","quote"));
  [...]
  $utw->SaveTags(998, array("software","foo","swf","tin","vnc"));
  $utw->SaveTags(999, array("oses","eek","longhorn","ram",
    "winsupersite","windows","amount","base","dog","preview","system"));
?>

Once that page was in place, I just visited it in my (already logged in) web browser window, at http://taint.org/wp-admin/addtags.php, and watched as it gronked for a while. Eventually it stopped, and all those entries had been tagged. (If I wasn’t so hackish, I might have put in a little UI text here — but I didn’t.)

The results are very good, I think.

A success: http://taint.org/tag/research has picked up a lot of the interesting older entries where I discussed things like IBM’s Tieresias pattern-recognition algorithm. That’s spot on.

A minor downside: it’s not so good at nouns. This entry talks about Silicon Valley and geographical insularity, and mentions "Silicon Valley" prominently — one or both of those words would seem to be a good thing to tag with, but it missed them.

Still, that’s a minor issue — the tags it has suggested are generally very appropriate and useful.

Next, I need to find a way to auto-generate titles for the really old entries ;)

links for 2006-05-29

Published May 29, 2006

iTunes Music Store, MP3, and iTunes v6.x

argh. avoid iTunes 6 like the plague; Apple changed the DRM again, it’s as yet unbroken, and once you purchase a track, your account is “locked” to the new DRM. This page gives details of the (labourious) process required to escape this nasty trap

(tags: apple itunes itms dedrms hymn-project drm mp3 music sharpmusique i-paid-for-this)
polypaudio: FAQ

Polypaudio looks like Linux sound done right (at last). questions 21-24 of this FAQ list hint at awesome possibilities for LAN-networked speaker systems, even better than http://taint.org/wk/RemotePlaybackWithEsd .

(tags: polypaudio sound music linux software operating-systems gstreamer esd alsa)
IrishHistoricMaps.ie

the Ordnance Survey has set up an online shop to sell access to out-of-copyright, public domain maps of Ireland. thanks lads, but I think there’s a word for paying for something that one should be getting for free

(tags: ordnance-survey mapping open-data ireland rip-off)

Web 2.0 and Open Source

Published May 29, 2006

A commenter at this post on Colm MacCarthaigh’s weblog writes:

I guess I still don’t understand how Open Source makes sense for the developers, economically. I understand how it makes sense for adapters like me, who take an app like Xoops or Gecko and customize it gently for a contract. Saves me hundreds of hours of labour. The down side of this is that the whole software industry is seeing a good deal of undercutting aimed at sales to small and medium sized commercial institutions.

Similarly, in the follow-up to the O’Reilly "web 2.0" trademark shitstorm, there’s been quite a few comments along the lines of "it’s all hype anyway".

I disagree with that assertion — and Joe Drumgoole has posted a great list of key Web 2.0 vs Web 1.0 differentiators, which nails down some key ideas about the new concepts, in a clear set of one-liners.

Both open source software companies, and "web 2.0" companies, are based on new economic ideas about software and the internet. There’s still quite a lot of confusion, fear and doubt about both, I think.

Open Source

As I said in my comment at Colm’s weblog — open source is a network effect. If you think of the software market as a single buyer and seller, with the seller producing software and selling to the buyer, it doesn’t make sense.

But that’s not the real picture of a software market. If you expand the picture beyond that, to a more realistic picture of a larger community of all sorts of people at all levels, with various levels interacting in a more complex maze of conversation and transactions, open source creates new opportunities.

Here’s one example, speaking from experience. As the developer of SpamAssassin, open source made sense for me because I could never compete with the big companies any other way.

If I had been considering it in terms of me (the seller) and a single customer (the buyer), economically I could make a case of ‘proprietary SpamAssassin’ being a viable situation — but that’s not the real situation; in reality there was me, the buyer, a few 800lb gorillas who could stomp all over any puny little underfunded Irish company I could put together, and quite a few other very smart people, who I could never afford to employ, who were happy to help out on ‘open-source SpamAssassin’ for free.

Given this picture, I’m quite sure that I made the right choice by open sourcing my code. Since then, I’ve basically had a career in SpamAssassin. In other words my open source product allowed me to make income that I wouldn’t have had, any other way.

It’s certainly not simple economics, is a risk, and is complicated, and many people don’t believe it works — but it’s viable as an economic strategy for developers, in my experience. (I’m not sure how to make it work for an entire company, mind you, but for single developers it’s entirely viable.)

Web 2.0

Similarly — I feel some of the companies that have been tagged as "web 2.0" are using the core ideas of open source code, and applying them in other ways.

Consider Threadless, which encourages designers to make their designs available, essentially for free — the designer doesn’t get paid when their tee shirt is printed; they get entered into a contest to win prizes.

Or Upcoming.org, where event tracking is entirely user-contributed; there’s no professional content writers scribbling reviews and leader text, just random people doing the same. For fun, wtf!

Or Flickr, where users upload their photos for free to create the social experience that is the site’s unique selling point.

In other words — these companies rely heavily on communities (or more correctly certain actors within the community) to produce part of the system — exactly as open source development relies on bottom-up community contribution to help out a little in places.

The alternative is the traditional, "web 1.0" style; it’s where you’re Bill Gates in the late 90’s, running a commercial software company from the top down.

You have the "crown jewels" — your source code — and the "users" don’t get to see it; they just "use".
Then they get to pay for upgrades to the next version.
If you deal with users, it’s via your sales "channels" and your tech support call centre.
User forums are certainly not to be encouraged, since it could be a PR nightmare if your users start getting together and talking about how buggy your products are.
Developers (er, I mean "engineers") similarly can’t go talking to customers on those forums, since they’ll get distracted and give away competitive advantage by accidentally leaking secrets.
Anyway, the best PR is the stuff that your PR staff put out — if customers talk to engineers they’ll just get confused by the over-technical messages!

Yeah, so, good luck with that. I remember doing all that back in the ’90’s and it really wasn’t much fun being so bloody paranoid all the time ;)

URLs:

(PS: The web2.0 companies aren’t using all of the concepts of open-source, of course — not all those web apps have their source code available for public reimplementation and cloning. I wish they were, but as I said, I can’t see how that’s entirely viable for every company. Not that it seems to stop the cloners, anyway. ;)

links for 2006-05-26

Published May 26, 2006

the rise of “Nevaeh” as a first name for girls since 2000

‘The surge of Nevaeh can be traced to a single event: the appearance of a Christian rock star, Sonny Sandoval of P.O.D., on MTV in 2000 with his baby daughter, Nevaeh. “Heaven spelled backwards,” he said.’ you stupid, stupid people

(tags: stupidity nevaeh names lemmings fashion-victims christians sonny-sandoval funny mtv)
O’Reilly Radar: Controversy about our “Web 2.0” service mark

oh dear. tip: allowing your “VP of Corporate Communications” to respond is not the way to do it cluetrain-style

(tags: oh-dear cluetrain oreilly it-at-cork web-2.0 trademarks lawyers ip)
EAST VILLAGE RADIO New York City: The GBH Radio Show

‘Tom from GBH and guests, playing Robot-Rock, Distortion-Disko, Electronic, Rock, New Wave Hip-hop, house, punk, electro, downbeat and classics.’ lots of good mashups and remixes, one 2-hour 128kbps MP3 every week

(tags: mashups music mp3 podcasts radio gbh new-york remixes)

links for 2006-05-25

Published May 25, 2006

Tuangou: flashmobbing for group discounts

That evening, Ms. Li and her brother joined 15 strangers at the store to demand a group discount on a new television, refrigerator, and washing machine.’ wow (via EirePreneur)

(tags: via:eirepreneur tuangou flashmobs team-buying community haggling bargains shopping)
Llamasoft games in the public domain

old Llamasoft game images may be distributed and used free of charge to and by anyone. awesome!

(tags: jeff-minter llamasoft gaming games commodore-64 vic-20 retrogaming emulation)
Oâ€™Reilly trademarks â€œWeb 2.0â€³ and sets lawyers on IT@Cork

any mention of “web 2.0” in a conference, and O’Reilly are firing legal letters — even for events outside the US

(tags: oreilly web-2.0 it-at-cork conferences tom-raftery lawyers ip trademarks)

links for 2006-05-24

Published May 24, 2006

Emergent Chaos: Counting In Background Checks

Criminal Records Bureau’s “erring on the side of caution” has resulted in around a 9.7% false positive rate, with 2,700 UK job-seekers falsely listed as being convicted criminals

(tags: criminal-records-bureau statistics false-positives uk risks bureaucracy)

Pam on the AIDS/LifeCycle

Published May 24, 2006

My mate Pam is cycling in this year’s <a href="http://www.aidslifecycle.org/6081″>AIDS/LifeCycle — for a week from June 4 to 10, she’ll be cycling from San Francisco to LA, for charity. That’s 585 miles. Since she bought her bike to do this ride, she’s clocked up a terrifying 2040 miles. Blimey.

It’s for a good cause — go on, <a href=’https://www.aidslifecycle.org/donate/6081′>make a donation!

links for 2006-05-23

Published May 23, 2006

Tax preparation companies blocking simpler tax returns in the US

I was wondering why this was such a shambles; now it makes sense. ‘Inefficiency has become a virtue in government’ (via waxy)

(tags: via:waxy taxes hr-block california us-politics government tax)
photos of first working OLPC laptop prototype

Actual running hardware! Looks a lot more realistic than the last mock-ups. I’m more positive now that I hear they have Chris Blizzard and Jim Gettys involved, too

(tags: olpc laptop one-laptop-per-child hardware mit)

Poll: keep ‘Fixing Email Weblog’ in Planet Antispam?

Published May 23, 2006

I added the Fixing Email weblog to Planet Antispam a while back — however, I’m not entirely sure at this stage that its content (which is seems to be primarily news syndication) fits with the "planet" concept (which is primarily intended for first-person posts).

So — quick poll. Let me know what you think, pro or con, Planet readers: should I remove the Fixing Email feed from that site?

Update: that was a pretty resounding ‘yes’. Done!

links for 2006-05-22

Published May 22, 2006

This Year’s Leeroy Jenkins

the guy behind the “more DoTs more DoTs more DoTs! 50 DKP MINUS!!” WoW voice-chat recording. I don’t play WoW, but this control freak’s incoherent freakout is hilarious even without knowing all the details

(tags: control-freak world-of-warcraft gaming funny geek rage freakout)
Continuations for Web Applications – a bad idea

I’ve come around to this conclusion too — attempting to use continuations to implement a web app ‘requires you to write your code in such a way that it can tolerate sudden halts, thread switches, rewinding, and forking of execution’ (via Miguel de Icaza)

(tags: continuations web-apps web via:miguel ian-griffiths coding software languages)
New York Magazine: David Edelstein’s Plagiarism Stunt

‘The response to my essay on plagiarism last week (â€œWhere Have I Read That Before?â€) was swift, so here goes: Yes, it is plagiarized. 99% of it. The only original lines, in fact, are the first and the last two’

(tags: plagiarism prank stunts writing media journalism)
The 25 Best Music Websites, according to Entertainment Weekly

actually quite accurate! Deserved props for eMusic, Stereogum, Fluxblog, KCRW, Lemon-Red, ILM, and Music For Robots; missed the Hype Machine, though. mind you, that may be just as well

(tags: music mp3s entertainment-weekly stereogum mp3blogs)
The ‘Secure Your Computer’ campaign

a new website-ribbon campaign from ISIPP, aimed at educating less-techie users on virus/malware avoidance; if you run a consumer-facing website, it’d be fantastic to get this up there

(tags: security secure-your-computer campaigns isipp education)

links for 2006-05-19

Published May 19, 2006

Beta user gives Sun Niagara the nod over Itanium

Trial a Niagara, get a free trip to SF! nice one Colm ;)

(tags: colm-maccarthaigh sun niagara t2000 hardware javaone san-francisco)

Dear Recruiters

Published May 19, 2006

Dear Recruiters,

If you’re going to (a) scrape my CV page from my website, then (b) spam me, unsolicited, offering to represent me for jobs I don’t want in places I don’t live, in explicit contravention of the terms of use [*] of that document — here’s a tip.

Don’t compound the problem by asking me to resend the document in bloody Microsoft Word format. FFS.

([*]: Those terms were, of course, added in an attempt to stem the tide of recruiter spam. Thanks to Colm MacCarthaigh for the idea…)

links for 2006-05-18

Published May 18, 2006

Good demo of how much info can be gleaned from an Apache access_log

cat-and-mouse fun with the Bank of England; interesting to hear that Google’s cache is still trackable via CSS references

(tags: css html tracking access_log referers privacy google-cache)
Kamaelia

A python framework based on one-way pipes and generators, from the BBC, used to build their “Macro” super-PVR. May be some ideas for IPC::DirQueue here

(tags: kamaelia python macro bbc pipes generators coroutines yield programming coding frameworks)

Bebo’s “Irish Invasion”

Published May 18, 2006

Reading <a href="http://www.pkellypr.com/blog/2006/0516/the-twelve-days-of-a-changing-irish-society/”>this post at Piaras Kelly’s blog, I was struck by something — I never realised quite how bizarre the situation with Bebo is. ~~If you check out the Google Trends ‘country’ tab, Ireland is the only country listed — meaning that search volume for "bebo" is infinitesimal, by comparison, elsewhere!~~ (Update: Ireland was the only country listed, because the URL used limited it to Ireland only. However, the point is still valid when other countries are included, too ;)

It is also destroying Myspace as a search term on the Irish internet. (Update: also fixed)

As a US-based company, they must be mystified by all this attention — the Brazilian invasion of Orkut has nothing on this ;)

I’ll recycle a comment I made on Joe Drumgoole’s weblog as to why this happened:

My theory is that social networking systems, like Bebo, Myspace, linkedin, Friendster, Tribe.net, Orkut, Facebook etc. have all developed their own emergent specialisations. These are entirely driven by their users — although the sites can attempt to push or pull in certain directions (such as Friendster banning ‘non-person’ accounts), fundamentally the users will drive it. All of those sites have massively different user populations; Tribe has the Burning Man crowd, Friendster the daters, Orkut the brazilians etc.

Next, I think kids of school age form a set of small set of cliques. They don’t want to appear cool to friends thousands of miles away, on the internet; they want to appear cool to their peer group in their local school. So all it takes is a group of influential ‘tastemakers’ — the alpha males and females in a year — to go onto Bebo, and it becomes the site for a certain school; and given enough of that, it’ll spread to other schools, and soon Bebo becomes the SNS for the irish school system. In other words, Irish kids couldn’t really care less what US kids think of them; they want to be cool locally.

Also I think MySpace has a similar problem to Orkut — it’s already ‘owned’ by a population somewhere else, who are talking about stuff that makes little sense to Irish teenagers. As a result, it’s not being used as a social system here in Ireland; instead, it’s just used by musicians who want a cheap place to host a few tracks without having to set up their own website.

(Aside: part of the latter is driven by clueless local press coverage of the Arctic Monkeys — they have latched onto their success, put the cart before the horse, and decided that they were somehow ‘made’ by hosting music on MySpace, rather than by the attention of their fans. duh!)

links for 2006-05-17

Published May 17, 2006

The N.S.A.’s Math Problem – New York Times

according to social-network graph analysis of the Enron mail corpus, “one of the ‘central’ players was Ken Lay’s secretary”. ha! (via robotwisdom)

(tags: via:robotwisdom graphs social-networks networks terrorism nytimes nsa mass-surveillance)
Critical Security Issues with Diebold TSx [PDF]

Harri Hursti’s report for BlackBoxVoting.org; it appears the boot loader will automatically reflash itself, if presented with a suitably-named file on PCMCIA media, and access to the PCMCIA slot is protected only by a few standard Philips-head screws. wow

(tags: diebold security voting elections computer-security e-voting blackboxvoting.org harri-hursti)
How American are Startups? (plasticbag.org)

great thread of comments sparked off by Paul Graham’s rather ill-informed presentation at XTech2006. Cory’s comment is spot-on, on both sides

(tags: startups ireland usa high-tech silicon-valley business work eu uk)

links for 2006-05-16

Published May 16, 2006

Google Notebook: jm

Google’s scrapbook-clone service. first impressions: Firefox extension = good, lots of Flash, URL’s hardly catchy, no sign of RSS feeds

(tags: google scrapbook web firefox bookmarks google-notebook)

links for 2006-05-15

Published May 15, 2006

SpamOrHam.org progress

spam filters beating humans at performing spam classification quite a lot, it turns out. Everyone should give SpamOrHam.org a go!

(tags: anti-spam spamorham jgc classification machine-learning)
Long Tail evidence from Safari and Google Book Search

good data; there does seem to be an appreciable effect

(tags: google-books books oreilly safari long-tail hard-numbers)

5 Years of taint.org

Published May 15, 2006

Five years ago, on 15 May 2001, I started writing this weblog.

Subject matter started with a forward of something odd from the Forteana list — ‘Why Finns are sick of illnesses named after them’. In terms of subject matter, I started the weblog to reduce the amount of forwards I was passing on by email to other groups — hence the preponderance of forteana posts early on.

Nowadays, by contrast, I try to write original ramblings^Wresearch for the main part of the site, and the occasional "fresh bits" I unearth elsewhere are kept separate, posted to the link-blog at del.icio.us/jm.

However, the real reason I started the thing was to act as an experiment in using WebMake as a blog platform — at least, that was the excuse. It worked quite successfully, for what it’s worth — but in mid-August 2005, I eventually accepted that there weren’t enough hours in the day to maintain a weblogging CMS, and its templates, as well as everything else, and that I didn’t really need to test WebMake’s abilities any more, and switched to WordPress. I’m glad I did; WP is a great piece of software.

So what’s been the biggest hit on taint.org, by far? Here it is: http://taint.org/xfer/2004/kittens.jpg . Lots and lots of Google Image referrers, MySpace hotlinkers, etc. etc. ;) It’s a top hit for a GIS search for [kittens], I think.

Random stats, based on April’s logs:

About 81247 hits were received during April to the RSS 2.0 feed (the default), 9921 to the Atom feed, and 7795 for the RSS 1.0 rendering. That indicates that format-wars-wise, people just use the default. ;)
Assuming the RSS reader apps average out to 1 HTTP GET every 30 mins (as Bloglines and Apple’s reader do), that means there are somewhere around (98963 / (30 24 2)) = 68 subscribers.
In terms of the old style browser-using readership — there were 44926 hits on the front page using web browsers.
AWStats claims 2700 visits per day, from around 33000 visitors per month. I find the latter figure hard to believe.

After the front page and the feeds, the scraped RSS feeds at http://taint.org/scraped/ come second, Threadless beating out Perry Bible Fellowship by a little bit.

links for 2006-05-14

Published May 14, 2006

the wires! the wires!

photo of Coldcut’s live setup — structured cabling system required

(tags: wires coldcut music djs photo flickr ninjatune spaghetti)

links for 2006-05-12

Published May 12, 2006

Jailtime.org

Downloadable filesystem images for Xen; all Linux so far, modified to run as Xen guests out of the box

(tags: xen linux filesystems virtual-machines domu fedora debian centos gentoo slackware)
Singleton Considered Stupid

‘what if your Singleton has a handle to some limited resource, like a database or file handle? I guess you get to keep that sucker open until your program ends’. YES (via mjd)

(tags: via:mjd programming software design-patterns singleton funny rants coding)
The Five Essential Phone Screen Questions

excellent software-development interview advice

(tags: coding software hiring jobs work steve-yegge programming)

Link-blog Networking

Published May 12, 2006

Cool — del.icio.us just added a feature whereby you can now see who has you in their network, and, of course, you can further view their networks and see who’s in them.

This’d be great to produce social-network graphs, although I daresay Joshua mightn’t be so keen on the spidering load. ;) I’ve optimistically requested some form of dump, anyway.

The social networking aspect of link collection and link-blogging via del.icio.us is emerging nicely; I’m keen to see what’s next in the pipeline.

A few interesting things:

Almost everyone who’s using del.icio.us seriously for link collection — ie. applying some quality control thresholds, and bothering to write one-line descriptions, at least — has filled out their ‘network’ by now.
It’d be useful to have "groups", so that we can now assert things like "jm, boogah, n0wak, negatendo, tweebiscuit, leonardr, muckster and torrez form a group". I’m sure that’d provide useful info, although could probably be inferred anyway. (People are attempting to hack it by using a shared tag on all their postings, like the "irishblogs" tag, but that’s an awful misuse of tagging in my opinion ;)
Also, it’ll be interesting to see what’ll happen once Google Co-op figures out a way to incorporate the del.icio.us network data. To be honest, I’m very surprised it wasn’t already in there — it seems like a no-brainer… maybe some Y!/G corporate rivalry is getting in the way.

Anyway, in the meantime it’s producing lots of good fodder for my SpicyLinks feed.

SpicyLinks is an implementation of something that I mentioned in a comment on this weblog entry, regarding future methods of reading weblogs; in essence, it’s an automated blog aggregation summariser. It reads other people’s link-blogs, so I don’t have to, and reports the stuff that proves popular in my personal collection of sources.
(Credit where due: HotLinks provided much of the inspiration, but doesn’t support personalisation, hence the reimplementation.)

SpicyLinks is similar to Populicious, but that app really misses the point, in my opinion. I don’t particularly want to know what everyone is pointing at; I want to know what a selected set of trusted sources (with good taste!) are pointing at.

This aggregation is pretty similar to the del.icio.us ‘network’ feed, but with much lower volume, and a higher signal/noise ratio, attained by dropping the ‘one-off’ items that only one person is pointing at. Initially, that may seem like a major failure, since you miss the ‘fresh bits’ — but as long as you’ve got the right people in your source network, it actually works very well.

It’d be great if this was one of the features implemented in the del.icio.us ‘network’ system…

links for 2006-05-11

Published May 11, 2006

FM3: The Buddha Machine

iPod-sized ambient hardware loop player from China; the tee-shirts are fantastic

(tags: WANT buddha-machine fm3 china buddha tee-shirts ambient)
Donnie Darko director investigated for terrorist links

DHS ineptitude strikes again

(tags: donnie-darko richard-kelly funny inept bureaucracy homeland-security dhs cannes movies film)
Defense Tech: NSA Sweep “Waste of Time,” Analyst Says

some really authoritative thumbs-down comments from Valdis Krebs and John Robb

(tags: nsa mass-surveillance privacy defensetech social-networks valdis-krebs john-robb)

links for 2006-05-10

Published May 10, 2006

Cambridge Security Research lab researchers on the Shell Chip-and-PIN break

interesting further notes; apparently the Trintech Smart 5000 PINPad terminals run Linux, and can be managed remotely

(tags: security trintech shell chip-and-pin compromises hacking linux)
Jeffrey Kalmikoff from skinnyCorp on community

‘Communities are human business debuggers. Why not know the problems, address them and prove that theyâ€™re fixed all in public?’ excellent article, with the solid testimonial of Threadless backing it up

(tags: community cluetrain threadless skinnycorp jeffrey-kalmikoff)
The New Yorker on 419 scams

‘former Iowa congressman Edward Mezvinsky was caught up in a 419 scam, and stole from his law clients, friends, and even his mother-in-law .. He is serving more than six years in prison after pleading guilty to thirty-one counts of fraud.’ bloody hell

(tags: 419 nigerian-scam gottschalk edward-mezvinsky scams spam)

links for 2006-05-09

Published May 9, 2006

Where’s the we in WeMedia?

Suw Charman and a load of others (see comments) lay into the BBC’s “citizen journalism” conference: “a complete waste of time”. ouch

(tags: bbc new-media london suw-charman blogging wemedia media-types)
BlueSecurity fans’ astroturf campaign

‘Please visit and take a minute to post positive comments about BlueSecurity. BlueSecurity is encouraging us to do such things so let’s help them spread the good word.’ explains a lot; several other astroturf coordination forums at castlecops.com, too

(tags: bluesecurity blue-frog astroturf sock-puppets pr cluetrain)

Script: new-referrer-rss

Published May 9, 2006

new-referrer-rss.pl – generate RSS feed of new referrer URLs from access_log

SYNOPSIS

new-referrers-rss nameofsite [source ...] > new-referrers.xml

DESCRIPTION

Given the name of a web site, and a selection of Apache combined log format ‘access_log’ files containing referrer URL data, this will generate an RSS feed containing the latest referrers.

The script should be run periodically with ‘fresh’ access_log data, from cron.

Todd Underwood on BlueSecurity DDoS

Published May 9, 2006

Renesys Blog: The Bluesecurity Fiasco — in which Todd Underwood, CSO for Renesys Corporation, applies some real-world knowledge of how the internet works to the "timeline of events" press release, issued by BlueSecurity as part of their ongoing PR about the DDoS.

Judging by the comments at Slashdot, this really needs to be more widely read.

Here’s some highlights:

The timeline from BlueSecurity […] is frustratingly vague. It uses phrases like ‘tampering with the Internet backbone using a technique called "Blackhole Filtering".’ As Thomas Pogge, a philosophy professor of mine, used to say: that’s not even wrong yet. There is no "Internet backbone", there is no technique known as "Blackhole Filtering", and blackhole routing is not normally described as tampering. So the whole explanation is nonsense. […] Let’s clear one thing up for the press and everyone else: this event just wasn’t that interesting. The attack against bluesecurity was a run-of-the-mill denial of service attack.

His conclusion:

I believe that the PR engine from BS is in overdrive spinning this event as fast as they can. But the concrete facts being put out by them simply to not add up. In the process they seem to be doing two things: 1) trying to imply or state that someone at UUnet was bribed by a spammer. This is simply ridiculous. I know many of the people who work for UUnet and they are honest, hardworking and extraordinarily clever people. They would not be crooked, or stupid, enough to do such a thing and if they were, they would have been trivially caught by change-management procedures. Moreover, such a change at UUnet (or BTN) wouldn’t have caused the event BS claims to have witnessed anyway. Additionally, 2) BS is trying to deflect attention from the damage that they caused at Six Apart. It would be much better if they could just claim ignorance of the DOS, apologize and move on. I recognize that that isn’t going to happen, but it sure would make this whole thing easier to handle.

Well said.

Of course, this is pretty much immaterial — the people who are using Blue Frog, and vocally supporting Blue Security, don’t really care what happened. All they care about is that someone is taking some kind of direct action against spammers, in some way or another, and if there’s a little "friendly fire" and some bending of the truth, why, this is a war! What, do you support the spammers?

It’s disappointing — the amount of disinformation being successfully pumped out (and accepted!) on this story is massive.

links for 2006-05-08

Published May 8, 2006

Amazon UK Shift the Goalposts

they’re no longer shipping games, electronics, or home/garden items to Ireland. what with this and the crappy shipping, looks like they’ve written off the Irish market for some reason

(tags: ireland amazon amazon.co.uk shopping consumer-electronics)

Outside My Window Right Now

Published May 8, 2006

Bubba, now safely back in Dublin after his 8000-mile flight from LAX, is getting back into exploring his old manor.

Here he is, ignoring a very brave magpie. Judging by the way the magpie was brazenly hopping around him, cawing, and the way that Bubba was ignoring him, I suspect there may be a nest nearby….

links for 2006-05-07

Published May 7, 2006

Enterprise Ireland to invest in Irish VC companies

good points from Joe Drumgoole; what works for Irish VCs isn’t necessarily aligned with what’s good for Ireland’s high-tech industry

(tags: ireland venture-capital startups funding government enterprise-ireland)

links for 2006-05-05

Published May 5, 2006

UTF-8 history

First doodled on a placemat by Ken Thompson and Rob Pike for Plan 9 in 1992 (via era)

(tags: via:era utf-8 encoding text plan-9 rob-pike ken-thompson unix bell-labs ibm x-open)

links for 2006-05-04

Published May 4, 2006

QDN: The dishonor of Blue Security

Blue Security accidentally took down large chunks of the blogosphere in an attempt to evade the DDoS targeting them; impressively inept. also, they really need to tone down their sock-puppet commenter squad (via torrez)

(tags: via:torrez funny sad blue-security blue-frog ddos six-apart livejournal typepad anti-spam)
OpenStreetMap To Free The Isle of Wight

open geodata creation from OSM in a 3-day mapping-fest this weekend. great explanation of why open geodata is important in the UK and Ireland, too

(tags: geodata geowanking openstreetmap isle-of-wight mapping ordnance-survey)
Drunk Men Work Here – On Bots

behavioural analysis on web-search engine bots, with some pretty pics (via waxy)

(tags: via:waxy search google yahoo msnbot googlebot scraping web web-search)
Wandering Spoon – Vietnamese Coffee – Ca Phe

YUM. wonder if I can find condensed milk around here

(tags: yum ca-phe vietnam coffee food)

London’s Oyster RFID card to become a full cashless payment system

Published May 4, 2006

Apparently, Transport For London <a href="http://software.silicon.com/applications/0,39024653,39150647,00.htm”>are planning ‘e-money’ trials based on their remotely-readable <a href=’http://www.rfidbuzz.com/wiki/Standards/MIFARE’>Oyster RFID cards.

Combine that with Kevin Mahaffey of Flexilis’ talk at Black Hat last year, where he demonstrated apparatus to extend RFID read range from 4-6 inches to approximately 50 feet, and things could get messy. ;)

The slides for that talk are available here (PDF); slide 20 specifically mentions the Hong Kong "Octopus" cashless-payment card.

links for 2006-05-03

Published May 3, 2006

Blue Frog leak as video game

Some users of the Blue Frog software are considering this leak to be some kind of Churchillian challenge to their resolve, instead of a failure on Blue Frog’s part! amazing

(tags: funny blue-frog blue-security ddos anti-spam do-not-email-list john-levine)
Jason Scott: The Great Failure of Wikipedia

‘What Wikipedia has taught us .. is that in a vacuum of politics, politics will be created. There is no vacuum of politics.’ interesting article

(tags: wikipedia jason-scott talks transcripts wikis politics community)

links for 2006-05-01

Published May 1, 2006

BlueSecurity blog entry about the alleged list leak

‘This spammer is using mailing lists he already owns and is now sending millions of such messages’ — hasn’t hit any of our thousands of spamtraps, which is quite impressive in that case

(tags: anti-spam blue-frog blue-security do-not-email-list)

Blue Frog List Leaked?

Published May 1, 2006

Blue Frog is a company who operates a "Do Not Email" list, on the (optimistic) basis that spammers will vet their lists against it.

Reportedly, it’s been compromised. If this is true, I’m not surprised — as Dr. Aviel Rubin‘s report to the FTC of May 2004 regarding a Do-Not-Email list notes:

The scrubbing approach [to running a D-N-E list] requires that a list of live email addresses exist. While the party owning that list may be well intentioned, it is unlikely that such a valuable list would not leak out. History is replete with insider attacks, as well as external break-ins to highly sensitive sites, such as the Pentagon computers. The Do Not Email Registry represents the kind of prize that attracts hackers. In this case, the prize has monetary value as well. Once the list is exposed, there is no way to undo it.

Also, it’s almost inevitable:

If this service were running for some time, it is more likely than not that the plaintext addresses would leak at some point, given the history of computer security incidents.

Update: it appears, according to this white paper, that the Blue Frog "Do Not Intrude" list is hashed, rather than plain-text. Rubin’s advice still applies:

Without hashing, a compromise of the registry database results in exposure of all of the registered email addresses. This is a total disaster. However, even exposure of a hashed list is a catastrophe. A spammer with a copy of a hashed list of email addresses is able to find out, for any email address, if the address is in the registry. The attacker simply hashes a candidate email address and sees if the hashed value is in the list. This is very powerful. [….]

Hashing provides absolutely no security against a marketer who obtains a scrubbed list and uses that to sell the addresses that were scrubbed by the registry. Whether or not the list is hashed has no impact on a malicious marketer in the scrubbing approach.

SpamAssassin in the Google Summer of Code 2006

Published April 30, 2006

Are you a student, and interested in earning $4,500 for contributing to open source, and fighting spam, over the course of the summer?

If so, get thee hence to the Google Summer of Code 2006 site, and propose a project!

Last year, we in SpamAssassin didn’t get it together to mentor SoC projects. This year, however, we have a few prospective mentors (including myself), and a few sample project ideas lined up; we’re all ready to go! Here’s the Student FAQ. Be quick; applications end in a week and a bit.

Here’s hoping we get some interesting submissions ;)

links for 2006-04-29

Published April 29, 2006

Your Tube, Whose Dime? – Forbes.com

YouTube’s bandwidth bill ‘may be approaching $1 million a month’. holy crap (via waxy)

(tags: via:waxy video internet youtube business bandwidth cdns)
ZoomIn

a really nice Flickr-like take on mapping; every street has user-contributed location geodata included; open REST API; social aspects; Google-friendly. Best mapping site I’ve seen

(tags: geowanking geodata mapping new-zealand web)
‘Everybody loves Eric Raymond’ source available

‘Everything needed to make this episode is available in the eler-source directory in a bzipped tarball. … Creative Commons Attribution Share-Alike license’

(tags: creative-commons eler jokes funny open-source comics)

Justin's Linklog Posts