Skip to content

Justin's Linklog Posts

Links for 2008-07-31

Del.icio.us 2.0 goes live yay! I’ve been waiting for this for yonks

10 years of Boards.ie massive ~50GB RDF/XML dump, for open crunching, to generate interesting “SIOC Semantic Web” apps

Postmaster.comcast.net how to get mail delivered successfully to Comcast, the usual stuff

Why we’ll never replace SMTP ‘The reason that e-mail is uniquely useful is that you can exchange mail with people you don’t already know. The reason that spam exists is that you can exchange mail with people you don’t already know.’ +1

“Bikes-for-Billboards” scheme exposes major planning flaws ‘what was initially hailed as “free bikes” has become one of the biggest planning controversies to hit Dublin in years.’ No shit. 70% of sites are on the Northside, rather than the richer Southside; and each bike will cost over EUR300k in ad revenue!

Rob Enderle’s page on Wikipedia detailing this analyst’s hilariously wrong pro-SCO, anti-Apple/Linux predictions over the years. John Gruber: ‘the only way it would be worthwhile for reporters to [quote him] would be if they were willing to describe him as “almost always utterly wrong”‘

Links for 2008-07-30

soc.culture.irish on “Cuil” meaning knowledge ‘eagerness, fearsomeness, a gnat, a horsefly, a beetle, a bluebottle, and (with the addition of a fada) a rear end, a reserve or backup, a corner, and an arse. The one thing it isn’t, according to the four dictionaries I just checked, is knowledge.’

Neocon search terms

I’m back from a week in Cornwall. I’d like to say I was rested, but chasing after an 11-month-old baby in a caravan isn’t all that restful. Still, it was sunny, and good for a change of pace ;)

Via b1ff.org, here’s the Nexis search that US Department of Justice White House liaisons ran on job candidates to determine their political leanings:

[first name of a candidate] and pre/2 [last name of a candidate] w/7 bush or gore or republican! or democrat! or charg! or accus! or criticiz! or blam! or defend! or iran contra or clinton or spotted owl or florida recount or sex! or controvers! or racis! or fraud! or investigat! or bankrupt! or layoff! or downsiz! or PNTR or NAFTA or outsourc! or indict! or enron or kerry or iraq or wmd! or arrest! or intox! or fired or sex! or racis! or intox! or slur! or arrest! or fired or controvers! or abortion! or gay! or homosexual! or gun! or firearm!

This Nexis reference says the “w/n” keyword searches for ‘words .. within 5 or 10 words of each other, Ex: “Enron w/5 investigation”‘.

This is just a smidgen away from the concept of a SpamAssassin-style scoring filter. Crazy stuff.

Best of all, it’s buggy and over-sensitive, according to one librarian: ‘If that is really their search string, they were going through 99% unrelated citations. There need to be a very nested set of parentheses to make the terms work, starting with one after the w/7. Fired and sex are OR’ed twice and need to be nested, at least in the case of Fired and the OR’d terms immeadiately following.’

Update: good Slashdot comment thread here. This comment indicates that the above librarian might be off-base regarding the w/7 parentheses, since the OR operator has higher priority. Here is an even better walkthrough of the query statement logic. Finally, here’s an explanation of the “spotted owl” curiosity…

Links for 2008-07-22

ZSFA — I Want The Mutt Of Feed Readers Zed recommends Newsbeuter. must take a look

We Want A Dead Simple Web Tablet For $200. Help Us Build It. having worked on a project to do just this, believe me, this is doomed. DOOMED

Science Clouds ‘compute cycles in the cloud for scientific communities .. allows you to provision customized compute nodes .. that you have full control over using a leasing model based on the Amazon’s EC2 service.’ Wonder if they’d like to give SA some time ;)

Links for 2008-07-21

O2 Leaking Customer Photos (updated) the JBoss/Tomcat install leaks the “secret” URLs through it’s default status page. this is the 3rd helping of FAIL for O2’s web team; 2 previous occasions in the last year exposed customer data through “secret” URL manipulation

Avant Window Navigator “a ‘dock-like’ (cough) navigator bar for the Linux desktop” (via Danny, again!)

trickle ‘user-space bandwidth shaper’, ie. like nice(1) for network bandwidth (via Danny)

RFC 5218 – What Makes For a Successful Protocol? ‘Based on case studies, this document identifies some of the factors influencing success and failure of protocol designs.’ (via spicylinks)

“Roommate” 419 Scam

Here’s an interesting form of advance fee fraud I hadn’t heard of before; it’s a good example of 419 scammers ruining yet another casual online marketplace.

Let’s say you have a room you want to rent. You put up a “housemate wanted” ad on Craigslist or wherever. Here’s the the reply you’ll get:

Hi There,

How re you doing? I hope all is well. I’m martha Robot , am 26 yrs old and Am originally from chester united Kingdom . Graduate of I have a master degree in fashion design and I work as a professional fashion designer. I’m am not in the united kingdom right now, i am presently in West africa . I am currently working on contract for a company call (African Family Home Fashions) here in West Africa which the contract will be ending soon. I will be returning to your place soon. I enjoy traveling, It is very interesting to get more knowledge about the new countries, new people and traditions. It’s great to have such a possibility. As i was searching through the web i saw the advert of your place . I would like to know maybe it’s still available becasue i’m extremely interested in it. Here are the questions i would like to know about the room before planing to move in to the following questions below:

A}I will like to know the major intersection nearest your neighbourhood.like shopping mall,Churches,bus line e.t.c

B}I will like to know the total cost for the my initial move as in first month rent and if you accept deposit.

C}I will like to know if there is any garage or parking space cos I will have my own car come over.

D}I will like to have the rent fee per month plus the utilities.

E}I will like to have the description of the place, size, and the equipments in there.

F}I will also like to know Your payment mode.

G}I will like to know if I can make an advance payment ahead my arrival that will be stand as a kind of commitment that I am truely coming over and for you to hold the place down for me.

I will be very glad to have all this questions answered with out leaving a stone unturned…You can Call my Landlord for more references in UK ..+447024046815.

Email me back:

Thanks. Martha.

Needless to say, this is a scam. Here’s how it works (courtesy of this post): The interested “applicant” will send a cashier’s check or money order for the deposit, the value of which greatly exceeds the actual amount requested. They will then claim the overpayment to be an honest error based on their confusion about how these things work, and ask the victim to send back a money order refunding that amount, or to send it on to a “travel agent” who is supposedly booking the scammer’s flight. The payment will be made via a non-refundable mechanism like the 419er’s favourite, Western Union. It will be a matter of great urgency, as they will claim to need the funds to make the trip over. Her money order will clear, their’s will not — and there’s no way to refund the payment, so it’s gone. This is a classic advance-fee fraud trick, it seems.

Got to love that nom de plume, though — “Martha Robot”. GREE-TINGS MAR-THA RO-BOT!

Googling for ‘major intersection nearest your neighbourhood’ churches bus finds plenty more:

Finally, a Washington-based realtor has written up a good walkthrough of the scam. He notes:

I recently ran an ad on craigslist.com to see if they were still working it. Craigslist has posted many warnings against responding to such solicitations and I was curious if the scammers had moved on to more fertile ground. They have not; I received 16 such inquiries in one day to a simple ad offering a room for rent in Bellevue. I used a fictitious identity and a newly created email address. I’ll use the emails from just one of them as an example. This particular scammer managed to have a check on my doorstep by the next day!

(thanks to nimbus9 for the headsup)

links for 2008-07-09

links for 2008-07-08

links for 2008-07-07

links for 2008-07-04

links for 2008-07-03

Amazon EC2’s spam and malware problems

Over the past few weeks, I’ve increasingly heard of spam and abuse problems originating in Amazon EC2.

This has culminated in a blog post yesterday by Brian Krebs at the Washington Post:

It took me by surprise this weekend to discover that that mounds of porn spam and junk e-mail laced with computer viruses are actively being blasted from digital real estate leased to [Amazon].

He goes on to discuss how EC2 space is now actively blocked by Outblaze, and has been listed by Spamhaus in their PBL list. A spokesperson for Amazon said:

“We have a clear acceptable use policy and whenever we have received a complaint of spam or malware coming through Amazon EC2, we have moved swiftly to strictly enforce the use policy by network isolating (or even terminating) any offending instances,” Kinton said. She added that Amazon has since taken action against the EC2 systems hosting the [malware].

However as Seth Breidbart noted in the comments, ‘note that Amazon will terminate the instance. That means that the spammer just creates another instance, which gets a new IP address, and continues spamming.’ True enough — as described, instance termination simply isn’t good enough.

My recommendations:

  • as John Levine noted, it’s likely that Amazon need to treat EC2-originated traffic similarly to how an ISP treats their DSL pools — filtering outbound traffic for nastiness, in particular rate-limiting port 25/tcp connections on a per-customer basis, so that an instance run by (or infiltrated by) a spammer cannot produce massive quantities of spam before it is detected and cut off.

    However, I’m not talking about blocking port 25/tcp outbound entirely. That’s not appropriate — an EC2 instance is analogous to a leased colo box in a server farm, and not being able to send mail from our instances would really suck for EC2 users (like myself and my employers).

  • It would help if there were a way to look up customer IDs from the IP address of the EC2 nodes they’re using — either via WHOIS or through rDNS. Even an opaque customer ID string would allow anti-abuse teams to correlate a single customer’s activity as they cycle through EC2 instances. This would allow those teams to deal with the reputation of Amazon’s customers, instead of Amazon’s own rep, analogous to how “traditional” hosters use SWIP to publicize their reassignments of IPs between their customers.

There’s some more discussion buried in a load of knee-jerking on the NANOG thread. Here’s a few good snippets:

Jon Lewis: ‘I got the impression the only thing Amazon considers abuse is use of their servers and not paying the bill. If you’re a paying customer, you can do whatever you like.’ (ouch.)

Ken Simpson: ‘IMHO, Amazon will eventually be forced to bifurcate their EC2 IP space into a section that is for “newbies” and a section for established customers. The newbie space will be widely black-listed, but will also have a lower rate of abuse complaint enforcement. The only scalable way to deal with a system like EC2 is to provide clear demarcations of where the crap is likely to originate from.’

Bill Herrin: ‘From an address-reputation perspective EC2 is no different than, say, China. Connections from China start life much closer to my filtering threshold that connections from Europe because a far lower percentage of the connections from China are legitimate. EC2 will get the same treatment.’

There’s also an earlier thread here.

Anyway, this issue is on fire — Amazon need to get the finger out and deal with it quickly and effectively, before EC2 does start to run into widespread blocks. I’m already planning migration of our mail-sending components off of EC2; we’re already seeing blocks of mail sent from it, and it’s looking likely that these will increase. :(

(It’s worth noting that a block of EC2’s netblocks today will produce a load of false positives, mainly on transactional mail, if you’re contemplating it. So I wouldn’t recommend it. But a lot of sites are willing to accept a few FPs, it seems.)

Hack: twitter_no_popups.user.js

Twitter has this nasty habit — if you come across a tweet in your feed reader containing a URL, and you want to follow that link, you can’t, because Twitter doesn’t auto-link URLs in its RSS feeds. Instead, you have to click on the feed item, itself, wait for that to open in the browser, then click on the link in the new browser tab. That link will, in turn, open in another new tab.

Here’s a quick-hack Greasemonkey user script to inhibit this second new-tab:

twitter_no_popups.user.js

links for 2008-07-01

How To Eat a Mangosteen

‘You’ll know what my riddle means
When you’ve eaten mangosteens.’
The Crab That Played with the Sea, by Rudyard Kipling

When I travelled through Thailand, I got rightly hooked on the delicious mangosteen, traditionally dubbed the “Queen of Fruit” by the Thais. I’ve been keeping an eye out ever since, through our travels to the US and back, without any luck. (In particular, they’ve been blocked by US customs for a long time, although reportedly this is changing nowadays.)

Finally, last year, they appeared in our local Tesco supermarket here in Ireland — or at least, an empty box appeared, sans fruit! That was it, though, until a couple of weeks ago, when my friend Bob was lucky enough to come across a few, and grabbed 4 for me. (Thanks Bob!)

It appears they’re in season around the start of June, which is when they make it to Tesco’s. Naturally, they’re much more expensive here — Tesco were selling them for about EUR 1.20 each, whereas a bag of 30 were about 50 cents when we used to buy them at the street-side in Ko Chang. But that’s to be expected, really.

Since they’re tricky enough to get hold of, I thought I should document exactly what to do with them once you get ’em ;)

They start off looking like this, roughly tomato-sized fruit with a thick, papery rind:

img

Get your thumbnail into the rind, not too deep though!, and tear it off like so:

img

Look at the rind’s great colour! Watch out for it, though, as it stains clothing easily. Discard the rind, and pluck out the fleshy, juicy white segments:

img

(Pay no attention to their resemblance to testicles. ;)

Finally you’ll wind up with 6 or so seedless segments, and 1 or 2 seed-bearing segments, larger than the others, containing a large inedible seed along with a fair bit of flesh:

img

Eat ’em and enjoy the flavour — it’s a bit like a tart, vanilla-y peach, but juicier, creamier and much smoother in texture. Mmmm, truly delicious. I’m looking forward to picking up some more soon!

I considered planting the seeds, but unfortunately, you can forget about growing a tree in your back yard; the mangosteen tree requires a tropical climate:

‘The mangosteen is ultra-tropical. It cannot tolerate temperatures below 40º F (4.44º C), nor above 100º F (37.78º C). Nursery seedlings are killed at 45º F (7.22º C).’

Ah well. Seems I’ll be at Tesco’s mercy for more.

links for 2008-06-30

links for 2008-06-27

links for 2008-06-26

links for 2008-06-25

links for 2008-06-23

VCS and the 1993 internet

Joey Hess suggests that current discussions about the superfluity of DVCS systems have a parallel in how the internet protocol world, circa 1993, played out:

I’m reminded of 1993. Using the internet at that time involved using a mishmash of stuff — Telnet, FTP, Gopher, strange things called Archie and Veronica. Or maybe this CERN “web” thing that Tim Berners-Lee had just invented a few years before, but that mostly was useful to particle physicists.

Then in 1994 a few more people put up web sites, then more and more, and suddenly there was an inflection point. Suddenly we were all browsing the web and all that other stuff seemed much more specialised and marginalised.

I would disagree, a little. Back in the early ’90’s, I was a sysadmin playing around with internet- and intranet-facing TCP/IP services (although in those days, the term “intranet” hadn’t been coined yet), so I gained a fair bit of experience at the coal-face in this regard. The mish-mash of protocols — telnet, gopher, Archie, WAIS, FTP, NNTP, and so on — all had their own worlds and their own views of the ‘net. What changed this in 1993 was not so much the arrival of HTTP, but TimBL’s other creation: the URL.

The URL allowed all those balkanized protocols to be supported by one WWW client, and allowed a HTML document to “link” to any other protocol —

The WWW browsers can access many existing data systems via existing protocols (FTP, NNTP) or via HTTP and a gateway. In this way, the critical mass of data is quickly exceeded, and the increasing use of the system by readers and information suppliers encourage each other.

This was a great “embrace and extend” manoeuvre by TimBL, in my opinion — by embracing the existing base of TCP/IP protocols, the WWW client became the ideal user interface to all of them. Once NCSA Mosaic came along, there really was no alternative to rival the Web’s ease of use. This was the case even if you didn’t have a HTTP server of your own; you could still access HTML documents and remote URLs.

In essence, HTML and the URL were the trojan horse, paving the way for HTTP (as HTML’s native distribution protocol) to succeed. It wasn’t the web sites that helped the WWW “win”, but embrace-and-extend via the URL.

For what it’s worth, I think there is an interesting parallel in today’s DCVS world: git-svn.

links for 2008-06-18

Firefox Download Evening

Download Day

Happy Firefox Download Day — or rather, Firefox Download Evening!

It turns out that the “day” in question has been defined as a 24-hour period starting at 10am Pacific Time; rather than compensating for the effects of timezones around the world, they’ve just picked an arbitrary 24-hour period.

That’s 6pm in Irish time, for example. At least I’m not one of the 57,000 Japanese pledgers, who’d be waiting up until 2am to kick off their download. It seems a little bizarre that there’s little leeway provided for non-US downloaders, who are right now twiddling their thumbs, waiting, while their “day” passes.

Annoyingly, the main world record page simply says ‘the official date for the launch of Firefox 3 is June 17, 2008’ — no mention of a starting time or official timezone at all!

This is the top thread on their forum right now — in addition to the omission of an entire continent ;)

links for 2008-06-16

adding to the “Going Dark” and DVCS debate

On programmers “going dark” — Aristotle Pagaltzis writes:

Jeff Atwood argues that open source projects are in real danger of programmers “going dark,” which means they lock themselves away silently for a long time, then surface with a huge patch that implements a complex feature.

It seems to me that this is as much a technological problem as a social issue… and that we have the technological solution figured out: it’s called distributed version control. It means that that lone developer who locked himself in a room need not resurface with a single huge patch – instead, he can come back with a branch implementing the feature in individually comprehensible steps. At the same time, it allows the lone programmer to experiment in private and throw away the most embarrassing mistakes, addressing part of the social problem.

However, I don’t think he realised that the Jeff Atwood story he responded to was in fact an echo of Ben Collins-Sussman’s original article, where he specifically picked out DVCS as a source of this danger:

A friend of mine works on several projects that use git or mercurial. He gave me this story recently. Basically, he was working with two groups on a project. One group published changes frequently…

“…and as a result, I was able to review consistently throughout the semester, offering design tweaks and code reviews regularly. And as a result of that, [their work] is now in the mainline, and mostly functional. The other group […] I haven’t heard a peep out of for 5 months. Despite many emails and IRC conversations inviting them to discuss their design and publish changes regularly, there is not a single line of code anywhere that I can see it. […] Last weekend, one of them walked up to me with a bug […] and I finally got to see the code to help them debug. I failed, because there are about 5000 lines of crappy code, and just reading through a single file I pointed out two or three major design flaws and a dozen wonky implementation issues. I had admonished them many times during these 5 months to publish their changes, so that we (the others) could take a look and offer feedback… but each time met with stony silence. I don’t know if they were afraid to publish it, or just don’t care. But either way, given the code I’ve seen, the net result is 5 wasted months.”

Before you scream; yes yes, I know that the potential for cave-hiding and writing code bombs is also possible with a centralized version control system like Subversion. But my friend has an interesting point:

“I think this failure is at least partially due to the fact that [DVCS] makes it so damn easy to wall yourself into a cave. Had we been using svn, I think the barrier to caving would have been too high, and I’d have seen the code.”

In other words, yes, this was fundamentally a social problem. A team was embarrassed to share code. But because they were using distributed version control, it gave them a sense of false security. “See, we’re committing changes to our repository every day… making progress!” If they had been using Subversion, it’s much less likely they would have sat on a 5000 line patch in their working copy for 5 months; they would have had to share the work much earlier.

To be honest, I’d tend to agree with Aristotle; just because centralized VC makes it harder to maintain a “private branch” with this “high barrier to caving”, and this therefore imposes a technical pressure to fix a social problem, doesn’t mean that is a good thing. I’d prefer to fix the DVCS to apply social pressure, and have both working tools and a working social organisation.

Another commenter on Ben’s original post put it well:

I [..] disagree, strongly, that DVCS makes code hiding any more difficult than single-branch VCS. When using a single branch, it’s usually a very small group of people who are allowed to commit. Any patches from non-core contributors get lost in a tangle of IRC pastebins, mailing lists, bug trackers, and blog posts. Furthermore, even if these patches are eventually committed, they have lost all their associated version information — the destructive rebase you complain about. DVCS allows anybody to branch from trunk, record their changes, and publish their branch in a service like Launchpad or github. For an example of this, look at the mass of user-created branches for popular projects like GNOME Do or AWN.

It’s very interesting to see those Launchpad sites, in my opinion.

I’ve spent many years shepherding contributions to SpamAssassin through our Bugzilla. We’ve often lost rule contributors, who are particularly hard to attract for some reason, due to delays and human overhead involved in this method. :( So an improved interface for this would be very useful…

links for 2008-06-12

links for 2008-06-11