Category: Uncategorized

Links for 2013-10-11

Published October 11, 2013

New faculty positions versus new PhDs

The ever-plummeting chances of a PhD finding a faculty job:
Since 1982, almost 800,000 PhDs were awarded in science and engineering fields, whereas only about 100,000 academic faculty positions were created in those fields within the same time frame. The number of S&E PhDs awarded annually has also increased over this time frame, from ~19,000 in 1982 to ~36,000 in 2011. The number of faculty positions created each year, however, has not changed, with roughly 3,000 new positions created annually.
(via Javier Omar Garcia)

(tags: via:javier career academia phd science work study research)
The Ethics of Autonomous Cars

Sometimes good judgment can compel us to act illegally. Should a self-driving vehicle get to make that same decision?

(tags: ethics stories via:chris-horn the-atlantic driving cars law robots self-driving-vehicles)
Timecop

'A Ruby gem providing "time travel" and "time freezing" capabilities, making it dead simple to test time-dependent code. It provides a unified method to mock Time.now, Date.today, and DateTime.now in a single call.' This is about the nicest mock-time library I've found so far. (via Ben)

(tags: time ruby testing coding unit-tests mocking timecop via:ben)
The 29 Stages Of A Twitterstorm

this is brilliant

(tags: uk twitter media funny pricehound racism outrage pitchforks rage social-media)

Links for 2013-10-10

Published October 10, 2013

'Experience of software engineers using TLA+, PlusCal and TLC' [slides] [pdf]

by Chris Newcombe, an AWS principal engineer. Several Amazonians sharing their results in simulating tricky distributed-systems problems using formal methods

(tags: tla+ pluscal tlc formal-methods simulation proving aws amazon architecture design)
LinkBench: A database benchmark for the social graph

However, the gold standard for database benchmarking is to test the performance of a system on the real production workload, since synthetic benchmarks often don't exercise systems in the same way. When making decisions about a significant component of Facebook's infrastructure, we need to understand how a database system will really perform in Facebook's production workload. [....] LinkBench addresses these needs by replicating the data model, graph structure, and request mix of our MySQL social graph workload.
Mentioned in a presentation from Peter Bailis, http://www.hpts.ws/papers/2013/bailis-hpts-2013.pdf

(tags: graph databases mysql facebook performance testing benchmarks workloads)

Links for 2013-10-09

Published October 9, 2013

pt-summary

from the Percona toolkit. 'Conveniently summarizes the status and configuration of a server. It is not a tuning tool or diagnosis tool. It produces a report that is easy to diff and can be pasted into emails without losing the formatting. This tool works well on many types of Unix systems.' --- summarises OOM history, top, netstat connection table, interface stats, network config, RAID, LVM, disks, inodes, disk scheduling, mounts, memory, processors, and CPU.

(tags: percona tools cli unix ops linux diagnosis raid netstat oom)
How much can an extra hour's sleep change you?

What they discovered is that when the volunteers cut back from seven-and-a-half to six-and-a-half hours' sleep a night, genes that are associated with processes like inflammation, immune response and response to stress became more active. The team also saw increases in the activity of genes associated with diabetes and risk of cancer. The reverse happened when the volunteers added an hour of sleep.

(tags: sleep health rest cancer bbc science)
Kovet

some great phone cases from an Irish company, with nifty art by Irish illustrators and artists including Fatti Burke and Chris Judge

(tags: chris-judge fatti-burke illustrators art ireland iphone cases)

Links for 2013-10-08

Published October 8, 2013

What drives JVM full GC duration

Interesting empirical results using JDK 7u21:
Full GC duration depends on the number of objects allocated and the locality of their references. It does not depend that much on actual heap size.
Reference locality has a surprisingly high effect.

(tags: java jvm data gc tuning performance cms g1)
Rhizome | Occupy.here: A tiny, self-contained darknet

Occupy.here began two years ago as an experiment for the encampment at Zuccotti Park. It was a wifi router hacked to run OpenWrt Linux (an operating system mostly used for computer networking) and a small "captive portal" website. When users joined the wifi network and attempted to load any URL, they were redirected to http://occupy.here. The web software offered up a simple BBS-style message board providing its users with a space to share messages and files.
Nifty project from Dan Phiffer.

(tags: occupy.here openwrt hacking wifi network community)
Whatever Happened to "Due Process" ?

Mark Jeftovic is on fire after receiving yet another "take down this domain or else" mail from the City of London police:
We have an obligation to our customers and we are bound by our Registrar Accreditation Agreements not to make arbitrary changes to our customers settings without a valid FOA (Form of Authorization). To supersede that we need a legal basis. To get a legal basis something has to happen in court. [...] What gets me about all of this is that the largest, most egregious perpetrators of online criminal activity right now are our own governments, spying on their own citizens, illegally wiretapping our own private communications and nobody cares, nobody will answer for it, it's just an out-of-scope conversation that is expected to blend into the overall background malaise of our ever increasing serfdom. If I can't make various governments and law enforcement agencies get warrants or court orders before they crack my private communications then I can at least require a court order before I takedown my own customer.

(tags: city-of-london police takedowns politics mark-jeftovic easydns registrars dns via:tjmcintyre)
Intellectual Ventures' Evil Knows No Bounds: Buys Patent AmEx Donated For Public Good... And Starts Suing

The problem with software patents, part XVII.
So you have a situation where even when the original patent holder donated the patent for "the public good," sooner or later, an obnoxious patent troll like IV comes along and turns it into a weapon. Again: AmEx patented those little numbers on your credit card, and then for the good of the industry and consumer protection donated the patent to a non-profit, who promised not to enforce the patent against banks... and then proceeded to sell the patent to Intellectual Ventures who is now suing banks over it.

(tags: intellectual-ventures scams patents swpats shakedown banking cvv american-express banks amex cmaf)

Links for 2013-10-07

Published October 7, 2013

SPSC revisited part III - FastFlow + Sparse Data

holy moly. This is some heavily-optimized mechanical-sympathy Java code. By using a sparse data structure, cache-aligned fields, and wait-free low-level CAS concurrency primitives via sun.misc.Unsafe, a single-producer/single-consumer queue implementation goes pretty damn fast compared to the current state of the art

(tags: nitsanw optimization concurrency java jvm cas spsc queues data-structures algorithms)
Non-blocking transactional atomicity

interesting new distributed atomic transaction algorithm from Peter Bailis

(tags: algorithms database distributed scalability storage peter-bailis distcomp)

Links for 2013-10-06

Published October 6, 2013

ZeroMQ: Helping us Block Malicious Domains in Real Time - Umbrella Security Labs

nice writeup of a ZeroMQ/Hadoop event processing pipeline architecture

(tags: zeromq hadoop event-processing architecture dns backend reputation)

the coming world of automated mass anti-terror false positives

Published October 6, 2013

Man sues RMV after driver's license mistakenly revoked by automated anti-terror false positive:

John H. Gass hadn’t had a traffic ticket in years, so the Natick resident was surprised this spring when he received a letter from the Massachusetts Registry of Motor Vehicles informing him to cease driving because his license had been revoked. [...] After frantic calls and a hearing with Registry officials, Gass learned the problem: An antiterrorism computerized facial recognition system that scans a database of millions of state driver’s license images had picked his as a possible fraud. “We send out 1,500 suspension letters every day," said Registrar Rachel Kaprielian. [...] “There are mistakes that can be made."

See also this New Scientist story. This story notes that the system's pretty widespread:

Massachusetts bought the system with a $1.5 million grant from the Department of Homeland Security. At least 34 states use such systems, which law enforcement officials say help prevent identity theft and ID fraud.

In my opinion, this kind of thing -- trial by inaccurate, false-positive-prone algorithm, is one of the most worrying things about the post-PRISM world.

When we created SpamAssassin, we were well aware of the risk of automated misclassification. Any machine-learning classifier will always make mistakes. The key is to carefully calibrate the expected false-positive/false-negative ratio so that the negative side-effects of a misclassification corresponds to the expected rate.

These anti-terrorism machine learning systems are calibrated to catch as many potential cases as possible, but by aiming to reduce false negatives to this degree, they become wildly prone to false positives. And when they're applied as a dragnet across all citizens' interactions with the state -- or even in the case of PRISM, all citizens' interactions that can be surveilled en masse -- it's going to create buckets of bureaucratic false-positive horror stories, as random innocent citizens are incorrectly tagged as criminals due to software bugs and poor calibration.

Links for 2013-10-05

Published October 5, 2013

Rapid read protection in Cassandra 2.0.2

Nifty new feature -- if a request takes over the 99th percentile for requests to that server, it'll be repeated against another replica. Unnecessary for Voldemort, of course, which queries all replicas anyway!

(tags: cassandra nosql replication distcomp latency storage)

Links for 2013-10-04

Published October 4, 2013

Attacking Tor: how the NSA targets users' online anonymity

As part of the Turmoil system, the NSA places secret servers, codenamed Quantum, at key places on the internet backbone. This placement ensures that they can react faster than other websites can. By exploiting that speed difference, these servers can impersonate a visited website to the target before the legitimate website can respond, thereby tricking the target's browser to visit a Foxacid server.
whoa, I missed this before.

(tags: nsa gchq packet-injection attacks security backbone http latency)
GCHQ report on 'MULLENIZE' program to 'stain' anonymous electronic traffic

By modifying the User-Agent: header string, each HTTP transaction is "stained" to allow tracking. huh

(tags: gchq nsa snooping sniffing surveillance user-agent http browsers leaks)

Links for 2013-10-03

Published October 3, 2013

Giving Docker/LXC containers a routable IP address

ugh, this is a mess. Docker, automate this crap

(tags: docker routing linux ops networking containers virtualization)
How the feds took down the Dread Pirate Roberts | Ars Technica

Well-written, comprehensive writeup of the Silk Road takedown, and the libertarian craziness of Ross William Ulbricht, it's alleged owner and operator

(tags: silk-road drugs crazy ross-william-ulbricht fbi libertarian murder tor)
Patent troll Lodsys chickens out, folds case rather than face Eugene Kaspersky

In Kaspersky's view, patent trolls are no better than the extortionists who cropped up in Russia after the fall of the Soviet Union, when crime ran rampant. Kaspersky saw more and more people becoming victims of various extortion schemes. US patent trolls seemed very similar. "Kaspersky's view was that paying patent trolls was like paying a protection racket," said Kniser. He wasn't going to do it.
yay! pity it didn't manage to establish precedent, though. But go Kaspersky!

(tags: eugene-kaspersky shakedowns law east-texas swpats patents patent-trolls)
Sergio Bossa's thoughts about Datomic

good comments from Sergio, particularly about the scalability of the single transactor in the Datomic architecture. I agree it's a worrying design flaw

(tags: clojure nosql datomic sergio-bossa transactor spof architecture storage)
Codex Seraphinianus: A new edition of the strangest book in the world

Excited! one commenter claims a paperback of the new edition of Luigi Serafini's masterwork should cost about $75 when it comes out in a couple of months. sign me up, this is an amazing work

(tags: codex-seraphinianus art weird strange books luigi-serafini)
The Snowden files: why the British public should be worried about GCHQ

When the Guardian offered John Lanchester access to the GCHQ files, the journalist and novelist was initially unconvinced. But what the papers told him was alarming: that Britain is sliding towards an entirely new kind of surveillance society

(tags: john-lanchester gchq guardian surveillance snooping police-state nsa privacy government)

Links for 2013-10-02

Published October 2, 2013

Groundbreaking Results for High Performance Trading with FPGA and x86 Technologies

The enhancement in performance was achieved by providing a fast-path where trades are executed directly by the FPGA under the control of trigger rules processed by the x86 based functions. The latency is reduced further by two additional techniques in the FPGA – inline parsing and pre-emption. As market data enters the switch, the Ethernet frame is parsed serially as bits arrive, allowing partial information to be extracted and matched before the whole frame has been received. Then, instead of waiting until the end of a potential triggering input packet, pre-emption is used to start sending the overhead part of a response which contains the Ethernet, IP, TCP and FIX headers. This allows completion of an outgoing order almost immediately after the end of the triggering market feed packet.
Insane stuff. (Via Martin Thompson)

(tags: via:martin-thompson insane speed low-latency fpga fast-path trading stock-markets performance optimization ethernet)
Why Tellybug moved from Cassandra to Amazon DynamoDB

Summary: poor reliability, better latencies, and cheaper (!)

(tags: aws dynamodb cassandra nosql storage tellybug counters scalability reliability latency)
The Best Bike Lock

Interviews with 2 New York bike thieves (one bottom feeder, one professional), reviewing the current batch of bicycle locks. Summary: U-locks are good, when used correctly, particularly the Kryptonite New York Lock ($80). On the other hand, Dublin's recent spate of thefts are largely driven by wide availability of battery-powered angle grinders (thanks Lidl!), which, according to this article, are relatively quiet and extremely fast. :(

(tags: bike review locks cycling u-locks theft security)
Fingerprints are Usernames, not Passwords

I could see some value, perhaps, in a tablet that I share with my wife, where each of us have our own accounts, with independent configurations, apps, and settings. We could each conveniently identify ourselves by our fingerprint. But biometrics cannot, and absolutely must not, be used to authenticate an identity. For authentication, you need a password or passphrase. Something that can be independently chosen, changed, and rotated. [...] Once your fingerprint is compromised (and, yes, it almost certainly already is, if you've crossed an international border or registered for a driver's license in most US states), how do you change it? Are you starting to see why this is a really bad idea?

(tags: biometrics apple security fingerprints passwords authentication authorization identity)
Silk Road busted

This is a pretty good summary of the salient points from the criminal complaint against Ross William Ulbricht -- I'd say it's pretty bad news for any users of the dodgy site, particularly given this:
"During the 60-day period from May 24, 2013 to July 23, 2013, there were approximately 1,217,218 communications sent between Silk Road users through Silk Road's private-message system."
According to the complaint, those are now in the FBI's hands -- likely unencrypted.

(tags: crime silk-road drugs busts tor ross-william-ulbricht fbi)
Vitamin T: Hold the Salsa, New York Times! We've Got Something to Taco ‘Bout - Digest - Los Angeles magazine

ouch. some serious slagging here, along with taco science. (BTW we have the same problem with carne asada in Ireland, our taquerias use the cheater method too, sadly)

(tags: la tacos mexican food new-york slagging burritos taquerias carne-asada)
Edward Snowden's E-Mail Provider Defied FBI Demands to Turn Over SSL Keys, Documents Show

Levison lost [in secret court against the government's order]. In a work-around, Levison complied the next day by turning over the private SSL keys as an 11 page printout in 4-point type. The government called the printout “illegible” and the court ordered Levison to provide a more useful electronic copy.
Nice try though! Bottom line is they demanded the SSL private key. (via Waxy)

(tags: government privacy security ssl tls crypto fbi via:waxy secrecy snooping)
Poisson Rouge: Crowdfunding Red Fish style

the fantastic French kids' site is now crowdfunding new work -- first off being a German Alphabet part of the site. My kids love their stuff, so -- bonne chance!

(tags: french poisson-rouge flash web kids children education)

Links for 2013-10-01

Published October 1, 2013

How an Engineer Earned 1.25 Million Air Miles By Buying Pudding

An amazing hack. 'Air Miles are awesome, they can be used to score free flights, hotel stays and if you’re really lucky, the scorn and hatred of everyone you come in contact with who has to pay full price when they travel. The king of all virtually free travelers is one David Phillips, a civil engineer who teaches at the University of California, Davis. David came to the attention of the wider media when he managed to convert about 12,150 cups of Healthy Choice chocolate pudding [costing $3000] into over a million Air Miles. Ever since, David and his entire family have been travelling the world for next to nothing.' (via al3xandru)

(tags: via:al3xandru hacks cool pudding small-print air-miles free)
Down the Rabbit Hole

An adventure that takes you through several popular Java language features and shows how they compile to bytecode and eventually JIT to assembly code.

(tags: charles-nutter java jvm compilation reversing talks slides)

Links for 2013-09-30

Published September 30, 2013

Model checking for highly concurrent code

Applied formal methods in order to test distributed systems -- specifically GlusterFS:
I'll use an example from my own recent experience. I'm developing a new kind of replication for GlusterFS. To make sure the protocol behaves correctly even across multiple failures, I developed a Murphi model for it. [...] I added a third failure [to the simulated model]. I didn't expect a three-node system to continue working if more than one of those were concurrent (the model allows the failures to be any mix of sequential and concurrent), but I expected it to fail cleanly without reaching an invalid state. Surprise! It managed to produce a case where a reader can observe values that go back in time. This might not make much sense without knowing the protocol involved, but it might give some idea of the crazy conditions a model checker will find that you couldn't possibly have considered. [...] So now I have a bug to fix, and that's a good thing. Clearly, it involves a very specific set of ill-timed reads, writes, and failures. Could I have found it by inspection or ad-hoc analysis? Hell, no. Could I have found it by testing on live systems? Maybe, eventually, but it probably would have taken months for this particular combination to occur on its own. Forcing it to occur would require a lot of extra code, plus an exerciser that would amount to a model checker running 100x slower across machines than Murphi does. With enough real deployments over enough time it would have happened, but the only feasible way to prevent that was with model checking. These are exactly the kinds of bugs that are hardest to fix in the field, and that make users distrust distributed systems, so those of us who build such systems should use every tool at our disposal to avoid them.

(tags: model-checking formal-methods modelling murphi distcomp distributed-systems glusterfs testing protocols)
Is Trypophobia a Real Phobia? | Popular Science

ie. "fear of small, clustered holes". Sounds like it's not so much a "phobia" as some kind of innate, visceral disgust response; I get it. 'As for who actually made the word up, that distinction probably belongs to a blogger in Ireland named Louise, Andrews says. According to an archived Geocities page, Louise settled on "trypophobia" (Greek for "boring holes" + "fear") after corresponding with a representative at the Oxford English Dictionary. Louise, Andrews and trypophobia Facebook group members have petitioned the dictionary to include the word. The term will need to be used for years and have multiple petitions and scholarly references before the dictionary accepts it, Andrews says. I, for one, would prefer to forget about it forever.'

(tags: disgusting revulsion fear phobias trypophobia holes ugh innate)
Common phobia you have never heard of: Fear of holes may stem from evolutionary survival response

"We think that everyone has trypophobic tendencies even though they may not be aware of it," said Dr Cole. "We found that people who don't have the phobia still rate trypophobic images as less comfortable to look at than other images. It backs up the theory that we are set-up to be fearful of things which hurt us in our evolutionary past. We have an innate predisposition to be wary of things that can harm us."

(tags: trypophobia holes fear aversion disgust ugh evolution innate)

Links for 2013-09-26

Published September 26, 2013

Mesosphere · Docker on Mesos

This is cool. Deploy Docker container images onto a Mesos cluster: key point, in the description of the Redis example: 'there’s no need to install Redis or its supporting libraries on your Mesos hosts.'

(tags: mesos docker deployment ops images virtualization containers linux)
Call me maybe: Kafka

Aphyr takes a look at Kafka 0.8's replication with the Jepsen test suite. It doesn't go great. Jay Kreps responds here: http://blog.empathybox.com/post/62279088548/a-few-notes-on-kafka-and-jepsen

(tags: jay-kreps kafka replication distributed-systems distcomp networking reliability fault-tolerance jepsen)

Links for 2013-09-25

Published September 25, 2013

The Hole in Our Collective Memory: How Copyright Made Mid-Century Books Vanish - Rebecca J. Rosen - The Atlantic

A book published during the presidency of Chester A. Arthur has a greater chance of being in print today than one published during the time of Reagan.
This is not a gently sloping downward curve. Publishers seem unwilling to sell their books on Amazon for more than a few years after their initial publication. The data suggest that publishing business models make books disappear fairly shortly after their publication and long before they are scheduled to fall into the public domain. Copyright law then deters their reappearance as long as they are owned. On the left side of the graph before 1920, the decline presents a more gentle time-sensitive downward sloping curve.

(tags: business books legal copyright law public-domain reading history publishers amazon papers)

Links for 2013-09-24

Published September 24, 2013

Horse_ebooks is human after all

Curated dissociated text. That's great

(tags: ebooks art horse_ebooks internet twitter markov-chains)
The Slow Winter

(tags: coding funny processors multicore multiprocessing branch-prediction hardware)
To my daughter's high school programming teacher

During the first semester of my daughter's junior/senior year, she took her first programming class. She knew I'd be thrilled, but she did it anyway. When my daughter got home from the first day of the semester, I asked her about the class. "Well, I'm the only girl in class," she said. Fortunately, that didn't bother her, and she even liked joking around with the guys in class. My daughter said that you noticed and apologized to her because she was the only girl in class. And when the lessons started (Visual Basic? Seriously??), my daughter flew through the assigments. After she finished, she'd help classmates who were behind or struggling in class. Over the next few weeks, things went downhill. While I was attending SC '12 in Salt Lake City last November, my daughter emailed to tell me that the boys in her class were harassing her. "They told me to get in the kitchen and make them sandwiches," she said. I was painfully reminded of the anonymous men boys who left comments on a Linux Pro Magazine blog post I wrote a few years ago, saying the exact same thing.
I am sick to death of this 'brogrammer' bullshit.

(tags: brogrammers sexism culture tech teaching coding software education)
"The cricket bat that died for Ireland"

The bat had the misfortune of being on display in the shop front of Elvery’s store on O’Connell Street, then Sackville Street, during the Easter Rising. J.W. Elvery & Co. was Ireland’s oldest sports store, specialising in sporting goods and waterproofed wear, with branches in Dublin, Cork (Patrick Street) and London (Conduit Street). [...] Its location, about one block from the GPO, meant it was in the middle of the cross-fire and general destruction of the main street.

(tags: ireland cricket 1916 history easter-rising crossfire sports elverys)
_Availability in Globally Distributed Storage Systems_ [pdf]

empirical BigTable and GFS failure numbers from Google are orders of magnitude higher than naïve independent-failure models. (via kragen)

(tags: via:kragen failure bigtable gfs statistics outages reliability)
Why We Hate Infographics (And Why You Should)

YES. (via Des Traynor)

(tags: via:destraynor infographics visualization dataviz graphics fail)

Links for 2013-09-23

Published September 23, 2013

Apple iOS 7 surprises as first with new multipath TCP connections - Network World

iOS 7 includes -- and uses -- multipath TCP, right now for device-to-Siri communications.
MPTCP is a TCP extension that enables the simultaneous use of several IP addresses or interfaces. Existing applications – completely unmodified -- see what appears to be a standard TCP interface. But under the covers, MPTCP is spreading the connection’s data across several subflows, sending it over the least congested paths.

(tags: ios7 ios networking apple mptcp tcp protocols fault-tolerance)
_How Hard Can It Be? Designing and Implementing a Deployable Multipath TCP_ [pdf]

(tags: mptcp tcp protocols networking ip)
DynamoDB Local

'a client-side database that supports the complete DynamoDB API, but doesn't manipulate any tables or data in DynamoDB itself. You can write code while sitting in a tree, on the beach, or in the desert. When you are ready to deploy your application, you simply instruct it to connect to the actual DynamoDB endpoint. No other modifications will be needed.' This is good -- an in-memory data store for integration testing is absolutely vital for production usage. (Voldemort does this well, for example.)

(tags: dynamodb aws ec2 testing integration-testing unit-tests)
Excellent Rob Pike quote about algorithmic complexity

'Fancy algorithms are slow when n is small, and n is usually small.' -- Rob Pike
Been there, bought the t-shirt ;)

(tags: rob-pike quotes algorithms big-o complexity coding)
Raft: The Understandable Distributed Consensus Protocol

good slides explaining the Raft protocol

(tags: raft slides presentation distcomp algorithms)

Links for 2013-09-22

Published September 22, 2013

RSA warns developers not to use RSA products

In case you're missing the story here, Dual_EC_DRBG (which I wrote about yesterday) is the random number generator voted most likely to be backdoored by the NSA. The story here is that -- despite many valid concerns about this generator -- RSA went ahead and made it the default generator used for all cryptography in its flagship cryptography library. The implications for RSA and RSA-based products are staggering. In a modestly bad but by no means worst case, the NSA may be able to intercept SSL/TLS connections made by products implemented with BSafe.

(tags: bsafe rsa crypto backdoors nsa security dual_ec_drbg rngs randomness)
A Case Against Cucumber

This is exactly my problem with Cucumber and similar BDD test frameworks.
When I write a Cucumber feature, I have to write the Gherkin that describes the acceptance criteria, and the Ruby code that implements the step definitions. Since the code to implement the step definitions is just normal RSpec (or whichever testing library you use), if someone else is writing the Gherkin, the amount of setup to create a working test should be about the same. So you’re only breaking even! However, I don’t believe that it would really be breaking even. Cucumber adds another layer of indirection on top of your tests. When I’m trying to see why a specific scenario is failing, first I need to find the step that is failing. Since these steps are defined with regular expressions, I have to grep for the step definition.

(tags: ruby testing bdd cucumber rspec coding)
Gamasutra - Opinion: The tragedy of Grand Theft Auto V

This is watching your sharp, witty father start telling old fart jokes as his mind slows down. And as much as the internet is habituated to defending GTA as "satire," what is it satirizing, if everything is either sad or awful? Where is the "satire" when the awful parts no longer seem edgy or provocative, just attempts at catch-all "offense" that aren't honed enough to even connect? Here's a series that has been creating real, meaningful friction with conventional entertainment for as long as I can remember, and rather than push the envelope by creating new kinds of monsters, it's reciting the same old gangland fantasies, like a college boy who can't stop staring at the Godfather II poster on his wall, talking about how he's gonna be a big Hollywood director in between bong rips. You call the trading index BAWSAQ? Oh, bro, you're so funny, you're gonna be huge.

(tags: gamasutra games gaming gta gta-v via:skamille)
CCC | Chaos Computer Club breaks Apple TouchID

"We hope that this finally puts to rest the illusions people have about fingerprint biometrics. It is plain stupid to use something that you can´t change and that you leave everywhere every day as a security token", said Frank Rieger, spokesperson of the CCC. "The public should no longer be fooled by the biometrics industry with false security claims. Biometrics is fundamentally a technology designed for oppression and control, not for securing everyday device access." iPhone users should avoid protecting sensitive data with their precious biometric fingerprint not only because it can be easily faked, as demonstrated by the CCC team. Also, you can easily be forced to unlock your phone against your will when being arrested. Forcing you to give up your (hopefully long) passcode is much harder under most jurisdictions than just casually swiping your phone over your handcuffed hands.

Links for 2013-09-18

Published September 18, 2013

Piracy is a 'minority activity', pirates spend more on content, and piracy rates dropped in the UK during 2012

OfCom has published a report on online piracy, which found that the practice is becoming less common and that pirates tend to spend more on legitimate content than non-pirates. The research, which was not funded by the entertainment industry, was conducted by Kantar Media among 21,474 participants and took place in 2012 across four separate stages. Over that time, the ratio of legal to illegal content fell -- confirming a suspected trend as legal streaming options became more available. It also confirmed another suspicion -- that a relatively small number of web users are responsible for most piracy. In OfCom's data, just two percent of users conducted three quarters of all piracy. Ofcom described piracy as "a minority activity". Of those surveyed, 58 percent accessed music, movie or TV content online, while 17 percent accessed illegal content sources. Those who admitted pirating content spent on average £26 every three months on legitimate content, set against an average spend of £16 among non-pirates.

(tags: wired piracy studies ofcom streaming)
Want to back an Irish Microbrewery?

The excellent Trouble Brewing are looking for investors

(tags: trouble-brewing ireland brewing beer business investment crowdfunding microbreweries)
Tips for Tuning the Garbage First Garbage Collector

(tags: g1gc gc java jvm tuning ops optimization)
_An Improved Construction For Counting Bloom Filters_

'A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow membership queries on a set that can be changing dynamically via insertions and deletions. As with a Bloom filter, a CBF obtains space savings by allowing false positives. We provide a simple hashing-based alternative based on d-left hashing called a d-left CBF (dlCBF). The dlCBF offers the same functionality as a CBF, but uses less space, generally saving a factor of two or more. We describe the construction of dlCBFs, provide an analysis, and demonstrate their effectiveness experimentally'

(tags: bloom-filter data-structures algorithms counting cbf storage false-positives d-left-hashing hashing)

Links for 2013-09-17

Published September 17, 2013

To solve hard problems, you need to use bricolage

In a talk about a neat software component he designed, Bruce Haddon observed that there is no way that the final structure and algorithmic behavior of this component could have been predicted, designed, or otherwise anticipated. Haddon observed that computer science serves as a source of core ideas: it provides the data structures and algorithms that are the building blocks. Meanwhile, he views software engineering as a useful set of methods to help design reliable software without losing your mind. Yet he points out that neither captures the whole experience. That’s because much of the work is what Haddon calls hacking, but what others would call bricolage. Simply put, there is much trial and error: we put ideas to together and see where it goes.
This is a great post, and I agree (broadly). IMO, most software engineering requires little CS, but there are occasional moments where a single significant aspect of a project requires a particular algorithm, and would be kludgy, hacky, or over-complex to solve without it.

(tags: bricolage hacking cs computer-science work algorithms)
Getting Real About Distributed System Reliability

I have come around to the view that the real core difficulty of [distributed] systems is operations, not architecture or design. Both are important but good operations can often work around the limitations of bad (or incomplete) software, but good software cannot run reliably with bad operations. This is quite different from the view of unbreakable, self-healing, self-operating systems that I see being pitched by the more enthusiastic NoSQL hypesters. Worse yet, you can’t easily buy good operations in the same way you can buy good software—you might be able to hire good people (if you can find them) but this is more than just people; it is practices, monitoring systems, configuration management, etc.

(tags: reliability nosql distributed-systems jay-kreps ops)
Don't use Hadoop - your data isn't that big

see also HN comments: https://news.ycombinator.com/item?id=6398650 , particularly davidmr's great one:
I suppose all of this is to say that the amount of required parallelization of a problem isn't necessarily related to the size of the problem set as is mentioned most in the article, but also the inherent CPU and IO characteristics of the problem. Some small problems are great for large-scale map-reduce clusters, some huge problems are horrible for even bigger-scale map-reduce clusters (think fluid dynamics or something that requires each subdivision of the problem space to communicate with its neighbors). I've had a quote printed on my door for years: Supercomputers are an expensive tool for turning CPU-bound problems into IO-bound problems.
I love that quote!

(tags: hadoop big-data scaling map-reduce)
Gilt Tech

Gilt ran a stress-test of Riak to replace Voldemort (I think) in a shadow stack, with good results:
Riak’s strong performance suggests that, should we pursue implementation, it will withstand our unique traffic needs and prove reliable. As for the Gilt-Basho team’s strong performance: It was amazing that we were able to accomplish so much in just a week’s time! Thanks again to Seth and Steve for making this possible.

(tags: riak testing shadow-stack voldemort storage gilt)
THE LONG DARK, a first-person post-disaster survival sim by Hinterland — Kickstarter

wow this looks great.
The Long Dark is a thoughtful, first-person survival simulation that emphasizes quiet exploration in a stark, yet hauntingly beautiful, post-disaster setting. The breathtakingly picturesque Pacific Northwest frames the backdrop for the drama of The Long Dark.

(tags: games survival via:fp eclaire the-long-dark kickstarter)
The Rational Choices of Crack Addicts - NYTimes.com

“The key factor is the environment, whether you’re talking about humans or rats,” Dr. Hart said. “The rats that keep pressing the lever for cocaine are the ones who are stressed out because they’ve been raised in solitary conditions and have no other options. But when you enrich their environment, and give them access to sweets and let them play with other rats, they stop pressing the lever.”

(tags: crack drugs policy science addiction society)

Links for 2013-09-16

Published September 16, 2013

Inside the mind of NSA chief Gen Keith Alexander | Glenn Greenwald

featuring some mental pics of the "Information Dominance Center", the Star Trek bridge which NSA chief Keith Alexander built with taxpayer money

(tags: big-brother nsa politics keith-alexander star-trek funny bizarre)
Schneier on Security: Reforming the NSA

Regardless of how we got here, the NSA can't reform itself. Change cannot come from within; it has to come from above. It's the job of government: of Congress, of the courts, and of the president. These are the people who have the ability to investigate how things became so bad, rein in the rogue agency, and establish new systems of transparency, oversight, and accountability. Any solution we devise will make the NSA less efficient at its eavesdropping job. That's a trade-off we should be willing to make, just as we accept reduced police efficiency caused by requiring warrants for searches and warning suspects that they have the right to an attorney before answering police questions. We do this because we realize that a too-powerful police force is itself a danger, and we need to balance our need for public safety with our aversion of a police state.

(tags: nsa politics us-politics surveillance snooping society government police public-safety police-state)
Biometric authentication failing in Mysore

Biometrics was rolled out for food distribution in order to cut down on fraud, but it's now resulting in a subset of users being unable to authenticate:
The biometric authentication system installed at the PDS outlets fails to establish the identity of many genuine beneficiaries, mostly workers, as their daily grind in the agricultural fields, construction sites or as domestic help have eroded the lines on their thumb resulting in distorted impressions.

(tags: fail risks biometrics authentication mysore security india fingerprinting)
Sketch of the Day – Frugal Streaming

ha, this is very clever! If you have enough volume, this is a nice estimation algorithm to compute stream quantiles in very little RAM

(tags: memory streaming stream-processing clever algorithms hacks streams)
How not to stop spammers

Spam Arrest is a company that sells an anti-spam service. They attempted to sue some spammers and, as has been widely reported, lost badly. This case emphasizes three points that litigious antispammers seem not to grasp: Under CAN SPAM, a lot of spam is legal. Judges hate plaintiffs who try to be too clever, and hate sloppy preparation even more. Never, ever, file a spam suit in Seattle.

(tags: anti-spam spam law seattle us can-spam spamarrest sentient-jets)
Benchmarking Redis on AWS ElastiCache

good data points, but could do with latency percentiles

(tags: latency redis measurement benchmarks ec2 elasticache aws storage tests)

Links for 2013-09-15

Published September 15, 2013

Being poor changes your thinking about everything

Very interesting research into poverty and scarcity, in the Washington Post:
The scarcity trap captures this notion we see again and again in many domains. When people have very little, they undertake behaviors that maintain or reinforce their future disadvantage. If you have very little, you often behave in such a way so that you'll have little in the future. In economics, people talk about the poverty trap. We're generalizing that, saying this happens a lot, and we've experienced it.

(tags: poor poverty society economics scarcity washington-post)
Good SSL for your website is absurdly difficult in practice

Yet again, security software fails on packaging and UI. via Tony Finch

(tags: security ssl tls packaging via:fanf)
Former NSA and CIA director says terrorists love using Gmail

At one point, Hayden expressed a distaste for online anonymity, saying "The problem I have with the Internet is that it's anonymous." But he noted, there is a struggle over that issue even inside government. The issue came to a head during the Arab Spring movement when the State Department was funding technology [presumably Tor?] to protect the anonymity of activists so governments could not track down or repress their voices. "We have a very difficult time with this," Hayden said. He then asked, "is our vision of the World Wide Web the global digital commons -- at this point you should see butterflies flying here and soft background meadow-like music -- or a global free fire zone?" Given that Hayden also compared the Internet to the wild west and Somalia, Hayden clearly leans toward the "global free fire zone" vision of the Internet.
well, that's a good analogy for where we're going -- a global free-fire zone.

(tags: gmail cia nsa surveillance michael-hayden security snooping law tor arab-spring)

Links for 2013-09-14

Published September 14, 2013

Google swaps out MySQL, moves to MariaDB

When we asked Sallner to quantify the scale of the migration he said, "They're moving it all. Everything they have. All of the MySQL servers are moving to MariaDB, as far as I understand." By moving to MariaDB, Google can free itself of any dependence on technology dictated by Oracle – a company whose motivations are unclear, and whose track record for working with the wider technology community is dicey, to say the least. Oracle has controlled MySQL since its acquisition of Sun in 2010, and the key InnoDB storage engine since it got ahold of Innobase in 2005. [...] We asked Cole why Google would shift from MySQL to MariaDB, and what the key technical differences between the systems were. "From my perspective, they're more or less equivalent other than if you look at specific features and how they implement them," Cole said, speaking in a personal capacity and not on behalf of Google. "Ideologically there are lots of differences."
So -- AWS, when will RDS offer MariaDB as an option?

(tags: google mysql mariadb sql open-source licensing databases storage innodb oracle)
FBI Admits It Controlled Tor Servers Behind Mass Malware Attack

The code’s behavior, and the command-and-control server’s Virginia placement, is also consistent with what’s known about the FBI’s “computer and internet protocol address verifier,” or CIPAV, the law enforcement spyware first reported by WIRED in 2007. Court documents and FBI files released under the FOIA have described the CIPAV as software the FBI can deliver through a browser exploit to gather information from the target’s machine and send it to an FBI server in Virginia. The FBI has been using the CIPAV since 2002 against hackers, online sexual predators, extortionists, and others, primarily to identify suspects who are disguising their location using proxy servers or anonymity services, like Tor. Prior to the Freedom Hosting attack, the code had been used sparingly, which kept it from leaking out and being analyzed.

(tags: cipav fbi tor malware spyware security wired)
Creating Flight Recordings

lots more detail on the new "Java Mission Control" feature in Hotspot 7u40 JVMs, and how to use it to start and stop profiling in a live, production JVM from a separate "jcmd" command-line client. If the overhead is small, this could be really neat -- turn on profiling for 1 minute every hour on a single instance, and collect realtime production profile data on an automated basis for post-facto analysis if required

(tags: instrumentation logging profiling java jvm ops)

Links for 2013-09-12

Published September 12, 2013

Necessary and Proportionate -- In Which Civil Society is Caught Between a Cop and a Spy

Modern telecommunications technology implied the development of modern telecommunications surveillance, because it moved the scope of action from the physical world (where intelligence, generally seen as part of the military mission, had acted) to the virtual world—including the scope of those actions that could threaten state power. While the public line may have been, as US Secretary of State Henry Stimson said in 1929, “gentlemen do not open each other’s mail”, you can bet that they always did keep a keen eye on the comings and goings of each other’s shipping traffic. The real reason that surveillance in the context of state intelligence was limited until recently was because it was too expensive, and it was too expensive for everyone. The Westphalian compromise demands equality of agency as tied to territory. As soon as one side gains a significant advantage, the structure of sovereignty itself is threatened at a conceptual level?—?hence Oppenheimer as the death of any hope of international rule of law. Once surveillance became cheap enough, all states were (and will increasingly be) forced to attempt it at scale, as a reaction to this pernicious efficiency. The US may be ahead of the game now, but Moore’s law and productization will work their magic here.

(tags: government telecoms snooping gchq nsa surveillance law politics intelligence spying internet)

Links for 2013-09-11

Published September 11, 2013

Observability at Twitter

Bit of detail into Twitter's TSD metric store.
There are separate online clusters for different data sets: application and operating system metrics, performance critical write-time aggregates, long term archives, and temporal indexes. A typical production instance of the time series database is based on four distinct Cassandra clusters, each responsible for a different dimension (real-time, historical, aggregate, index) due to different performance constraints. These clusters are amongst the largest Cassandra clusters deployed in production today and account for over 500 million individual metric writes per minute. Archival data is stored at a lower resolution for trending and long term analysis, whereas higher resolution data is periodically expired. Aggregation is generally performed at write-time to avoid extra storage operations for metrics that are expected to be immediately consumed. Indexing occurs along several dimensions–service, source, and metric names–to give users some flexibility in finding relevant data.

(tags: twitter monitoring metrics service-metrics tsd time-series storage architecture cassandra)
NSA: Possibly breaking US laws, but still bound by laws of computational complexity

I didn’t clearly explain that there’s an enormous continuum between, on the one hand, a full break of RSA or Diffie-Hellman (which still seems extremely unlikely to me), and on the other, “pure side-channel attacks” involving no new cryptanalytic ideas. Along that continuum, there are many plausible places where the NSA might be. For example, imagine that they had a combination of side-channel attacks, novel algorithmic advances, and sheer computing power that enabled them to factor, let’s say, ten 2048-bit RSA keys every year. In such a case, it would still make perfect sense that they’d want to insert backdoors into software, sneak vulnerabilities into the standards, and do whatever else it took to minimize their need to resort to such expensive attacks. But the possibility of number-theoretic advances well beyond what the open world knows certainly wouldn’t be ruled out. Also, as Schneier has emphasized, the fact that NSA has been aggressively pushing elliptic-curve cryptography in recent years invites the obvious speculation that they know something about ECC that the rest of us don’t.

(tags: ecc rsa crypto security nsa gchq snooping sniffing diffie-hellman pki key-length)
Low Overhead Method Profiling with Java Mission Control now enabled in the most recent HotSpot JVM release

Built into the HotSpot JVM [in JDK version 7u40] is something called the Java Flight Recorder. It records a lot of information about/from the JVM runtime, and can be thought of as similar to the Data Flight Recorders you find in modern airplanes. You normally use the Flight Recorder to find out what was happening in your JVM when something went wrong, but it is also a pretty awesome tool for production time profiling. Since Mission Control (using the default templates) normally don’t cause more than a per cent overhead, you can use it on your production server.
I'm intrigued by the idea of always-on profiling in production. This could be cool.

(tags: performance java measurement profiling jvm jdk hotspot mission-control instrumentation telemetry metrics)

Links for 2013-09-09

Published September 9, 2013

How the NSA Spies on Smartphones

One of the US agents' tools is the use of backup files established by smartphones. According to one NSA document, these files contain the kind of information that is of particular interest to analysts, such as lists of contacts, call logs and drafts of text messages. To sort out such data, the analysts don't even require access to the iPhone itself, the document indicates. The department merely needs to infiltrate the target's computer, with which the smartphone is synchronized, in advance. Under the heading "iPhone capability," the NSA specialists list the kinds of data they can analyze in these cases. The document notes that there are small NSA programs, known as "scripts," that can perform surveillance on 38 different features of the iPhone 3 and 4 operating systems. They include the mapping feature, voicemail and photos, as well as the Google Earth, Facebook and Yahoo Messenger applications.
and, of course, the alternative means of backup is iCloud.... wonder how secure those backups are.

(tags: nsa surveillance gchq iphone smartphones backups icloud security)
Behind the Screens at Loggly

Boost ASIO at the front end (!), Kafka 0.8, Storm, and ElasticSearch

(tags: boost scalability loggly logging ingestion cep stream-processing kafka storm architecture elasticsearch)
Schneier on Security: Excess Automobile Deaths as a Result of 9/11

The inconvenience of extra passenger screening and added costs at airports after 9/11 cause many short-haul passengers to drive to their destination instead, and, since airline travel is far safer than car travel, this has led to an increase of 500 U.S. traffic fatalities per year. Using DHS-mandated value of statistical life at $6.5 million, this equates to a loss of $3.2 billion per year, or $32 billion over the period 2002 to 2011 (Blalock et al. 2007).

(tags: risk security death 9-11 politics screening dhs air-travel driving road-safety)

Links for 2013-09-08

Published September 8, 2013

Perhaps I'm out of step and Britons just don't think privacy is important | Henry Porter | Comment is free | The Observer

The debate has been stifled in Britain more successfully than anywhere else in the free world and, astonishingly, this has been with the compliance of a media and public that regard their attachment to liberty to be a matter of genetic inheritance. So maybe it is best for me to accept that the BBC, together with most of the newspapers, has moved with society, leaving me behind with a few old privacy-loving codgers, wondering about the cause of this shift in attitudes. Is it simply the fear of terror and paedophiles? Are we so overwhelmed by the power of the surveillance agencies that we feel we can't do anything? Or is it that we have forgotten how precious and rare truly free societies are in history?

(tags: privacy uk politics snooping spies gchq society nsa henry-porter)
Big data is watching you

Some great street art from Brighton, via Darach Ennis

(tags: via:darachennis street-art graffiti big-data snooping spies gchq nsa art)
Blocking The Pirate Bay appears to have 'no lasting net impact' on illegal downloading

In the fight against the unauthorised sharing of copyright protected material, aka piracy, Dutch Internet Service Providers have been summoned by courts to block their subscribers’ access to The Pirate Bay (TPB) and related sites. This paper studies the effectiveness of this approach towards online copyright enforcement, using both a consumer survey and a newly developed non-infringing technology for BitTorrent monitoring. While a small group of respondents download less from illegal sources or claim to have stopped, and a small but significant effect is found on the distribution of Dutch peers, no lasting net impact is found on the percentage of the Dutch population downloading from illegal sources.

(tags: fail blocking holland pirate-bay tpb papers via:tjmcintyre internet isps)
How Advanced Is the NSA's Cryptanalysis — And Can We Resist It?

Bruce Schneier's suggestions:
Assuming the hypothetical NSA breakthroughs don’t totally break public-cryptography — and that’s a very reasonable assumption — it’s pretty easy to stay a few steps ahead of the NSA by using ever-longer keys. We’re already trying to phase out 1024-bit RSA keys in favor of 2048-bit keys. Perhaps we need to jump even further ahead and consider 3072-bit keys. And maybe we should be even more paranoid about elliptic curves and use key lengths above 500 bits. One last blue-sky possibility: a quantum computer. Quantum computers are still toys in the academic world, but have the theoretical ability to quickly break common public-key algorithms — regardless of key length — and to effectively halve the key length of any symmetric algorithm. I think it extraordinarily unlikely that the NSA has built a quantum computer capable of performing the magnitude of calculation necessary to do this, but it’s possible. The defense is easy, if annoying: stick with symmetric cryptography based on shared secrets, and use 256-bit keys.

(tags: bruce-schneier cryptography wired nsa surveillance snooping gchq cryptanalysis crypto future key-lengths)
DevOps Eye for the Coding Guy: Metrics

a pretty good description of the process of adding service metrics to a Django webapp using graphite and statsd. Bookmarking mainly for the great real-time graphing hack at the end...

(tags: statsd django monitoring metrics python graphite)
Probabalistic Scraping of Plain Text Tables

a nifty hack.
Recently I have been banging my head trying to import a ton of OCR acquired data expressed in tabular form. I think I have come up with a neat approach using probabilistic reasoning combined with mixed integer programming. The method is pretty robust to all sorts of real world issues. In particular, the method leverages topological understanding of tables, encodes it declaratively into a mixed integer/linear program, and integrates weak probabilistic signals to classify the whole table in one go (at sub second speeds). This method can be used for any kind of classification where you have strong logical constraints but noisy data.
(via proggit)

(tags: scraping tables ocr probabilistic linear-programming optimization machine-learning via:proggit)
vimeo/timeserieswidget

'Plugin to make highly interactive graphite graph objects ((i.e. graphs where you can interactively toggle on/off individual series, inspect datapoints, zoom in realtime, etc) Supports Flot (canvas), Rickshaw (svg) and standard graphite png images (in case you're nostalgic and don't like interactivity).'

(tags: graphs graphing graphite dataviz flot rickshaw svg canvas javascript)

Links for 2013-09-05

Published September 5, 2013

modern JVM concurrency primitives are broken if the system clock steps backwards

'The implementation of the concurrency primitive LockSupport.parkNanos(), the function that controls *every* concurrency primitive on the JVM, is flawed, and any NTP sync, or system time change, can potentially break it with unexpected results across the board when running a 64bit JVM on Linux 64bit.' Basically, LockSupport.parkNanos() calls pthread_cond_timedwait() using a CLOCK_REALTIME instead of CLOCK_MONOTONIC. 'tinker step 0' in ntp.conf may be a viable workaround.

(tags: clocks timing ntp slew sync step pthreads java jvm timers clock_realtime clock_monotonic)
Schneier on Security: The NSA Is Breaking Most Encryption on the Internet

The new Snowden revelations are explosive. Basically, the NSA is able to decrypt most of the Internet. They're doing it primarily by cheating, not by mathematics. It's joint reporting between the Guardian, the New York Times, and ProPublica. I have been working with Glenn Greenwald on the Snowden documents, and I have seen a lot of them. These are my two essays on today's revelations. Remember this: The math is good, but math has no agency. Code has agency, and the code has been subverted.

(tags: encryption communication government nsa security bruce-schneier crypto politics snooping gchq guardian journalism)

Links for 2013-09-04

Published September 4, 2013

How To Buffer Full YouTube Videos Before Playing

summary - turn off DASH (Dynamic adaptive streaming) using a userscript.

(tags: chrome youtube google video dash mpeg streaming)
Voldemort on Solid State Drives [paper]

'This paper and talk was given by the LinkedIn Voldemort Team at the Workshop on Big Data Benchmarking (WBDB May 2012).'
With SSD, we find that garbage collection will become a very significant bottleneck, especially for systems which have little control over the storage layer and rely on Java memory management. Big heapsizes make the cost of garbage collection expensive, especially the single threaded CMS Initial mark. We believe that data systems must revisit their caching strategies with SSDs. In this regard, SSD has provided an efficient solution for handling fragmentation and moving towards predictable multitenancy.

(tags: voldemort storage ssd disk linkedin big-data jvm tuning ops gc)

Links for 2013-09-03

Published September 3, 2013

Streaming MapReduce with Summingbird

Before Summingbird at Twitter, users that wanted to write production streaming aggregations would typically write their logic using a Hadoop DSL like Pig or Scalding. These tools offered nice distributed system abstractions: Pig resembled familiar SQL, while Scalding, like Summingbird, mimics the Scala collections API. By running these jobs on some regular schedule (typically hourly or daily), users could build time series dashboards with very reliable error bounds at the unfortunate cost of high latency. While using Hadoop for these types of loads is effective, Twitter is about real-time and we needed a general system to deliver data in seconds, not hours. Twitter’s release of Storm made it easy to process data with very low latencies by sacrificing Hadoop’s fault tolerant guarantees. However, we soon realized that running a fully real-time system on Storm was quite difficult for two main reasons: Recomputation over months of historical logs must be coordinated with Hadoop or streamed through Storm with a custom log loading mechanism; Storm is focused on message passing and random-write databases are harder to maintain. The types of aggregations one can perform in Storm are very similar to what’s possible in Hadoop, but the system issues are very different. Summingbird began as an investigation into a hybrid system that could run a streaming aggregation in both Hadoop and Storm, as well as merge automatically without special consideration of the job author. The hybrid model allows most data to be processed by Hadoop and served out of a read-only store. Only data that Hadoop hasn’t yet been able to process (data that falls within the latency window) would be served out of a datastore populated in real-time by Storm. But the error of the real-time layer is bounded, as Hadoop will eventually get around to processing the same data and will smooth out any error introduced. This hybrid model is appealing because you get well understood, transactional behavior from Hadoop, and up to the second additions from Storm. Despite the appeal, the hybrid approach has the following practical problems: Two sets of aggregation logic have to be kept in sync in two different systems; Keys and values must be serialized consistently between each system and the client. The client is responsible for reading from both datastores, performing a final aggregation and serving the combined results Summingbird was developed to provide a general solution to these problems.
Very interesting stuff. I'm particularly interested in the design constraints they've chosen to impose to achieve this -- data formats which require associative merging in particular.

(tags: mapreduce streaming big-data twitter storm summingbird scala pig hadoop aggregation merging)
Thoughts on Granby Park, the recent pop-up park off Parnell St

We mentioned above that pop-up spaces have become popular across Europe because they allow developers and city councils to harness urban creativity in order to drive up real estate prices without ceding control of a given site. Those who produce the space through hard work, collaboration and passion move on, making way for property development and speculation. The international research in this area is very clear on this point and it has been documented in places from Lower-East Side Manhattan to Berlin’s Kreuzberg. Most perversely, increased property prices make it even more difficult for creativity to flourish in a given area and end up driving out long-term working class communities, migrants and young people. But what can we do? If every attempt we make to make our city a better place simply ends up being captured in the calculations of real estate players, surely the situation is hopeless? Is it better, then, to do nothing? We don’t think it is better to do nothing and, like Upstart, we still believe we can find a way together through experimentation and collaboration. However, this means questioning, reflecting on and publicly discussing the relationship between our efforts to make a city more after our hearts desire and the process of gentrification. As noted above, this is especially the case with pop-up spaces given their temporary nature. It is really necessary that we think about how to make sure our activities don’t contribute to gentrification in the long term, but instead benefit the city as a whole. We certainly don’t have the solutions, but if we sweep these awkward questions under the carpet we risk contributing to the very forces we want to challenge and alienating those who will perceive us as the ‘front-line’ of gentrification.

(tags: gentrification pop-up parks dublin ireland cities upstart spaces urban-planning)
[#CASSANDRA-5582] Replace CustomHsHaServer with better optimized solution based on LMAX Disruptor

Disruptor: decimating P99s since 2011

(tags: disruptor cassandra java p99 latency speed performance concurrency via:kellabyte)
Time is a Dimension

I love these.
Photographic prints are great because they don’t need power to be displayed. They are more or less permanent. Videos are great because they record a sequence of time which shows reality almost like how we experience. Is it possible to combine the two? And not via long exposure photography where often details are lost from motion. So I played around with the tools of digital photography and post processing to give you this series: Time is a dimension. This series of images are mostly landscapes, seascapes and cityscapes, and they are a single composite made from sequences that span 2-4 hours, mostly of sunrises and sunsets. The basic structure of a landscape is present in every piece. But each panel or concentric layer shows a different slice of time, which is related to the adjacent panel/layer. The transition from daytime to night is gradual and noticeable in every piece, but would not be something you expect to see in a still image.

(tags: photography beautiful photos art time dimensions prints via:matthaughey)

Links for 2013-09-02

Published September 2, 2013

WTF Visualizations

'Visualizations that make no sense.' Some of these are unintentional comedy gold -- pie charts feature heavily, of course. (via Des Traynor)

(tags: via:destraynor infographics wtf visualization dataviz data fail funny graphics pie-charts)
Non-blocking transactional atomicity

Peter Bailis with an interesting distributed-storage atomicity algorithm for performing multi-record transactional updates

(tags: algorithms nbta transactions databases storage distcomp distributed atomic coding eventual-consistency crdts)
Interview with the Github Elasticsearch Team

good background on Github's Elasticsearch scaling efforts. Some rather horrific split-brain problems under load, and crashes due to OpenJDK bugs (sounds like OpenJDK *still* isn't ready for production). painful

(tags: elasticsearch github search ops scaling split-brain outages openjdk java jdk jvm)
The Irish Times, terminations and Holles Street: The story that wasn’t there.

Summarising a very shoddy tale from our paper of record.
I don’t know what happened here. I don’t know whether there ever was a woman who met the description given by the Irish Times who suffered a medical crisis during pregnancy. I don’t know why a group of men in positions of authority in the Irish Times decided that, if there was such a woman, they had any right to tell the rest of the country about her experiences. I don’t know why, when they discovered that a mistake had been made in the one legal fact used to justify that decision they didn’t immediately apologise. And I don’t know what happened between the 23rd August 2013 and 31st August 2013 to prompt them to print a shoulder shrugging ‘acceptance’ that the case ‘hadn’t happened’ and limit the paper’s apology to an institution, as opposed to its readers. But, from what I’ve seen this week, I do know one thing. Whatever questions readers might have, The Irish Times isn’t interested in giving them any answers.

(tags: irish-times fail shoddy abortion health public-interest journalism pregnancy corrections)
Blueflood by rackerlabs

Rackspace's large-scale TSD storage system, built on Cassandra, Java, ASL2

(tags: cassandra tsd storage time-series data open-source java rackspace)

Links for 2013-08-31

Published August 31, 2013

Reversing Sinclair's amazing 1974 calculator hack - half the ROM of the HP-35

Amazing reverse engineering.
In a hotel room in Texas, Clive Sinclair had a big problem. He wanted to sell a cheap scientific calculator that would grab the market from expensive calculators such as the popular HP-35. Hewlett-Packard had taken two years, 20 engineers, and a million dollars to design the HP-35, which used 5 complex chips and sold for $395. Sinclair's partnership with calculator manufacturer Bowmar had gone nowhere. Now Texas Instruments offered him an inexpensive calculator chip that could barely do four-function math. Could he use this chip to build a $100 scientific calculator? Texas Instruments' engineers said this was impossible - their chip only had 3 storage registers, no subroutine calls, and no storage for constants such as ?. The ROM storage in the calculator held only 320 instructions, just enough for basic arithmetic. How could they possibly squeeze any scientific functions into this chip? Fortunately Clive Sinclair, head of Sinclair Radionics, had a secret weapon - programming whiz and math PhD Nigel Searle. In a few days in Texas, they came up with new algorithms and wrote the code for the world's first single-chip scientific calculator, somehow programming sine, cosine, tangent, arcsine, arccos, arctan, log, and exponentiation into the chip. The engineers at Texas Instruments were amazed. How did they do it? Up until now it's been a mystery. But through reverse engineering, I've determined the exact algorithms and implemented a simulator that runs the calculator's actual code. The reverse-engineered code along with my detailed comments is in the window below.

(tags: reversing reverse-engineering history calculators sinclair ti hp chips silicon hacks)

Links for 2013-08-30

Published August 30, 2013

Microsoft CEO Steve Ballmer retires: A firsthand account of the company’s employee-ranking system

LOL MS. Sadly, this talk of "core competencies" and "visibility" is pretty reminiscent of Amazon's review season, too:
This illustrated another problem with [stack ranking]: It destroyed trust between individual contributors and management, because the stack rank required that all lower-level managers systematically lie to their reports. Why? Because for years Microsoft did not admit the existence of the stack rank to nonmanagers. Knowledge of the process gradually leaked out, becoming a recurrent complaint on the much-loathed (by Microsoft) Mini-Microsoft blog, where a high-up Microsoft manager bitterly complained about organizational dysfunction and was joined in by a chorus of hundreds of employees. The stack rank finally made it into a Vanity Fair article in 2012, but for many years it was not common knowledge, inside or outside Microsoft. It was presented to the individual contributors as a system of objective assessment of “core competencies,” with each person being judged in isolation. When review time came, and programmers would fill out a short self-assessment talking about their achievements, strengths, and weaknesses, only some of them knew that their ratings had been more or less already foreordained at the stack rank. [...] If you did know about the stack rank, you weren’t supposed to admit it. So you went through the pageantry of the performance review anyway, arguing with your manager in the rhetoric of “core competencies.” The managers would respond in kind. Since the managers had little control over the actual score and attendant bonus and raise (if any), their job was to write a review to justify the stack rank in the language of absolute merit. (“Higher visibility” was always a good catch-all: Sure, you may be a great coder and work 80 hours a week, but not enough people have heard of you!)

(tags: amazon stack-ranking employees ranking work microsoft core-competencies)

Links for 2013-08-29

Published August 29, 2013

BBC News - How one man turns annoying cold calls into cash

This is hilarious. Quid pro quo!
Once he had set up the 0871 line, every time a bank, gas or electricity supplier asked him for his details online, he submitted it as his contact number. He added he was "very honest" and the companies did ask why he had a premium number. He told the programme he replied: "Because I'm getting annoyed with PPI phone calls when I'm trying to watch Coronation Street so I'd rather make 10p a minute." He said almost all of the companies he dealt with were happy to use it and if they refused he asked them to email.

(tags: spam cold-calls phone ads uk funny 0871 premium-rate ppi)
The Edge Minecraft cover

This is brilliant. Half of the office now wants prints.
Massive congratulations to Edge magazine. The stellar publication has been around for 20 years! To celebrate, their 258th issue comes in 20 different flavours, and one of those flavours includes the earthly overtones of both Minecraft and Dungeons & Dragons. Junkboy drew it, and I [Owen] worded it a few weeks ago.

(tags: covers images edge minecraft gaming funny dungeons-and-dragons retro dnd)
Forecast Blog

Forecast.io are doing such a great job of applying modern machine-learning to traditional weather data. "Quicksilver" is their neural-net-adjusted global temperature geodata, and here's how it's built

(tags: quicksilver forecast forecast.io neural-networks ai machine-learning algorithms weather geodata earth temperature)
_MillWheel: Fault-Tolerant Stream Processing at Internet Scale_ [paper, pdf]

from VLDB 2013:
MillWheel is a framework for building low-latency data-processing applications that is widely used at Google. Users specify a directed computation graph and application code for individual nodes, and the system manages persistent state and the continuous flow of records, all within the envelope of the framework’s fault-tolerance guarantees. This paper describes MillWheel’s programming model as well as its implementation. The case study of a continuous anomaly detector in use at Google serves to motivate how many of MillWheel’s features are used. MillWheel’s programming model provides a notion of logical time, making it simple to write time-based aggregations. MillWheel was designed from the outset with fault tolerance and scalability in mind. In practice, we find that MillWheel’s unique combination of scalability, fault tolerance, and a versatile programming model lends itself to a wide variety of problems at Google.

(tags: millwheel google data-processing cep low-latency fault-tolerance scalability papers event-processing stream-processing)

Links for 2013-08-28

Published August 28, 2013

GCHQ tapping at least 14 EU fiber-optic cables

Süddeutsche Zeitung (SZ) had already revealed in late June that the British had access to the cable TAT-14, which connects Germany with the USA, UK, Denmark, France and the Netherlands. In addition to TAT-14, the other cables that GCHQ has access to include Atlantic Crossing 1, Circe North, Circe South, Flag Atlantic-1, Flag Europa-Asia, SeaMeWe-3 and SeaMeWe-4, Solas, UK France 3, UK Netherlands-14, Ulysses, Yellow and the Pan European Crossing.

(tags: sz germany cables fiber-optic tapping snooping tat-14 eu politics gchq)
In historic vote, New Zealand bans software patents | Ars Technica

This is amazing news. Paying attention, Sean Sherlock?
A major new patent bill, passed in a 117-4 vote by New Zealand's Parliament after five years of debate, has banned software patents. The relevant clause of the patent bill actually states that a computer program is "not an invention." Some have suggested that was a way to get around the wording of the TRIPS intellectual property treaty, which requires patents to be "available for any inventions, whether products or processes, in all fields of technology." [...] One Member of Parliament who was deeply involved in the debate, Clare Curran, quoted several heads of software firms complaining about how the patenting process allowed "obvious things" to get patented and that "in general software patents are counter-productive." Curran quoted one developer as saying, "It's near impossible for software to be developed without breaching some of the hundreds of thousands of patents granted around the world for obvious work." "These are the heavyweights of the new economy in software development," said Curran. "These are the people that needed to be listened to, and thankfully, they were."

(tags: new-zealand nz patents swpats law trips ip software-patents yay)
Docker: Git for deployment

Docker is to deployment as Git is to development. Developers are able to leverage Git's performance and flexibility when building applications. Git encourages experiments and doesn't punish you when things go wrong: start your experiments in a branch, if things fall down, just git rebase or git reset. It's easy to start a branch and fast to push it. Docker encourages experimentation for operations. Containers start quickly. Building images is a snap. Using another images as a base image is easy. Deploying whole images is fast, and last but not least, it's not painful to rollback. Fast + flexible = deployments are about to become a lot more enjoyable.

(tags: docker deployment sysadmin ops devops vms vagrant virtualization containers linux git)

Links for 2013-08-27

Published August 27, 2013

Using set cover algorithm to optimize query latency for a large scale distributed graph | LinkedIn Engineering

how LI solved a tricky graph-database-query latency problem with a set-cover algorithm

(tags: linkedin algorithms coding distributed-systems graph databases querying set-cover set replication)
How might the feds have snooped on Lavabit?

"I have been told that they cannot change your fundamental business practices," said Callas, who unlike Levison was able to say SilentCircle has received no NSLs or court orders of any kind. "I presume that would mean things like getting SSL keys because that would mean they could impersonate your servers. That would be like setting up a store front that says your business name and putting [government agents] in your company uniforms." Similarly, he added: "They cannot make changes to existing operating systems. They can't make you change source code." To which [Lavabit's] Levison replied: "That was always my understanding, too. That's why this is so important. Like [Callas] at SilentCircle said, the assumption has been that the government can't force us to change our business practices like that and compromise that information. Like I said, I don't hold those beliefs anymore."

(tags: ars-technica security privacy nsls ssl silentcircle jon-callas crypto)
Lock-Based vs Lock-Free Concurrent Algorithms

An excellent post from Martin Thompson showing a new JSR166 concurrency primitive, StampedLock, compared against a number of alternatives in a simple microbenchmark. The most interesting thing for me is how much the lock-free, AtomicReference.compareAndSet()-based approach blows away all the lock-based approaches -- even in the 1-reader-1-writer case. Its code is extremely simple, too: https://github.com/mjpt777/rw-concurrency/blob/master/src/LockFreeSpaceship.java

(tags: concurrency java threads lock-free locking compare-and-set cas atomic jsr166 microbenchmarks performance)
Juniper Adds Puppet support

This is super-cool. 'Network engineering no longer should be mundane tasks like conf, set interfaces fe-0/0/0 unit o family inet address 10.1.1.1/24. How does mindless CLI work translate to efficiently spent time ? What if you need to change 300 devices? What if you are writing it by hand? An error-prone waste of time. Juniper today announced Puppet support for their 12.2R3,5 JUNOS code. This is compatible with EX4200, EX4550, and QFX3500 switches. These are top end switches, but this start is directly aimed at their DC and enterprise devices. Initially, the manifest interactions offered are interface, layer 2 interface, vlan, port aggregation groups, and device names.' Based on what I saw in the Network Automation team in Amazon, this is an amazing leap forward; it'd instantly render obsolete a bunch of horrific SSH-CLI automation cruft.

(tags: ssh cli automation networking networks puppet ops juniper cisco)
awscli

The future of the AWS command line tools is awscli, a single, unified, consistent command line tool that works with almost all of the AWS services. Here is a quick list of the services that awscli currently supports: Auto Scaling, CloudFormation, CloudSearch, CloudWatch, Data Pipeline, Direct Connect, DynamoDB, EC2, ElastiCache, Elastic Beanstalk, Elastic Transcoder, ELB, EMR, Identity and Access Management, Import/Export, OpsWorks, RDS, Redshift, Route 53, S3, SES, SNS, SQS, Storage Gateway, Security Token Service, Support API, SWF, VPC. Support for the following appears to be planned: CloudFront, Glacier, SimpleDB. The awscli software is being actively developed as an open source project on Github, with a lot of support from Amazon. You’ll note that the biggest contributors to awscli are Amazon employees with Mitch Garnaat leading. Mitch is also the author of boto, the amazing Python library for AWS.

(tags: aws awscli cli tools command-line ec2 s3 amazon api)

Links for 2013-08-26

Published August 26, 2013

Let Me Explain Why Miley Cyrus’ VMA Performance Was Our Top Story This Morning | The Onion - America's Finest News Source

Absolute genius from The Onion.
Those of us watching on Google Analytics saw the number of homepage visits skyrocket the second we put up that salacious image of Miley Cyrus dancing half nude on the VMA stage. But here’s where it gets great: We don’t just do a top story on the VMA performance and call it a day. No, no. We also throw in a slideshow called “Evolution of Miley,” which, for those of you who don’t know, is just a way for you to mindlessly click through 13 more photos of Miley Cyrus. And if we get 500,000 of you to do that, well, 500,000 multiplied by 13 means we can get 6.5 million page views on that slideshow alone. Throw in another slideshow titled “6 ‘don’t miss’ VMA moments,” and it’s starting to look like a pretty goddamned good Monday, numbers-wise. Also, there are two videos -- one of the event and then some bullshit two-minute clip featuring our “entertainment experts” talking about the performance. Side note: Advertisers, along with you idiots, love videos. Another side note: The Miley Cyrus story was in the same top spot we used for our 9/11 coverage.

(tags: humor journalism cnn miley-cyrus vma news funny advertising ads)
Why wireless mesh networks won't save us from censorship

I'm not saying mesh networks don't work ever; the people in the wireless mesh community I've met are all great people doing fantastic work. What I am saying is that unplanned wireless mesh networks never work at scale. I think it's a great problem to think about, but in terms of actual allocation of time and resources I think there are other, more fruitful avenues of action to fight Internet censorship.
(via Kragen)

(tags: wireless censorship internet networking mesh mesh-networks organisation scaling wifi)

Links for 2013-08-25

Published August 25, 2013

Information on Google App Engine's recent US datacenter relocations - Google Groups

or, really, 'why we had some glitches and outages recently'. A few interesting tidbits about GAE innards though (via Bill De hOra)

(tags: gae google app-engine outages ops paxos eventual-consistency replication storage hrd)

Links for 2013-08-24

Published August 24, 2013

Newest YouTube user to fight a takedown is copyright guru Lawrence Lessig

This is lovely. Here's hoping it provides a solid precedent.
Illegitimate or simply unnecessary copyright claims are, unfortunately, commonplace in the Internet era. But if there's one person who's probably not going to back down from a claim of copyright infringement, it's Larry Lessig, one of the foremost writers and thinkers on digital-age copyright. [..] If Liberation Music was thinking they'd have an easy go of it when they demanded that YouTube take down a 2010 lecture of Lessig's entitled "Open," they were mistaken. Lessig has teamed up with the Electronic Frontier Foundation to sue Liberation, claiming that its overly aggressive takedown violates the DMCA and that it should be made to pay damages.

(tags: liberation-music eff copyright law larry-lessig fair-use)
TCP is UNreliable

Great account from Cliff Click describing an interest edge-case risk of using TCP without application-level acking, and how it caused a messy intermittent bug in production.
In all these failures the common theme is that the receiver is very heavily loaded, with many hundreds of short-lived TCP connections being opened/read/closed every second from many other machines. The sender sends a ‘SYN’ packet, requesting a connection. The sender (optimistically) sends 1 data packet; optimistic because the receiver has yet to acknowledge the SYN packet. The receiver, being much overloaded, is very slow. Eventually the receiver returns a ‘SYN-ACK’ packet, acknowledging both the open and the data packet. At this point the receiver’s JVM has not been told about the open connection; this work is all opening at the OS layer alone. The sender, being done, sends a ‘FIN’ which it does NOT wait for acknowledgement (all data has already been acknowledged). The receiver, being heavily overloaded, eventually times-out internally (probably waiting for the JVM to accept the open-call, and the JVM being overloaded is too slow to get around to it) – and sends a RST (reset) packet back…. wiping out the connection and the data. The sender, however, has moved on – it already sent a FIN & closed the socket, so the RST is for a closed connection. Net result: sender sent, but the receiver reset the connection without informing either the JVM process or the sender.

(tags: tcp protocols SO_LINGER FIN RST connections cliff-click ip)
The ultimate SO_LINGER page, or: why is my tcp not reliable

If we look at the HTTP protocol, there data is usually sent with length information included, either at the beginning of an HTTP response, or in the course of transmitting information (so called ‘chunked’ mode). And they do this for a reason. Only in this way can the receiving end be sure it received all information that it was sent. Using the shutdown() technique above really only tells us that the remote closed the connection. It does not actually guarantee that all data was received correctly by program B. The best advice is to send length information, and to have the remote program actively acknowledge that all data was received.

(tags: SO_LINGER sockets tcp ip networking linux protocols shutdown FIN RST)

Links for 2013-08-23

Published August 23, 2013

NZ police affidavits show use of PRISM for surveillance of Kim "Megaupload" Dotcom

The discovery was made by blogger Keith Ng who wrote on his On Point blog (http://publicaddress.net/onpoint/ich-bin-ein-cyberpunk/) that the Organised and Financial Crime Agency New Zealand (OFCANZ) requested assistance from the Government Communications Security Bureau (GCSB), the country's signals intelligence unit, which is charge of surveilling the Pacific region under the Five-Eyes agreement. A list of so-called selectors or search terms were provided to GCSB by the police [PDF, redacted] for the surveillance of emails and other data traffic generated by Dotcom and his Megaupload associates. 'Selectors' is the term used for the National Security Agency (NSA) XKEYSCORE categorisation system that Australia and New Zealand contribute to and which was leaked by Edward Snowden as part of his series of PRISM revelations. Some "selectors of interest" have been redacted out, but others such as Kim Dotcom's email addresses, the mail proxy server used for some of the accounts and websites, remain in the documents.
So to recap; police investigating an entirely non-terrorism-related criminal case in NZ was given access to live surveillance traffic for surveillance of an NZ citizen. Scary stuff

(tags: surveillance prism nsa new-zealand xkeyscore gcsb kim-dotcom piracy privacy data-retention megaupload filesharing)
"Scalable Eventually Consistent Counters over Unreliable Networks" [paper, pdf]

Counters are an important abstraction in distributed computing, and play a central role in large scale geo-replicated systems, counting events such as web page impressions or social network "likes". Classic distributed counters, strongly consistent, cannot be made both available and partition-tolerant, due to the CAP Theorem, being unsuitable to large scale scenarios. This paper defines Eventually Consistent Distributed Counters (ECDC) and presents an implementation of the concept, Handoff Counters, that is scalable and works over unreliable networks. By giving up the sequencer aspect of classic distributed counters, ECDC implementations can be made AP in the CAP design space, while retaining the essence of counting. Handoff Counters are the first CRDT (Conflict-free Replicated Data Type) based mechanism that overcomes the identity explosion problem in naive CRDTs, such as G-Counters (where state size is linear in the number of independent actors that ever incremented the counter), by managing identities towards avoiding global propagation, and garbage collecting temporary entries. The approach used in Handoff Counters is not restricted to counters, being more generally applicable to other data types with associative and commutative operations.

(tags: pdf papers eventual-consistency counters distributed-systems distcomp cap-theorem ecdc handoff-counters crdts data-structures g-counters)

Links for 2013-08-21

Published August 21, 2013

LMDB response to a LevelDB-comparison blog post

This seems like a good point to note about LMDB in general:
We state quite clearly that LMDB is read-optimized, not write-optimized. I wrote this for the OpenLDAP Project; LDAP workloads are traditionally 80-90% reads. Write performance was not the goal of this design, read performance is. We make no claims that LMDB is a silver bullet, good for every situation. It’s not meant to be – but it is still far better at many things than all of the other DBs out there that *do* claim to be good for everything.

(tags: lmdb leveldb databases openldap storage persistent)
How to avoid crappy ISP caches when viewing YouTube video

Must give this a try when I get home -- I frequently have latency problems watching YT on my UPC connection, and I bet they have a crappily-managed, overloaded cache box on their network.

(tags: streaming youtube caching isps caches firewalls iptables hacks video networking)
How to configure ntpd so it will not move time backwards

The "-x" switch will expand the step/slew boundary from 128ms to 600 seconds, ensuring the time is slewed (drifted slowly towards the correct time at a max of 5ms per second) rather than "stepped" (a sudden jump, potentially backwards). Since slewing has a max of 5ms per second, time can never "jump backwards", which is important to avoid some major application bugs (particularly in Java timers).

(tags: ntpd time ntp ops sysadmin slew stepping time-synchronization linux unix java bugs)
Snowizard

'a Java port of Twitter's Snowflake thrift service presented as an HTTP-based Dropwizard service'.
an HTTP-based service for generating unique ID numbers at high scale with some simple guarantees. supports returning ID numbers as: JSON and JSONP; Google's Protocol Buffers; Plain text. At GE, we were more interested in the uncoordinated aspects of Snowflake than its throughput requirements, so HTTP was fine for our needs. We also exposed the core of Snowflake as an embeddable module so it can be directly integrated into our applications. We don't have the guarantees that the Snowflake-Zookeeper integration was providing, but that was also acceptable to us. In places where we really needed high throughput, we leveraged the snowizard-core embeddable module directly.
Odd OSS license, though -- BSDish?

(tags: java open-source ids soa services snowflake http)
Containers and Docker: How Secure Are They?

pretty extensive article. (via Tony Finch)

(tags: via:fanf security containerization docker containers lxc linux ops)

Links for 2013-08-20

Published August 20, 2013

Groklaw - Forced Exposure ~pj

I loved doing Groklaw, and I believe we really made a significant contribution. But even that turns out to be less than we thought, or less than I hoped for, anyway. My hope was always to show you that there is beauty and safety in the rule of law, that civilization actually depends on it. How quaint. If you have to stay on the Internet, my research indicates that the short term safety from surveillance, to the degree that is even possible, is to use a service like Kolab for email, which is located in Switzerland, and hence is under different laws than the US, laws which attempt to afford more privacy to citizens. I have now gotten for myself an email there, p.jones at mykolab.com in case anyone wishes to contact me over something really important and feels squeamish about writing to an email address on a server in the US. But both emails still work. It's your choice. My personal decision is to get off of the Internet to the degree it's possible. I'm just an ordinary person. But I really know, after all my research and some serious thinking things through, that I can't stay online personally without losing my humanness, now that I know that ensuring privacy online is impossible. I find myself unable to write. I've always been a private person. That's why I never wanted to be a celebrity and why I fought hard to maintain both my privacy and yours. Oddly, if everyone did that, leap off the Internet, the world's economy would collapse, I suppose. I can't really hope for that. But for me, the Internet is over. So this is the last Groklaw article. I won't turn on comments. Thank you for all you've done. I will never forget you and our work together. I hope you'll remember me too. I'm sorry I can't overcome these feelings, but I yam what I yam, and I tried, but I can't.

(tags: nsa surveillance privacy groklaw law us-politics data-protection snooping mail kolab)
Nelson's Weblog: tech / bad / failure-of-encryption

One of the great failures of the Internet era has been giving up on end-to-end encryption. PGP dates back to 1991, 22 years ago. It gave us the technical means to have truly secure email between two people. But it was very difficult to use. And in 22 years no one has ever meaningfully made email encryption really usable. [...] We do have SSL/HTTPS, the only real end-to-end encryption most of us use daily. But the key distribution is hopelessly centralized, authority rooted in 40+ certificates. At least 4 of those certs have been compromised by blackhat hackers in the past few years. How many more have been subverted by government agencies? I believe the SSL Observatory is the only way we’d know.
We do also have SSH. Maybe more services need to adopt that model?

(tags: ssh ssl tls pki crypto end-to-end pgp security surveillance)
Recordinality

a new, and interesting, sketching algorithm, with a Java implementation:
Recordinality is unique in that it provides cardinality estimation like HLL, but also offers "distinct value sampling." This means that Recordinality can allow us to fetch a random sample of distinct elements in a stream, invariant to cardinality. Put more succinctly, given a stream of elements containing 1,000,000 occurrences of 'A' and one occurrence each of 'B' - 'Z', the probability of any letter appearing in our sample is equal. Moreover, we can also efficiently store the number of times elements in our distinct sample have been observed. This can help us to understand the distribution of occurrences of elements in our stream. With it, we can answer questions like "do the elements we've sampled present in a power law-like pattern, or is the distribution of occurrences relatively even across the set?"

(tags: sketching coding algorithms recordinality cardinality estimation hll hashing murmurhash java)

Links for 2013-08-19

Published August 19, 2013

You can't "waste your vote"!

A fantastic infographic explaining Australia's Preferential Voting system, featuring Dennis the Election Koala and Ken the Voting Dingo

(tags: infographics funny pr voting australia images via:fp)
David Miranda, schedule 7 and the danger that all reporters now face | Alan Rusbridger | Comment is free | The Guardian

The man was unmoved. And so one of the more bizarre moments in the Guardian's long history occurred – with two GCHQ security experts overseeing the destruction of hard drives in the Guardian's basement just to make sure there was nothing in the mangled bits of metal which could possibly be of any interest to passing Chinese agents. "We can call off the black helicopters," joked one as we swept up the remains of a MacBook Pro. Whitehall was satisfied, but it felt like a peculiarly pointless piece of symbolism that understood nothing about the digital age. We will continue to do patient, painstaking reporting on the Snowden documents, we just won't do it in London. The seizure of Miranda's laptop, phones, hard drives and camera will similarly have no effect on Greenwald's work. The state that is building such a formidable apparatus of surveillance will do its best to prevent journalists from reporting on it. Most journalists can see that. But I wonder how many have truly understood the absolute threat to journalism implicit in the idea of total surveillance, when or if it comes – and, increasingly, it looks like "when". We are not there yet, but it may not be long before it will be impossible for journalists to have confidential sources. Most reporting – indeed, most human life in 2013 – leaves too much of a digital fingerprint. Those colleagues who denigrate Snowden or say reporters should trust the state to know best (many of them in the UK, oddly, on the right) may one day have a cruel awakening. One day it will be their reporting, their cause, under attack. But at least reporters now know to stay away from Heathrow transit lounges.

(tags: nsa gchq surveillance spying snooping guardian reporters journalism uk david-miranda glenn-greenwald edward-snowden)
al3x/sovereign

'Sovereign is a set of Ansible playbooks that you can use to build and maintain' your own GMail/Google calendar/etc. on a VPS. Some up-to-date hosting tips, basically

(tags: sovereign gmail google vps ansible al3x hosting)

Links for 2013-08-17

Published August 17, 2013

New Tweets per second record, and how | Twitter Blog

How Twitter scaled up massively in 3 years -- replacing Ruby with the JVM, adopting SOA and custom sharding. Good summary post, looking forward to more techie details soon

(tags: twitter performance scalability jvm ruby soa scaling)

Links for 2013-08-16

Published August 16, 2013

Massive Overblocking Hits Hundreds Of UK Sites | Techdirt

Customers of UK ISPs Virgin Media and Be Broadband found they were unable to access hundreds of sites, including the Radio Times and Zooniverse, due to a secret website-blocking court order from the Premier League. PC Pro believe that 3 other ISPs' customers were also affected. According to customers reverse-engineering, it looks like the court order incorrectly demanded the blocking of "http-redirection-a.dnsmadeeasy.com", a HTTP redirector operated by the DNS operator DNSMadeEasy.
The fact that the court could issue an order which didn’t see this coming and that the ISPs would act on it without checking that what they were doing was sensible is, in my opinion, extremely worrying.

(tags: overblocking censorship org uk sky be-broadband virgin-media dnsmadeeasy filtering premier-league false-positives isps)

Links for 2013-08-15

Published August 15, 2013

Beating the CAP Theorem Checklist

'Your ( ) tweet ( ) blog post ( ) marketing material ( ) online comment advocates a way to beat the CAP theorem. Your idea will not work. Here is why it won't work:' lovely stuff, via Bill De hOra

(tags: via:dehora funny cap cs distributed-systems distcomp networking partitions state checklists)
'Sparrow: Scalable Scheduling for Sub-Second Parallel Jobs' [tech report]

(tags: scheduling sparrow load-balancing algorithms distributed-systems distcomp papers)

Links for 2013-08-14

Published August 14, 2013

From derelict to delightful: Art Tunnel Smithfield

I do like the Art Tunnel. Smithfield is a great demo of reclaiming Dublin's increasing dereliction and I hope the DCC allow this to continue

(tags: smithfield d7 dublin ireland art art-tunnel reclamation derelict economy dcc)
How A 'Deviant' Philosopher Built Palantir, A CIA-Funded Data-Mining Juggernaut - Forbes

Palantir -- the free-market state-surveillance data-retention nightmare. At the end of this slightly overenthusiastic puff piece we get to:
Katz-Lacabe wasn’t impressed. Palantir’s software, he points out, has no default time limits -- all information remains searchable for as long as it’s stored on the customer’s servers. And its auditing function? “I don’t think it means a damn thing,” he says. “Logs aren’t useful unless someone is looking at them.” [...] What if Palantir’s audit logs -- its central safeguard against abuse -- are simply ignored? Karp responds that the logs are intended to be read by a third party. In the case of government agencies, he suggests an oversight body that reviews all surveillance -- an institution that is purely theoretical at the moment. “Something like this will exist,” Karp insists. “Societies will build it, precisely because the alternative is letting terrorism happen or losing all our liberties.” Palantir’s critics, unsurprisingly, aren’t reassured by Karp’s hypothetical court. Electronic Privacy Information Center activist Amie Stepanovich calls Palantir “naive” to expect the government to start policing its own use of technology. The Electronic Frontier Foundation’s Lee Tien derides Karp’s argument that privacy safeguards can be added to surveillance systems after the fact. “You should think about what to do with the toxic waste while you’re building the nuclear power plant,” he argues, “not some day in the future.”

(tags: palantir data-retention privacy surveillance state cia forbes andy-greenberg eff epic snooping)

Links for 2013-08-13

Published August 13, 2013

London orders rubbish bins to stop collecting smartphone data

Good call.
AUTHORITIES IN LONDON’S financial district have ordered a company using high-tech rubbish bins to collect smartphone data from passers-by to cease its activities, and referred the firm to the privacy watchdog. The City of London Corporation, which manages the so-called “Square Mile” around St Paul’s Cathedral, said such data collection “needs to stop” until there could be a public debate about it.
(via Daragh O'Brien)

(tags: via:dobrien privacy phones wifi mac-address data-protection data-retention renew london bins snooping sniffing)
The Irish State wishes to uninvent computers with new FOI Bill

Mark Coughlan noticed this:
The FOI body shall take reasonable steps to search for and extract the records to which the request relates, having due regard to the steps that would be considered reasonable if the records were held in paper format.
In other words, pretend that computerised database technology, extant since the 1960s, does not exist. Genius (via Simon McGarr)

(tags: funny irish ireland foi open-data freedom computerisation punch-cards paper databases)
Hamlet is Banned in the British Library

Pretty hilarious account of the usual, run-of-the-mill overblocking in the British Library from last weekend:
I asked [the information desk] if they saw the problem, perhaps just the symbolism, of Hamlet being banned in the British Library. They shrugged. The IT department said there was nothing to be done, as it was only the British Library's wifi service that was blocking Hamlet, and the British Library's wifi service, they seemed sure, had nothing to do with the British Library. They were merely ships that passed in the night. Children crying to each other from either bank of an uncrossable river.

(tags: censorship filters overblocking hamlet shakespeare literature funny sad british-library blocking)
The algorithm for a perfectly balanced photo gallery – Summit Stories from Crispy Mountain

Nice application of a partitioning exhaustive search algorithm using dynamic programming (via Tom)

(tags: algorithms javascript python dynamic-programming partitioning images gallery)
Soviets map America

An amazing Soviet map of the US economy from 1979. Wonderful piece of cold war memorabilia

(tags: cold-war ussr usa mapping maps soviet economy memorabilia)

Links for 2013-08-12

Published August 12, 2013

Randomly Failed! The State of Randomness in Current Java Implementations

This would appear to be the paper which sparked off the drama around BitCoin thefts from wallets generated on Android devices:
The SecureRandom PRNG is the primary source of randomness for Java and is used e.g., by cryptographic operations. This underlines its importance regarding security. Some of fallback solutions of the investigated implementations [are] revealed to be weak and predictable or capable of being in?uenced. Very alarming are the defects found in Apache Harmony, since it is partly used by Android.
More on the BitCoin drama: https://bitcointalk.org/index.php?topic=271486.40 , http://bitcoin.org/en/alert/2013-08-11-android

(tags: android java prng random security bugs apache-harmony apache crypto bitcoin papers)
The Getty Museum offers a huge chunk of their collection for free use

We’ve launched the Open Content Program to share, freely and without restriction, as many of the Getty’s digital resources as possible. The initial focus of the Open Content Program is to make available all images of public domain artworks in the Getty’s collections. Today we’ve taken a first step toward this goal by making roughly 4,600 high-resolution images of the Museum’s collection free to use, modify, and publish for any purpose. Why open content? Why now? The Getty was founded on the conviction that understanding art makes the world a better place, and sharing our digital resources is the natural extension of that belief. This move is also an educational imperative. Artists, students, teachers, writers, and countless others rely on artwork images to learn, tell stories, exchange ideas, and feed their own creativity. In its discussion of open content, the most recent Horizon Report, Museum Edition stated that “it is now the mark—and social responsibility—of world-class institutions to develop and share free cultural and educational resources.” I agree wholeheartedly.

(tags: getty art via:tupp_ed open-content free images pictures paintings museums)
The NSA Is Commandeering the Internet - Bruce Schneier

You, an executive in one of those companies, can fight. You'll probably lose, but you need to take the stand. And you might win. It's time we called the government's actions what it really is: commandeering. Commandeering is a practice we're used to in wartime, where commercial ships are taken for military use, or production lines are converted to military production. But now it's happening in peacetime. Vast swaths of the Internet are being commandeered to support this surveillance state. If this is happening to your company, do what you can to isolate the actions. Do you have employees with security clearances who can't tell you what they're doing? Cut off all automatic lines of communication with them, and make sure that only specific, required, authorized acts are being taken on behalf of government. Only then can you look your customers and the public in the face and say that you don't know what is going on -- that your company has been commandeered.

(tags: nsa america politics privacy data-protection data-retention law google microsoft security bruce-schneier)
We are the Operations team at Etsy. Ask us anything! : IAmA

great AMA from Etsy ops staff (via Nelson)

(tags: etsy reddit devops ops architecture ama via:nelson)

Links for 2013-08-09

Published August 9, 2013

Building a panopticon: The evolution of the NSA’s XKeyscore

This is an amazing behind-the-scenes look at the architecture of XKeyscore, and how it evolved from an earlier large-scale packet interception system, Narus' Semantic Traffic Analyzer. XKeyscore is a federated, distributed system, with distributed packet-capture agents running on Linux, built with protocol-specific plugins, which write 3 days of raw packet data, and 30 days of intercept metadata, to local buffer stores. Central queries are then 'distributed across all of the XKeyscore tap sites, and any results are returned and aggregated'. Dunno about you, but this is pretty much how I would have built something like this, IMO....

(tags: panopticon xkeyscore nsa architecture scalability packet-capture narus sniffing snooping interception lawful-interception li tapping)

Links for 2013-08-08

Published August 8, 2013

Police may block recording with Apple patent

Creeptastic, Apple.
Apple has patented a piece of technology which would allow government and police to block transmission of information, including video and photographs, from any public gathering or venue they deem “sensitive”, and “protected from externalities.” In other words, these powers will have control over what can and cannot be documented on wireless devices during any public event. And while the company says the affected sites are to be mostly cinemas, theaters, concert grounds and similar locations, Apple Inc. also says “covert police or government operations may require complete ‘blackout’ conditions.”

(tags: apple iphone via:devore creepy police photos recording remote-control phones blackout)

Links for 2013-08-07

Published August 7, 2013

Ivan Risti?: Defending against the BREACH attack

One interesting response to this HTTPS compression-based MITM attack:
The award for least-intrusive and entirely painless mitigation proposal goes to Paul Querna who, on the httpd-dev mailing list, proposed to use the HTTP chunked encoding to randomize response length. Chunked encoding is a HTTP feature that is typically used when the size of the response body is not known in advance; only the size of the next chunk is known. Because chunks carry some additional information, they affect the size of the response, but not the content. By forcing more chunks than necessary, for example, you can increase the length of the response. To the attacker, who can see only the size of the response body, but not anything else, the chunks are invisible. (Assuming they're not sent in individual TCP packets or TLS records, of course.) This mitigation technique is very easy to implement at the web server level, which makes it the least expensive option. There is only a question about its effectiveness. No one has done the maths yet, but most seem to agree that response length randomization slows down the attacker, but does not prevent the attack entirely. But, if the attack can be slowed down significantly, perhaps it will be as good as prevented.

(tags: mitm attacks hacking security compression http https protocols tls ssl tcp chunked-encoding apache)
Totoro Isn't All Cute. For Some, He's the God of Death.

"Everyone, do not worry," read the Studio Ghibli statement. "There's absolutely no truth or configuration that Totoro is the God of Death or that Mei is dead in My Neighbor Totoro."

(tags: totoro studio-ghibli death morbid japan film movies urban-legends alternate plot)

Links for 2013-08-06

Published August 6, 2013

Hogan describes bin charge increases as ‘opportunistic’ - Environmental News | The Irish Times

LOL Greyhound.
Greyhound Recycling last month announced increases of 50 cents a month for customers on a flat monthly charge, 50 cents for each black bin collection for customers who pay by the lift and two cents a kilo for customers who pay by weight only. In a letter to customers, it described the levy as “tax imposed by the Government of Ireland on the people of Ireland”. However, following a complaint to the [National Consumer Agency] that the by-weight increase was 76 per cent more than the [government landfill levy] increase, Greyhound reduced the charge to an additional one cent a kilo.

(tags: greyhound ireland dublin rubbish recycling consumer ripoffs tax)
IrelandOffline broadband availability map

Marking the locations of broadband options in your area, along with VDSL cabinets, local exchanges, and wireless ISP coverage, and the landing sites of submarine cables (presumably from submarinecablemap.com data)

(tags: irelandoffline cables network internet ireland coverage wisps vdsl broadband)

Links for 2013-08-05

Published August 5, 2013

Filters 'not a silver bullet' that will stop perverts, warns Interpol chief - Independent.ie

Sunday Independent interview with Interpol assistant director Mick Moran:
Moran spoke out after child welfare organisations here called on the Government to follow the UK's example by placing anti-pornography filters on Irish home broadband connections. The Irish Society for the Prevention of Cruelty to Children argued that pornography was damaging to young children and should be removed from their line of sight. But Moran warned this would only lull parents into a false sense of security. "If we imagine the access people had to porn in the past – that access is now complete and total. They have access to the most horrific material out there. We now need to focus on parental responsibility about how kids are using the internet."

(tags: mick-moran cam interpol policing ispcc filtering parenting children broadband)
Coordinated Omission

Gil Tene raises an extremely good point about load testing, high-percentile response-time measurement, and behaviour when testing a system under load:
I've been harping for a while now about a common measurement technique problem I call "Coordinated Omission" for a while, which can often render percentile data useless. [...] I believe that this problem occurs extremely frequently in test results, but it's usually hard to deduce it's existence purely from the final data reported. But every once in a while, I see test results where the data provided is enough to demonstrate the huge percentile-misreporting effect of Coordinated Omission based purely on the summary report. I ran into just such a case in Attila's cool posting about log4j2's truly amazing performance, so I decided to avoid polluting his thread with an elongated discussion of how to compute 99.9%'ile data, and started this topic here. That thread should really be about how cool log4j2 is, and I'm certain that it really is cool, even after you correct the measurements. [...] Basically, I think that the 99.99% observation computation is wrong, and demonstrably (using the data in the graph data posted) exhibits the classic "coordinated omission" measurement problem I've been preaching about. This test is not alone in exhibiting this, and there is nothing to be ashamed of when you find yourself making this mistake. I only figured it out after doing it myself many many times, and then I noticed that everyone else seems to also be doing it but most of them haven't yet figured it out. In fact, I run into this issue so often in percentile reporting and load testing that I'm starting to wonder if coordinated omission is there in 99.9% of latency tests ;-)

(tags: measurement testing latency load-testing gil-tene coordinated-omission validity log4j percentiles)
Xerox scanners/photocopiers randomly alter numbers in scanned documents · D. Kriesel

Pretty major Xerox fail: photocopied/scanned docs are found to have replaced the digit '6' with '8', due to a poor choice of compression techniques:
Several mails I got suggest that the xerox machines use JBIG2 for compression. This algorithm creates a dictionary of image patches it finds “similar”. Those patches then get reused instead of the original image data, as long as the error generated by them is not “too high”. Makes sense. This also would explain, why the error occurs when scanning letters or numbers in low resolution (still readable, though). In this case, the letter size is close to the patch size of JBIG2, and whole “similar” letters or even letter blocks get replaced by each other.

(tags: jbig2 compression xerox photocopying scanning documents fonts arial image-compression images)

Links for 2013-08-03

Published August 3, 2013

The 1940s origins of Whataboutery

The exchange is indicative of a rhetorical strategy known as 'whataboutism', which occurs when officials implicated in wrongdoing whip out a counter-example of a similar abuse from the accusing country, with the goal of undermining the legitimacy of the criticism itself. (In Latin, this rhetorical defense is called tu quoque, or "you, too.")

(tags: history language whataboutism whataboutery politics 1940s russia ussr)
etcd

A highly-available key value store for shared configuration and service discovery. etcd is inspired by zookeeper and doozer, with a focus on: Simple: curl'able user facing API (HTTP+JSON); Secure: optional SSL client cert authentication; Fast: benchmarked 1000s of writes/s per instance; Reliable: Properly distributed using Raft; Etcd is written in go and uses the raft consensus algorithm to manage a highly availably replicated log.
One of the core components of CoreOS -- http://coreos.com/ .

(tags: configuration distributed raft ha doozer zookeeper go replication consensus-algorithm etcd coreos)
_In Search of an Understandable Consensus Algorithm_, Diego Ongaro and John Ousterhout, Stanford

Raft is a consensus algorithm for managing a replicated log. It produces a result equivalent to Paxos, and it is as efficient as Paxos, but its structure is different from Paxos; this makes Raft more understandable than Paxos and also provides a better foundation for building practical systems. In order to enhance understandability, Raft separates the key elements of consensus, such as leader election and log replication, and it enforces a stronger degree of coherency to reduce the number of states that must be considered. Raft also includes a new mechanism for changing the cluster membership, which uses overlapping majorities to guarantee safety. Results from a user study demonstrate that Raft is easier for students to learn than Paxos.

(tags: distributed algorithms paxos raft consensus-algorithms distcomp leader-election replication clustering)

Links for 2013-08-02

Published August 2, 2013

Extract from 1973 HM Treasury document concerning post-nuclear-attack responses

'Extract from 1973 HM Treasury document concerning post-nuclear-attack monetary policy' includes this amazing snippet:
[Contingency] ...(d) a total nuclear attack employing high power missiles which would destroy all but a small percentage of the UK population and almost all physical assets or civilised life. [...] As for (d), the money policy would of course be absurdly unrealistic for the few surviving administrators and politicians as they struggled to organise food and shelter for the tiny bands of surviving able-bodied and the probably larger number of sick and dying. Most of the other departments contingency planning might also be irrelevant in such a situation. Within a fairly short time the survivors would evacuate the UK and try to find some sort of life in less-effected countries (southern Ireland?).
Hey, at least they were considering these scenarios. (via Charlie Stross)

(tags: nuclear attack contingency government monetary policy uk ireland history 1960s via:cstross insane fallout)
WhatClinic.com’s zombie recruitment video. We want your brains...

BRAAAAAAINS

(tags: whatclinic braaaaaains zombies funny video recruitment)
Guacamole Norteño

A very tasty-looking guac recipe, from h2g market veteran Lily Ramirez-Foran -- her family's traditional one. I like the addition of pomegranate seeds

(tags: guacamole avocados pomegranate recipes lily-ramirez-foran food h2g)
RA Forum: Button Factory - August 14th Simonetti (Goblin) Horror Project

LIVE - for the first time ever in Ireland, Claudio Simonetti (Goblin) & band will perform the classics of horror movie scores by seminal Italian progressive rock band Goblin, Simonetti himself and possibly one or two curve-balls ! Horror rock maestro Claudio Simonetti will fulfill fans’ dreams and nightmares as the band perform the notably eerie soundtracks from Suspriria, Tenebre, Dawn of the Dead, Creepers, Demons and more! This epic show will also feature an intense A/V screening element featuring the electric scenes from some of these revered classics of horror and giallo.

(tags: goblin bands music horror movies claudio-simonetti)

Links for 2013-07-31

Published July 31, 2013

Python Infrastructure Status - SSL Verification Errors on PyPI

There appears to be a problem affecting a number of users where SSL verification errors will be shown saying "pypi.python.org" does not match "addvocate.com". As Best we can tell this appears to be related to the ISP. It seems to be affecting folks using O2 or O2 related companies. We've also reports of it affecting people using Free. Cause appears to be one of the IP addresses returned in the Geo DNS for Europe returning a certificate for addvocate.com. It's not clear at this time *why* that IP address is returning a certificate for addvocate.com.
Turned out to be a routing loop in the fast.ly London POP (via Mick Twomey)

(tags: via:micktwomey o2 censorship filtering internet ssl tls pypi python geodns pki)
"Toxic" behaviour in games is largely from "usually good" people

Only 5% of toxic behavior comes from toxic people; 77% of it comes from people who are usually good. That finding has all sorts of implications for how to stop toxic behavior in an online community. It’s not enough to just ban the jerks; good people have bad days too. Instead you have to teach the whole community what the community standards are. And quickly identify people who are having a bad day, intervene before their toxicity infects too many other people.
Great post by Nelson.

(tags: gaming toxic bad-behaviour trolls abuse online games league-of-legends)

Links for 2013-07-30

Published July 30, 2013

Setting up FamilyShield

OpenDNS's simple DNS-based blocking of dodgy content. Will need to set this up on the home router now that the kids are surfing...

(tags: opendns dns blocking filtering home porn familyshield)
Mail from the (Velvet) Cybercrime Underground

Brian Krebs manages to thwart an attempted framing for possession of Silk Road heroin. bloody hell

(tags: silk-road drugs bitcoin ecommerce brian-krebs crime framed cybercrime russia scary law-enforcement)
Clare dolphin attacks fourth swimmer in a month as Dusty protects her patch

Dusty the Dolphin has gone bad!
Locals say the three-metre long mammal has been responsible for injuring a number of people over the past two years, with several of those being hospitalised with significant injuries. She struck a 40-year-old woman in the abdomen earlier this month. In response, lifeguards now fly the red danger flag any time the dolphin enters the area. The Irish Whale and Dolphin Group has also erected warning posters at Doolin pier. IWDG coordinator Dr Simon Berrow said: “It is our policy to discourage people swimming with whales and dolphins in Ireland. “We’ve drafted a poster recommending people do not swim with Dusty, but if they must, then they should respect her as a wild dolphin and not grab, lunge or chase after her. If she shows aggressive behaviour or is boisterous they should leave the water.”

(tags: dusty dolphins wildlife nature fanore county-clare ireland swimming doolin animals)

Links for 2013-07-29

Published July 29, 2013

Why YouTube buffers: The secret deals that make -- and break -- online video

Should ISPs be required to ensure they have sufficient upstream bandwidth to video sites like YouTube and Netflix?
"Verizon has chosen to sell its customers a product [Netflix] that they hope those customers don't actually use," Schaeffer said. "And when customers use it and request movies, they have not ensured there is adequate connectivity to get that video content back to their customers."

(tags: netflix youtube streaming video isps net-neutrality peering comcast bandwidth upstream)
ISPAI Responds to Porn Filtering Debacle

Quite a strong statement:
The issue of access to age-inappropriate content is not a new matter and it is important not to have “knee-jerk” reactions which don’t solve the perceived problem and have major implications for the public’s right to access information in general. Notably the European Commission, as stated by vice-president Nellie Kroes [jm: sic], has come out strongly against blocking of the Internet, seeing it as an important platform for freedom of speech and she intends to “guarantee access without restriction.” We in Ireland would do well to consider carefully the impact that any rash adoption or attempted copying of UK measures might have here in the light of current and future EU legislation and policy.

(tags: ispai filtering overblocking david-cameron porn internet ireland politics blocking web uk)
Forecast.io

Excellent weather site, displaying beautifully interpolated rainfall visualization, from the team behind the Dark Sky app

(tags: weather ireland dark-sky apps iphone ipad forecast rain dataviz mapping via:marcomorain)

Links for 2013-07-27

Published July 27, 2013

Applied Cryptography, Cryptography Engineering, and how they need to be updated

Whoa, I had no idea my knowledge of crypto was so out of date! For example:
ECC is going to replace RSA within the next 10 years. New systems probably shouldn’t use RSA at all.
This blogpost is full of similar useful guidelines and rules of thumb. Here's hoping I don't need to work on a low-level cryptosystem any time soon, as the risk of screwing it up is always high, but if I do this is a good reference for how it needs to be done nowadays.

(tags: thomas-ptacek crypto cryptography coding design security aes cbc ctr ecb hmac side-channels rsa ecc)
When 'Smart Homes' Get Hacked: I Haunted A Complete Stranger's House Via The Internet - Forbes

Hardware designers do their usual trick -- omit the whole security part:
[Trustwave's Crowley] found security flaws that would allow a digital intruder to take control of a number of sensitive devices beyond the Insteon systems, from the Belkin WeMo Switch to the Satis Smart Toilet. Yes, they found that a toilet was hackable. You only have to have the Android app for the $5,000 toilet on your phone and be close enough to the toilet to communicate with it. “It connects through Bluetooth, with no username or password using the pin ‘0000’,” said Crowley. “So anyone who has the application on their phone and was connected to the network could control anyone else’s toilet. You could turn the bidet on while someone’s in there.”

(tags: home automation insteon security hardware fail attacks bluetooth han trustwave belkin satis)

Links for 2013-07-26

Published July 26, 2013

France Kills Three Strikes

Missed bookmarking this news --
After years of debate and controversy the French Government has finally backtracked on the law which allowed errant subscribers to be disconnected from the Internet. This morning a decree was published which removed the possibility for file-sharers to have their connections cut for copyright infringement. Instead, those caught by rightsholders will now be subjected to a system of automated fines.

(tags: france legal ip piracy filesharing three-strikes)
BBC News - Chinese firm Huawei controls net filter praised by PM

Talk Talk's porn-filtering, system praised by David Cameron in the UK as a model for porn filtering for the country's ISPs, is operated by Huawei. Of course, there's no possible problems with allowing Huawei, with its alleged close ties to the Chinese government, operate a state-wide internet censorship system in the UK without any functioning oversight, right? ;) Also worth noting: all TalkTalk traffic passes through the Huawei filtering infrastructure, even when the customer has "opted in".

(tags: huawei talk-talk oversight overblocking politics china uk david-cameron filtering censorship)
Branded to death | Features | Times Higher Education

The most abominable monster now threatening the intellectual health and the integrity of pure enquiry as well as conscientious teaching is the language of advertising, or better, the machinery of propaganda. Any number of critics from within university walls have warned the people at large and academics in particular of the way the helots of advertising and the state police of propaganda bloat and distort the language of thoughtful description, peddle with a confident air generalisations without substance, and serenely circulate orotund lies while ignoring their juniors’ rebuttals and abuse.
Relevant to this argument -- http://arstechnica.com/tech-policy/2013/07/the-webs-longest-nightmare-ends-eolas-patents-are-dead-on-appeal/ notes that 'the role of the University of California [was] one of the most perplexing twists in the Eolas saga. The university kept a low profile during the lead-up to trial; but once in Texas, Eolas' lawyers constantly reminded the jury they were asserting "these University of California patents." A lawyer from UC's patent-licensing division described support for Eolas at trial by simply saying that the university "stands by its licensees."'

(tags: branding advertising newspeak universities third-level eolas higher-education education research university-of-california ucb patents ip swpats)

Links for 2013-07-25

Published July 25, 2013

Twilio Billing Incident Post-Mortem

At 1:35 AM PDT on July 18, a loss of network connectivity caused all billing redis-slaves to simultaneously disconnect from the master. This caused all redis-slaves to reconnect and request full synchronization with the master at the same time. Receiving full sync requests from each redis-slave caused the master to suffer extreme load, resulting in performance degradation of the master and timeouts from redis-slaves to redis-master. By 2:39 AM PDT the host’s load became so extreme, services relying on redis-master began to fail. At 2:42 AM PDT, our monitoring system alerted our on-call engineering team of a failure in the Redis cluster. Observing extreme load on the host, the redis process on redis-master was misdiagnosed as requiring a restart to recover. This caused redis-master to read an incorrect configuration file, which in turn caused Redis to attempt to recover from a non-existent AOF file, instead of the binary snapshot. As a result of that failed recovery, redis-master dropped all balance data. In addition to forcing recovery from a non-existent AOF, an incorrect configuration also caused redis-master to boot as a slave of itself, putting it in read-only mode and preventing the billing system from updating account balances.
See also http://antirez.com/news/60 for antirez' response. Here's the takeaways I'm getting from it: 1. network partitions happen in production, and cause cascading failures. this is a great demo of that. 2. don't store critical data in Redis. this was the case for Twilio -- as far as I can tell they were using Redis as a front-line cache for billing data -- but it's worth saying anyway. ;) 3. Twilio were just using Redis as a cache, but a bug in their code meant that the writes to the backing SQL store were not being *read*, resulting in repeated billing and customer impact. In other words, it turned a (fragile) cache into the authoritative store. 4. they should probably have designed their code so that write failures would not result in repeated billing for customers -- that's a bad failure path. Good post-mortem anyway, and I'd say their customers are a good deal happier to see this published, even if it contains details of the mistakes they made along the way.

(tags: redis caching storage networking network-partitions twilio postmortems ops billing replication)
Tuning and benchmarking Java 7's Garbage Collectors: Default, CMS and G1

Rudiger Moller runs through a typical GC-tuning session, in exhaustive detail

(tags: java gc tuning jvm cms g1 ops)
Censum

[JVM] GC is a difficult, specialised area that can be very frustrating for busy developers or devops folks to deal with. The JVM has a number of Garbage Collectors and a bewildering array of switches that can alter the behaviour of each collector. Censum does all of the parsing, number crunching and statistical analysis for you, so you don't have to go and get that PhD in Computer Science in order to solve your GC performance problem. Censum gives you straight answers as opposed to a ton of raw data. can eat any GC log you care to throw at it. is easy to install and use.
Commercial software, UKP 495 per license.

(tags: censum gc tuning ops java jvm commercial)

Links for 2013-07-24

Published July 24, 2013

The Web’s longest nightmare ends: Eolas patents are dead on appeal | Ars Technica

Ding dong, the troll is dead! Ars Technica with a great description of the Eolas web patent fiasco, and the UC system's sorry role. I blame Bayh-Dole for creating this insane mindset where places of learning are forced to "monetize" their research.
Under Doyle's conception of his own invention, practically any modern website owed him royalties. Playing a video online or rotating an image on a shopping website were "interactive" features that infringed his patents. And unlike many "patent trolls" who simply settle for settlements just under the cost of litigation, Doyle's company had the chops, the lawyers, and the early filing date needed to extract tens of millions of dollars from the accused companies. [...] The role of the University of California is one of the most perplexing twists in the Eolas saga. The university kept a low profile during the lead-up to trial; but once in Texas, Eolas lawyers constantly reminded the jury they were asserting "these University of California patents." A lawyer from UC's patent-licensing division described support for Eolas at trial by simply saying that the university "stands by its licensees." (Eolas was technically an exclusive licensee of the UC-owned patent, which also gives it the right to sue.) At the same time, the University of California, and the Berkeley campus in particular, was a key institution in creating early web technology. While UC lawyers cooperated with the plaintiffs, two UC Berkeley-trained computer scientists were key witnesses in the effort to demolish the Eolas patents. Pei-Yuan Wei created the pioneering Viola browser, a key piece of prior art, while he was a student at UC-Berkeley in the early 1990s. Scott Silvey, another UC-Berkeley student at that time, testified about a program he made called VPlot, which allowed users to rotate an image of an airplane using Wei's browser. VPlot and Viola were demonstrated to Sun Microsystems in May 1993, months before Doyle claimed to have conceived of his invention.

(tags: patents swpats eolas web patent-trolls ucb universities research viola plugins berkeley)
Irish Comms Minister Pat Rabbitte ignores calls for State role in blocking online porn

Good call.
Mr Rabbitte says that legal concerns attached to mandatory filters, as well as a fear of imposing censorship, have persuaded him against trying to force ISPs to impose mandatory pornography-blocking internet filters. "I remain to be convinced that blanket censorship or a default-on blocker is the correct or workable response," he said. "Even if it were possible to ensure that such measures were not easily circumvented or didn't inadvertently block perfectly acceptable content, the principled question of whether the State should be encouraging service providers to filter or block content to all users, regardless of whether there are children resident, would still arise."

(tags: pat-rabbitte internet filtering censorship blocking porn overblocking default-on isps ireland)
Grove

Hosted IRC, 20 users for $50/month. Useful now that Google have fecked up Chat entirely

(tags: irc chat collaboration groupware hosted-services)

Links for 2013-07-23

Published July 23, 2013

UK Internet censorship plan no less stupid than it was last year - Boing Boing

Cory Doctorow's long list of articles describing how the UK's censorware-for-all plan is going to fail. I like this bit:
When we argued our case to the vendor's representative, he was categorical: any nudity, anywhere on [Boing Boing], makes it into a "nudity site" for the purposes of blocking. The vendor went so far as to state that a single image of Michelangelo's David, on one page among hundreds of thousands on a site, would be sufficient grounds for a nudity classification. I suspect that none of the censorship advocates in the Lords understand that the offshore commercial operators they're proposing to put in charge of the nation's information access apply this kind of homeopathic standard to objectionable material.
I guess this means the Daily Mail will be similarly classified as containing "nudity" and blocked, given their smut column on every page?

(tags: daily-mail fail censorship censorware boing-boing michelangelo sculpture nudity uk politics filtering overblocking web internet)
Content Aware Typography

Photoshop's "Content Aware Fill" applied to text. some very cool results

(tags: images cool art typography algorithms via:pentadact photoshop)
A Tour Inside CloudFlare's Latest Generation Servers

great transparency from CloudFront! Looking at their current 4th-gen rackmount server buildout -- now with HP after Dell and ZT. Shitloads of SSDs for lower power and greater predictability in failure rates. 128GB RAM. consistent hashing to address stores instead of RAID. Sandybridge chipset. Solarflare SFC9020 10Gbps network cards. This is really impressive openness for a high-scale custom datacenter server platform...

(tags: datacenter cloudflare hardware rackmount ssds intel)
3D-Printer Manufacturer Creates Software Filter To Prevent Firearm Printing

'[Create It REAL], which sells 3D printer component parts and software, recently announced that it has come up with a firearm component detection algorithm that will give 3D printers the option to block any gun parts. The software compares each component a user is trying to print with a database of potential firearms parts, and shuts down the modeling software if it senses the user is trying to make a gun.'

(tags: blocklists filtering guns weapons 3d-printing future firearms)
Fund it :: Upstart Granby Park

help fund Granby Park, a pop-up park to replace a vacant site on the corner of Dominick St and Parnell St in Dublin 1: http://upstart.ie/

(tags: fund-it granby-park dublin d1 parks pop-up city funding grassroots)
Rooting SIM cards

the details of Karsten Nohl's attack against SIM cards, allowing remote-root malware via SMS.
Cracking SIM update keys: [Over The Air] commands, such as software updates, are cryptographically-secured SMS messages, which are delivered directly to the SIM. While the option exists to use state-of-the-art AES or the somewhat outdated 3DES algorithm for OTA, many (if not most) SIM cards still rely on the 70s-era DES cipher. [...] To derive a DES OTA key, an attacker starts by sending a binary SMS to a target device. The SIM does not execute the improperly signed OTA command, but does in many cases respond to the attacker with an error code carrying a cryptographic signature, once again sent over binary SMS. A rainbow table resolves this plaintext-signature tuple to a 56-bit DES key within two minutes on a standard computer.
2 minutes. Sic transit gloria DES. The next step after that is to send a signed request to run a Java applet, then exploit a hole in the JVM sandbox, and the SIM card is rooted. Looking forward to the full paper on July 31st...

(tags: des 3des crypto security sms sim-cards smartcards java applets ota rainbow-tables cracking karsten-nohl)
Machine Learning Speeds TCP

Cool. A machine-learning-generated TCP congestion control algorithm which handily beats sfqCoDel, Vegas, Reno et al. But:
"Although the [computer-generated congestion control algorithms] appear to work well on networks whose parameters fall within or near the limits of what they were prepared for -- even beating in-network schemes at their own game and even when the design range spans an order of magnitude variation in network parameters -- we do not yet understand clearly why they work, other than the observation that they seem to optimize their intended objective well. We have attempted to make algorithms ourselves that surpass the generated RemyCCs, without success. That suggests to us that Remy may have accomplished something substantive. But digging through the dozens of rules in a RemyCC and ?guring out their purpose and function is a challenging job in reverse-engineering. RemyCCs designed for broader classes of networks will likely be even more complex, compounding the problem." So are network engineers willing to trust an algorithm that seems to work but has no explanation as to why it works other than optimizing a specific objective function? As AI becomes increasingly successful the question could also be asked in a wider context.
(via Bill de hOra)

(tags: via-dehora machine-learning tcp networking hmm mit algorithms remycc congestion)

Links for 2013-07-22

Published July 22, 2013

Street Cuffs: L.A. Sees Big Jump In Bike Thefts

Some [LA] bike messengers last month took justice into their own hands when they caught two suspected thieves, teenage boys who attended a local Catholic high school. According to police, the messengers stripped down the teens to their boxer shorts before taking their cellphones, backpacks and clothes. “They meted out street justice. We don’t condone street justice. They never threatened them. But they made it clear: don’t mess with another person’s property,” Los Angeles Police Lt. Paul Vernon said. “This incident and the arrests are the tip of the iceberg when comes to people stealing bicycles.” Vernon said the two boys told police they were robbed by about 20 men on bicycles at 6th Street and Grand Avenue about 3 p.m. on Jan. 12. Investigators said they cannot prove the boys were stealing bikes and continue to look for the assailants.

(tags: cycling theft robbery bike-theft la crime vigilantes cycle-couriers)
ICO’s Tame Investigation Of Google Street View Data Slurping

“People will yet again be asking whether Google has been let off without the kind of full and rigorous investigation that you would expect after this kind of incident,” Nick Pickles, director of the Big Brother Watch, told TechWeekEurope. “Let’s not forget that information was collected without permission from thousands of people’s Wi-Fi networks, in a way that if an individual had done so they would have almost certainly have been prosecuted. It seems strange that ICO [the UK's Data Protection regulatory agency] did not want to inspect the [datacenter] cages housing the data, while it is also troubling that Google’s assurances were taken at face value, despite this not being the first incident where consumers have seen their privacy violated by the company.”

(tags: privacy google ico regulation data-protection snooping wifi sniffing network-traffic street-view)
Mexican Pickled Potatoes

'My researches on the pickling matter had lead me to conclude that Mexico was, in fact, one of the few places where pickled potatoes were “a thing” and, in discussing same with Lily last month at her Mexican food stall in the Honest To Goodness market, I discovered that her soon-to-be-visiting Mexican mama was, in fact, a maker of such pickles. Not long afterward, I watched as Lily sat down with her mother, querying the ways of her pickled potatoes, translating and scribbling instructions for me as the details were recalled, not in an orderly series of steps, but in a series of asides and by-the-ways, by one for whom the practice of pickling potatoes was entirely second nature.'

(tags: pickling yum food mexico potatoes spuds recipes)
Porn to be Blocked in the UK – “What’s new?” Say Pirate Bay Users | TorrentFreak

It seems likely that the ISPs will implement a system similar to the one currently being used by TalkTalk, as the prime minister will specifically single the ISP out for praise in his speech. TalkTalk’s HomeSafe is a system which filters out URLs based on a remote blocklist provided and maintained by…. well, no one quite knows. This is worrying since when things don’t go quite to plan there’s no one to complain to. As previously reported, when TalkTalk customers are asked whether they want to block file-sharing sites, TorrentFreak.com is rendered inaccessible. Despite our pleas and complaints that we are a news resource, the company said it would not remove us from their blocklist. We doubt we’re the only ones being silenced.

(tags: talktalk blocking uk isps torrentfreak politics filtering david-cameron porn overblocking)

Links for 2013-07-20

Published July 20, 2013

The Trello Tech Stack

Good description of how Fog Creek built out their Trello product; client-side JS rendering, model synced across the wire, HAProxy, Redis, and WebSockets. Bookmarked notably for this paragraph, which doesn't ameliorate my fear of WebSockets as a tech:
The Socket.io server currently has some problems with scaling up to more than 10K [jm: oh dear] simultaneous client connections when using multiple processes and the Redis store, and the client has some issues that can cause it to open multiple connections to the same server, or not know that its connection has been severed.

(tags: websockets javascript architecture fog-creek trello ajax push)
Log4j 2: Performance close to insane

Nice writeup on Log4j 2's new AsyncAppender implementation, based on the LMAX Disruptor. sounds pretty excellent:
“One nice little detail I should mention is that both Async Loggers and Async Appenders fix something that has always bothered me in Log4j-1.x, which is that they will flush the buffer after logging the last event in the queue . With Log4j-1.x, if you used buffered I/O, you often could not see the last few log events, as they were still stuck in the memory buffer. Your only option was setting immediateFlush to true, which forces disk I/O on every single log event and has a performance impact. With Async Loggers and Appenders in Log4j-2.0 your log statements are all flushed to disk, so they are always visible, but this happens in a very efficient manner.”

(tags: logging java performance async disruptor low-latency)
Chronicle

an ultra low latency, high throughput, persisted, messaging and event driven in memory database. The typical latency is as low as 80 nano-seconds and supports throughputs of 5-20 million messages/record updates per second. This library also supports distributed, durable, observable collections (Map, List, Set) The performance depends on the data structures used, but simple data structures can achieve throughputs of 5 million elements or key/value pairs in batches (eg addAll or putAll) and 500K elements or key/values per second when added/updated/removed individually. It uses almost no heap, trivial GC impact, can be much larger than your physical memory size (only limited by the size of your disk) and can be shared between processes with better than 1/10th latency of using Sockets over loopback. It can change the way you design your system because it allows you to have independent processes which can be running or not at the same time (as no messages are lost) This is useful for restarting services and testing your services from canned data. e.g. like sub-microsecond durable messaging. You can attach any number of readers, including tools to see the exact state of the data externally.

(tags: library messaging performance java chronicle disk mmap)
Stayhold

a completely new patent pending product designed in Ireland that is going to change the way people use their cars for carrying goods. It is a solid plastic product that grips the carpet in your car and acts as a barrier to hold loose items securely against the side wall in your car trunk or boot.
Found out about this online -- a US-based acquaintance raving about them being worth the shipping from Ireland. nice work!

(tags: stayhold transportation cars boot gadgets toget)

Links for 2013-07-18

Published July 18, 2013

Docker

'the Linux container engine'. I totally misunderstood what Docker was -- this is cool.
Heterogeneous payloads: Any combination of binaries, libraries, configuration files, scripts, virtualenvs, jars, gems, tarballs, you name it. No more juggling between domain-specific tools. Docker can deploy and run them all. Any server: Docker can run on any x64 machine with a modern linux kernel - whether it's a laptop, a bare metal server or a VM. This makes it perfect for multi-cloud deployments. Isolation: Docker isolates processes from each other and from the underlying host, using lightweight containers. Repeatability: Because each container is isolated in its own filesystem, they behave the same regardless of where, when, and alongside what they run.

(tags: lxc containers virtualization cloud ops linux docker deployment)
Next Generation Continuous Integration & Deployment with dotCloud’s Docker and Strider

Since Docker treats it’s images as a tree of derivations from a source image, you have the ability to store an image at each stage of a build. This means we can provide full binary images of the environment in which the tests failed. This allows you to run locally bit-for-bit the same container as the CI server ran. Due to the magic of Docker and AUFS Copy-On-Write filesystems, we can store this cheaply. Often tests pass when built in a CI environment, but when built in another (e.g. production) environment break due to subtle differences. Docker makes it trivial to take exactly the binary environment in which the tests pass, and ship that to production to run it.

(tags: docker strider continuous-integration continuous-deployment deployment devops ops dotcloud lxc virtualisation copy-on-write images)

Links for 2013-07-17

Published July 17, 2013

Pinterest's follower graph store, built on Redis

This is a good, high-availability Redis configuration; sharded by userid across 8192 shards, with a Redis master/slave pair of instances for each set of N shards. I like their use of two redundancy systems -- hot slave and backup snapshots:
We run our cluster in a Redis master-slave configuration, and the slaves act as hot backups. Upon a master failure, we failover the slave as the new master and either bring up a new slave or reuse the old master as the new slave. We rely on ZooKeeper to make this as quick as possible. Each master Redis instance (and slave instance) is configured to write to AOF on Amazon EBS. This ensures that if the Redis instances terminate unexpectedly then the loss of data is limited to 1 second of updates. The slave Redis instances also perform BGsave hourly which is then loaded to a more permanent store (Amazon S3). This copy is also used by Map Reduce jobs for analytics. As a production system, we need many failure modes to guard ourselves. As mentioned, if the master host is down, we will manually failover to slave. If a single master Redis instance reboots, monit restart restores from AOF, implying a 1 second window of data loss on the shards on that instance. If the slave host goes down, we bring up a replacement. If a single slave Redis instance goes down, we rely on monit to restart using the AOF data. Because we may encounter AOF or BGsave file corruption, we BGSave and copy hourly backups to S3. Note that large file sizes can cause BGsave induced delays but in our cluster this is mitigated by smaller Redis data due to the sharding scheme.

(tags: graph redis architecture ha high-availability design redundancy sharding)
Flower Filter

'A simple time-decaying approximate membership filter' -- like a Bloom filter with time decay. See also http://eng.42go.com/flower-filter-an-update/ for some notes on the non-independence of survival probabilities, and how that imposes negligible differences in practice.

(tags: bloom-filter algorithms coding probabilistic approximate time decay)
Spybike

This is brilliant. 'covert bicycle GPS tracker; Notifies you by SMS if your bicycle moves; Online tracking'. 'Spybike is a covert tracking device that is hidden inside your bicycle steerer tube. The device is disguised to look like a normal head set cap to avoid suspicion. If someone steals your bike, you can use SpyBike to track their movements online and on your mobile.' More details: http://www.integratedtrackers.com/GPSTrack/pdf/Spybike_Instructions_2.pdf

(tags: spybike cycling theft gps tracking)
No Time To Spare [infographic]

'On August 2, 2005, a fully-loaded Air France Airbus A340 arriving from Paris crash-landed at Toronto's Pearson International Airport and caught fire. Only 4 of the 8 exits were usable, yet all 309 people on board made it off the aircraft in two minutes, before it was consumed by flames. Here, five of the passengers recount their escape.'

(tags: infographics travel air accidents fire airbus safety escape a340)

Links for 2013-07-16

Published July 16, 2013

Merkel call for data protection rules puts Ireland in spotlight - Technology News

Irish Times on EU unhappiness with Ireland's "light touch" data protection regime:
Hawkes’s appearance last month on RTÉ’s Morning Ireland regarding the US Prism surveillance programme, since posted to YouTube, reheated lingering resentment among many European data authorities. His admission that he “knew in a general way” about such programmes and didn’t “regard this particular revelation as particularly new” was a red rag to his European colleagues who fear Ireland is the transmission point of wholesale EU data to the US.

(tags: eu ireland data-protection privacy billy-hawkes regulation dpc)
Java Garbage Collection Distilled

a great summary of the state of JVM garbage collection from Martin Thompson

(tags: jvm java gc garbage-collection tuning memory performance martin-thompson)

Links for 2013-07-15

Published July 15, 2013

Improved HTTPS Performance with Early SSL Termination

This is a neat hack. Since SSL/TLS connection establishment requires lots of consecutive round trips before the connection is ready, by performing that closer to the user and reusing an existing region-to-region connection behind the scenes, the overall latency is greatly improved. Works for HTTP as well

(tags: http https ssl architecture aws ec2 performance latency internet round-trip nginx tls)
How to secure your webapp

Locking down a webapp with current strict HTTPS policies.
It’s impossible to get to 100% security but there are steps you can take to secure your webapp for your users, to help mitigate against different types of attacks both against you, your webapp and your customers themselves. These are all things we’ve implemented with Server Density v2 to help harden the product as much as possible. These tips are in addition to security best practices such as protecting against SQL injection, filtering, session handling, and XSRF protection. Check out the OWASP cheat sheets and top 10 lists to ensure you’re covered for the basics before implementing the suggestions below.

(tags: https ssl security web webdev tls)
Breakthrough silicon scanning discovers backdoor in military chip [PDF]

Wow, I'd missed this:
This paper is a short summary of the ?rst real world detection of a backdoor in a military grade FPGA. Using an innovative patented technique we were able to detect and analyse in the ?rst documented case of its kind, a backdoor inserted into the Actel/Microsemi ProASIC3 chips for accessing FPGA con?guration. The backdoor was found amongst additional JTAG functionality and exists on the silicon itself, it was not present in any ?rmware loaded onto the chip. Using Pipeline Emission Analysis (PEA), our pioneered technique, we were able to extract the secret key to activate the backdoor, as well as other security keys such as the AES and the Passkey. This way an attacker can extract all the con?guration data from the chip, reprogram crypto and access keys, modify low-level silicon features, access unencrypted con?guration bitstream or permanently damage the device. Clearly this means the device is wide open to intellectual property (IP) theft, fraud, re-programming as well as reverse engineering of the design which allows the introduction of a new backdoor or Trojan. Most concerning, it is not possible to patch the backdoor in chips already deployed, meaning those using this family of chips have to accept the fact they can be easily compromised or will have to be physically replaced after a redesign of the silicon itself.

(tags: chips hardware backdoors security scanning pea jtag actel microsemi silicon fpga trojans)

Links for 2013-07-10

Published July 10, 2013

small town council in Oz has been snooping on mobile phone records to catch litterbugs and owners of unregistered pets

Privacy advocates have slammed Wyndham council for spying on residents’ mobile phone data and email records almost 50 times in the past three years, “not to hunt down terrorists but to catch litterbugs and owners of unregistered pets”. Figures from the attorney-general’s department reveal Wyndham is the only Victorian council that has been snooping on personal data, seizing residents’ information 31 times during 2010-11 and 2011-12. Council’s acting chief executive Kelly Grigsby told the Weekly there had been another 18 authorisations in the past 12 months to chase people for unauthorised advertising, unregistered pets and illegal littering.

(tags: victoria australia oz privacy snooping data-retention metadata overreach)
Traditional AQM is not enough!

Jim Gettys on modern web design, HTTP, buffering, and FIFO queues in the network.
Web surfing is putting impulses of packets, without congestion avoidance, into FIFO queues where they do severe collateral damage to anything sharing the link (including itself!). So today’s web behavior incurs huge collateral damage on itself, data centers, the edge of the network, and in particular any application that hopes to have real time behavior. How do we solve this problem?
tl;dr: fq_codel. Now I want it!

(tags: buffering networking internet web http protocols tcp bufferbloat jim-gettys codel fq_codel)

Links for 2013-07-09

Published July 9, 2013

We interrupt this program to warn the Emergency Alert System is hackable | Ars Technica

Private SSH key included in a firmware update. Oh dear:
The US Emergency Alert System, which interrupts live TV and radio broadcasts with information about national emergencies in progress, is vulnerable to attacks that allow hackers to remotely disseminate bogus reports and tamper with gear, security researchers warned. The remote takeover vulnerability affects the DASDEC-I and DASDEC-II application servers made by a company called Digital Alert Systems. It stems from the a recent firmware update that mistakenly included the private secure shell (SSH) key, according to an advisory published Monday by researchers from security firm IOActive. Administrators use such keys to remotely log in to a server to gain unfettered "root" access. The publication of the key makes it trivial for hackers to gain unauthorized access on Digital Alert System appliances that run default settings on older firmware. "An attacker who gains control of one or more DASDEC systems can disrupt these stations' ability to transmit and could disseminate false emergency information over a large geographic area," the IOActive advisory warned. "In addition, depending on the configuration of this and other devices, these messages could be forwarded and mirrored by other DASDEC systems."

(tags: ssh security fail emergency alert warning tv radio)
The Architecture Twitter Uses to Deal with 150M Active Users, 300K QPS, a 22 MB/S Firehose, and Send Tweets in Under 5 Seconds

Good read.
Twitter is primarily a consumption mechanism, not a production mechanism. 300K QPS are spent reading timelines and only 6000 requests per second are spent on writes.
* their approach of precomputing the timeline for the non-search case is a good example of optimizing for the more frequently-exercised path. * MySQL and Redis are the underlying stores. Redis is acting as a front-line in-RAM cache. they're pretty happy with it: https://news.ycombinator.com/item?id=6011254 * these further talks go into more detail, apparently (haven't watched them yet): http://www.infoq.com/presentations/Real-Time-Delivery-Twitter http://www.infoq.com/presentations/Twitter-Timeline-Scalability http://www.infoq.com/presentations/Timelines-Twitter * funny thread of comments on HN, from a big-iron fan: https://news.ycombinator.com/item?id=6008228

(tags: scale architecture scalability twitter high-scalability redis mysql)
Lightning Memory-Mapped Database

Sounds like a good potential replacement for Berkeley DB, at least for cases where LevelDB isn't proving practical.
LMDB is a database storage engine similar to LevelDB or BDB which database authors often use as a base for building databases on top of. LMDB was designed as a replacement for BDB within the OpenLDAP project but it has been pretty useful to use with other databases as well. It’s API design is highly influenced by BDB so that replacing BDB is straightforward.
Licensed under the OpenLDAP Public License (is that BSDish?)

(tags: openldap lmdb databases bdb berkeley-db storage persistence oss open-source)

Links for 2013-07-08

Published July 8, 2013

ssh - fabric appears to start apache2 but doesn't - Stack Overflow

fabric fail. pty=False fixes the bug

(tags: fabric fail bugs pty ssh automation ops)
'Copysets: Reducing the Frequency of Data Loss in Cloud Storage' [paper]

An improved replica-selection algorithm for replicated storage systems.
We present Copyset Replication, a novel general purpose replication technique that signi?cantly reduces the frequency of data loss events. We implemented and evaluated Copyset Replication on two open source data center storage systems, HDFS and RAMCloud, and show it incurs a low overhead on all operations. Such systems require that each node’s data be scattered across several nodes for parallel data recovery and access. Copyset Replication presents a near optimal tradeoff between the number of nodes on which the data is scattered and the probability of data loss. For example, in a 5000-node RAMCloud cluster under a power outage, Copyset Replication reduces the probability of data loss from 99.99% to 0.15%. For Facebook’s HDFS cluster, it reduces the probability from 22.8% to 0.78%.

(tags: storage cloud-storage replication data reliability fault-tolerance copysets replicas data-loss)

Links for 2013-07-07

Published July 7, 2013

Testing your database backups: the test environment database refresh pattern | #F80046

Great idea. I'm surprised I hadn't come across this before

(tags: backups restoring ops sysadmin testing)

Links for 2013-07-03

Published July 3, 2013

Clean Code Cheat Sheet [pdf]

'principles, patterns, smells and guidelines for clean code, class and package design, TDD, Acceptance Test Driven Development, and CI'

(tags: clean-code code-smells coding tdd testing continous-integration patterns pdf)
Applegate's Law

'Over time, the probability of someone drawing a cock with your [user-generated content] app approaches one.'

(tags: cocks time-to-penis user-generated-content content ugc via:rob-manuel qwghlm funny applegates-law web b3ta lol)

Links for 2013-07-02

Published July 2, 2013

Fat Tails

Nice d3.js demo of the fat-tailed distribution:
A fat-tailed distribution looks normal but the parts far away from the average are thicker, meaning a higher chance of huge deviations. [...] Fat tails don't mean more variance; just different variance. For a given variance, a higher chance of extreme deviations implies a lower chance of medium ones.

(tags: dataviz via:hn statistics visualization distributions fat-tailed kurtosis d3.js javascript variance deviation)
Google Cloud Messaging for Android

GCM is a service that allows you to send data from your server to your users' Android-powered device, and also to receive messages from devices on the same connection. The GCM service handles all aspects of queueing of messages and delivery to the target Android application running on the target device. GCM is completely free no matter how big your messaging needs are, and there are no quotas.

(tags: gcm messaging android google push)
packetdrill - network stack testing tool

[Google's] packetdrill scripting tool enables quick, precise tests for entire TCP/UDP/IPv4/IPv6 network stacks, from the system call layer down to the NIC hardware. packetdrill currently works on Linux, FreeBSD, OpenBSD, and NetBSD. It can test network stack behavior over physical NICs on a LAN, or on a single machine using a tun virtual network device.

(tags: testing networking tun google linux papers tcp ip udp freebsd openbsd netbsd)
the TCP bounded buffer deadlock problem

I've wound up mentioning this twice in the past week, so it's worth digging up and bookmarking!
Under certain circumstances a TCP connection can end up in a "deadlock", where neither the client nor the server is able to write data out or read data in. This is caused by two factors. First, a client or server cannot perform two transactions at once; a read cannot be performed if a write transaction is in progress, and vice versa. Second, the buffers that exist at either end of the TCP connection are of limited size. The deadlock occurs when both the client and server are trying to send an amount of data that is larger than the combined input and output buffer size.

(tags: tcp ip bounded-buffer deadlock bugs buffering connections distributed-systems)
An excellent writeup of the TCP bounded-buffer deadlock problem

on pages 146-149 of 'TCP/IP Sockets in C: Practical Guide for Programmers' by Michael J. Donahoo and Kenneth L. Calvert.

(tags: tcp ip bounded-buffer deadlock bugs buffering connections distributed-systems)

Links for 2013-07-01

Published July 1, 2013

How The Copyright Industry Pushed For Internet Surveillance | TorrentFreak

Rick Falkvinge with a good point:
The reason for the copyright industry to push for surveillance is simple: any digital communications channel can be used for private conversation, but it can also be used to share culture and knowledge that is under copyright monopoly. In order to tell which communications is which, you must sort all of it – and to do that, you must look at all of it. In other words, if enforcing the copyright monopoly is your priority, you need to kill privacy, and specifically anonymity and secrecy of correspondence.
This was exactly my biggest worry -- a side-effect of effective copyright filtering is the creation of infrastructure for online oppression by the state.

(tags: copyright privacy state data-protection rick-falkvinge copyfight internet filtering surveillance anonymity)
Aer Lingus set to resume flights to San Francisco from Dublin

Yay!
Google, Apple and Facebook have persuaded Aer Lingus to reopen the San Francisco to Dublin route, according to sources in the US. The technology giants have their European headquarters in Dublin and their American bases in San Francisco. According to insiders, Aer Lingus will make an announcement soon having received assurances that Silicon Valley companies will take up seats.

(tags: flights travel ireland san-francisco sf aer-lingus)
Comics For Children…. a visual list…. | The Forbidden Planet International Blog

some great recommendations here. Hildafolk has been popular with my 5-year-old, must pick up a few more

(tags: comics kids children books reading library toget toread)

Links for 2013-06-28

Published June 28, 2013

_Measuring Mobile Web Performance_ [slides]

Notable slide is #13, displaying a graph of HSDPA packet RTTs measured from a train. Max RTT gets up to 20,266ms. ouch

(tags: rtt packets latency hsdpa mobile internet trains packet-loss)
Latest leak of EU Data Protection Regulation makes fines impossible

Well, isn't this convenient. The leaked proposed regulation document from the Irish EU presidency contains the following changes from current law:
what is new is a set of prescriptive conditions which, if adopted, appears to make a Monetary Penalty Notice (MPN) almost impracticable to serve. This is because the [Data Protection] Commissioner would have consider a dozen factors (many of which will give no doubt rise to appeal). [...] In addition, the fines in the Regulation require consideration of the actual damage caused; this compares unfavourably with the current MPN where large fines have been contingent on grave security errors on the part of the data controller (i.e. the MPN of the UK DPA does not need damage to data subjects – only the likelihood of substantial distress or damage which should have been preventable/foreseeable).

(tags: data-protection law eu ec ireland privacy fines regulation mpn)
Google Translate of "Lorem ipsum"

The perils of unsupervised machine learning... here's what GTranslate reckons "lorem ipsum" translates to:
We will be sure to post a comment. Add tomato sauce, no tank or a traditional or online. Until outdoor environment, and not just any competition, reduce overall pain. Cisco Security, they set up in the throat develop the market beds of Cura; Employment silently churn-class by our union, very beginner himenaeos. Monday gate information. How long before any meaningful development. Until mandatory functional requirements to developers. But across the country in the spotlight in the notebook. The show was shot. Funny lion always feasible, innovative policies hatred assured. Information that is no corporate Japan

(tags: lorem-ipsum boilerplate machine-learning translation google translate probabilistic tomato-sauce cisco funny)

Links for 2013-06-27

Published June 27, 2013

how RAID fits in with Riak

Write heavy, high performance applications should probably use RAID 0 or avoid RAID altogether and consider using a larger n_val and cluster size. Read heavy applications have more options, and generally demand more fault tolerance with the added benefit of easier hardware replacement procedures.
Good to see official guidance on this (via Bill de hOra)

(tags: via:dehora riak cluster fault-tolerance raid ops)
Locally Repairable Codes

Facebook’s new erasure coding algorithm (via High Scalability).
Disk I/O and network traffic were reduced by half compared to RS codes. The LRC required 14% more storage than RS (ie. 60% of data size). Repair times were much lower thanks to the local repair codes. Much greater reliability thanks to fast repairs. Reduced network traffic makes them suitable for geographic distribution.

(tags: erasure-coding facebook redundancy repair algorithms papers via:highscalability data storage fault-tolerance)
Boundary's Early Warnings alarm

Anomaly detection on network throughput metrics, alarming if throughputs on selected flows deviate by 1, 2, or 3 standard deviations from a historical baseline.

(tags: network-monitoring throughput boundary service-metrics alarming ops statistics)
My email to Irish Times Editor, sent 25th June

Daragh O'Brien noting 3 stories on 3 consecutive days voicing dangerously skewed misinformation about data protection and privacy law in Ireland:
There is a worrying pattern in these stories. The first two decry the Data Protection legislation (current and future) as being dangerous to children and damaging to the genealogy trade. The third sets up an industry “self-regulation” straw man and heralds it as progress (when it is decidedly not, serving only to further confuse consumers about their rights). If I was a cynical person I would find it hard not to draw the conclusion that the Irish Times, the “paper of record” has been stooged by organisations who are resistant to the defence of and validation of fundamental rights to privacy as enshrined in the Data Protection Acts and EU Treaties, and in the embryonic Data Protection Regulation. That these stories emerge hot on the heels of the pendulum swing towards privacy concerns that the NSA/Prism revelations have triggered is, I must assume, a co-incidence. It cannot be the case that the Irish Times blindly publishes press releases without conducting cursory fact checking on the stories contained therein? Three stories over three days is insufficient data to plot a definitive trend, but the emphasis is disconcerting. Is it the Irish Times’ editorial position that Data Protection legislation and the protection of fundamental rights is a bad thing and that industry self-regulation that operates in ignorance of legislation is the appropriate model for the future? It surely cannot be that press releases are regurgitated as balanced fact and news by the Irish Times without fact checking and verification? If I was to predict a “Data Protection killed my Puppy” type headline for tomorrow’s edition or another later this week would I be proved correct?

(tags: daragh-obrien irish-times iab bias advertising newspapers press-releases journalism data-protection privacy ireland)
_Bolt-On Causal Consistency_ [slides]

SIGMOD 2013 presentation from Peter Bailis, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica -- adding consistency to an eventually-consistent store by tracking dependencies

(tags: eventual-consistency state cap-theorem storage peter-bailis)

Links for 2013-06-26

Published June 26, 2013

Facebook announce Wormhole

Over the last couple of years, we have built and deployed a reliable publish-subscribe system called Wormhole. Wormhole has become a critical part of Facebook's software infrastructure. At a high level, Wormhole propagates changes issued in one system to all systems that need to reflect those changes – within and across data centers.
Facebook's Kafka-alike, basically, although with some additional low-latency guarantees. FB appear to be using it for multi-region and multi-AZ replication. Proprietary.

(tags: pub-sub scalability facebook realtime low-latency multi-region replication multi-az wormhole)
gnuplot's dumb terminal

Turns out gnuplot has a pretty readable ASCII terminal rendering mode; combined with 'watch' it makes for a nifty graphing one-liner

(tags: gnuplot plotting charts graphs cli command-line unix gnu hacks dataviz visualization ascii)

The easy way to find JMX metrics in the field using jmxsh

Published June 26, 2013

(oh look, a proper blog post!)

JMX is the de-facto standard in the Java and JVM-based world for exposing service metrics, and feeds nicely to tools like Graphite using JMXTrans and others. However, it's pretty obtuse and over-complex, and it can be hard to figure out what path the JMX metrics will show up under once deployed.

Unfortunately, once a JVM-based service is deployed to EC2, it becomes very difficult to use jconsole to connect to it, due to deficiencies and crappy design in the JMX RMI protocol (I love the way they reinvented the broken parts of IIOP in that respect). Don't even bother; instead, use jmxsh: https://code.google.com/p/jmxsh/ .

To use this, you need to modify the service process' command line to include the following JVM args, so that the remote JMX API is exposed:

-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=16660 -Dcom.sun.management.jmxremote.local.only=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false

Change the port number if there is already a process running on that port. Ensure the port isn't accessible from off-host; in EC2, this should be safe enough to use once that port number is not in the EC2 security group.

Go to https://code.google.com/p/jmxsh/downloads/list and download the latest jmxsh-FOO.jar; e.g. 'wget https://jmxsh.googlecode.com/files/jmxsh-R5.jar'. Then on the host, as the UID the service is running under, run: 'java -jar jmxsh-R5.jar -h 127.0.0.1 -p 16660'. You can then hit "Enter" to go into "Browse Mode", and you'll get text menus like this:

 ====================================================

  Attribute List:

        1. -r- long        MaxFileDescriptorCount
        2. -r- long        OpenFileDescriptorCount
        3. -r- long        CommittedVirtualMemorySize
        4. -r- long        FreePhysicalMemorySize
        5. -r- long        FreeSwapSpaceSize
        6. -r- long        ProcessCpuTime
        7. -r- long        TotalPhysicalMemorySize
        8. -r- long        TotalSwapSpaceSize
        9. -r- String      Name
       10. -r- int         AvailableProcessors
       11. -r- String      Arch
       12. -r- double      SystemLoadAverage
       13. -r- String      Version

   SERVER: service:jmx:rmi:///jndi/rmi://127.0.0.1:16660/jmxrmi
   DOMAIN: java.lang
   MBEAN:  java.lang:type=OperatingSystem

 ====================================================

Navigate through the MBean tree looking for good Attributes which would make good metrics (5 in the list above, for example). Note the MBean and the Attribute names.

Links for 2013-06-25

Published June 25, 2013

Liberty issues claim against British Intelligence Services over PRISM and Tempora privacy scandal

James Welch, Legal Director for Liberty, said: “Those demanding the Snoopers’ Charter seem to have been indulging in out-of-control snooping even without it – exploiting legal loopholes and help from Uncle Sam. “No-one suggests a completely unpoliced internet but those in power cannot swap targeted investigations for endless monitoring of the entire globe.”
Go Liberty! Take note, ICCL, this is how a civil liberties group engages with internet issues.

(tags: prism nsa gchq surveillance liberty civil-liberties internet snooping)
shades

A command-line utility in Ruby to perform (a) OLAP cubing and (b) histogramming, given whitespace-delimited line data

(tags: ruby olap number-crunching data histograms cli)
'If I was your cloud provider, I'd never let you down'

This is the thing that's put me off Joyent. They make claims like this one from October 2012:
We’ve given our other partners 99.9999% uptime.
This despite a 10-day outage of their BingoDisk and Strongspace storage services in January 2008, 1734 days previously (http://www.datacenterknowledge.com/archives/2008/01/21/joyent-services-back-after-8-day-outage/). If you assume that is the only outage they've had since then, that works out as 99.4% uptime. Quite a few less nines...

(tags: joyent marketing uptime two-nines fail strongdisk)
js-hll

Good UI for exploration of HyperLogLog set intersections and unions.
One of the first things that we wanted to do with HyperLogLog when we first started playing with it was to support and expose it natively in the browser. The thought of allowing users to directly interact with these structures -- perform arbitrary unions and intersections on effectively unbounded sets all on the client -- was exhilarating to us. [...] we are pleased to announce the open-source release of AK’s HyperLogLog implementation for JavaScript, js-hll. We are releasing this code under the Apache License, Version 2.0. We knew that we couldn’t just release a bunch of JavaScript code without allowing you to see it in action — that would be a crime. We passed a few ideas around and the one that kept bubbling to the top was a way to kill two birds with one stone. We wanted something that would showcase what you can do with HLL in the browser and give us a tool for explaining HLLs. It is typical for us to explain how HLL intersections work using a Venn diagram. You draw some overlapping circles with a border that represents the error and you talk about how if that border is close to or larger than the intersection then you can’t say much about the size of that intersection. This works just ok on a whiteboard but what you really want is to just build a visualization that allows you to select from some sets and see the overlap. Maybe even play with the precision a little bit to see how that changes the result. Well, we did just that!

(tags: javascript ui hll hyperloglog algorithms sketching js sets intersection union apache open-source)
Sketch of the Day: K-Minimum Values

Another sketching algorithm -- this one supports set union and intersection operations more easily than HyperLogLog when there are more than 2 sets

(tags: algorithms coding space-saving cardinality streams stream-processing estimation sets sketching)

Links for 2013-06-24

Published June 24, 2013

Skype's principal architect explains why they no longer have end-to-end crypto

Mobile devices can't handle the CPU and constantly-online requirements, and an increased reliance on dedicated routing supernodes to avoid Windows-client monoculture and p2p network fragility (via the IP list, via kragen)

(tags: skype p2p mobile architecture networking internet snooping crypto via:ip via:kragen phones windows)
Accuweather long-range forecast accuracy questionable

"questionable" is putting it mildly:
Now to to the point: Are the 25-day forecasts any good? In a word, no. Specifically, after running this data, I would not trust a forecast high temperature more than a week out. I’d rather look at the normal (historical average) temperature for that day than the forecast. Similarly, I would not even look at a precipitation forecast more than 6 days in advance, and I wouldn’t start to trust it for anything important until about 3 days ahead of time.

(tags: accuweather accuracy fail graphs data weather forecasting philadelphia)
Setting up Perfect Forward Secrecy for nginx or stud

Matt Sergeant writes up a pretty solid HOWTO:
There has been a lot of discussion recently about Perfect Forward Secrecy (PFS) and the benefits it can bring you, especially in terms of any kind of traffic sniffing attack. Unfortunately setting this up I found very few guides telling you exactly what you need to do. The downside to PFS [via ECDHE] is that it uses more CPU power than other ciphers. This is a trade-off between security and cost.

(tags: ecdhe elliptic-curve crypto pfs ssl tls howto nginx stud)

Links for 2013-06-21

Published June 21, 2013

Java Concurrent Counters By Numbers

threadsafe counters in the JVM compared. AtomicLong, Doug Lea's LongAdder, a ThreadLocal counter, and a field-on-the-Thread-object counter int (via Darach Ennis). Nitsan's posts on concurrency are fantastic

(tags: counters concurrency threads java jvm atomic)
Ultimate Tic-Tac-Toe

Tic-Tac-Toe Inception. whoa

(tags: games tic-tac-toe inception recursion boardgames via:fp)
hlld

a high-performance C server which is used to expose HyperLogLog sets and operations over them to networked clients. It uses a simple ASCII protocol which is human readable, and similar to memcached. HyperLogLog's are a relatively new sketching data structure. They are used to estimate cardinality, i.e. the unique number of items in a set. They are based on the observation that any bit in a "good" hash function is indepedenent of any other bit and that the probability of getting a string of N bits all set to the same value is 1/(2^N). There is a lot more in the math, but that is the basic intuition. What is even more incredible is that the storage required to do the counting is log(log(N)). So with a 6 bit register, we can count well into the trillions. For more information, its best to read the papers referenced at the end. TL;DR: HyperLogLogs enable you to have a set with about 1.6% variance, using 3280 bytes, and estimate sizes in the trillions.
(via:cscotta)

(tags: hyper-log-log hlld hll data-structures memcached daemons sketching estimation big-data cardinality algorithms via:cscotta)
SSL/TLS overhead

'The TLS handshake has multiple variations, but let’s pick the most common one – anonymous client and authenticated server (the connections browsers use most of the time).' Works out to 4 packets, in addition to the TCP handshake's 3, and about 6.5k bytes on average.

(tags: network tls ssl performance latency speed networking internet security packets tcp handshake)
McLibel leaflet was co-written by undercover police officer Bob Lambert | UK news | guardian.co.uk

The true identity of one of the authors of the "McLibel leaflet" is Bob Lambert, a police officer who used the alias Bob Robinson in his five years infiltrating the London Greenpeace group. [...] McDonald's famously sued green campaigners over the roughly typed leaflet, in a landmark three-year high court case, that was widely believed to have been a public relations disaster for the corporation. Ultimately the company won a libel battle in which it spent millions on lawyers. Lambert was deployed by the special demonstration squad (SDS) – a top-secret Metropolitan police unit that targeted political activists between 1968 until 2008, when it was disbanded. He co-wrote the defamatory six-page leaflet in 1986 – and his role in its production has been the subject of an internal Scotland Yard investigation for several months. At no stage during the civil legal proceedings brought by McDonald's in the 1990s was it disclosed that a police infiltrator helped author the leaflet.

(tags: infiltration police mcdonalds libel greenpeace bob-lambert undercover 1980s uk-politics)

Links for 2013-06-20

Published June 20, 2013

Project Voldemort: measuring BDB space consumption

HOWTO measure this using the BDB-JE command line tools. this is exposed through JMX as the CleanerBacklog metric, too, I think, but good to bookmark just in case

(tags: voldemort cleaner bdb ops space storage monitoring debug)
rendering pcm with simulated phosphor persistence

This is something readily applicable to display of sampled time-series metric data -- it really makes regular patterns visible (and is nicely retro to boot).
When PCM waveforms and similar function plots are displayed on screen, computational speed is often preferred over beauty and information content. For example, Audacity only draws the local maximum envelope amplitude and (what appears to be) RMS power when zoomed out, and when zoomed in, displays a very straightforward linear interpolation between samples. Analogue oscilloscopes, on the other hand, do things differently. An electron beam scans a phosphor screen at a constant X velocity, lighting a dot everywhere it hits. The dot brightness is proportional to the time the electron beam was directed at it. Because the X speed of the beam is constant and the Y position is modulated by the waveform, brightness gives information about the local derivative of the function. Now how cool is that? It looks like an X-ray of the signal. We can see right away that the beep is roughly a square wave, because there's light on top and bottom of the oscillation envelope but mostly darkness in between. Minute changes in the harmonic content are also visible as interesting banding and ribbons.
(via an _amazing_ kragen post on ghetto electronics)

(tags: via:kragen pcm waveforms oscilloscopes analog analogue dataviz time-series waves ui phosphor retro)
stuff Google has learned from their hiring data

A. On the hiring side, we found that [interview] brainteasers are a complete waste of time. How many golf balls can you fit into an airplane? How many gas stations in Manhattan? A complete waste of time. They don’t predict anything. They serve primarily to make the interviewer feel smart. Instead, what works well are structured behavioral interviews, where you have a consistent rubric for how you assess people, rather than having each interviewer just make stuff up. Behavioral interviewing also works — where you’re not giving someone a hypothetical, but you’re starting with a question like, “Give me an example of a time when you solved an analytically difficult problem.” The interesting thing about the behavioral interview is that when you ask somebody to speak to their own experience, and you drill into that, you get two kinds of information. One is you get to see how they actually interacted in a real-world situation, and the valuable “meta” information you get about the candidate is a sense of what they consider to be difficult.
This makes sense, and matches what I learned in Amazon. Bad news for Microsoft though! (Correction: Adam Shostack got in touch to note that MS haven't done this for 10+ years either.)
Also, I like this:
A. One of the things we’ve seen from all our data crunching is that G.P.A.’s are worthless as a criteria for hiring, and test scores are worthless — no correlation at all except for brand-new college grads, where there’s a slight correlation. Google famously used to ask everyone for a transcript and G.P.A.’s and test scores, but we don’t anymore, unless you’re just a few years out of school. We found that they don’t predict anything. What’s interesting is the proportion of people without any college education at Google has increased over time as well. So we have teams where you have 14 percent of the team made up of people who’ve never gone to college.

(tags: google hiring interviewing interviews brainteasers gpa microsoft star amazon)

Links for 2013-06-19

Published June 19, 2013

Java Garbage Collection Distilled

Martin Thompson lays it out:
Serial, Parallel, Concurrent, CMS, G1, Young Gen, New Gen, Old Gen, Perm Gen, Eden, Tenured, Survivor Spaces, Safepoints, and the hundreds of JVM start-up flags. Does this all baffle you when trying to tune the garbage collector while trying to get the required throughput and latency from your Java application? If it does then don’t worry, you are not alone. Documentation describing garbage collection feels like man pages for an aircraft. Every knob and dial is detailed and explained but nowhere can you find a guide on how to fly. This article will attempt to explain the tradeoffs when choosing and tuning garbage collection algorithms for a particular workload.

(tags: gc java garbage-collection coding cms g1 jvm optimization)
DRI needs your help

Appalled by mass surveillance scandals? So are we. We’re doing something about it – and you can too. In 2006 we started a case challenging Irish and European laws that require your mobile phone company and ISP to monitor your location, your calls, your texts and your emails and to store that information for up to two years. That case has now made it to the European Court of Justice and will be heard on July 9th. If we are successful, it will strike down these laws for all of Europe and will declare illegal this type of mass surveillance of the entire population. Here’s where you come in. You can take part by: making a donation to help us pay for the expenses we incur; following our updates and keeping abreast of the issues; spreading the word on social media. With your help, we can strike a blow for the privacy of all citizens.

(tags: activism privacy politics ireland dri digital-rights data-protection data-retention)
3-D Printer Brings Dexterity To Children With No Fingers

'A South African man who lost part of his hand in a home carpentry accident and an American puppeteer he met via YouTube have teamed up to make 3D-printable hands for children who have no fingers. So far, over 100 children have been given "robohands" for free, and a simplified version released just yesterday snaps together like LEGO bricks and costs just $5 in materials.' This is incredible. Check out the video of Liam and his robohand in action: http://www.youtube.com/watch?v=kB53-D_N8Uc

(tags: 3d-printing 3d makers robohands hands prosthetics future youtube via:gruverja)

Links for 2013-06-18

Published June 18, 2013

Open Rights Group - EU Commission caved to US demands to drop anti-PRISM privacy clause

Reports this week revealed that the US successfully pressed the European Commission to drop sections of the Data Protection Regulation that would, as the Financial Times explains, “have nullified any US request for technology and telecoms companies to hand over data on EU citizens. The article [...] would have prohibited transfers of personal information to a third country under a legal request, for example the one used by the NSA for their PRISM programme, unless “expressly authorized by an international agreement or provided for by mutual legal assistance treaties or approved by a supervisory authority.” The Article was deleted from the draft Regulation proper, which was published shortly afterwards in January 2012. The reports suggest this was due to intense pressure from the US. Commission Vice-President Viviane Reding favoured keeping the the clause, but other Commissioners seemingly did not grasp the significance of the article.

(tags: org privacy us surveillance fisaaa viviane-reding prism nsa ec eu data-protection)
Verified by Visa and MasterCard SecureCode kill 10-12% of your business

As Chris Shiflett noted: not only are they bad for security, they're bad for business too.
12 percent of users consider abandoning [an online shopping transaction] when they see either the Verified by Visa or the American Express SafeKey logos, while 10 percent will consider abandoning when the see the MasterCard Secure card logo.

(tags: ecommerce vbv online-shopping mastercard visa securecode security fail)
The Cold Hard Facts of Freezing to Death

an amazing account of near-death from hypothermia (via Dor)

(tags: via:dor hypothermia cold medicine science non-fiction)

Links for 2013-06-17

Published June 17, 2013

Atelier olschinsky - "Cities III 05"

Fine Art Print on Hahnemuehle Photo Rag Bright White, 310g: 40x50cm up to 70x100cm. Some great art based on decayed urban landscape shots, from a Vienna-based design studio. See also http://english.mashkulture.net/2011/10/17/atelier-olschinsky-cities-iii/ , http://www.mascontext.com/tag/atelier-olschinsky/

(tags: olschinsky cities urban decay landscape art prints want)
Possible ban on 'factory food' in French restaurants

I am very much in favour of this in Ireland, too. The pre-prepared food thing makes for crappy food:
In an attempt to crack down on the proliferation of restaurants serving boil-in-a-bag or microwave-ready meals, which could harm France’s reputation for good food, MP Daniel Fasquelle is putting a new law to parliament this month. [...] The proposed law would limit the right to use the word “restaurant” to eateries where food is prepared on site using raw ingredients, either fresh or frozen. Exceptions would be made for some prepared products, such as bread, charcuterie and ice cream.

(tags: restaurants food france cuisine boil-in-the-bag microwave cooking daniel-fasquelle)
On Scala

great, comprehensive review of the language, its pros and misfeatures, from Bill de hOra

(tags: scala languages coding fp reviews)
Introducing Kale « Code as Craft

Etsy have implemented a tool to perform auto-correlation of service metrics, and detection of deviation from historic norms:
at Etsy, we really love to make graphs. We graph everything! Anywhere we can slap a StatsD call, we do. As a result, we’ve found ourselves with over a quarter million distinct metrics. That’s far too many graphs for a team of 150 engineers to watch all day long! And even if you group metrics into dashboards, that’s still an awful lot of dashboards if you want complete coverage. Of course, if a graph isn’t being watched, it might misbehave and no one would know about it. And even if someone caught it, lots of other graphs might be misbehaving in similar ways, and chances are low that folks would make the connection. We’d like to introduce you to the Kale stack, which is our attempt to fix both of these problems. It consists of two parts: Skyline and Oculus. We first use Skyline to detect anomalous metrics. Then, we search for that metric in Oculus, to see if any other metrics look similar. At that point, we can make an informed diagnosis and hopefully fix the problem.
It'll be interesting to see if they can get this working well. I've found it can be tricky to get working with low false positives, without massive volume to "smooth out" spikes caused by normal activity. Amazon had one particularly successful version driving severity-1 order drop alarms, but it used massive event volumes and still had periodic false positives. Skyline looks like it will alarm on a single anomalous data point, and in the comments Abe notes "our algorithms err on the side of noise and so alerting would be very noisy."

(tags: etsy monitoring service-metrics alarming deviation correlation data search graphs oculus skyline kale false-positives)
Paper: "Root Cause Detection in a Service-Oriented Architecture" [pdf]

LinkedIn have implemented an automated root-cause detection system:
This paper introduces MonitorRank, an algorithm that can reduce the time, domain knowledge, and human effort required to ?nd the root causes of anomalies in such service-oriented architectures. In the event of an anomaly, MonitorRank provides a ranked order list of possible root causes for monitoring teams to investigate. MonitorRank uses the historical and current time-series metrics of each sensor as its input, along with the call graph generated between sensors to build an unsupervised model for ranking. Experiments on real production outage data from LinkedIn, one of the largest online social networks, shows a 26% to 51% improvement in mean average precision in ?nding root causes compared to baseline and current state-of-the-art methods.
This is a topic close to my heart after working on something similar for 3 years in Amazon! Looks interesting, although (a) I would have liked to see more case studies and examples of "real world" outages it helped with; and (b) it's very much a machine-learning paper rather than a systems one, and there is no discussion of fault tolerance in the design of the detection system, which would leave me worried that in the case of a large-scale outage event, the system itself will disappear when its help is most vital. (This was a major design influence on our team's work.) Overall, particularly given those 2 issues, I suspect it's not in production yet. Ours certainly was ;)

(tags: linkedin soa root-cause alarming correlation service-metrics machine-learning graphs monitoring)
Announcing Zuul: Edge Service in the Cloud

Netflix' library to implement "edge services" -- ie. a front end to their API, web servers, and streaming servers. Some interesting features: dynamic filtering using Groovy scripts; Hystrix for software load balancing, fault tolerance, and error handling for originated HTTP requests; fine-grained service metrics; Archaius for configuration; and canary requests to detect overload risks. Pretty complex though

(tags: edge-services api netflix zuul archaius canary-requests http groovy hystrix load-balancing fault-tolerance error-handling configuration)
CloudFlare, PRISM, and Securing SSL Ciphers

Matthew Prince of CloudFlare has an interesting theory on the NSA's capabilities:
It is not inconceivable that the NSA has data centers full of specialized hardware optimized for SSL key breaking. According to data shared with us from a survey of SSL keys used by various websites, the majority of web companies were using 1024-bit SSL ciphers and RSA-based encryption through 2012. Given enough specialized hardware, it is within the realm of possibility that the NSA could within a reasonable period of time reverse engineer 1024-bit SSL keys for certain web companies. If they'd been recording the traffic to these web companies, they could then use the broken key to go back and decrypt all the transactions. While this seems like a compelling theory, ultimately, we remain skeptical this is how the PRISM program described in the slides actually works. Cracking 1024-bit keys would be a big deal and likely involve some cutting-edge cryptography and computational power, even for the NSA. The largest SSL key that is known to have been broken to date is 768 bits long. While that was 4 years ago, and the NSA undoubtedly has some of the best cryptographers in the world, it's still a considerable distance from 768 bits to 1024 bits -- especially given the slide suggests Microsoft's key would have to had been broken back in 2007. Moreover, the slide showing the dates on which "collection began" for various companies also puts the cost of the program at $20M/year. That may sound like a lot of money, but it is not for an undertaking like this. Just the power necessary to run the server farm needed to break a 1024-bit key would likely cost in excess of $20M/year. While the NSA may have broken 1024-bit SSL keys as part of some other program, if the slide is accurate and complete, we think it's highly unlikely they did so as part of the PRISM program. A not particularly glamorous alternative theory is that the NSA didn't break the SSL key but instead just cajoled rogue employees at firms with access to the private keys -- whether the companies themselves, partners they'd shared the keys with, or the certificate authorities who issued the keys in the first place -- to turn them over. That very well may be possible on a budget of $20M/year. [....] Google is a notable anomaly. The company uses a 1024-bit key, but, unlike all the other companies listed above, rather than using a default cipher suite based on the RSA encryption algorithm, they instead prefer the Elliptic Curve Diffie-Hellman Ephemeral (ECDHE) cipher suites. Without going into the technical details, a key difference of ECDHE is that they use a different private key for each user's session. This means that if the NSA, or anyone else, is recording encrypted traffic, they cannot break one private key and read all historical transactions with Google. The NSA would have to break the private key generated for each session, which, in Google's case, is unique to each user and regenerated for each user at least every 28-hours. While ECDHE arguably already puts Google at the head of the pack for web transaction security, to further augment security Google has publicly announced that they will be increasing their key length to 2048-bit by the end of 2013. Assuming the company continues to prefer the ECDHE cipher suites, this will put Google at the cutting edge of web transaction security.
2048-bit ECDHE sounds like the way to go, and CloudFlare now support that too.

(tags: prism security nsa cloudflare ssl tls ecdhe elliptic-curve crypto rsa key-lengths)
Record companies to target 20 more pirate sites after court ruling - Independent.ie

Looks like IRMA are following the lead of the UK's BPI, by chasing the proxy sites next:
Up to 20 internet sites are to be targeted by an organisation representing record companies in a move to stamp out the illegal pirating of music and other copyright material. The Irish Recorded Music Association (IRMA) said it would be immediately moving against the 20 "worst offenders" to "take out" internet sites involved in the illegal downloading of copyright work.
However, looks like this will involve more court time:
Last night IRMA director general, Dick Doyle said the High Court ruling was only the first step in "taking out many internet sites involved in illegally downloading music. "We will be back in court very shortly to take out five to 10 other sites. We have already selected a total of 20 of the worst offender sites and we will go after the next five in the very near future," he said.
That's not going to be cheap!

(tags: courts ireland law irma piracy pirate-bay bpi proxies filesharing copyright)
Building a Modern Website for Scale (QCon NY 2013) [slides]

some great scalability ideas from LinkedIn. Particularly interesting are the best practices suggested for scaling web services: 1. store client-call timeouts and SLAs in Zookeeper for each REST endpoint; 2. isolate backend calls using async/threadpools; 3. cancel work on failures; 4. avoid sending requests to GC'ing hosts; 5. rate limits on the server. #4 is particularly cool. They do this using a "GC scout" request before every "real" request; a cheap TCP request to a dedicated "scout" Netty port, which replies near-instantly. If it comes back with a 1-packet response within 1 millisecond, send the real request, else fail over immediately to the next host in the failover set. There's still a potential race condition where the "GC scout" can be achieved quickly, then a GC starts just before the "real" request is issued. But the incidence of GC-blocking-request is probably massively reduced. It also helps against packet loss on the rack or server host, since packet loss will cause the drop of one of the TCP packets, and the TCP retransmit timeout will certainly be higher than 1ms, causing the deadline to be missed. (UDP would probably work just as well, for this reason.) However, in the case of packet loss in the client's network vicinity, it will be vital to still attempt to send the request to the final host in the failover set regardless of a GC-scout failure, otherwise all requests may be skipped. The GC-scout system also helps balance request load off heavily-loaded hosts, or hosts with poor performance for other reasons; they'll fail to achieve their 1 msec deadline and the request will be shunted off elsewhere. For service APIs with real low-latency requirements, this is a great idea.

(tags: gc-scout gc java scaling scalability linkedin qcon async threadpools rest slas timeouts networking distcomp netty tcp udp failover fault-tolerance packet-loss)
Why I won’t give the European Parliament the data protection analysis it wanted

Holy crap. Simon Davies rips into the EU data-protection reform disaster with gusto:
The situation was an utter disgrace. The advertising industry even gave an award to an Irish Minister for destroying some of the rights in the regulation while the UK managed to force a provision that would make the direct marketing industry a “legitimate” processing operation in its own right, putting it on the same level of lawful processing as fraud prevention. Things got to the point where even the most senior data protection officials in Europe stopped trying to influence events and had told me “let the chips fall as they may”. [...] But let’s take a step back for a moment from this travesty. Out on the streets – while most may not know what data protection is – people certainly know what it is supposed to protect. People value their privacy and they will be vocal about attempts to destroy it. I had said as much to the joint parliamentary meeting, observing “the one element that has been left out of all these efforts is the public”. However, as the months rolled on, the only message being sent to the public was that data protection is an anachronism stitched together with self interest and impracticality. [...] I wasn’t aware at the time that there was a vast stitch-up to kill the reforms. I cannot bring myself to present a temperate report with measured wording that pretends this is all just normal business. It isn’t normal business, and it should never be normal business in any civilized society. How does one talk in measured tones about such endemic hypocrisy and deception? If you want to know who the real enemy of privacy is, don’t just look to the American agencies. The real enemy is right here in the European Parliament in the guise of MEPs who have knowingly sold our rights away to maintain powerful relationships. I’d like to say they were merely hoodwinked into supporting the vandalism, but many are smart people who knew exactly what they were doing.
Nice work, Irish presidency! His bottom line:
Is there a way forward? I believe so. First, governments should yield to common decency and scrap the illegitimate and poisoned Irish Council draft and hand the task to the Lithuanian Presidency that commences next month. Second, the Irish and British governments should be infinitely more transparent about their cooperation with intrusive interests that fuelled the deception.

(tags: ireland eu europe reform law data-protection privacy simon-davies meps iab)
Persuading David Simon (Pinboard Blog)

Maciej Ceglowski with a strongly-argued rebuttal of David Simon's post about the NSA's PRISM. This point in particular is key:
The point is, you don't need human investigators to find leads, you can have the algorithms do it [based on the call graph or network of who-calls-who]. They will find people of interest, assemble the watch lists, and flag whomever you like for further tracking. And since the number of actual terrorists is very, very, very small, the output of these algorithms will consist overwhelmingly of false positives.

(tags: false-positives maciej privacy security nsa prism david-simon accuracy big-data filtering anti-spam)
Schneier on Security: Blowback from the NSA Surveillance

Unintended consequences on US-focused governance of the internet and cloud computing:
Writing about the new Internet nationalism, I talked about the ITU meeting in Dubai last fall, and the attempt of some countries to wrest control of the Internet from the US. That movement just got a huge PR boost. Now, when countries like Russia and Iran say the US is simply too untrustworthy to manage the Internet, no one will be able to argue. We can't fight for Internet freedom around the world, then turn around and destroy it back home. Even if we don't see the contradiction, the rest of the world does.

(tags: internet freedom cloud-computing amazon google hosting usa us-politics prism nsa surveillance)

Links for 2013-06-15

Published June 15, 2013

EU unlocks a great new source of online innovation

Today the European Parliament voted to formally agree new rules on open data – effectively making a reality of the proposal which I first put forward just over 18 months ago, and making it easier to open up huge amounts of public sector data.
Great news -- wonder how it'll affect the Ordnance Survey of Ireland?

(tags: osi mapping open-data open data europe eu neelie-kroes)
UK ISPs Secretly Start Blocking Torrent Site Proxies | TorrentFreak

The next step of cat-and-mouse. Let's see what the pirate sites do next...
The blocking orders are intended to deter online piracy and were requested by the music industry group BPI on behalf of a variety of major labels. Thus far they’ve managed to block access to The Pirate Bay, Kat.ph, H33T and Fenopy, and preparations are being made to add many others. The effectiveness of these initial measures has been called into doubt, as they are relatively easy to bypass. For example, in response to the blockades hundreds of proxy sites popped up, allowing subscribers to reach the prohibited sites via a detour. However, as of this week these proxies are also covered by the same blocklist they aim to circumvent, without a new court ruling. The High Court orders give music industry group BPI the authority to add sites to the blocklist without oversight. Until now some small changes have been made, mostly in response to The Pirate Bay’s domain hopping endeavors, but with the latest blocklist update a whole new range of websites is being targeted.

(tags: bittorrent blocking filesharing copyright bpi piracy pirate-bay proxies fenopy kat.ph h33t filtering uk)

Links for 2013-06-14

Published June 14, 2013

There's a map for that

'Not long ago, we began rendering 3D models on GitHub. Today we're excited to announce the latest addition to the visualization family - geographic data. Any .geojson file in a GitHub repository will now be automatically rendered as an interactive, browsable map, annotated with your geodata.' As this HN comment notes, https://news.ycombinator.com/item?id=5875693 -- 'I'd much rather Github cleaned up the UI for existing features than added these little flourishes that I can't imagine even 1% of users use.' Something is seriously wrong in how GitHub decides product direction if this kind of wankology (and that Judy-array crap) is what gets prioritised. :( (via Marc O'Morain)

(tags: via:marc github mapping maps geojson hacking product-management ui pull-requests)
Lawsuit Filed To Prove Happy Birthday Is In The Public Domain; Demands Warner Pay Back Millions Of License Fees | Techdirt

The issue [...] is that it's just not cost effective for anyone to actually stand up and challenge Warner Music, who has strong financial incentive to pretend the copyright is still valid. Well, apparently, someone is pissed off enough to try. The creatively named Good Morning to You Productions, a documentary film company planning a film about the song Happy Birthday, has now filed a lawsuit concerning the copyright of Happy Birthday and are seeking to force Warner/Chappell to return the millions of dollars it has collected over the years. That's going to make this an interesting case.

(tags: music copyright law via:bwalsh public-domain happy-birthday songs warner-music lawsuits)
graphite-metrics

metric collectors for various stuff not (or poorly) handled by other monitoring daemons Core of the project is a simple daemon (harvestd), which collects metric values and sends them to graphite carbon daemon (and/or other configured destinations) once per interval. Includes separate data collection components ("collectors") for processing of: /proc/slabinfo for useful-to-watch values, not everything (configurable). /proc/vmstat and /proc/meminfo in a consistent way. /proc/stat for irq, softirq, forks. /proc/buddyinfo and /proc/pagetypeinfo (memory fragmentation). /proc/interrupts and /proc/softirqs. Cron log to produce start/finish events and duration for each job into a separate metrics, adapts jobs to metric names with regexes. Per-system-service accounting using systemd and it's cgroups. sysstat data from sadc logs (use something like sadc -F -L -S DISK -S XDISK -S POWER 60 to have more stuff logged there) via sadf binary and it's json export (sadf -j, supported since sysstat-10.0.something, iirc). iptables rule "hits" packet and byte counters, taken from ip{,6}tables-save, mapped via separate "table chain_name rule_no metric_name" file, which should be generated along with firewall rules (I use this script to do that).
Pretty exhaustive list of system metrics -- could have some interesting ideas for Linux OS-level metrics to monitor in future.

(tags: graphite monitoring metrics unix linux ops vm iptables sysadmin)
Former NSA Boss: We Don't Data Mine Our Giant Data Collection, We Just Ask It Questions

'Well, that's - no, we're going to use it. But we're not going to use it in the way that some people fear. You put these records, you store them, you have them. It's kind of like, I've got the haystack now. And now let's try to find the needle. And you find the needle by asking that data a question. I'm sorry to put it that way, but that's fundamentally what happens. All right. You don't troll through the data looking for patterns or anything like that. The data is set aside. And now I go into that data with a question that - a question that is based on articulable(ph), arguable, predicate to a terrorist nexus.'
Yep, that's data mining.

(tags: data-mining questions haystack needle nsa usa politics privacy data-protection michael-hayden)
fastutil

fastutil extends the Java™ Collections Framework by providing type-specific maps, sets, lists and queues with a small memory footprint and fast access and insertion; provides also big (64-bit) arrays, sets and lists, and fast, practical I/O classes for binary and text files. It is free software distributed under the Apache License 2.0. It requires Java 6 or newer.
used by Facebook (along with Apache Giraph, Netty, Unsafe) to speed up "weekend Hive jobs" to "coffee breaks". http://www.slideshare.net/nitayj/2013-0603-berlin-buzzwords

(tags: via:highscalability facebook giraph optimization java speed fastutil collections data-structures)
Big Memory, Part 4

good microbenchmarking of a bunch of Java collections; Trove, fastutil, PCJ, mahout-collections, hppc

(tags: java collections benchmarks performance speed coding data-structures optimization)

Links for 2013-06-13

Published June 13, 2013

Spamalot reigns: the spoils of Ireland’s EU kingship | The Irish Times - Thu, Jun 13, 2013

The spam presidency. As European citizens are made the miserable targets of unimpeded “direct marketing”, that may be how Ireland’s stint in the EU presidency seat is recalled for years to come. Under the guiding hand of Minister for Justice Alan Shatter, the Council of the European Union has submitted proposals for amendments to a proposed new data protection regulation, all of which overwhelmingly favour business and big organisations, not citizens. The most obviously repugnant and surprising element in the amendments is a watering down of existing protections for EU citizens against the willy-nilly marketing Americans are forced to endure. In the US there are few meaningful restrictions on what businesses can do with people’s personal information when pitching products and services at them. In the EU, this has always been strictly controlled; information gathered for one purpose cannot be used by a business to sell whatever it wants – unless you have opted in to receive such solicitations. This means you are not constantly bombarded by emails and junk mail, nor do you get non-stop phone calls from telemarketers. Under the proposed amendments to the draft data protection regulation, direct marketing would become a legal form of data processing. In effect, this would legitimise spam email, junk print mail and marketing calls. This unexpected provision signals just how successful powerful corporate lobbyists have been in convincing ministers that business matters more than privacy or giving citizens reasonable control over their personal information. Far worse is contained in other amendments, which in effect turn the original draft of the regulation upside down.
Fantastic article from Karlin Lillington in today's Times on the terrible amendments proposed for the EU's data protection law.

(tags: eu law prism data-protection privacy ireland ec marketing spam anti-spam email)

Links for 2013-06-12

Published June 12, 2013

Vagrant and Chef to provision dev test environments

We have recently switched from a manually configured development environment to a nearly fully automated one using Vagrant, Chef, and a few other tools. With this transition, we’ve moved to an environment where data on the dev boxes is considered disposable and only what’s checked into the SCM is “real”. This is where we’ve always wanted to be, but without the ability to easily rebuild the dev environment from scratch, it’s hard to internalize this behavior pattern.

(tags: dev osx chef vagrant testing vms coding)
Rapid Response: The NSA Prism Leak

'The biggest leak in the history of US security or nothing to worry about? A breach of trust and a data protection issue or a necessary secret project to protect American interests? [Tomorrow] lunchtime Science Gallery Rapid Response event [sic] will pick through the jargon, examine the minutiae of the National Security Agency's PRISM project and the whistle blower Edward Snowden's revelations, and discuss what it means for you and everyone. And we'll look at the bigger picture too. Journalist Una Mullally will chair a panel of guests on the story that everyone is talking about. '

(tags: science-gallery panel-discussions dublin nsa prism panel)
Music firms secure orders blocking access to Pirate Bay - Crime & Law News from Ireland & Abroad | The Irish Times - Wed, Jun 12, 2013

Four major music companies have secured court orders requiring six internet service providers to block access by subscribers to various Pirate Bay websites within some 30 days in a bid to prevent illegal downloading of copyright music and other material. [...] Today, Mr Justice Brian McGovern said he was satisfied to make the order in circumstances including that new copyright laws here and in the EU permitted such orders to be made. He said he fully agreed with a previous High Court judge who had said he would make such blocking orders if the law permitted and noted the law now allowed for such orders. The form of the orders means the music companies will not have to make fresh applications to court if Pirate Bay changes its location on the internet.

(tags: pirate-bay blocking filtering internet ireland upc eircom vodafone digiweb three imagine o2 copyright)
Labour TD ignores tough questions on web case

I [Tom Murphy] have asked [Sean Sherlock] a question: Does he have any comment about the lawsuit between EMI and UPC (and a raft of other ISPs too btw) which is using his SI to attempt to block PirateBay? A court case he said would not happen. Now, I am blocked from following him on Twitter. This is not how a proper political system works.

(tags: politics ireland twitter sean-sherlock tom-murphy boards devore copyright)

Links for 2013-06-11

Published June 11, 2013

PRISM explains the wider lobbying issues surrounding EU data protection reform | EDRI

The US has very successfully and expertly lobbied against the [EU] data protection package directly, it has mobilised and supported US industry lobbying. US industry has lobbied in its own name and mobilised malleable European trade associations to lobby on their behalf to amplify their message, “independent” “think tanks” have been created to amplify their message again. The result is not just the biggest lobbying effort that Brussels has ever seen, but also the broadest. Compliant Members of the European Parliament (MEPs) and EU Member States [...] have been imposing a “death by a thousand cuts” on the Regulation. Where previously there was a clear obligation to collect the “minimum necessary” data for any given service, the vague requirement to retain “not excessive” data is now preferred. Where previously companies could only use data for purposes that were “compatible” with the original reason for collecting the data, the Irish EU Presidency (pdf) has proposed a comical definition of “compatible” based on five elements, only one of which is related to the dictionary definition of the word. Members of the European Parliament and EU Member States are falling over themselves to ensure that the EU does not maintain its strategic advantage over the US. In addition to dismantling the proposed Regulation, countries like the UK desperately seek to delay the whole process and subsume it into the EU-US free trade agreement (the so-called “investment partnership” TTIP/TAFTA), which would subordinate a fundamental rights discussion in a trade negotiation. The UK government is even prepared to humiliate itself by arguing in favour of the US position on the basis that two and a half years (see Communication from 2010, pdf) of discussion is too fast!

(tags: edri data-protection eu ec ireland politics usa meps privacy uk free-trade)

Links for 2013-06-10

Published June 10, 2013

Microsoft admits US government can access EU-based cloud data

interesting point from an MS Q&A back in 2011, quite relevant nowadays:
Q: Can Microsoft guarantee that EU-stored data, held in EU based datacenters, will not leave the European Economic Area under any circumstances — even under a request by the Patriot Act? A: Frazer explained that, as Microsoft is a U.S.-headquartered company, it has to comply with local laws (the United States, as well as any other location where one of its subsidiary companies is based). Though he said that "customers would be informed wherever possible," he could not provide a guarantee that they would be informed — if a gagging order, injunction or U.S. National Security Letter permits it. He said: "Microsoft cannot provide those guarantees. Neither can any other company." While it has been suspected for some time, this is the first time Microsoft, or any other company, has given this answer. Any data which is housed, stored or processed by a company, which is a U.S. based company or is wholly owned by a U.S. parent company, is vulnerable to interception and inspection by U.S. authorities.

(tags: microsoft privacy cloud-computing eu data-centers data-protection nsa fisa usa)

Links for 2013-06-09

Published June 9, 2013

IAB Europe awards MEP Sean Kelly for standing up for data privacy rights (video) - Ireland’s CIO and strategy news and reports service – Siliconrepublic.com

Irish MEP serving as a rapporteur on reform of the EU data protection regime, was given an award by an advertising trade group last month:
Sean Kelly, Fine Gael MEP for Ireland South [who serves as the EU’s Industry Committee Rapporteur for the General Data Protection Regulation], has been selected to receive the prestigious IAB Europe Award for Leadership and Excellence for his approach to dealing with privacy concerns over shortcomings in the European Commission’s data protection proposal. IAB Europe represents more than 5,500 online advertising media, research and analytics organisations.

(tags: iab-europe awards spam sean-kelly ireland meps politics eu data-protection privacy ec)
The CAP FAQ by henryr

No subject appears to be more controversial to distributed systems engineers than the oft-quoted, oft-misunderstood CAP theorem. The purpose of this FAQ is to explain what is known about CAP, so as to help those new to the theorem get up to speed quickly, and to settle some common misconceptions or points of disagreement.

(tags: database distributed nosql cap consistency cap-theorem faqs)
seeing into the UV spectrum after Cataract Surgery with Crystalens

I've been very happy so far with the Crystalens implant for Cataract Surgery [...] one unexpected/interesting aspect is I see a violet glow that others do not - perhaps I'm more sensitive to the low end of the visible light spectrum.
(via Tony Finch)

(tags: via:fanf science perception augmentation uv light sight cool cataracts surgery lens eyes)
Instagram: Making the Switch to Cassandra from Redis, a 75% 'Insta' Savings

shifting data out of RAM and onto SSDs -- unsurprisingly, big savings.
a 12 node cluster of EC2 hi1.4xlarge instances; we store around 1.2TB of data across this cluster. At peak, we're doing around 20,000 writes per second to that specific cluster and around 15,000 reads per second. We've been really impressed with how well Cassandra has been able to drop into that role.

(tags: ram ssd cassandra databases nosql redis instagram storage ec2)
Council of the European Union Releases Draft Compromise Text on the Proposed EU Data Protection Regulation

Oh god. this sounds like an impending privacy and anti-spam disaster. "business-focussed":
Overall, the [Irish EC Presidency’s] draft compromise text can be seen as a more business-focused, pragmatic approach. For example, the Presidency has drafted an additional recital (Recital 3a), clarifying the right to data protection as a qualified right, highlighting the principle of proportionality and importance of other competing fundamental rights, including the freedom to conduct a business.
and some pretty serious relaxation of how consent for use of personal data is measured:
The criterion for valid consent is amended from “explicit” to “unambiguous,” except in the case of processing special categories of data (i.e., sensitive personal data) (Recital 25 and Article 9(2)). This reverts to the current position under the Data Protection Directive and is a concession to the practical difficulty of obtaining explicit consent in all cases. The criteria for valid consent are further relaxed by the ability to obtain consent in writing, orally or in an electronic manner, and where technically feasible and effective, valid consent can be given using browser settings and other technical solutions. Further, the requirement that the controller bear the burden of proof that valid consent was obtained is limited to a requirement that the controller be able to “demonstrate” that consent was obtained (Recital 32 and Article 7(1)). The need for “informed” consent is also relaxed from the requirement to provide the full information requirements laid out in Article 14 to the minimal requirements that the data subject “at least” be made aware of: (1) the identity of the data controller, and (2) the purpose(s) of the processing of their personal data (Recitals 33 and 48).

(tags: anti-spam privacy data-protection spam ireland eu ec regulation)
LobbyPlag

wow, great view of which MEPs are eviscerating the EU's data protection regime:
Currently the EU is negotiating about new data privacy laws. This new EU Regulation will replace all existing national laws on data privacy. Here you can see a general overview which Members of the European Parliament (MEPs) are pushing for more or less data privacy. Choose a country, a political group or a MEP from the “Top 10” list to find out more.

(tags: europe eu privacy data-protection datap ec regulation meps)

Links for 2013-06-07

Published June 7, 2013

EDRI's comments on EU proposals to reform privacy law

Amendments 762, 764 and 765 in particular seem to move portions of the law from "confirmed opt-in required" to "opt-out is ok" -- which sounds like a risk where spam and unsolicited actions on a person's data are concerned

(tags: law privacy anti-spam eu spam edri)

Links for 2013-06-06

Published June 6, 2013

EC2Instances.info

'Easy Amazon EC2 Instance Comparison'. a nice UI on the various EC2 instance types on offer with their key attributes. Misses out availability of EBS-optimized instances though

(tags: amazon ec2 aws comparison pricing)
HyperLevelDB: A High-Performance LevelDB Fork

'HyperLevelDB improves on LevelDB in two key ways: Improved parallelism: HyperLevelDB uses more fine-grained locking internally to provide higher throughput for multiple writer threads. Improved compaction: HyperLevelDB uses a different method of compaction that achieves higher throughput for write-heavy workloads, even as the database grows.'

(tags: leveldb storage key-value-stores persistence unix libraries open-source)
EU Council deals killer blow to privacy reforms

'In an extraordinary result for corporate lobbying, direct marketing would by default be considered a legitimate data process and would therefore – by default – be lawful.'

(tags: eu politics data-protection privacy anti-spam spam eu-council direct-marketing)

Links for 2013-06-05

Published June 5, 2013

Care and Feeding of Large Scale Graphite Installations [slides]

good docs for large-scale graphite use: 'Tip and tricks of using and scaling graphite. First presented at DevOpsDays Austin Texas 2013-05-01'

(tags: graphite devops ops metrics dashboards sysadmin)
Low-latency stock trading "jumps the gun" due to default NTP configuration settings

On June 3, 2013, trading in SPY exploded at 09:59:59.985, which is 15 milliseconds before the ISM's Manufacturing number released at 10:00:00. Activity in the eMini (traded in Chicago), exploded at 09:59:59.992, which is 8 milliseconds before the news release, but 7 milliseconds after SPY. Note how SPY and the eMini traded within a millisecond for the Consumer Confidence release last week, but the eMini lagged SPY by about 7 milliseconds for the ISM Manufacturing release. The simultaneous trading on Consumer Confidence is because that number is released at the same time in both NYC and Chicago. The ISM Manufacturing number is probably released on a low latency feed in NYC, and then takes 5-7 milliseconds, due to the speed of light, to reach Chicago. Either the clock used to release the ISM number was 15 milliseconds fast, or someone (correctly) jumped the gun. Update: [...] The clock used to release the ISM was indeed, 15 milliseconds fast. This could be from using the default setting of many NTP clients, which allows the clock to drift up to about 16 milliseconds before adjusting time.

(tags: ntp time synchronization spy trading stocks low-latency clocks internet)
the infamous 2008 S3 single-bit-corruption outage

Neat, I didn't realise this was publicly visible. A single corrupted bit infected the S3 gossip network, taking down the whole S3 service in (iirc) one region:
We've now determined that message corruption was the cause of the server-to-server communication problems. More specifically, we found that there were a handful of messages on Sunday morning that had a single bit corrupted such that the message was still intelligible, but the system state information was incorrect. We use MD5 checksums throughout the system, for example, to prevent, detect, and recover from corruption that can occur during receipt, storage, and retrieval of customers' objects. However, we didn't have the same protection in place to detect whether [gossip state] had been corrupted. As a result, when the corruption occurred, we didn't detect it and it spread throughout the system causing the symptoms described above. We hadn't encountered server-to-server communication issues of this scale before and, as a result, it took some time during the event to diagnose and recover from it. During our post-mortem analysis we've spent quite a bit of time evaluating what happened, how quickly we were able to respond and recover, and what we could do to prevent other unusual circumstances like this from having system-wide impacts. Here are the actions that we're taking: (a) we've deployed several changes to Amazon S3 that significantly reduce the amount of time required to completely restore system-wide state and restart customer request processing; (b) we've deployed a change to how Amazon S3 gossips about failed servers that reduces the amount of gossip and helps prevent the behavior we experienced on Sunday; (c) we've added additional monitoring and alarming of gossip rates and failures; and, (d) we're adding checksums to proactively detect corruption of system state messages so we can log any such messages and then reject them.
This is why you checksum all the things ;)

(tags: s3 aws post-mortems network outages failures corruption grey-failures amazon gossip)

Links for 2013-06-04

Published June 4, 2013

The network is reliable

Aphyr and Peter Bailis collect an authoritative list of known network partition and outage cases from published post-mortem data:
This post is meant as a reference point -- to illustrate that, according to a wide range of accounts, partitions occur in many real-world environments. Processes, servers, NICs, switches, local and wide area networks can all fail, and the resulting economic consequences are real. Network outages can suddenly arise in systems that are stable for months at a time, during routine upgrades, or as a result of emergency maintenance. The consequences of these outages range from increased latency and temporary unavailability to inconsistency, corruption, and data loss. Split-brain is not an academic concern: it happens to all kinds of systems -- sometimes for days on end. Partitions deserve serious consideration.
I honestly cannot understand people who didn't think this was the case. 3 years reading (and occasionally auto-cutting) Amazon's network-outage tickets as part of AWS network monitoring will do that to you I guess ;)

(tags: networking outages partition cap failure fault-tolerance)
Cities 05

from Atelier Olschinsky. 'Fine Art Print on Hahnemuehle Photo Rag Bright White 310g; Limited Edition / Numbered and signed by the artist'

(tags: art graphics cities prints want via:bdif)

Links for 2013-05-30

Published May 30, 2013

incompetent error-handling code in the mongo-java-driver project

an unexplained invocation of Math.random() in the exception handling block of this MongoDB java driver class causes roflscale lols in the github commit notes. http://stackoverflow.com/a/16833798 has more explanation.

(tags: github commits mongodb webscale roflscale random daily-wtf wtf)
Hermetic Servers

'What is a Hermetic Server? The short definition would be a “server in a box”. If you can start up the entire server on a single machine that has no network connection AND the server works as expected, you have a hermetic server! This is a special case of the more general “hermetic” concept which applies to an isolated system not necessarily on a single machine. Why is it useful to have a hermetic server? Because if your entire [system under test] is composed of hermetic servers, it could all be started on a single machine for testing; no network connection necessary! The single machine could be a physical or virtual machine.' These also qualify as "fakes", using the terminology Martin Fowler suggests at http://martinfowler.com/bliki/TestDouble.html , I think

(tags: google testing hermetic-servers test test-doubles unit-testing)
Don’t Overuse Mocks

hooray, sanity from the Google Testing blog. this has been a major cause of pain in the past, dealing with tricky rewrites of mock-heavy unit test code

(tags: mocking testing tests google mocks unit-testing)