New Tweets per second record, and how | Twitter Blog
How Twitter scaled up massively in 3 years -- replacing Ruby with the JVM, adopting SOA and custom sharding. Good summary post, looking forward to more techie details soon
(tags: twitter performance scalability jvm ruby soa scaling)
Category: Uncategorized
Massive Overblocking Hits Hundreds Of UK Sites | Techdirt
Customers of UK ISPs Virgin Media and Be Broadband found they were unable to access hundreds of sites, including the Radio Times and Zooniverse, due to a secret website-blocking court order from the Premier League. PC Pro believe that 3 other ISPs' customers were also affected. According to customers reverse-engineering, it looks like the court order incorrectly demanded the blocking of "http-redirection-a.dnsmadeeasy.com", a HTTP redirector operated by the DNS operator DNSMadeEasy.
The fact that the court could issue an order which didn’t see this coming and that the ISPs would act on it without checking that what they were doing was sensible is, in my opinion, extremely worrying.
(tags: overblocking censorship org uk sky be-broadband virgin-media dnsmadeeasy filtering premier-league false-positives isps)
Beating the CAP Theorem Checklist
'Your ( ) tweet ( ) blog post ( ) marketing material ( ) online comment advocates a way to beat the CAP theorem. Your idea will not work. Here is why it won't work:' lovely stuff, via Bill De hOra
(tags: via:dehora funny cap cs distributed-systems distcomp networking partitions state checklists)
'Sparrow: Scalable Scheduling for Sub-Second Parallel Jobs' [tech report]
(tags: scheduling sparrow load-balancing algorithms distributed-systems distcomp papers)
From derelict to delightful: Art Tunnel Smithfield
I do like the Art Tunnel. Smithfield is a great demo of reclaiming Dublin's increasing dereliction and I hope the DCC allow this to continue
(tags: smithfield d7 dublin ireland art art-tunnel reclamation derelict economy dcc)
How A 'Deviant' Philosopher Built Palantir, A CIA-Funded Data-Mining Juggernaut - Forbes
Palantir -- the free-market state-surveillance data-retention nightmare. At the end of this slightly overenthusiastic puff piece we get to:
Katz-Lacabe wasn’t impressed. Palantir’s software, he points out, has no default time limits -- all information remains searchable for as long as it’s stored on the customer’s servers. And its auditing function? “I don’t think it means a damn thing,” he says. “Logs aren’t useful unless someone is looking at them.” [...] What if Palantir’s audit logs -- its central safeguard against abuse -- are simply ignored? Karp responds that the logs are intended to be read by a third party. In the case of government agencies, he suggests an oversight body that reviews all surveillance -- an institution that is purely theoretical at the moment. “Something like this will exist,” Karp insists. “Societies will build it, precisely because the alternative is letting terrorism happen or losing all our liberties.” Palantir’s critics, unsurprisingly, aren’t reassured by Karp’s hypothetical court. Electronic Privacy Information Center activist Amie Stepanovich calls Palantir “naive” to expect the government to start policing its own use of technology. The Electronic Frontier Foundation’s Lee Tien derides Karp’s argument that privacy safeguards can be added to surveillance systems after the fact. “You should think about what to do with the toxic waste while you’re building the nuclear power plant,” he argues, “not some day in the future.”
(tags: palantir data-retention privacy surveillance state cia forbes andy-greenberg eff epic snooping)
London orders rubbish bins to stop collecting smartphone data
Good call.
AUTHORITIES IN LONDON’S financial district have ordered a company using high-tech rubbish bins to collect smartphone data from passers-by to cease its activities, and referred the firm to the privacy watchdog. The City of London Corporation, which manages the so-called “Square Mile” around St Paul’s Cathedral, said such data collection “needs to stop” until there could be a public debate about it.
(via Daragh O'Brien)(tags: via:dobrien privacy phones wifi mac-address data-protection data-retention renew london bins snooping sniffing)
The Irish State wishes to uninvent computers with new FOI Bill
Mark Coughlan noticed this:
The FOI body shall take reasonable steps to search for and extract the records to which the request relates, having due regard to the steps that would be considered reasonable if the records were held in paper format.
In other words, pretend that computerised database technology, extant since the 1960s, does not exist. Genius (via Simon McGarr)(tags: funny irish ireland foi open-data freedom computerisation punch-cards paper databases)
Hamlet is Banned in the British Library
Pretty hilarious account of the usual, run-of-the-mill overblocking in the British Library from last weekend:
I asked [the information desk] if they saw the problem, perhaps just the symbolism, of Hamlet being banned in the British Library. They shrugged. The IT department said there was nothing to be done, as it was only the British Library's wifi service that was blocking Hamlet, and the British Library's wifi service, they seemed sure, had nothing to do with the British Library. They were merely ships that passed in the night. Children crying to each other from either bank of an uncrossable river.
(tags: censorship filters overblocking hamlet shakespeare literature funny sad british-library blocking)
The algorithm for a perfectly balanced photo gallery – Summit Stories from Crispy Mountain
Nice application of a partitioning exhaustive search algorithm using dynamic programming (via Tom)
(tags: algorithms javascript python dynamic-programming partitioning images gallery)
-
An amazing Soviet map of the US economy from 1979. Wonderful piece of cold war memorabilia
(tags: cold-war ussr usa mapping maps soviet economy memorabilia)
Randomly Failed! The State of Randomness in Current Java Implementations
This would appear to be the paper which sparked off the drama around BitCoin thefts from wallets generated on Android devices:
The SecureRandom PRNG is the primary source of randomness for Java and is used e.g., by cryptographic operations. This underlines its importance regarding security. Some of fallback solutions of the investigated implementations [are] revealed to be weak and predictable or capable of being in?uenced. Very alarming are the defects found in Apache Harmony, since it is partly used by Android.
More on the BitCoin drama: https://bitcointalk.org/index.php?topic=271486.40 , http://bitcoin.org/en/alert/2013-08-11-android(tags: android java prng random security bugs apache-harmony apache crypto bitcoin papers)
The Getty Museum offers a huge chunk of their collection for free use
We’ve launched the Open Content Program to share, freely and without restriction, as many of the Getty’s digital resources as possible. The initial focus of the Open Content Program is to make available all images of public domain artworks in the Getty’s collections. Today we’ve taken a first step toward this goal by making roughly 4,600 high-resolution images of the Museum’s collection free to use, modify, and publish for any purpose. Why open content? Why now? The Getty was founded on the conviction that understanding art makes the world a better place, and sharing our digital resources is the natural extension of that belief. This move is also an educational imperative. Artists, students, teachers, writers, and countless others rely on artwork images to learn, tell stories, exchange ideas, and feed their own creativity. In its discussion of open content, the most recent Horizon Report, Museum Edition stated that “it is now the mark—and social responsibility—of world-class institutions to develop and share free cultural and educational resources.” I agree wholeheartedly.
(tags: getty art via:tupp_ed open-content free images pictures paintings museums)
The NSA Is Commandeering the Internet - Bruce Schneier
You, an executive in one of those companies, can fight. You'll probably lose, but you need to take the stand. And you might win. It's time we called the government's actions what it really is: commandeering. Commandeering is a practice we're used to in wartime, where commercial ships are taken for military use, or production lines are converted to military production. But now it's happening in peacetime. Vast swaths of the Internet are being commandeered to support this surveillance state. If this is happening to your company, do what you can to isolate the actions. Do you have employees with security clearances who can't tell you what they're doing? Cut off all automatic lines of communication with them, and make sure that only specific, required, authorized acts are being taken on behalf of government. Only then can you look your customers and the public in the face and say that you don't know what is going on -- that your company has been commandeered.
(tags: nsa america politics privacy data-protection data-retention law google microsoft security bruce-schneier)
We are the Operations team at Etsy. Ask us anything! : IAmA
great AMA from Etsy ops staff (via Nelson)
(tags: etsy reddit devops ops architecture ama via:nelson)
Building a panopticon: The evolution of the NSA’s XKeyscore
This is an amazing behind-the-scenes look at the architecture of XKeyscore, and how it evolved from an earlier large-scale packet interception system, Narus' Semantic Traffic Analyzer. XKeyscore is a federated, distributed system, with distributed packet-capture agents running on Linux, built with protocol-specific plugins, which write 3 days of raw packet data, and 30 days of intercept metadata, to local buffer stores. Central queries are then 'distributed across all of the XKeyscore tap sites, and any results are returned and aggregated'. Dunno about you, but this is pretty much how I would have built something like this, IMO....
(tags: panopticon xkeyscore nsa architecture scalability packet-capture narus sniffing snooping interception lawful-interception li tapping)
Police may block recording with Apple patent
Creeptastic, Apple.
Apple has patented a piece of technology which would allow government and police to block transmission of information, including video and photographs, from any public gathering or venue they deem “sensitive”, and “protected from externalities.” In other words, these powers will have control over what can and cannot be documented on wireless devices during any public event. And while the company says the affected sites are to be mostly cinemas, theaters, concert grounds and similar locations, Apple Inc. also says “covert police or government operations may require complete ‘blackout’ conditions.”
(tags: apple iphone via:devore creepy police photos recording remote-control phones blackout)
Ivan Risti?: Defending against the BREACH attack
One interesting response to this HTTPS compression-based MITM attack:
The award for least-intrusive and entirely painless mitigation proposal goes to Paul Querna who, on the httpd-dev mailing list, proposed to use the HTTP chunked encoding to randomize response length. Chunked encoding is a HTTP feature that is typically used when the size of the response body is not known in advance; only the size of the next chunk is known. Because chunks carry some additional information, they affect the size of the response, but not the content. By forcing more chunks than necessary, for example, you can increase the length of the response. To the attacker, who can see only the size of the response body, but not anything else, the chunks are invisible. (Assuming they're not sent in individual TCP packets or TLS records, of course.) This mitigation technique is very easy to implement at the web server level, which makes it the least expensive option. There is only a question about its effectiveness. No one has done the maths yet, but most seem to agree that response length randomization slows down the attacker, but does not prevent the attack entirely. But, if the attack can be slowed down significantly, perhaps it will be as good as prevented.
(tags: mitm attacks hacking security compression http https protocols tls ssl tcp chunked-encoding apache)
Totoro Isn't All Cute. For Some, He's the God of Death.
"Everyone, do not worry," read the Studio Ghibli statement. "There's absolutely no truth or configuration that Totoro is the God of Death or that Mei is dead in My Neighbor Totoro."
(tags: totoro studio-ghibli death morbid japan film movies urban-legends alternate plot)
Hogan describes bin charge increases as ‘opportunistic’ - Environmental News | The Irish Times
LOL Greyhound.
Greyhound Recycling last month announced increases of 50 cents a month for customers on a flat monthly charge, 50 cents for each black bin collection for customers who pay by the lift and two cents a kilo for customers who pay by weight only. In a letter to customers, it described the levy as “tax imposed by the Government of Ireland on the people of Ireland”. However, following a complaint to the [National Consumer Agency] that the by-weight increase was 76 per cent more than the [government landfill levy] increase, Greyhound reduced the charge to an additional one cent a kilo.
(tags: greyhound ireland dublin rubbish recycling consumer ripoffs tax)
IrelandOffline broadband availability map
Marking the locations of broadband options in your area, along with VDSL cabinets, local exchanges, and wireless ISP coverage, and the landing sites of submarine cables (presumably from submarinecablemap.com data)
(tags: irelandoffline cables network internet ireland coverage wisps vdsl broadband)
Filters 'not a silver bullet' that will stop perverts, warns Interpol chief - Independent.ie
Sunday Independent interview with Interpol assistant director Mick Moran:
Moran spoke out after child welfare organisations here called on the Government to follow the UK's example by placing anti-pornography filters on Irish home broadband connections. The Irish Society for the Prevention of Cruelty to Children argued that pornography was damaging to young children and should be removed from their line of sight. But Moran warned this would only lull parents into a false sense of security. "If we imagine the access people had to porn in the past – that access is now complete and total. They have access to the most horrific material out there. We now need to focus on parental responsibility about how kids are using the internet."
(tags: mick-moran cam interpol policing ispcc filtering parenting children broadband)
-
Gil Tene raises an extremely good point about load testing, high-percentile response-time measurement, and behaviour when testing a system under load:
I've been harping for a while now about a common measurement technique problem I call "Coordinated Omission" for a while, which can often render percentile data useless. [...] I believe that this problem occurs extremely frequently in test results, but it's usually hard to deduce it's existence purely from the final data reported. But every once in a while, I see test results where the data provided is enough to demonstrate the huge percentile-misreporting effect of Coordinated Omission based purely on the summary report. I ran into just such a case in Attila's cool posting about log4j2's truly amazing performance, so I decided to avoid polluting his thread with an elongated discussion of how to compute 99.9%'ile data, and started this topic here. That thread should really be about how cool log4j2 is, and I'm certain that it really is cool, even after you correct the measurements. [...] Basically, I think that the 99.99% observation computation is wrong, and demonstrably (using the data in the graph data posted) exhibits the classic "coordinated omission" measurement problem I've been preaching about. This test is not alone in exhibiting this, and there is nothing to be ashamed of when you find yourself making this mistake. I only figured it out after doing it myself many many times, and then I noticed that everyone else seems to also be doing it but most of them haven't yet figured it out. In fact, I run into this issue so often in percentile reporting and load testing that I'm starting to wonder if coordinated omission is there in 99.9% of latency tests ;-)
(tags: measurement testing latency load-testing gil-tene coordinated-omission validity log4j percentiles)
Xerox scanners/photocopiers randomly alter numbers in scanned documents · D. Kriesel
Pretty major Xerox fail: photocopied/scanned docs are found to have replaced the digit '6' with '8', due to a poor choice of compression techniques:
Several mails I got suggest that the xerox machines use JBIG2 for compression. This algorithm creates a dictionary of image patches it finds “similar”. Those patches then get reused instead of the original image data, as long as the error generated by them is not “too high”. Makes sense. This also would explain, why the error occurs when scanning letters or numbers in low resolution (still readable, though). In this case, the letter size is close to the patch size of JBIG2, and whole “similar” letters or even letter blocks get replaced by each other.
(tags: jbig2 compression xerox photocopying scanning documents fonts arial image-compression images)
The 1940s origins of Whataboutery
The exchange is indicative of a rhetorical strategy known as 'whataboutism', which occurs when officials implicated in wrongdoing whip out a counter-example of a similar abuse from the accusing country, with the goal of undermining the legitimacy of the criticism itself. (In Latin, this rhetorical defense is called tu quoque, or "you, too.")
(tags: history language whataboutism whataboutery politics 1940s russia ussr)
-
A highly-available key value store for shared configuration and service discovery. etcd is inspired by zookeeper and doozer, with a focus on: Simple: curl'able user facing API (HTTP+JSON); Secure: optional SSL client cert authentication; Fast: benchmarked 1000s of writes/s per instance; Reliable: Properly distributed using Raft; Etcd is written in go and uses the raft consensus algorithm to manage a highly availably replicated log.
One of the core components of CoreOS -- http://coreos.com/ .(tags: configuration distributed raft ha doozer zookeeper go replication consensus-algorithm etcd coreos)
_In Search of an Understandable Consensus Algorithm_, Diego Ongaro and John Ousterhout, Stanford
Raft is a consensus algorithm for managing a replicated log. It produces a result equivalent to Paxos, and it is as efficient as Paxos, but its structure is different from Paxos; this makes Raft more understandable than Paxos and also provides a better foundation for building practical systems. In order to enhance understandability, Raft separates the key elements of consensus, such as leader election and log replication, and it enforces a stronger degree of coherency to reduce the number of states that must be considered. Raft also includes a new mechanism for changing the cluster membership, which uses overlapping majorities to guarantee safety. Results from a user study demonstrate that Raft is easier for students to learn than Paxos.
(tags: distributed algorithms paxos raft consensus-algorithms distcomp leader-election replication clustering)
Extract from 1973 HM Treasury document concerning post-nuclear-attack responses
'Extract from 1973 HM Treasury document concerning post-nuclear-attack monetary policy' includes this amazing snippet:
[Contingency] ...(d) a total nuclear attack employing high power missiles which would destroy all but a small percentage of the UK population and almost all physical assets or civilised life. [...] As for (d), the money policy would of course be absurdly unrealistic for the few surviving administrators and politicians as they struggled to organise food and shelter for the tiny bands of surviving able-bodied and the probably larger number of sick and dying. Most of the other departments contingency planning might also be irrelevant in such a situation. Within a fairly short time the survivors would evacuate the UK and try to find some sort of life in less-effected countries (southern Ireland?).
Hey, at least they were considering these scenarios. (via Charlie Stross)(tags: nuclear attack contingency government monetary policy uk ireland history 1960s via:cstross insane fallout)
WhatClinic.com’s zombie recruitment video. We want your brains...
BRAAAAAAINS
(tags: whatclinic braaaaaains zombies funny video recruitment)
-
A very tasty-looking guac recipe, from h2g market veteran Lily Ramirez-Foran -- her family's traditional one. I like the addition of pomegranate seeds
(tags: guacamole avocados pomegranate recipes lily-ramirez-foran food h2g)
RA Forum: Button Factory - August 14th Simonetti (Goblin) Horror Project
LIVE - for the first time ever in Ireland, Claudio Simonetti (Goblin) & band will perform the classics of horror movie scores by seminal Italian progressive rock band Goblin, Simonetti himself and possibly one or two curve-balls ! Horror rock maestro Claudio Simonetti will fulfill fans’ dreams and nightmares as the band perform the notably eerie soundtracks from Suspriria, Tenebre, Dawn of the Dead, Creepers, Demons and more! This epic show will also feature an intense A/V screening element featuring the electric scenes from some of these revered classics of horror and giallo.
Python Infrastructure Status - SSL Verification Errors on PyPI
There appears to be a problem affecting a number of users where SSL verification errors will be shown saying "pypi.python.org" does not match "addvocate.com". As Best we can tell this appears to be related to the ISP. It seems to be affecting folks using O2 or O2 related companies. We've also reports of it affecting people using Free. Cause appears to be one of the IP addresses returned in the Geo DNS for Europe returning a certificate for addvocate.com. It's not clear at this time *why* that IP address is returning a certificate for addvocate.com.
Turned out to be a routing loop in the fast.ly London POP (via Mick Twomey)(tags: via:micktwomey o2 censorship filtering internet ssl tls pypi python geodns pki)
"Toxic" behaviour in games is largely from "usually good" people
Only 5% of toxic behavior comes from toxic people; 77% of it comes from people who are usually good. That finding has all sorts of implications for how to stop toxic behavior in an online community. It’s not enough to just ban the jerks; good people have bad days too. Instead you have to teach the whole community what the community standards are. And quickly identify people who are having a bad day, intervene before their toxicity infects too many other people.
Great post by Nelson.(tags: gaming toxic bad-behaviour trolls abuse online games league-of-legends)
-
OpenDNS's simple DNS-based blocking of dodgy content. Will need to set this up on the home router now that the kids are surfing...
(tags: opendns dns blocking filtering home porn familyshield)
Mail from the (Velvet) Cybercrime Underground
Brian Krebs manages to thwart an attempted framing for possession of Silk Road heroin. bloody hell
(tags: silk-road drugs bitcoin ecommerce brian-krebs crime framed cybercrime russia scary law-enforcement)
Clare dolphin attacks fourth swimmer in a month as Dusty protects her patch
Dusty the Dolphin has gone bad!
Locals say the three-metre long mammal has been responsible for injuring a number of people over the past two years, with several of those being hospitalised with significant injuries. She struck a 40-year-old woman in the abdomen earlier this month. In response, lifeguards now fly the red danger flag any time the dolphin enters the area. The Irish Whale and Dolphin Group has also erected warning posters at Doolin pier. IWDG coordinator Dr Simon Berrow said: “It is our policy to discourage people swimming with whales and dolphins in Ireland. “We’ve drafted a poster recommending people do not swim with Dusty, but if they must, then they should respect her as a wild dolphin and not grab, lunge or chase after her. If she shows aggressive behaviour or is boisterous they should leave the water.”
(tags: dusty dolphins wildlife nature fanore county-clare ireland swimming doolin animals)
Why YouTube buffers: The secret deals that make -- and break -- online video
Should ISPs be required to ensure they have sufficient upstream bandwidth to video sites like YouTube and Netflix?
"Verizon has chosen to sell its customers a product [Netflix] that they hope those customers don't actually use," Schaeffer said. "And when customers use it and request movies, they have not ensured there is adequate connectivity to get that video content back to their customers."
(tags: netflix youtube streaming video isps net-neutrality peering comcast bandwidth upstream)
ISPAI Responds to Porn Filtering Debacle
Quite a strong statement:
The issue of access to age-inappropriate content is not a new matter and it is important not to have “knee-jerk” reactions which don’t solve the perceived problem and have major implications for the public’s right to access information in general. Notably the European Commission, as stated by vice-president Nellie Kroes [jm: sic], has come out strongly against blocking of the Internet, seeing it as an important platform for freedom of speech and she intends to “guarantee access without restriction.” We in Ireland would do well to consider carefully the impact that any rash adoption or attempted copying of UK measures might have here in the light of current and future EU legislation and policy.
(tags: ispai filtering overblocking david-cameron porn internet ireland politics blocking web uk)
-
Excellent weather site, displaying beautifully interpolated rainfall visualization, from the team behind the Dark Sky app
(tags: weather ireland dark-sky apps iphone ipad forecast rain dataviz mapping via:marcomorain)
Applied Cryptography, Cryptography Engineering, and how they need to be updated
Whoa, I had no idea my knowledge of crypto was so out of date! For example:
ECC is going to replace RSA within the next 10 years. New systems probably shouldn’t use RSA at all.
This blogpost is full of similar useful guidelines and rules of thumb. Here's hoping I don't need to work on a low-level cryptosystem any time soon, as the risk of screwing it up is always high, but if I do this is a good reference for how it needs to be done nowadays.(tags: thomas-ptacek crypto cryptography coding design security aes cbc ctr ecb hmac side-channels rsa ecc)
When 'Smart Homes' Get Hacked: I Haunted A Complete Stranger's House Via The Internet - Forbes
Hardware designers do their usual trick -- omit the whole security part:
[Trustwave's Crowley] found security flaws that would allow a digital intruder to take control of a number of sensitive devices beyond the Insteon systems, from the Belkin WeMo Switch to the Satis Smart Toilet. Yes, they found that a toilet was hackable. You only have to have the Android app for the $5,000 toilet on your phone and be close enough to the toilet to communicate with it. “It connects through Bluetooth, with no username or password using the pin ‘0000’,” said Crowley. “So anyone who has the application on their phone and was connected to the network could control anyone else’s toilet. You could turn the bidet on while someone’s in there.”
(tags: home automation insteon security hardware fail attacks bluetooth han trustwave belkin satis)
-
Missed bookmarking this news --
After years of debate and controversy the French Government has finally backtracked on the law which allowed errant subscribers to be disconnected from the Internet. This morning a decree was published which removed the possibility for file-sharers to have their connections cut for copyright infringement. Instead, those caught by rightsholders will now be subjected to a system of automated fines.
(tags: france legal ip piracy filesharing three-strikes)
BBC News - Chinese firm Huawei controls net filter praised by PM
Talk Talk's porn-filtering, system praised by David Cameron in the UK as a model for porn filtering for the country's ISPs, is operated by Huawei. Of course, there's no possible problems with allowing Huawei, with its alleged close ties to the Chinese government, operate a state-wide internet censorship system in the UK without any functioning oversight, right? ;) Also worth noting: all TalkTalk traffic passes through the Huawei filtering infrastructure, even when the customer has "opted in".
(tags: huawei talk-talk oversight overblocking politics china uk david-cameron filtering censorship)
Branded to death | Features | Times Higher Education
The most abominable monster now threatening the intellectual health and the integrity of pure enquiry as well as conscientious teaching is the language of advertising, or better, the machinery of propaganda. Any number of critics from within university walls have warned the people at large and academics in particular of the way the helots of advertising and the state police of propaganda bloat and distort the language of thoughtful description, peddle with a confident air generalisations without substance, and serenely circulate orotund lies while ignoring their juniors’ rebuttals and abuse.
Relevant to this argument -- http://arstechnica.com/tech-policy/2013/07/the-webs-longest-nightmare-ends-eolas-patents-are-dead-on-appeal/ notes that 'the role of the University of California [was] one of the most perplexing twists in the Eolas saga. The university kept a low profile during the lead-up to trial; but once in Texas, Eolas' lawyers constantly reminded the jury they were asserting "these University of California patents." A lawyer from UC's patent-licensing division described support for Eolas at trial by simply saying that the university "stands by its licensees."'(tags: branding advertising newspeak universities third-level eolas higher-education education research university-of-california ucb patents ip swpats)
Twilio Billing Incident Post-Mortem
At 1:35 AM PDT on July 18, a loss of network connectivity caused all billing redis-slaves to simultaneously disconnect from the master. This caused all redis-slaves to reconnect and request full synchronization with the master at the same time. Receiving full sync requests from each redis-slave caused the master to suffer extreme load, resulting in performance degradation of the master and timeouts from redis-slaves to redis-master. By 2:39 AM PDT the host’s load became so extreme, services relying on redis-master began to fail. At 2:42 AM PDT, our monitoring system alerted our on-call engineering team of a failure in the Redis cluster. Observing extreme load on the host, the redis process on redis-master was misdiagnosed as requiring a restart to recover. This caused redis-master to read an incorrect configuration file, which in turn caused Redis to attempt to recover from a non-existent AOF file, instead of the binary snapshot. As a result of that failed recovery, redis-master dropped all balance data. In addition to forcing recovery from a non-existent AOF, an incorrect configuration also caused redis-master to boot as a slave of itself, putting it in read-only mode and preventing the billing system from updating account balances.
See also http://antirez.com/news/60 for antirez' response. Here's the takeaways I'm getting from it: 1. network partitions happen in production, and cause cascading failures. this is a great demo of that. 2. don't store critical data in Redis. this was the case for Twilio -- as far as I can tell they were using Redis as a front-line cache for billing data -- but it's worth saying anyway. ;) 3. Twilio were just using Redis as a cache, but a bug in their code meant that the writes to the backing SQL store were not being *read*, resulting in repeated billing and customer impact. In other words, it turned a (fragile) cache into the authoritative store. 4. they should probably have designed their code so that write failures would not result in repeated billing for customers -- that's a bad failure path. Good post-mortem anyway, and I'd say their customers are a good deal happier to see this published, even if it contains details of the mistakes they made along the way.(tags: redis caching storage networking network-partitions twilio postmortems ops billing replication)
Tuning and benchmarking Java 7's Garbage Collectors: Default, CMS and G1
Rudiger Moller runs through a typical GC-tuning session, in exhaustive detail
-
[JVM] GC is a difficult, specialised area that can be very frustrating for busy developers or devops folks to deal with. The JVM has a number of Garbage Collectors and a bewildering array of switches that can alter the behaviour of each collector. Censum does all of the parsing, number crunching and statistical analysis for you, so you don't have to go and get that PhD in Computer Science in order to solve your GC performance problem. Censum gives you straight answers as opposed to a ton of raw data. can eat any GC log you care to throw at it. is easy to install and use.
Commercial software, UKP 495 per license.
The Web’s longest nightmare ends: Eolas patents are dead on appeal | Ars Technica
Ding dong, the troll is dead! Ars Technica with a great description of the Eolas web patent fiasco, and the UC system's sorry role. I blame Bayh-Dole for creating this insane mindset where places of learning are forced to "monetize" their research.
Under Doyle's conception of his own invention, practically any modern website owed him royalties. Playing a video online or rotating an image on a shopping website were "interactive" features that infringed his patents. And unlike many "patent trolls" who simply settle for settlements just under the cost of litigation, Doyle's company had the chops, the lawyers, and the early filing date needed to extract tens of millions of dollars from the accused companies. [...] The role of the University of California is one of the most perplexing twists in the Eolas saga. The university kept a low profile during the lead-up to trial; but once in Texas, Eolas lawyers constantly reminded the jury they were asserting "these University of California patents." A lawyer from UC's patent-licensing division described support for Eolas at trial by simply saying that the university "stands by its licensees." (Eolas was technically an exclusive licensee of the UC-owned patent, which also gives it the right to sue.) At the same time, the University of California, and the Berkeley campus in particular, was a key institution in creating early web technology. While UC lawyers cooperated with the plaintiffs, two UC Berkeley-trained computer scientists were key witnesses in the effort to demolish the Eolas patents. Pei-Yuan Wei created the pioneering Viola browser, a key piece of prior art, while he was a student at UC-Berkeley in the early 1990s. Scott Silvey, another UC-Berkeley student at that time, testified about a program he made called VPlot, which allowed users to rotate an image of an airplane using Wei's browser. VPlot and Viola were demonstrated to Sun Microsystems in May 1993, months before Doyle claimed to have conceived of his invention.
(tags: patents swpats eolas web patent-trolls ucb universities research viola plugins berkeley)
Irish Comms Minister Pat Rabbitte ignores calls for State role in blocking online porn
Good call.
Mr Rabbitte says that legal concerns attached to mandatory filters, as well as a fear of imposing censorship, have persuaded him against trying to force ISPs to impose mandatory pornography-blocking internet filters. "I remain to be convinced that blanket censorship or a default-on blocker is the correct or workable response," he said. "Even if it were possible to ensure that such measures were not easily circumvented or didn't inadvertently block perfectly acceptable content, the principled question of whether the State should be encouraging service providers to filter or block content to all users, regardless of whether there are children resident, would still arise."
(tags: pat-rabbitte internet filtering censorship blocking porn overblocking default-on isps ireland)
-
Hosted IRC, 20 users for $50/month. Useful now that Google have fecked up Chat entirely
(tags: irc chat collaboration groupware hosted-services)
UK Internet censorship plan no less stupid than it was last year - Boing Boing
Cory Doctorow's long list of articles describing how the UK's censorware-for-all plan is going to fail. I like this bit:
When we argued our case to the vendor's representative, he was categorical: any nudity, anywhere on [Boing Boing], makes it into a "nudity site" for the purposes of blocking. The vendor went so far as to state that a single image of Michelangelo's David, on one page among hundreds of thousands on a site, would be sufficient grounds for a nudity classification. I suspect that none of the censorship advocates in the Lords understand that the offshore commercial operators they're proposing to put in charge of the nation's information access apply this kind of homeopathic standard to objectionable material.
I guess this means the Daily Mail will be similarly classified as containing "nudity" and blocked, given their smut column on every page?(tags: daily-mail fail censorship censorware boing-boing michelangelo sculpture nudity uk politics filtering overblocking web internet)
-
Photoshop's "Content Aware Fill" applied to text. some very cool results
(tags: images cool art typography algorithms via:pentadact photoshop)
A Tour Inside CloudFlare's Latest Generation Servers
great transparency from CloudFront! Looking at their current 4th-gen rackmount server buildout -- now with HP after Dell and ZT. Shitloads of SSDs for lower power and greater predictability in failure rates. 128GB RAM. consistent hashing to address stores instead of RAID. Sandybridge chipset. Solarflare SFC9020 10Gbps network cards. This is really impressive openness for a high-scale custom datacenter server platform...
(tags: datacenter cloudflare hardware rackmount ssds intel)
3D-Printer Manufacturer Creates Software Filter To Prevent Firearm Printing
'[Create It REAL], which sells 3D printer component parts and software, recently announced that it has come up with a firearm component detection algorithm that will give 3D printers the option to block any gun parts. The software compares each component a user is trying to print with a database of potential firearms parts, and shuts down the modeling software if it senses the user is trying to make a gun.'
(tags: blocklists filtering guns weapons 3d-printing future firearms)
Fund it :: Upstart Granby Park
help fund Granby Park, a pop-up park to replace a vacant site on the corner of Dominick St and Parnell St in Dublin 1: http://upstart.ie/
(tags: fund-it granby-park dublin d1 parks pop-up city funding grassroots)
-
the details of Karsten Nohl's attack against SIM cards, allowing remote-root malware via SMS.
Cracking SIM update keys: [Over The Air] commands, such as software updates, are cryptographically-secured SMS messages, which are delivered directly to the SIM. While the option exists to use state-of-the-art AES or the somewhat outdated 3DES algorithm for OTA, many (if not most) SIM cards still rely on the 70s-era DES cipher. [...] To derive a DES OTA key, an attacker starts by sending a binary SMS to a target device. The SIM does not execute the improperly signed OTA command, but does in many cases respond to the attacker with an error code carrying a cryptographic signature, once again sent over binary SMS. A rainbow table resolves this plaintext-signature tuple to a 56-bit DES key within two minutes on a standard computer.
2 minutes. Sic transit gloria DES. The next step after that is to send a signed request to run a Java applet, then exploit a hole in the JVM sandbox, and the SIM card is rooted. Looking forward to the full paper on July 31st...(tags: des 3des crypto security sms sim-cards smartcards java applets ota rainbow-tables cracking karsten-nohl)
-
Cool. A machine-learning-generated TCP congestion control algorithm which handily beats sfqCoDel, Vegas, Reno et al. But:
"Although the [computer-generated congestion control algorithms] appear to work well on networks whose parameters fall within or near the limits of what they were prepared for -- even beating in-network schemes at their own game and even when the design range spans an order of magnitude variation in network parameters -- we do not yet understand clearly why they work, other than the observation that they seem to optimize their intended objective well. We have attempted to make algorithms ourselves that surpass the generated RemyCCs, without success. That suggests to us that Remy may have accomplished something substantive. But digging through the dozens of rules in a RemyCC and ?guring out their purpose and function is a challenging job in reverse-engineering. RemyCCs designed for broader classes of networks will likely be even more complex, compounding the problem." So are network engineers willing to trust an algorithm that seems to work but has no explanation as to why it works other than optimizing a specific objective function? As AI becomes increasingly successful the question could also be asked in a wider context.
(via Bill de hOra)(tags: via-dehora machine-learning tcp networking hmm mit algorithms remycc congestion)
Street Cuffs: L.A. Sees Big Jump In Bike Thefts
Some [LA] bike messengers last month took justice into their own hands when they caught two suspected thieves, teenage boys who attended a local Catholic high school. According to police, the messengers stripped down the teens to their boxer shorts before taking their cellphones, backpacks and clothes. “They meted out street justice. We don’t condone street justice. They never threatened them. But they made it clear: don’t mess with another person’s property,” Los Angeles Police Lt. Paul Vernon said. “This incident and the arrests are the tip of the iceberg when comes to people stealing bicycles.” Vernon said the two boys told police they were robbed by about 20 men on bicycles at 6th Street and Grand Avenue about 3 p.m. on Jan. 12. Investigators said they cannot prove the boys were stealing bikes and continue to look for the assailants.
(tags: cycling theft robbery bike-theft la crime vigilantes cycle-couriers)
ICO’s Tame Investigation Of Google Street View Data Slurping
“People will yet again be asking whether Google has been let off without the kind of full and rigorous investigation that you would expect after this kind of incident,” Nick Pickles, director of the Big Brother Watch, told TechWeekEurope. “Let’s not forget that information was collected without permission from thousands of people’s Wi-Fi networks, in a way that if an individual had done so they would have almost certainly have been prosecuted. It seems strange that ICO [the UK's Data Protection regulatory agency] did not want to inspect the [datacenter] cages housing the data, while it is also troubling that Google’s assurances were taken at face value, despite this not being the first incident where consumers have seen their privacy violated by the company.”
(tags: privacy google ico regulation data-protection snooping wifi sniffing network-traffic street-view)
-
'My researches on the pickling matter had lead me to conclude that Mexico was, in fact, one of the few places where pickled potatoes were “a thing” and, in discussing same with Lily last month at her Mexican food stall in the Honest To Goodness market, I discovered that her soon-to-be-visiting Mexican mama was, in fact, a maker of such pickles. Not long afterward, I watched as Lily sat down with her mother, querying the ways of her pickled potatoes, translating and scribbling instructions for me as the details were recalled, not in an orderly series of steps, but in a series of asides and by-the-ways, by one for whom the practice of pickling potatoes was entirely second nature.'
Porn to be Blocked in the UK – “What’s new?” Say Pirate Bay Users | TorrentFreak
It seems likely that the ISPs will implement a system similar to the one currently being used by TalkTalk, as the prime minister will specifically single the ISP out for praise in his speech. TalkTalk’s HomeSafe is a system which filters out URLs based on a remote blocklist provided and maintained by…. well, no one quite knows. This is worrying since when things don’t go quite to plan there’s no one to complain to. As previously reported, when TalkTalk customers are asked whether they want to block file-sharing sites, TorrentFreak.com is rendered inaccessible. Despite our pleas and complaints that we are a news resource, the company said it would not remove us from their blocklist. We doubt we’re the only ones being silenced.
(tags: talktalk blocking uk isps torrentfreak politics filtering david-cameron porn overblocking)
-
Good description of how Fog Creek built out their Trello product; client-side JS rendering, model synced across the wire, HAProxy, Redis, and WebSockets. Bookmarked notably for this paragraph, which doesn't ameliorate my fear of WebSockets as a tech:
The Socket.io server currently has some problems with scaling up to more than 10K [jm: oh dear] simultaneous client connections when using multiple processes and the Redis store, and the client has some issues that can cause it to open multiple connections to the same server, or not know that its connection has been severed.
(tags: websockets javascript architecture fog-creek trello ajax push)
Log4j 2: Performance close to insane
Nice writeup on Log4j 2's new AsyncAppender implementation, based on the LMAX Disruptor. sounds pretty excellent:
“One nice little detail I should mention is that both Async Loggers and Async Appenders fix something that has always bothered me in Log4j-1.x, which is that they will flush the buffer after logging the last event in the queue . With Log4j-1.x, if you used buffered I/O, you often could not see the last few log events, as they were still stuck in the memory buffer. Your only option was setting immediateFlush to true, which forces disk I/O on every single log event and has a performance impact. With Async Loggers and Appenders in Log4j-2.0 your log statements are all flushed to disk, so they are always visible, but this happens in a very efficient manner.”
(tags: logging java performance async disruptor low-latency)
-
an ultra low latency, high throughput, persisted, messaging and event driven in memory database. The typical latency is as low as 80 nano-seconds and supports throughputs of 5-20 million messages/record updates per second. This library also supports distributed, durable, observable collections (Map, List, Set) The performance depends on the data structures used, but simple data structures can achieve throughputs of 5 million elements or key/value pairs in batches (eg addAll or putAll) and 500K elements or key/values per second when added/updated/removed individually. It uses almost no heap, trivial GC impact, can be much larger than your physical memory size (only limited by the size of your disk) and can be shared between processes with better than 1/10th latency of using Sockets over loopback. It can change the way you design your system because it allows you to have independent processes which can be running or not at the same time (as no messages are lost) This is useful for restarting services and testing your services from canned data. e.g. like sub-microsecond durable messaging. You can attach any number of readers, including tools to see the exact state of the data externally.
(tags: library messaging performance java chronicle disk mmap)
-
a completely new patent pending product designed in Ireland that is going to change the way people use their cars for carrying goods. It is a solid plastic product that grips the carpet in your car and acts as a barrier to hold loose items securely against the side wall in your car trunk or boot.
Found out about this online -- a US-based acquaintance raving about them being worth the shipping from Ireland. nice work!
-
'the Linux container engine'. I totally misunderstood what Docker was -- this is cool.
Heterogeneous payloads: Any combination of binaries, libraries, configuration files, scripts, virtualenvs, jars, gems, tarballs, you name it. No more juggling between domain-specific tools. Docker can deploy and run them all. Any server: Docker can run on any x64 machine with a modern linux kernel - whether it's a laptop, a bare metal server or a VM. This makes it perfect for multi-cloud deployments. Isolation: Docker isolates processes from each other and from the underlying host, using lightweight containers. Repeatability: Because each container is isolated in its own filesystem, they behave the same regardless of where, when, and alongside what they run.
(tags: lxc containers virtualization cloud ops linux docker deployment)
Next Generation Continuous Integration & Deployment with dotCloud’s Docker and Strider
Since Docker treats it’s images as a tree of derivations from a source image, you have the ability to store an image at each stage of a build. This means we can provide full binary images of the environment in which the tests failed. This allows you to run locally bit-for-bit the same container as the CI server ran. Due to the magic of Docker and AUFS Copy-On-Write filesystems, we can store this cheaply. Often tests pass when built in a CI environment, but when built in another (e.g. production) environment break due to subtle differences. Docker makes it trivial to take exactly the binary environment in which the tests pass, and ship that to production to run it.
(tags: docker strider continuous-integration continuous-deployment deployment devops ops dotcloud lxc virtualisation copy-on-write images)
Pinterest's follower graph store, built on Redis
This is a good, high-availability Redis configuration; sharded by userid across 8192 shards, with a Redis master/slave pair of instances for each set of N shards. I like their use of two redundancy systems -- hot slave and backup snapshots:
We run our cluster in a Redis master-slave configuration, and the slaves act as hot backups. Upon a master failure, we failover the slave as the new master and either bring up a new slave or reuse the old master as the new slave. We rely on ZooKeeper to make this as quick as possible. Each master Redis instance (and slave instance) is configured to write to AOF on Amazon EBS. This ensures that if the Redis instances terminate unexpectedly then the loss of data is limited to 1 second of updates. The slave Redis instances also perform BGsave hourly which is then loaded to a more permanent store (Amazon S3). This copy is also used by Map Reduce jobs for analytics. As a production system, we need many failure modes to guard ourselves. As mentioned, if the master host is down, we will manually failover to slave. If a single master Redis instance reboots, monit restart restores from AOF, implying a 1 second window of data loss on the shards on that instance. If the slave host goes down, we bring up a replacement. If a single slave Redis instance goes down, we rely on monit to restart using the AOF data. Because we may encounter AOF or BGsave file corruption, we BGSave and copy hourly backups to S3. Note that large file sizes can cause BGsave induced delays but in our cluster this is mitigated by smaller Redis data due to the sharding scheme.
(tags: graph redis architecture ha high-availability design redundancy sharding)
-
'A simple time-decaying approximate membership filter' -- like a Bloom filter with time decay. See also http://eng.42go.com/flower-filter-an-update/ for some notes on the non-independence of survival probabilities, and how that imposes negligible differences in practice.
(tags: bloom-filter algorithms coding probabilistic approximate time decay)
-
This is brilliant. 'covert bicycle GPS tracker; Notifies you by SMS if your bicycle moves; Online tracking'. 'Spybike is a covert tracking device that is hidden inside your bicycle steerer tube. The device is disguised to look like a normal head set cap to avoid suspicion. If someone steals your bike, you can use SpyBike to track their movements online and on your mobile.' More details: http://www.integratedtrackers.com/GPSTrack/pdf/Spybike_Instructions_2.pdf
No Time To Spare [infographic]
'On August 2, 2005, a fully-loaded Air France Airbus A340 arriving from Paris crash-landed at Toronto's Pearson International Airport and caught fire. Only 4 of the 8 exits were usable, yet all 309 people on board made it off the aircraft in two minutes, before it was consumed by flames. Here, five of the passengers recount their escape.'
(tags: infographics travel air accidents fire airbus safety escape a340)
Merkel call for data protection rules puts Ireland in spotlight - Technology News
Irish Times on EU unhappiness with Ireland's "light touch" data protection regime:
Hawkes’s appearance last month on RTÉ’s Morning Ireland regarding the US Prism surveillance programme, since posted to YouTube, reheated lingering resentment among many European data authorities. His admission that he “knew in a general way” about such programmes and didn’t “regard this particular revelation as particularly new” was a red rag to his European colleagues who fear Ireland is the transmission point of wholesale EU data to the US.
(tags: eu ireland data-protection privacy billy-hawkes regulation dpc)
Java Garbage Collection Distilled
a great summary of the state of JVM garbage collection from Martin Thompson
(tags: jvm java gc garbage-collection tuning memory performance martin-thompson)
Improved HTTPS Performance with Early SSL Termination
This is a neat hack. Since SSL/TLS connection establishment requires lots of consecutive round trips before the connection is ready, by performing that closer to the user and reusing an existing region-to-region connection behind the scenes, the overall latency is greatly improved. Works for HTTP as well
(tags: http https ssl architecture aws ec2 performance latency internet round-trip nginx tls)
-
Locking down a webapp with current strict HTTPS policies.
It’s impossible to get to 100% security but there are steps you can take to secure your webapp for your users, to help mitigate against different types of attacks both against you, your webapp and your customers themselves. These are all things we’ve implemented with Server Density v2 to help harden the product as much as possible. These tips are in addition to security best practices such as protecting against SQL injection, filtering, session handling, and XSRF protection. Check out the OWASP cheat sheets and top 10 lists to ensure you’re covered for the basics before implementing the suggestions below.
Breakthrough silicon scanning discovers backdoor in military chip [PDF]
Wow, I'd missed this:
This paper is a short summary of the ?rst real world detection of a backdoor in a military grade FPGA. Using an innovative patented technique we were able to detect and analyse in the ?rst documented case of its kind, a backdoor inserted into the Actel/Microsemi ProASIC3 chips for accessing FPGA con?guration. The backdoor was found amongst additional JTAG functionality and exists on the silicon itself, it was not present in any ?rmware loaded onto the chip. Using Pipeline Emission Analysis (PEA), our pioneered technique, we were able to extract the secret key to activate the backdoor, as well as other security keys such as the AES and the Passkey. This way an attacker can extract all the con?guration data from the chip, reprogram crypto and access keys, modify low-level silicon features, access unencrypted con?guration bitstream or permanently damage the device. Clearly this means the device is wide open to intellectual property (IP) theft, fraud, re-programming as well as reverse engineering of the design which allows the introduction of a new backdoor or Trojan. Most concerning, it is not possible to patch the backdoor in chips already deployed, meaning those using this family of chips have to accept the fact they can be easily compromised or will have to be physically replaced after a redesign of the silicon itself.
(tags: chips hardware backdoors security scanning pea jtag actel microsemi silicon fpga trojans)
-
Privacy advocates have slammed Wyndham council for spying on residents’ mobile phone data and email records almost 50 times in the past three years, “not to hunt down terrorists but to catch litterbugs and owners of unregistered pets”. Figures from the attorney-general’s department reveal Wyndham is the only Victorian council that has been snooping on personal data, seizing residents’ information 31 times during 2010-11 and 2011-12. Council’s acting chief executive Kelly Grigsby told the Weekly there had been another 18 authorisations in the past 12 months to chase people for unauthorised advertising, unregistered pets and illegal littering.
(tags: victoria australia oz privacy snooping data-retention metadata overreach)
Traditional AQM is not enough!
Jim Gettys on modern web design, HTTP, buffering, and FIFO queues in the network.
Web surfing is putting impulses of packets, without congestion avoidance, into FIFO queues where they do severe collateral damage to anything sharing the link (including itself!). So today’s web behavior incurs huge collateral damage on itself, data centers, the edge of the network, and in particular any application that hopes to have real time behavior. How do we solve this problem?
tl;dr: fq_codel. Now I want it!(tags: buffering networking internet web http protocols tcp bufferbloat jim-gettys codel fq_codel)
We interrupt this program to warn the Emergency Alert System is hackable | Ars Technica
Private SSH key included in a firmware update. Oh dear:
The US Emergency Alert System, which interrupts live TV and radio broadcasts with information about national emergencies in progress, is vulnerable to attacks that allow hackers to remotely disseminate bogus reports and tamper with gear, security researchers warned. The remote takeover vulnerability affects the DASDEC-I and DASDEC-II application servers made by a company called Digital Alert Systems. It stems from the a recent firmware update that mistakenly included the private secure shell (SSH) key, according to an advisory published Monday by researchers from security firm IOActive. Administrators use such keys to remotely log in to a server to gain unfettered "root" access. The publication of the key makes it trivial for hackers to gain unauthorized access on Digital Alert System appliances that run default settings on older firmware. "An attacker who gains control of one or more DASDEC systems can disrupt these stations' ability to transmit and could disseminate false emergency information over a large geographic area," the IOActive advisory warned. "In addition, depending on the configuration of this and other devices, these messages could be forwarded and mirrored by other DASDEC systems."
-
Good read.
Twitter is primarily a consumption mechanism, not a production mechanism. 300K QPS are spent reading timelines and only 6000 requests per second are spent on writes.
* their approach of precomputing the timeline for the non-search case is a good example of optimizing for the more frequently-exercised path. * MySQL and Redis are the underlying stores. Redis is acting as a front-line in-RAM cache. they're pretty happy with it: https://news.ycombinator.com/item?id=6011254 * these further talks go into more detail, apparently (haven't watched them yet): http://www.infoq.com/presentations/Real-Time-Delivery-Twitter http://www.infoq.com/presentations/Twitter-Timeline-Scalability http://www.infoq.com/presentations/Timelines-Twitter * funny thread of comments on HN, from a big-iron fan: https://news.ycombinator.com/item?id=6008228(tags: scale architecture scalability twitter high-scalability redis mysql)
Lightning Memory-Mapped Database
Sounds like a good potential replacement for Berkeley DB, at least for cases where LevelDB isn't proving practical.
LMDB is a database storage engine similar to LevelDB or BDB which database authors often use as a base for building databases on top of. LMDB was designed as a replacement for BDB within the OpenLDAP project but it has been pretty useful to use with other databases as well. It’s API design is highly influenced by BDB so that replacing BDB is straightforward.
Licensed under the OpenLDAP Public License (is that BSDish?)(tags: openldap lmdb databases bdb berkeley-db storage persistence oss open-source)
ssh - fabric appears to start apache2 but doesn't - Stack Overflow
fabric fail. pty=False fixes the bug
'Copysets: Reducing the Frequency of Data Loss in Cloud Storage' [paper]
An improved replica-selection algorithm for replicated storage systems.
We present Copyset Replication, a novel general purpose replication technique that signi?cantly reduces the frequency of data loss events. We implemented and evaluated Copyset Replication on two open source data center storage systems, HDFS and RAMCloud, and show it incurs a low overhead on all operations. Such systems require that each node’s data be scattered across several nodes for parallel data recovery and access. Copyset Replication presents a near optimal tradeoff between the number of nodes on which the data is scattered and the probability of data loss. For example, in a 5000-node RAMCloud cluster under a power outage, Copyset Replication reduces the probability of data loss from 99.99% to 0.15%. For Facebook’s HDFS cluster, it reduces the probability from 22.8% to 0.78%.
(tags: storage cloud-storage replication data reliability fault-tolerance copysets replicas data-loss)
-
'principles, patterns, smells and guidelines for clean code, class and package design, TDD, Acceptance Test Driven Development, and CI'
(tags: clean-code code-smells coding tdd testing continous-integration patterns pdf)
-
'Over time, the probability of someone drawing a cock with your [user-generated content] app approaches one.'
(tags: cocks time-to-penis user-generated-content content ugc via:rob-manuel qwghlm funny applegates-law web b3ta lol)
-
Nice d3.js demo of the fat-tailed distribution:
A fat-tailed distribution looks normal but the parts far away from the average are thicker, meaning a higher chance of huge deviations. [...] Fat tails don't mean more variance; just different variance. For a given variance, a higher chance of extreme deviations implies a lower chance of medium ones.
(tags: dataviz via:hn statistics visualization distributions fat-tailed kurtosis d3.js javascript variance deviation)
Google Cloud Messaging for Android
GCM is a service that allows you to send data from your server to your users' Android-powered device, and also to receive messages from devices on the same connection. The GCM service handles all aspects of queueing of messages and delivery to the target Android application running on the target device. GCM is completely free no matter how big your messaging needs are, and there are no quotas.
packetdrill - network stack testing tool
[Google's] packetdrill scripting tool enables quick, precise tests for entire TCP/UDP/IPv4/IPv6 network stacks, from the system call layer down to the NIC hardware. packetdrill currently works on Linux, FreeBSD, OpenBSD, and NetBSD. It can test network stack behavior over physical NICs on a LAN, or on a single machine using a tun virtual network device.
(tags: testing networking tun google linux papers tcp ip udp freebsd openbsd netbsd)
the TCP bounded buffer deadlock problem
I've wound up mentioning this twice in the past week, so it's worth digging up and bookmarking!
Under certain circumstances a TCP connection can end up in a "deadlock", where neither the client nor the server is able to write data out or read data in. This is caused by two factors. First, a client or server cannot perform two transactions at once; a read cannot be performed if a write transaction is in progress, and vice versa. Second, the buffers that exist at either end of the TCP connection are of limited size. The deadlock occurs when both the client and server are trying to send an amount of data that is larger than the combined input and output buffer size.
(tags: tcp ip bounded-buffer deadlock bugs buffering connections distributed-systems)
An excellent writeup of the TCP bounded-buffer deadlock problem
on pages 146-149 of 'TCP/IP Sockets in C: Practical Guide for Programmers' by Michael J. Donahoo and Kenneth L. Calvert.
(tags: tcp ip bounded-buffer deadlock bugs buffering connections distributed-systems)
How The Copyright Industry Pushed For Internet Surveillance | TorrentFreak
Rick Falkvinge with a good point:
The reason for the copyright industry to push for surveillance is simple: any digital communications channel can be used for private conversation, but it can also be used to share culture and knowledge that is under copyright monopoly. In order to tell which communications is which, you must sort all of it – and to do that, you must look at all of it. In other words, if enforcing the copyright monopoly is your priority, you need to kill privacy, and specifically anonymity and secrecy of correspondence.
This was exactly my biggest worry -- a side-effect of effective copyright filtering is the creation of infrastructure for online oppression by the state.(tags: copyright privacy state data-protection rick-falkvinge copyfight internet filtering surveillance anonymity)
Aer Lingus set to resume flights to San Francisco from Dublin
Yay!
Google, Apple and Facebook have persuaded Aer Lingus to reopen the San Francisco to Dublin route, according to sources in the US. The technology giants have their European headquarters in Dublin and their American bases in San Francisco. According to insiders, Aer Lingus will make an announcement soon having received assurances that Silicon Valley companies will take up seats.
(tags: flights travel ireland san-francisco sf aer-lingus)
Comics For Children…. a visual list…. | The Forbidden Planet International Blog
some great recommendations here. Hildafolk has been popular with my 5-year-old, must pick up a few more
(tags: comics kids children books reading library toget toread)
_Measuring Mobile Web Performance_ [slides]
Notable slide is #13, displaying a graph of HSDPA packet RTTs measured from a train. Max RTT gets up to 20,266ms. ouch
(tags: rtt packets latency hsdpa mobile internet trains packet-loss)
Latest leak of EU Data Protection Regulation makes fines impossible
Well, isn't this convenient. The leaked proposed regulation document from the Irish EU presidency contains the following changes from current law:
what is new is a set of prescriptive conditions which, if adopted, appears to make a Monetary Penalty Notice (MPN) almost impracticable to serve. This is because the [Data Protection] Commissioner would have consider a dozen factors (many of which will give no doubt rise to appeal). [...] In addition, the fines in the Regulation require consideration of the actual damage caused; this compares unfavourably with the current MPN where large fines have been contingent on grave security errors on the part of the data controller (i.e. the MPN of the UK DPA does not need damage to data subjects – only the likelihood of substantial distress or damage which should have been preventable/foreseeable).
(tags: data-protection law eu ec ireland privacy fines regulation mpn)
Google Translate of "Lorem ipsum"
The perils of unsupervised machine learning... here's what GTranslate reckons "lorem ipsum" translates to:
We will be sure to post a comment. Add tomato sauce, no tank or a traditional or online. Until outdoor environment, and not just any competition, reduce overall pain. Cisco Security, they set up in the throat develop the market beds of Cura; Employment silently churn-class by our union, very beginner himenaeos. Monday gate information. How long before any meaningful development. Until mandatory functional requirements to developers. But across the country in the spotlight in the notebook. The show was shot. Funny lion always feasible, innovative policies hatred assured. Information that is no corporate Japan
(tags: lorem-ipsum boilerplate machine-learning translation google translate probabilistic tomato-sauce cisco funny)
-
Write heavy, high performance applications should probably use RAID 0 or avoid RAID altogether and consider using a larger n_val and cluster size. Read heavy applications have more options, and generally demand more fault tolerance with the added benefit of easier hardware replacement procedures.
Good to see official guidance on this (via Bill de hOra)(tags: via:dehora riak cluster fault-tolerance raid ops)
-
Facebook’s new erasure coding algorithm (via High Scalability).
Disk I/O and network traffic were reduced by half compared to RS codes. The LRC required 14% more storage than RS (ie. 60% of data size). Repair times were much lower thanks to the local repair codes. Much greater reliability thanks to fast repairs. Reduced network traffic makes them suitable for geographic distribution.
(tags: erasure-coding facebook redundancy repair algorithms papers via:highscalability data storage fault-tolerance)
Boundary's Early Warnings alarm
Anomaly detection on network throughput metrics, alarming if throughputs on selected flows deviate by 1, 2, or 3 standard deviations from a historical baseline.
(tags: network-monitoring throughput boundary service-metrics alarming ops statistics)
My email to Irish Times Editor, sent 25th June
Daragh O'Brien noting 3 stories on 3 consecutive days voicing dangerously skewed misinformation about data protection and privacy law in Ireland:
There is a worrying pattern in these stories. The first two decry the Data Protection legislation (current and future) as being dangerous to children and damaging to the genealogy trade. The third sets up an industry “self-regulation” straw man and heralds it as progress (when it is decidedly not, serving only to further confuse consumers about their rights). If I was a cynical person I would find it hard not to draw the conclusion that the Irish Times, the “paper of record” has been stooged by organisations who are resistant to the defence of and validation of fundamental rights to privacy as enshrined in the Data Protection Acts and EU Treaties, and in the embryonic Data Protection Regulation. That these stories emerge hot on the heels of the pendulum swing towards privacy concerns that the NSA/Prism revelations have triggered is, I must assume, a co-incidence. It cannot be the case that the Irish Times blindly publishes press releases without conducting cursory fact checking on the stories contained therein? Three stories over three days is insufficient data to plot a definitive trend, but the emphasis is disconcerting. Is it the Irish Times’ editorial position that Data Protection legislation and the protection of fundamental rights is a bad thing and that industry self-regulation that operates in ignorance of legislation is the appropriate model for the future? It surely cannot be that press releases are regurgitated as balanced fact and news by the Irish Times without fact checking and verification? If I was to predict a “Data Protection killed my Puppy” type headline for tomorrow’s edition or another later this week would I be proved correct?
(tags: daragh-obrien irish-times iab bias advertising newspapers press-releases journalism data-protection privacy ireland)
_Bolt-On Causal Consistency_ [slides]
SIGMOD 2013 presentation from Peter Bailis, Ali Ghodsi, Joseph M. Hellerstein, Ion Stoica -- adding consistency to an eventually-consistent store by tracking dependencies
(tags: eventual-consistency state cap-theorem storage peter-bailis)
-
Over the last couple of years, we have built and deployed a reliable publish-subscribe system called Wormhole. Wormhole has become a critical part of Facebook's software infrastructure. At a high level, Wormhole propagates changes issued in one system to all systems that need to reflect those changes – within and across data centers.
Facebook's Kafka-alike, basically, although with some additional low-latency guarantees. FB appear to be using it for multi-region and multi-AZ replication. Proprietary.(tags: pub-sub scalability facebook realtime low-latency multi-region replication multi-az wormhole)
-
Turns out gnuplot has a pretty readable ASCII terminal rendering mode; combined with 'watch' it makes for a nifty graphing one-liner
(tags: gnuplot plotting charts graphs cli command-line unix gnu hacks dataviz visualization ascii)
(oh look, a proper blog post!)
JMX is the de-facto standard in the Java and JVM-based world for exposing service metrics, and feeds nicely to tools like Graphite using JMXTrans and others. However, it's pretty obtuse and over-complex, and it can be hard to figure out what path the JMX metrics will show up under once deployed.
Unfortunately, once a JVM-based service is deployed to EC2, it becomes very difficult to use jconsole to connect to it, due to deficiencies and crappy design in the JMX RMI protocol (I love the way they reinvented the broken parts of IIOP in that respect). Don't even bother; instead, use jmxsh: https://code.google.com/p/jmxsh/ .
To use this, you need to modify the service process' command line to include the following JVM args, so that the remote JMX API is exposed:
-Dcom.sun.management.jmxremote -Dcom.sun.management.jmxremote.port=16660 -Dcom.sun.management.jmxremote.local.only=true -Dcom.sun.management.jmxremote.authenticate=false -Dcom.sun.management.jmxremote.ssl=false
Change the port number if there is already a process running on that port. Ensure the port isn't accessible from off-host; in EC2, this should be safe enough to use once that port number is not in the EC2 security group.
Go to https://code.google.com/p/jmxsh/downloads/list and download the latest jmxsh-FOO.jar; e.g. 'wget https://jmxsh.googlecode.com/files/jmxsh-R5.jar'. Then on the host, as the UID the service is running under, run: 'java -jar jmxsh-R5.jar -h 127.0.0.1 -p 16660'. You can then hit "Enter" to go into "Browse Mode", and you'll get text menus like this:
====================================================
Attribute List:
1. -r- long MaxFileDescriptorCount
2. -r- long OpenFileDescriptorCount
3. -r- long CommittedVirtualMemorySize
4. -r- long FreePhysicalMemorySize
5. -r- long FreeSwapSpaceSize
6. -r- long ProcessCpuTime
7. -r- long TotalPhysicalMemorySize
8. -r- long TotalSwapSpaceSize
9. -r- String Name
10. -r- int AvailableProcessors
11. -r- String Arch
12. -r- double SystemLoadAverage
13. -r- String Version
SERVER: service:jmx:rmi:///jndi/rmi://127.0.0.1:16660/jmxrmi
DOMAIN: java.lang
MBEAN: java.lang:type=OperatingSystem
====================================================
Navigate through the MBean tree looking for good Attributes which would make good metrics (5 in the list above, for example). Note the MBean and the Attribute names.
Liberty issues claim against British Intelligence Services over PRISM and Tempora privacy scandal
James Welch, Legal Director for Liberty, said: “Those demanding the Snoopers’ Charter seem to have been indulging in out-of-control snooping even without it – exploiting legal loopholes and help from Uncle Sam. “No-one suggests a completely unpoliced internet but those in power cannot swap targeted investigations for endless monitoring of the entire globe.”
Go Liberty! Take note, ICCL, this is how a civil liberties group engages with internet issues.(tags: prism nsa gchq surveillance liberty civil-liberties internet snooping)
-
A command-line utility in Ruby to perform (a) OLAP cubing and (b) histogramming, given whitespace-delimited line data
(tags: ruby olap number-crunching data histograms cli)
'If I was your cloud provider, I'd never let you down'
This is the thing that's put me off Joyent. They make claims like this one from October 2012:
We’ve given our other partners 99.9999% uptime.
This despite a 10-day outage of their BingoDisk and Strongspace storage services in January 2008, 1734 days previously (http://www.datacenterknowledge.com/archives/2008/01/21/joyent-services-back-after-8-day-outage/). If you assume that is the only outage they've had since then, that works out as 99.4% uptime. Quite a few less nines...-
Good UI for exploration of HyperLogLog set intersections and unions.
One of the first things that we wanted to do with HyperLogLog when we first started playing with it was to support and expose it natively in the browser. The thought of allowing users to directly interact with these structures -- perform arbitrary unions and intersections on effectively unbounded sets all on the client -- was exhilarating to us. [...] we are pleased to announce the open-source release of AK’s HyperLogLog implementation for JavaScript, js-hll. We are releasing this code under the Apache License, Version 2.0. We knew that we couldn’t just release a bunch of JavaScript code without allowing you to see it in action — that would be a crime. We passed a few ideas around and the one that kept bubbling to the top was a way to kill two birds with one stone. We wanted something that would showcase what you can do with HLL in the browser and give us a tool for explaining HLLs. It is typical for us to explain how HLL intersections work using a Venn diagram. You draw some overlapping circles with a border that represents the error and you talk about how if that border is close to or larger than the intersection then you can’t say much about the size of that intersection. This works just ok on a whiteboard but what you really want is to just build a visualization that allows you to select from some sets and see the overlap. Maybe even play with the precision a little bit to see how that changes the result. Well, we did just that!
(tags: javascript ui hll hyperloglog algorithms sketching js sets intersection union apache open-source)
Sketch of the Day: K-Minimum Values
Another sketching algorithm -- this one supports set union and intersection operations more easily than HyperLogLog when there are more than 2 sets
(tags: algorithms coding space-saving cardinality streams stream-processing estimation sets sketching)
Skype's principal architect explains why they no longer have end-to-end crypto
Mobile devices can't handle the CPU and constantly-online requirements, and an increased reliance on dedicated routing supernodes to avoid Windows-client monoculture and p2p network fragility (via the IP list, via kragen)
(tags: skype p2p mobile architecture networking internet snooping crypto via:ip via:kragen phones windows)
Accuweather long-range forecast accuracy questionable
"questionable" is putting it mildly:
Now to to the point: Are the 25-day forecasts any good? In a word, no. Specifically, after running this data, I would not trust a forecast high temperature more than a week out. I’d rather look at the normal (historical average) temperature for that day than the forecast. Similarly, I would not even look at a precipitation forecast more than 6 days in advance, and I wouldn’t start to trust it for anything important until about 3 days ahead of time.
(tags: accuweather accuracy fail graphs data weather forecasting philadelphia)
Setting up Perfect Forward Secrecy for nginx or stud
Matt Sergeant writes up a pretty solid HOWTO:
There has been a lot of discussion recently about Perfect Forward Secrecy (PFS) and the benefits it can bring you, especially in terms of any kind of traffic sniffing attack. Unfortunately setting this up I found very few guides telling you exactly what you need to do. The downside to PFS [via ECDHE] is that it uses more CPU power than other ciphers. This is a trade-off between security and cost.
(tags: ecdhe elliptic-curve crypto pfs ssl tls howto nginx stud)
Java Concurrent Counters By Numbers
threadsafe counters in the JVM compared. AtomicLong, Doug Lea's LongAdder, a ThreadLocal counter, and a field-on-the-Thread-object counter int (via Darach Ennis). Nitsan's posts on concurrency are fantastic
-
Tic-Tac-Toe Inception. whoa
(tags: games tic-tac-toe inception recursion boardgames via:fp)
-
a high-performance C server which is used to expose HyperLogLog sets and operations over them to networked clients. It uses a simple ASCII protocol which is human readable, and similar to memcached. HyperLogLog's are a relatively new sketching data structure. They are used to estimate cardinality, i.e. the unique number of items in a set. They are based on the observation that any bit in a "good" hash function is indepedenent of any other bit and that the probability of getting a string of N bits all set to the same value is 1/(2^N). There is a lot more in the math, but that is the basic intuition. What is even more incredible is that the storage required to do the counting is log(log(N)). So with a 6 bit register, we can count well into the trillions. For more information, its best to read the papers referenced at the end. TL;DR: HyperLogLogs enable you to have a set with about 1.6% variance, using 3280 bytes, and estimate sizes in the trillions.
(via:cscotta)(tags: hyper-log-log hlld hll data-structures memcached daemons sketching estimation big-data cardinality algorithms via:cscotta)
-
'The TLS handshake has multiple variations, but let’s pick the most common one – anonymous client and authenticated server (the connections browsers use most of the time).' Works out to 4 packets, in addition to the TCP handshake's 3, and about 6.5k bytes on average.
(tags: network tls ssl performance latency speed networking internet security packets tcp handshake)
McLibel leaflet was co-written by undercover police officer Bob Lambert | UK news | guardian.co.uk
The true identity of one of the authors of the "McLibel leaflet" is Bob Lambert, a police officer who used the alias Bob Robinson in his five years infiltrating the London Greenpeace group. [...] McDonald's famously sued green campaigners over the roughly typed leaflet, in a landmark three-year high court case, that was widely believed to have been a public relations disaster for the corporation. Ultimately the company won a libel battle in which it spent millions on lawyers. Lambert was deployed by the special demonstration squad (SDS) – a top-secret Metropolitan police unit that targeted political activists between 1968 until 2008, when it was disbanded. He co-wrote the defamatory six-page leaflet in 1986 – and his role in its production has been the subject of an internal Scotland Yard investigation for several months. At no stage during the civil legal proceedings brought by McDonald's in the 1990s was it disclosed that a police infiltrator helped author the leaflet.
(tags: infiltration police mcdonalds libel greenpeace bob-lambert undercover 1980s uk-politics)
Project Voldemort: measuring BDB space consumption
HOWTO measure this using the BDB-JE command line tools. this is exposed through JMX as the CleanerBacklog metric, too, I think, but good to bookmark just in case
(tags: voldemort cleaner bdb ops space storage monitoring debug)
rendering pcm with simulated phosphor persistence
This is something readily applicable to display of sampled time-series metric data -- it really makes regular patterns visible (and is nicely retro to boot).
When PCM waveforms and similar function plots are displayed on screen, computational speed is often preferred over beauty and information content. For example, Audacity only draws the local maximum envelope amplitude and (what appears to be) RMS power when zoomed out, and when zoomed in, displays a very straightforward linear interpolation between samples. Analogue oscilloscopes, on the other hand, do things differently. An electron beam scans a phosphor screen at a constant X velocity, lighting a dot everywhere it hits. The dot brightness is proportional to the time the electron beam was directed at it. Because the X speed of the beam is constant and the Y position is modulated by the waveform, brightness gives information about the local derivative of the function. Now how cool is that? It looks like an X-ray of the signal. We can see right away that the beep is roughly a square wave, because there's light on top and bottom of the oscillation envelope but mostly darkness in between. Minute changes in the harmonic content are also visible as interesting banding and ribbons.
(via an _amazing_ kragen post on ghetto electronics)(tags: via:kragen pcm waveforms oscilloscopes analog analogue dataviz time-series waves ui phosphor retro)
stuff Google has learned from their hiring data
A. On the hiring side, we found that [interview] brainteasers are a complete waste of time. How many golf balls can you fit into an airplane? How many gas stations in Manhattan? A complete waste of time. They don’t predict anything. They serve primarily to make the interviewer feel smart. Instead, what works well are structured behavioral interviews, where you have a consistent rubric for how you assess people, rather than having each interviewer just make stuff up. Behavioral interviewing also works — where you’re not giving someone a hypothetical, but you’re starting with a question like, “Give me an example of a time when you solved an analytically difficult problem.” The interesting thing about the behavioral interview is that when you ask somebody to speak to their own experience, and you drill into that, you get two kinds of information. One is you get to see how they actually interacted in a real-world situation, and the valuable “meta” information you get about the candidate is a sense of what they consider to be difficult.
This makes sense, and matches what I learned in Amazon. Bad news for Microsoft though! (Correction: Adam Shostack got in touch to note that MS haven't done this for 10+ years either.)Also, I like this:
A. One of the things we’ve seen from all our data crunching is that G.P.A.’s are worthless as a criteria for hiring, and test scores are worthless — no correlation at all except for brand-new college grads, where there’s a slight correlation. Google famously used to ask everyone for a transcript and G.P.A.’s and test scores, but we don’t anymore, unless you’re just a few years out of school. We found that they don’t predict anything. What’s interesting is the proportion of people without any college education at Google has increased over time as well. So we have teams where you have 14 percent of the team made up of people who’ve never gone to college.
(tags: google hiring interviewing interviews brainteasers gpa microsoft star amazon)
Java Garbage Collection Distilled
Martin Thompson lays it out:
Serial, Parallel, Concurrent, CMS, G1, Young Gen, New Gen, Old Gen, Perm Gen, Eden, Tenured, Survivor Spaces, Safepoints, and the hundreds of JVM start-up flags. Does this all baffle you when trying to tune the garbage collector while trying to get the required throughput and latency from your Java application? If it does then don’t worry, you are not alone. Documentation describing garbage collection feels like man pages for an aircraft. Every knob and dial is detailed and explained but nowhere can you find a guide on how to fly. This article will attempt to explain the tradeoffs when choosing and tuning garbage collection algorithms for a particular workload.
(tags: gc java garbage-collection coding cms g1 jvm optimization)
-
Appalled by mass surveillance scandals? So are we. We’re doing something about it – and you can too. In 2006 we started a case challenging Irish and European laws that require your mobile phone company and ISP to monitor your location, your calls, your texts and your emails and to store that information for up to two years. That case has now made it to the European Court of Justice and will be heard on July 9th. If we are successful, it will strike down these laws for all of Europe and will declare illegal this type of mass surveillance of the entire population. Here’s where you come in. You can take part by: making a donation to help us pay for the expenses we incur; following our updates and keeping abreast of the issues; spreading the word on social media. With your help, we can strike a blow for the privacy of all citizens.
(tags: activism privacy politics ireland dri digital-rights data-protection data-retention)
3-D Printer Brings Dexterity To Children With No Fingers
'A South African man who lost part of his hand in a home carpentry accident and an American puppeteer he met via YouTube have teamed up to make 3D-printable hands for children who have no fingers. So far, over 100 children have been given "robohands" for free, and a simplified version released just yesterday snaps together like LEGO bricks and costs just $5 in materials.' This is incredible. Check out the video of Liam and his robohand in action: http://www.youtube.com/watch?v=kB53-D_N8Uc
(tags: 3d-printing 3d makers robohands hands prosthetics future youtube via:gruverja)
Open Rights Group - EU Commission caved to US demands to drop anti-PRISM privacy clause
Reports this week revealed that the US successfully pressed the European Commission to drop sections of the Data Protection Regulation that would, as the Financial Times explains, “have nullified any US request for technology and telecoms companies to hand over data on EU citizens. The article [...] would have prohibited transfers of personal information to a third country under a legal request, for example the one used by the NSA for their PRISM programme, unless “expressly authorized by an international agreement or provided for by mutual legal assistance treaties or approved by a supervisory authority.” The Article was deleted from the draft Regulation proper, which was published shortly afterwards in January 2012. The reports suggest this was due to intense pressure from the US. Commission Vice-President Viviane Reding favoured keeping the the clause, but other Commissioners seemingly did not grasp the significance of the article.
(tags: org privacy us surveillance fisaaa viviane-reding prism nsa ec eu data-protection)
Verified by Visa and MasterCard SecureCode kill 10-12% of your business
As Chris Shiflett noted: not only are they bad for security, they're bad for business too.
12 percent of users consider abandoning [an online shopping transaction] when they see either the Verified by Visa or the American Express SafeKey logos, while 10 percent will consider abandoning when the see the MasterCard Secure card logo.
(tags: ecommerce vbv online-shopping mastercard visa securecode security fail)
The Cold Hard Facts of Freezing to Death
an amazing account of near-death from hypothermia (via Dor)
(tags: via:dor hypothermia cold medicine science non-fiction)
Atelier olschinsky - "Cities III 05"
Fine Art Print on Hahnemuehle Photo Rag Bright White, 310g: 40x50cm up to 70x100cm. Some great art based on decayed urban landscape shots, from a Vienna-based design studio. See also http://english.mashkulture.net/2011/10/17/atelier-olschinsky-cities-iii/ , http://www.mascontext.com/tag/atelier-olschinsky/
(tags: olschinsky cities urban decay landscape art prints want)
Possible ban on 'factory food' in French restaurants
I am very much in favour of this in Ireland, too. The pre-prepared food thing makes for crappy food:
In an attempt to crack down on the proliferation of restaurants serving boil-in-a-bag or microwave-ready meals, which could harm France’s reputation for good food, MP Daniel Fasquelle is putting a new law to parliament this month. [...] The proposed law would limit the right to use the word “restaurant” to eateries where food is prepared on site using raw ingredients, either fresh or frozen. Exceptions would be made for some prepared products, such as bread, charcuterie and ice cream.
(tags: restaurants food france cuisine boil-in-the-bag microwave cooking daniel-fasquelle)
-
great, comprehensive review of the language, its pros and misfeatures, from Bill de hOra
Introducing Kale « Code as Craft
Etsy have implemented a tool to perform auto-correlation of service metrics, and detection of deviation from historic norms:
at Etsy, we really love to make graphs. We graph everything! Anywhere we can slap a StatsD call, we do. As a result, we’ve found ourselves with over a quarter million distinct metrics. That’s far too many graphs for a team of 150 engineers to watch all day long! And even if you group metrics into dashboards, that’s still an awful lot of dashboards if you want complete coverage. Of course, if a graph isn’t being watched, it might misbehave and no one would know about it. And even if someone caught it, lots of other graphs might be misbehaving in similar ways, and chances are low that folks would make the connection. We’d like to introduce you to the Kale stack, which is our attempt to fix both of these problems. It consists of two parts: Skyline and Oculus. We first use Skyline to detect anomalous metrics. Then, we search for that metric in Oculus, to see if any other metrics look similar. At that point, we can make an informed diagnosis and hopefully fix the problem.
It'll be interesting to see if they can get this working well. I've found it can be tricky to get working with low false positives, without massive volume to "smooth out" spikes caused by normal activity. Amazon had one particularly successful version driving severity-1 order drop alarms, but it used massive event volumes and still had periodic false positives. Skyline looks like it will alarm on a single anomalous data point, and in the comments Abe notes "our algorithms err on the side of noise and so alerting would be very noisy."(tags: etsy monitoring service-metrics alarming deviation correlation data search graphs oculus skyline kale false-positives)
Paper: "Root Cause Detection in a Service-Oriented Architecture" [pdf]
LinkedIn have implemented an automated root-cause detection system:
This paper introduces MonitorRank, an algorithm that can reduce the time, domain knowledge, and human effort required to ?nd the root causes of anomalies in such service-oriented architectures. In the event of an anomaly, MonitorRank provides a ranked order list of possible root causes for monitoring teams to investigate. MonitorRank uses the historical and current time-series metrics of each sensor as its input, along with the call graph generated between sensors to build an unsupervised model for ranking. Experiments on real production outage data from LinkedIn, one of the largest online social networks, shows a 26% to 51% improvement in mean average precision in ?nding root causes compared to baseline and current state-of-the-art methods.
This is a topic close to my heart after working on something similar for 3 years in Amazon! Looks interesting, although (a) I would have liked to see more case studies and examples of "real world" outages it helped with; and (b) it's very much a machine-learning paper rather than a systems one, and there is no discussion of fault tolerance in the design of the detection system, which would leave me worried that in the case of a large-scale outage event, the system itself will disappear when its help is most vital. (This was a major design influence on our team's work.) Overall, particularly given those 2 issues, I suspect it's not in production yet. Ours certainly was ;)(tags: linkedin soa root-cause alarming correlation service-metrics machine-learning graphs monitoring)
Announcing Zuul: Edge Service in the Cloud
Netflix' library to implement "edge services" -- ie. a front end to their API, web servers, and streaming servers. Some interesting features: dynamic filtering using Groovy scripts; Hystrix for software load balancing, fault tolerance, and error handling for originated HTTP requests; fine-grained service metrics; Archaius for configuration; and canary requests to detect overload risks. Pretty complex though
(tags: edge-services api netflix zuul archaius canary-requests http groovy hystrix load-balancing fault-tolerance error-handling configuration)
CloudFlare, PRISM, and Securing SSL Ciphers
Matthew Prince of CloudFlare has an interesting theory on the NSA's capabilities:
It is not inconceivable that the NSA has data centers full of specialized hardware optimized for SSL key breaking. According to data shared with us from a survey of SSL keys used by various websites, the majority of web companies were using 1024-bit SSL ciphers and RSA-based encryption through 2012. Given enough specialized hardware, it is within the realm of possibility that the NSA could within a reasonable period of time reverse engineer 1024-bit SSL keys for certain web companies. If they'd been recording the traffic to these web companies, they could then use the broken key to go back and decrypt all the transactions. While this seems like a compelling theory, ultimately, we remain skeptical this is how the PRISM program described in the slides actually works. Cracking 1024-bit keys would be a big deal and likely involve some cutting-edge cryptography and computational power, even for the NSA. The largest SSL key that is known to have been broken to date is 768 bits long. While that was 4 years ago, and the NSA undoubtedly has some of the best cryptographers in the world, it's still a considerable distance from 768 bits to 1024 bits -- especially given the slide suggests Microsoft's key would have to had been broken back in 2007. Moreover, the slide showing the dates on which "collection began" for various companies also puts the cost of the program at $20M/year. That may sound like a lot of money, but it is not for an undertaking like this. Just the power necessary to run the server farm needed to break a 1024-bit key would likely cost in excess of $20M/year. While the NSA may have broken 1024-bit SSL keys as part of some other program, if the slide is accurate and complete, we think it's highly unlikely they did so as part of the PRISM program. A not particularly glamorous alternative theory is that the NSA didn't break the SSL key but instead just cajoled rogue employees at firms with access to the private keys -- whether the companies themselves, partners they'd shared the keys with, or the certificate authorities who issued the keys in the first place -- to turn them over. That very well may be possible on a budget of $20M/year. [....] Google is a notable anomaly. The company uses a 1024-bit key, but, unlike all the other companies listed above, rather than using a default cipher suite based on the RSA encryption algorithm, they instead prefer the Elliptic Curve Diffie-Hellman Ephemeral (ECDHE) cipher suites. Without going into the technical details, a key difference of ECDHE is that they use a different private key for each user's session. This means that if the NSA, or anyone else, is recording encrypted traffic, they cannot break one private key and read all historical transactions with Google. The NSA would have to break the private key generated for each session, which, in Google's case, is unique to each user and regenerated for each user at least every 28-hours. While ECDHE arguably already puts Google at the head of the pack for web transaction security, to further augment security Google has publicly announced that they will be increasing their key length to 2048-bit by the end of 2013. Assuming the company continues to prefer the ECDHE cipher suites, this will put Google at the cutting edge of web transaction security.
2048-bit ECDHE sounds like the way to go, and CloudFlare now support that too.(tags: prism security nsa cloudflare ssl tls ecdhe elliptic-curve crypto rsa key-lengths)
Record companies to target 20 more pirate sites after court ruling - Independent.ie
Looks like IRMA are following the lead of the UK's BPI, by chasing the proxy sites next:
Up to 20 internet sites are to be targeted by an organisation representing record companies in a move to stamp out the illegal pirating of music and other copyright material. The Irish Recorded Music Association (IRMA) said it would be immediately moving against the 20 "worst offenders" to "take out" internet sites involved in the illegal downloading of copyright work.
However, looks like this will involve more court time:Last night IRMA director general, Dick Doyle said the High Court ruling was only the first step in "taking out many internet sites involved in illegally downloading music. "We will be back in court very shortly to take out five to 10 other sites. We have already selected a total of 20 of the worst offender sites and we will go after the next five in the very near future," he said.
That's not going to be cheap!(tags: courts ireland law irma piracy pirate-bay bpi proxies filesharing copyright)
Building a Modern Website for Scale (QCon NY 2013) [slides]
some great scalability ideas from LinkedIn. Particularly interesting are the best practices suggested for scaling web services: 1. store client-call timeouts and SLAs in Zookeeper for each REST endpoint; 2. isolate backend calls using async/threadpools; 3. cancel work on failures; 4. avoid sending requests to GC'ing hosts; 5. rate limits on the server. #4 is particularly cool. They do this using a "GC scout" request before every "real" request; a cheap TCP request to a dedicated "scout" Netty port, which replies near-instantly. If it comes back with a 1-packet response within 1 millisecond, send the real request, else fail over immediately to the next host in the failover set. There's still a potential race condition where the "GC scout" can be achieved quickly, then a GC starts just before the "real" request is issued. But the incidence of GC-blocking-request is probably massively reduced. It also helps against packet loss on the rack or server host, since packet loss will cause the drop of one of the TCP packets, and the TCP retransmit timeout will certainly be higher than 1ms, causing the deadline to be missed. (UDP would probably work just as well, for this reason.) However, in the case of packet loss in the client's network vicinity, it will be vital to still attempt to send the request to the final host in the failover set regardless of a GC-scout failure, otherwise all requests may be skipped. The GC-scout system also helps balance request load off heavily-loaded hosts, or hosts with poor performance for other reasons; they'll fail to achieve their 1 msec deadline and the request will be shunted off elsewhere. For service APIs with real low-latency requirements, this is a great idea.
(tags: gc-scout gc java scaling scalability linkedin qcon async threadpools rest slas timeouts networking distcomp netty tcp udp failover fault-tolerance packet-loss)
Why I won’t give the European Parliament the data protection analysis it wanted
Holy crap. Simon Davies rips into the EU data-protection reform disaster with gusto:
The situation was an utter disgrace. The advertising industry even gave an award to an Irish Minister for destroying some of the rights in the regulation while the UK managed to force a provision that would make the direct marketing industry a “legitimate” processing operation in its own right, putting it on the same level of lawful processing as fraud prevention. Things got to the point where even the most senior data protection officials in Europe stopped trying to influence events and had told me “let the chips fall as they may”. [...] But let’s take a step back for a moment from this travesty. Out on the streets – while most may not know what data protection is – people certainly know what it is supposed to protect. People value their privacy and they will be vocal about attempts to destroy it. I had said as much to the joint parliamentary meeting, observing “the one element that has been left out of all these efforts is the public”. However, as the months rolled on, the only message being sent to the public was that data protection is an anachronism stitched together with self interest and impracticality. [...] I wasn’t aware at the time that there was a vast stitch-up to kill the reforms. I cannot bring myself to present a temperate report with measured wording that pretends this is all just normal business. It isn’t normal business, and it should never be normal business in any civilized society. How does one talk in measured tones about such endemic hypocrisy and deception? If you want to know who the real enemy of privacy is, don’t just look to the American agencies. The real enemy is right here in the European Parliament in the guise of MEPs who have knowingly sold our rights away to maintain powerful relationships. I’d like to say they were merely hoodwinked into supporting the vandalism, but many are smart people who knew exactly what they were doing.
Nice work, Irish presidency! His bottom line:Is there a way forward? I believe so. First, governments should yield to common decency and scrap the illegitimate and poisoned Irish Council draft and hand the task to the Lithuanian Presidency that commences next month. Second, the Irish and British governments should be infinitely more transparent about their cooperation with intrusive interests that fuelled the deception.
(tags: ireland eu europe reform law data-protection privacy simon-davies meps iab)
Persuading David Simon (Pinboard Blog)
Maciej Ceglowski with a strongly-argued rebuttal of David Simon's post about the NSA's PRISM. This point in particular is key:
The point is, you don't need human investigators to find leads, you can have the algorithms do it [based on the call graph or network of who-calls-who]. They will find people of interest, assemble the watch lists, and flag whomever you like for further tracking. And since the number of actual terrorists is very, very, very small, the output of these algorithms will consist overwhelmingly of false positives.
(tags: false-positives maciej privacy security nsa prism david-simon accuracy big-data filtering anti-spam)
Schneier on Security: Blowback from the NSA Surveillance
Unintended consequences on US-focused governance of the internet and cloud computing:
Writing about the new Internet nationalism, I talked about the ITU meeting in Dubai last fall, and the attempt of some countries to wrest control of the Internet from the US. That movement just got a huge PR boost. Now, when countries like Russia and Iran say the US is simply too untrustworthy to manage the Internet, no one will be able to argue. We can't fight for Internet freedom around the world, then turn around and destroy it back home. Even if we don't see the contradiction, the rest of the world does.
(tags: internet freedom cloud-computing amazon google hosting usa us-politics prism nsa surveillance)
EU unlocks a great new source of online innovation
Today the European Parliament voted to formally agree new rules on open data – effectively making a reality of the proposal which I first put forward just over 18 months ago, and making it easier to open up huge amounts of public sector data.
Great news -- wonder how it'll affect the Ordnance Survey of Ireland?(tags: osi mapping open-data open data europe eu neelie-kroes)
UK ISPs Secretly Start Blocking Torrent Site Proxies | TorrentFreak
The next step of cat-and-mouse. Let's see what the pirate sites do next...
The blocking orders are intended to deter online piracy and were requested by the music industry group BPI on behalf of a variety of major labels. Thus far they’ve managed to block access to The Pirate Bay, Kat.ph, H33T and Fenopy, and preparations are being made to add many others. The effectiveness of these initial measures has been called into doubt, as they are relatively easy to bypass. For example, in response to the blockades hundreds of proxy sites popped up, allowing subscribers to reach the prohibited sites via a detour. However, as of this week these proxies are also covered by the same blocklist they aim to circumvent, without a new court ruling. The High Court orders give music industry group BPI the authority to add sites to the blocklist without oversight. Until now some small changes have been made, mostly in response to The Pirate Bay’s domain hopping endeavors, but with the latest blocklist update a whole new range of websites is being targeted.
(tags: bittorrent blocking filesharing copyright bpi piracy pirate-bay proxies fenopy kat.ph h33t filtering uk)
-
'Not long ago, we began rendering 3D models on GitHub. Today we're excited to announce the latest addition to the visualization family - geographic data. Any .geojson file in a GitHub repository will now be automatically rendered as an interactive, browsable map, annotated with your geodata.' As this HN comment notes, https://news.ycombinator.com/item?id=5875693 -- 'I'd much rather Github cleaned up the UI for existing features than added these little flourishes that I can't imagine even 1% of users use.' Something is seriously wrong in how GitHub decides product direction if this kind of wankology (and that Judy-array crap) is what gets prioritised. :( (via Marc O'Morain)
(tags: via:marc github mapping maps geojson hacking product-management ui pull-requests)
-
The issue [...] is that it's just not cost effective for anyone to actually stand up and challenge Warner Music, who has strong financial incentive to pretend the copyright is still valid. Well, apparently, someone is pissed off enough to try. The creatively named Good Morning to You Productions, a documentary film company planning a film about the song Happy Birthday, has now filed a lawsuit concerning the copyright of Happy Birthday and are seeking to force Warner/Chappell to return the millions of dollars it has collected over the years. That's going to make this an interesting case.
(tags: music copyright law via:bwalsh public-domain happy-birthday songs warner-music lawsuits)
-
metric collectors for various stuff not (or poorly) handled by other monitoring daemons Core of the project is a simple daemon (harvestd), which collects metric values and sends them to graphite carbon daemon (and/or other configured destinations) once per interval. Includes separate data collection components ("collectors") for processing of: /proc/slabinfo for useful-to-watch values, not everything (configurable). /proc/vmstat and /proc/meminfo in a consistent way. /proc/stat for irq, softirq, forks. /proc/buddyinfo and /proc/pagetypeinfo (memory fragmentation). /proc/interrupts and /proc/softirqs. Cron log to produce start/finish events and duration for each job into a separate metrics, adapts jobs to metric names with regexes. Per-system-service accounting using systemd and it's cgroups. sysstat data from sadc logs (use something like sadc -F -L -S DISK -S XDISK -S POWER 60 to have more stuff logged there) via sadf binary and it's json export (sadf -j, supported since sysstat-10.0.something, iirc). iptables rule "hits" packet and byte counters, taken from ip{,6}tables-save, mapped via separate "table chain_name rule_no metric_name" file, which should be generated along with firewall rules (I use this script to do that).
Pretty exhaustive list of system metrics -- could have some interesting ideas for Linux OS-level metrics to monitor in future.(tags: graphite monitoring metrics unix linux ops vm iptables sysadmin)
Former NSA Boss: We Don't Data Mine Our Giant Data Collection, We Just Ask It Questions
'Well, that's - no, we're going to use it. But we're not going to use it in the way that some people fear. You put these records, you store them, you have them. It's kind of like, I've got the haystack now. And now let's try to find the needle. And you find the needle by asking that data a question. I'm sorry to put it that way, but that's fundamentally what happens. All right. You don't troll through the data looking for patterns or anything like that. The data is set aside. And now I go into that data with a question that - a question that is based on articulable(ph), arguable, predicate to a terrorist nexus.'
Yep, that's data mining.(tags: data-mining questions haystack needle nsa usa politics privacy data-protection michael-hayden)
-
fastutil extends the Java™ Collections Framework by providing type-specific maps, sets, lists and queues with a small memory footprint and fast access and insertion; provides also big (64-bit) arrays, sets and lists, and fast, practical I/O classes for binary and text files. It is free software distributed under the Apache License 2.0. It requires Java 6 or newer.
used by Facebook (along with Apache Giraph, Netty, Unsafe) to speed up "weekend Hive jobs" to "coffee breaks". http://www.slideshare.net/nitayj/2013-0603-berlin-buzzwords(tags: via:highscalability facebook giraph optimization java speed fastutil collections data-structures)
-
good microbenchmarking of a bunch of Java collections; Trove, fastutil, PCJ, mahout-collections, hppc
(tags: java collections benchmarks performance speed coding data-structures optimization)
Spamalot reigns: the spoils of Ireland’s EU kingship | The Irish Times - Thu, Jun 13, 2013
The spam presidency. As European citizens are made the miserable targets of unimpeded “direct marketing”, that may be how Ireland’s stint in the EU presidency seat is recalled for years to come. Under the guiding hand of Minister for Justice Alan Shatter, the Council of the European Union has submitted proposals for amendments to a proposed new data protection regulation, all of which overwhelmingly favour business and big organisations, not citizens. The most obviously repugnant and surprising element in the amendments is a watering down of existing protections for EU citizens against the willy-nilly marketing Americans are forced to endure. In the US there are few meaningful restrictions on what businesses can do with people’s personal information when pitching products and services at them. In the EU, this has always been strictly controlled; information gathered for one purpose cannot be used by a business to sell whatever it wants – unless you have opted in to receive such solicitations. This means you are not constantly bombarded by emails and junk mail, nor do you get non-stop phone calls from telemarketers. Under the proposed amendments to the draft data protection regulation, direct marketing would become a legal form of data processing. In effect, this would legitimise spam email, junk print mail and marketing calls. This unexpected provision signals just how successful powerful corporate lobbyists have been in convincing ministers that business matters more than privacy or giving citizens reasonable control over their personal information. Far worse is contained in other amendments, which in effect turn the original draft of the regulation upside down.
Fantastic article from Karlin Lillington in today's Times on the terrible amendments proposed for the EU's data protection law.(tags: eu law prism data-protection privacy ireland ec marketing spam anti-spam email)
Vagrant and Chef to provision dev test environments
We have recently switched from a manually configured development environment to a nearly fully automated one using Vagrant, Chef, and a few other tools. With this transition, we’ve moved to an environment where data on the dev boxes is considered disposable and only what’s checked into the SCM is “real”. This is where we’ve always wanted to be, but without the ability to easily rebuild the dev environment from scratch, it’s hard to internalize this behavior pattern.
Rapid Response: The NSA Prism Leak
'The biggest leak in the history of US security or nothing to worry about? A breach of trust and a data protection issue or a necessary secret project to protect American interests? [Tomorrow] lunchtime Science Gallery Rapid Response event [sic] will pick through the jargon, examine the minutiae of the National Security Agency's PRISM project and the whistle blower Edward Snowden's revelations, and discuss what it means for you and everyone. And we'll look at the bigger picture too. Journalist Una Mullally will chair a panel of guests on the story that everyone is talking about. '
(tags: science-gallery panel-discussions dublin nsa prism panel)
-
Four major music companies have secured court orders requiring six internet service providers to block access by subscribers to various Pirate Bay websites within some 30 days in a bid to prevent illegal downloading of copyright music and other material. [...] Today, Mr Justice Brian McGovern said he was satisfied to make the order in circumstances including that new copyright laws here and in the EU permitted such orders to be made. He said he fully agreed with a previous High Court judge who had said he would make such blocking orders if the law permitted and noted the law now allowed for such orders. The form of the orders means the music companies will not have to make fresh applications to court if Pirate Bay changes its location on the internet.
(tags: pirate-bay blocking filtering internet ireland upc eircom vodafone digiweb three imagine o2 copyright)
Labour TD ignores tough questions on web case
I [Tom Murphy] have asked [Sean Sherlock] a question: Does he have any comment about the lawsuit between EMI and UPC (and a raft of other ISPs too btw) which is using his SI to attempt to block PirateBay? A court case he said would not happen. Now, I am blocked from following him on Twitter. This is not how a proper political system works.
(tags: politics ireland twitter sean-sherlock tom-murphy boards devore copyright)
PRISM explains the wider lobbying issues surrounding EU data protection reform | EDRI
The US has very successfully and expertly lobbied against the [EU] data protection package directly, it has mobilised and supported US industry lobbying. US industry has lobbied in its own name and mobilised malleable European trade associations to lobby on their behalf to amplify their message, “independent” “think tanks” have been created to amplify their message again. The result is not just the biggest lobbying effort that Brussels has ever seen, but also the broadest. Compliant Members of the European Parliament (MEPs) and EU Member States [...] have been imposing a “death by a thousand cuts” on the Regulation. Where previously there was a clear obligation to collect the “minimum necessary” data for any given service, the vague requirement to retain “not excessive” data is now preferred. Where previously companies could only use data for purposes that were “compatible” with the original reason for collecting the data, the Irish EU Presidency (pdf) has proposed a comical definition of “compatible” based on five elements, only one of which is related to the dictionary definition of the word. Members of the European Parliament and EU Member States are falling over themselves to ensure that the EU does not maintain its strategic advantage over the US. In addition to dismantling the proposed Regulation, countries like the UK desperately seek to delay the whole process and subsume it into the EU-US free trade agreement (the so-called “investment partnership” TTIP/TAFTA), which would subordinate a fundamental rights discussion in a trade negotiation. The UK government is even prepared to humiliate itself by arguing in favour of the US position on the basis that two and a half years (see Communication from 2010, pdf) of discussion is too fast!
(tags: edri data-protection eu ec ireland politics usa meps privacy uk free-trade)
Microsoft admits US government can access EU-based cloud data
interesting point from an MS Q&A back in 2011, quite relevant nowadays:
Q: Can Microsoft guarantee that EU-stored data, held in EU based datacenters, will not leave the European Economic Area under any circumstances — even under a request by the Patriot Act? A: Frazer explained that, as Microsoft is a U.S.-headquartered company, it has to comply with local laws (the United States, as well as any other location where one of its subsidiary companies is based). Though he said that "customers would be informed wherever possible," he could not provide a guarantee that they would be informed — if a gagging order, injunction or U.S. National Security Letter permits it. He said: "Microsoft cannot provide those guarantees. Neither can any other company." While it has been suspected for some time, this is the first time Microsoft, or any other company, has given this answer. Any data which is housed, stored or processed by a company, which is a U.S. based company or is wholly owned by a U.S. parent company, is vulnerable to interception and inspection by U.S. authorities.
(tags: microsoft privacy cloud-computing eu data-centers data-protection nsa fisa usa)
-
Irish MEP serving as a rapporteur on reform of the EU data protection regime, was given an award by an advertising trade group last month:
Sean Kelly, Fine Gael MEP for Ireland South [who serves as the EU’s Industry Committee Rapporteur for the General Data Protection Regulation], has been selected to receive the prestigious IAB Europe Award for Leadership and Excellence for his approach to dealing with privacy concerns over shortcomings in the European Commission’s data protection proposal. IAB Europe represents more than 5,500 online advertising media, research and analytics organisations.
(tags: iab-europe awards spam sean-kelly ireland meps politics eu data-protection privacy ec)
-
No subject appears to be more controversial to distributed systems engineers than the oft-quoted, oft-misunderstood CAP theorem. The purpose of this FAQ is to explain what is known about CAP, so as to help those new to the theorem get up to speed quickly, and to settle some common misconceptions or points of disagreement.
(tags: database distributed nosql cap consistency cap-theorem faqs)
seeing into the UV spectrum after Cataract Surgery with Crystalens
I've been very happy so far with the Crystalens implant for Cataract Surgery [...] one unexpected/interesting aspect is I see a violet glow that others do not - perhaps I'm more sensitive to the low end of the visible light spectrum.
(via Tony Finch)(tags: via:fanf science perception augmentation uv light sight cool cataracts surgery lens eyes)
Instagram: Making the Switch to Cassandra from Redis, a 75% 'Insta' Savings
shifting data out of RAM and onto SSDs -- unsurprisingly, big savings.
a 12 node cluster of EC2 hi1.4xlarge instances; we store around 1.2TB of data across this cluster. At peak, we're doing around 20,000 writes per second to that specific cluster and around 15,000 reads per second. We've been really impressed with how well Cassandra has been able to drop into that role.
(tags: ram ssd cassandra databases nosql redis instagram storage ec2)
-
Oh god. this sounds like an impending privacy and anti-spam disaster. "business-focussed":
Overall, the [Irish EC Presidency’s] draft compromise text can be seen as a more business-focused, pragmatic approach. For example, the Presidency has drafted an additional recital (Recital 3a), clarifying the right to data protection as a qualified right, highlighting the principle of proportionality and importance of other competing fundamental rights, including the freedom to conduct a business.
and some pretty serious relaxation of how consent for use of personal data is measured:The criterion for valid consent is amended from “explicit” to “unambiguous,” except in the case of processing special categories of data (i.e., sensitive personal data) (Recital 25 and Article 9(2)). This reverts to the current position under the Data Protection Directive and is a concession to the practical difficulty of obtaining explicit consent in all cases. The criteria for valid consent are further relaxed by the ability to obtain consent in writing, orally or in an electronic manner, and where technically feasible and effective, valid consent can be given using browser settings and other technical solutions. Further, the requirement that the controller bear the burden of proof that valid consent was obtained is limited to a requirement that the controller be able to “demonstrate” that consent was obtained (Recital 32 and Article 7(1)). The need for “informed” consent is also relaxed from the requirement to provide the full information requirements laid out in Article 14 to the minimal requirements that the data subject “at least” be made aware of: (1) the identity of the data controller, and (2) the purpose(s) of the processing of their personal data (Recitals 33 and 48).
(tags: anti-spam privacy data-protection spam ireland eu ec regulation)
-
wow, great view of which MEPs are eviscerating the EU's data protection regime:
Currently the EU is negotiating about new data privacy laws. This new EU Regulation will replace all existing national laws on data privacy. Here you can see a general overview which Members of the European Parliament (MEPs) are pushing for more or less data privacy. Choose a country, a political group or a MEP from the “Top 10” list to find out more.
(tags: europe eu privacy data-protection datap ec regulation meps)
EDRI's comments on EU proposals to reform privacy law
Amendments 762, 764 and 765 in particular seem to move portions of the law from "confirmed opt-in required" to "opt-out is ok" -- which sounds like a risk where spam and unsolicited actions on a person's data are concerned
-
'Easy Amazon EC2 Instance Comparison'. a nice UI on the various EC2 instance types on offer with their key attributes. Misses out availability of EBS-optimized instances though
(tags: amazon ec2 aws comparison pricing)
HyperLevelDB: A High-Performance LevelDB Fork
'HyperLevelDB improves on LevelDB in two key ways: Improved parallelism: HyperLevelDB uses more fine-grained locking internally to provide higher throughput for multiple writer threads. Improved compaction: HyperLevelDB uses a different method of compaction that achieves higher throughput for write-heavy workloads, even as the database grows.'
(tags: leveldb storage key-value-stores persistence unix libraries open-source)
EU Council deals killer blow to privacy reforms
'In an extraordinary result for corporate lobbying, direct marketing would by default be considered a legitimate data process and would therefore – by default – be lawful.'
(tags: eu politics data-protection privacy anti-spam spam eu-council direct-marketing)
Care and Feeding of Large Scale Graphite Installations [slides]
good docs for large-scale graphite use: 'Tip and tricks of using and scaling graphite. First presented at DevOpsDays Austin Texas 2013-05-01'
Low-latency stock trading "jumps the gun" due to default NTP configuration settings
On June 3, 2013, trading in SPY exploded at 09:59:59.985, which is 15 milliseconds before the ISM's Manufacturing number released at 10:00:00. Activity in the eMini (traded in Chicago), exploded at 09:59:59.992, which is 8 milliseconds before the news release, but 7 milliseconds after SPY. Note how SPY and the eMini traded within a millisecond for the Consumer Confidence release last week, but the eMini lagged SPY by about 7 milliseconds for the ISM Manufacturing release. The simultaneous trading on Consumer Confidence is because that number is released at the same time in both NYC and Chicago. The ISM Manufacturing number is probably released on a low latency feed in NYC, and then takes 5-7 milliseconds, due to the speed of light, to reach Chicago. Either the clock used to release the ISM number was 15 milliseconds fast, or someone (correctly) jumped the gun. Update: [...] The clock used to release the ISM was indeed, 15 milliseconds fast. This could be from using the default setting of many NTP clients, which allows the clock to drift up to about 16 milliseconds before adjusting time.
(tags: ntp time synchronization spy trading stocks low-latency clocks internet)
the infamous 2008 S3 single-bit-corruption outage
Neat, I didn't realise this was publicly visible. A single corrupted bit infected the S3 gossip network, taking down the whole S3 service in (iirc) one region:
We've now determined that message corruption was the cause of the server-to-server communication problems. More specifically, we found that there were a handful of messages on Sunday morning that had a single bit corrupted such that the message was still intelligible, but the system state information was incorrect. We use MD5 checksums throughout the system, for example, to prevent, detect, and recover from corruption that can occur during receipt, storage, and retrieval of customers' objects. However, we didn't have the same protection in place to detect whether [gossip state] had been corrupted. As a result, when the corruption occurred, we didn't detect it and it spread throughout the system causing the symptoms described above. We hadn't encountered server-to-server communication issues of this scale before and, as a result, it took some time during the event to diagnose and recover from it. During our post-mortem analysis we've spent quite a bit of time evaluating what happened, how quickly we were able to respond and recover, and what we could do to prevent other unusual circumstances like this from having system-wide impacts. Here are the actions that we're taking: (a) we've deployed several changes to Amazon S3 that significantly reduce the amount of time required to completely restore system-wide state and restart customer request processing; (b) we've deployed a change to how Amazon S3 gossips about failed servers that reduces the amount of gossip and helps prevent the behavior we experienced on Sunday; (c) we've added additional monitoring and alarming of gossip rates and failures; and, (d) we're adding checksums to proactively detect corruption of system state messages so we can log any such messages and then reject them.
This is why you checksum all the things ;)(tags: s3 aws post-mortems network outages failures corruption grey-failures amazon gossip)
-
Aphyr and Peter Bailis collect an authoritative list of known network partition and outage cases from published post-mortem data:
This post is meant as a reference point -- to illustrate that, according to a wide range of accounts, partitions occur in many real-world environments. Processes, servers, NICs, switches, local and wide area networks can all fail, and the resulting economic consequences are real. Network outages can suddenly arise in systems that are stable for months at a time, during routine upgrades, or as a result of emergency maintenance. The consequences of these outages range from increased latency and temporary unavailability to inconsistency, corruption, and data loss. Split-brain is not an academic concern: it happens to all kinds of systems -- sometimes for days on end. Partitions deserve serious consideration.
I honestly cannot understand people who didn't think this was the case. 3 years reading (and occasionally auto-cutting) Amazon's network-outage tickets as part of AWS network monitoring will do that to you I guess ;)(tags: networking outages partition cap failure fault-tolerance)
-
from Atelier Olschinsky. 'Fine Art Print on Hahnemuehle Photo Rag Bright White 310g; Limited Edition / Numbered and signed by the artist'
incompetent error-handling code in the mongo-java-driver project
an unexplained invocation of Math.random() in the exception handling block of this MongoDB java driver class causes roflscale lols in the github commit notes. http://stackoverflow.com/a/16833798 has more explanation.
(tags: github commits mongodb webscale roflscale random daily-wtf wtf)
-
'What is a Hermetic Server? The short definition would be a “server in a box”. If you can start up the entire server on a single machine that has no network connection AND the server works as expected, you have a hermetic server! This is a special case of the more general “hermetic” concept which applies to an isolated system not necessarily on a single machine. Why is it useful to have a hermetic server? Because if your entire [system under test] is composed of hermetic servers, it could all be started on a single machine for testing; no network connection necessary! The single machine could be a physical or virtual machine.' These also qualify as "fakes", using the terminology Martin Fowler suggests at http://martinfowler.com/bliki/TestDouble.html , I think
(tags: google testing hermetic-servers test test-doubles unit-testing)
-
hooray, sanity from the Google Testing blog. this has been a major cause of pain in the past, dealing with tricky rewrites of mock-heavy unit test code
Casalattico - Wikipedia, the free encyclopedia
How wierd. Many of the well-known chippers in Ireland are run by families from the same comune in Italy.
In the late 19th and early 20th century a significant number of young people left Casalattico to work in Ireland, with many founding chip shops there. Most second, third and fourth generation Irish-Italians can trace their lineage back to the municipality, with names such as Magliocco, Fusco, Marconi, Borza, Macari, Rosato and Forte being the most common. Although the Forte family actually originates from the village of Mortale, renamed Mon Forte due to the achievements of the Forte family. It is believed that up to 8,000 Irish-Italians have ancestors from Casalattico. The village is home to an Irish festival every summer to celebrate the many families that moved from there to Ireland.
(via JK)(tags: rome lazio italy ireland chip-shops chippers history emigration casalattico work irish-italians via:jk)
Videos from the Continuous Delivery track at QCon SF 2012
Think we'll be watching some of these in work soon -- Jez Humble's talk (the last one) in particular looks good:
Amazon, Etsy, Google and Facebook are all primarily software development shops which command enormous amounts of resources. They are, to use Christopher Little’s metaphor, unicorns. How can the rest of us adopt continuous delivery? That’s the subject of my talk, which describes four case studies of organizations that adopted continuous delivery, with varying degrees of success. One of my favourites – partly because it’s embedded software, not a website – is the story of HP’s LaserJet Firmware team, who re-architected their software around the principles of continuous delivery. People always want to know the business case for continuous delivery: the FutureSmart team provide one in the book they wrote that discusses how they did it.
(tags: continuous-integration continuous-delivery build release process dev deployment videos qcon towatch hp)
_Dynamic Histograms: Capturing Evolving Data Sets_ [pdf]
Currently, histograms are static structures: they are created from scratch periodically and their creation is based on looking at the entire data distribution as it exists each time. This creates problems, however, as data stored in DBMSs usually varies with time. If new data arrives at a high rate and old data is likewise deleted, a histogram’s accuracy may deteriorate fast as the histogram becomes older, and the optimizer’s effectiveness may be lost. Hence, how often a histogram is reconstructed becomes very critical, but choosing the right period is a hard problem, as the following trade-off exists: If the period is too long, histograms may become outdated. If the period is too short, updates of the histogram may incur a high overhead. In this paper, we propose what we believe is the most elegant solution to the problem, i.e., maintaining dynamic histograms within given limits of memory space. Dynamic histograms are continuously updateable, closely tracking changes to the actual data. We consider two of the best static histograms proposed in the literature [9], namely V-Optimal and Compressed, and modify them. The new histograms are naturally called Dynamic V-Optimal (DVO) and Dynamic Compressed (DC). In addition, we modified V-Optimal’s partition constraint to create the Static Average-Deviation Optimal (SADO) and Dynamic Average-Deviation Optimal (DADO) histograms.
(via d2fn)(tags: via:d2fn histograms streaming big-data data dvo dc sado dado dynamic-histograms papers toread)
How I decoded the human genome - Salon.com
classic long-read article from John Sundman: 'We are becoming the masters of our own DNA. But does that give us the right to decide that my children should never have been born?' part two at http://www.salon.com/2003/10/22/genome_two/
(tags: human genome genomics eugenics politics life john-sundman disability health dna medicine salon long-reads children)
The “Meme Hustler” hustler: Evgeny Morozov’s Stupid Talk about Tim O’Reilly
great long-read blog post from John Sundman debunking Evgeny Morozov's takedown of Tim O'Reilly
(tags: debunking john-sundman evgeny-morozov tim-oreilly tech technological-solutionism futurism writing silicon-valley utopianism open-source oss)
Strange Passion Presents Chant Chant Chant, Choice & SM Corporation live
'We are delighted to announce, for one night only, 3 legendary Irish Post Punk bands performing live in Dublin after a 30 year hiatus. This follows on from the critically acclaimed release of the Strange Passion Irish Post Punk compilation in 2012. Post punk legends Chant Chant Chant will perform along with electronic music pioneers Choice and SM Corporation. '
(tags: choice music ireland post-punk electronic dublin strange-passion gigs)
'Mythbusting Modern Hardware to gain "Mechanical Sympathy"' [slides]
Martin Thompson's latest talk -- taking a few common concepts about modern hardware performance and debunking/confirming them, mythbusters-style
(tags: mythbusters hardware mechanical-sympathy martin-thompson java performance cpu disks ssd)
High home ownership can seriously damage labor market, new study suggests
Interesting -- a healthy rental market is needed to allow sufficient labour mobility. This matches what I heard and saw from friends and coworkers in the US, anecdotally
Concert Industry Struggles With ‘Bots’ That Siphon Off Tickets - NYTimes.com
Bots now buying more than 60% of tickets, one group requesting up to 200,000 per day; bot writers now charging $14 per 10k captchas (via Shane Naughton)
(tags: ticketmaster scalping tickets via:shane-naughton bots captchas abuse)
Instant artist statement: Arty Bollocks Generator
'My work explores the relationship between the body and vegetarian ethics. With influences as diverse as Munch and Francis Bacon, new synergies are created from both orderly and random narratives. Ever since I was a postgraduate I have been fascinated by the essential unreality of the moment. What starts out as undefined soon becomes corroded into a hegemony of greed, leaving only a sense of failing and the chance of a new order. As temporal replicas become transformed through diligent and undefined practice, the viewer is left with an impression of the darkness of our culture.'
(tags: funny humor art arty bollocks generator hacks via:leroideplywood)
Communication costs in real-world networks
Peter Bailis has generated some good real-world data about network performance and latency, measured using EC2 instances, between ec2 regions, between zones, and between hosts in a single AZ. good data (particularly as I was looking for this data in a public source not too long ago).
I wasn’t aware of any datasets describing network behavior both within and across datacenters, so we launched m1.small Amazon EC2 instances in each of the eight geo-distributed “Regions,” across the three us-east “Availability Zones” (three co-located datacenters in Virginia), and within one datacenter (us-east-b). We measured RTTs between hosts for a week at a granularity of one ping per second.
Some of the high-percentile measurements are undoubtedly impact of host and VM behaviour, but that is still good data for a typical service built in EC2.(tags: networks performance measurements benchmarks ops ec2 networking internet az latency)
Reducing MongoDB traffic by 78% with Redis | Crashlytics Blog
One for @roflscaletips. Crashlytics reduce MongoDB load by hacking in some hand-coded caching into their Rails app, instead of just using a front-line HTTP cache to reduce Rails *and* db load. duh. (via Oisin)
(tags: crashlytics fail roflscale rails caching redis ruby via:oisin)
Display Hidden Files in OS X Open and Save Dialog Boxes
yet another laughable UI kludge in OS X. ridiculous
(tags: usability osx apple ui kludges hidden-files dot-files command-shift-option-elbow magic)
-
"Dear Mr Tilman, this is the only way I can help you. saluti, Giorgio Moroder". I love it -- someone call Tufte
(tags: graphics giorgio-moroder history music ilx basslines donna-summer synths)
Hollywood Studios [attempt to censor] Pirate Bay Documentary
Probably not deliberate, but pretty damn inept.
Over the past weeks several movie studios have been trying to suppress the availability of TPB-AFK [the Pirate Bay documentary] by asking Google to remove links to the documentary from its search engine. The links are carefully hidden in standard DMCA takedown notices for popular movies and TV-shows. The silent attacks come from multiple Hollywood sources including Viacom, Paramount, Fox and Lionsgate and are being sent out by multiple anti-piracy outfits. Fox, with help from six-strikes monitoring company Dtecnet, asked Google to remove a link to TPB-AFK on Mechodownload. Paramount did the same with a link on the Warez.ag forums. Viacom sent at least two takedown requests targeting links to the Pirate Bay documentary on Mrworldpremiere and Rapidmoviez. Finally, Lionsgate jumped in by asking Google to remove a copy of TPB-AFK from a popular Pirate Bay proxy.
(tags: funny inept hollywood lionsgate fox viacom paramount dtecnet tpb-afk piratebay piracy copyright movies google)
Flashback: How Yahoo Killed Flickr and Lost the Internet
This is about the best tech journalism I've ever read on Flickr. nice one Mat Honan
(tags: gizmodo flickr acquisition mergers yahoo corporate-culture mat-honan tech journalism)
Resisting the lure of the Freeman movement | Workers Solidarity Movement
An anarchist critique of the Freeman movement from the WSM:
This has been a very brief overview of the Freeman movement that has tried to capture with broad strokes its nature and possible responses. There is room for much more work, including a more in-depth analysis of the various flaws in the approach to the law. The greatest danger however is allowing a movement to develop within anarchist circles that ignores the principle of mutual aid and implicitly promotes private ownership of resources, that by granting absolute right to individuals gives them the ability to ignore their responsibilities to the wider community and ecology that sustains them. In more traditional terms, the movement is one all about negative freedoms, ignoring positive freedom as a concept.
(tags: anarchism freeman-on-the-land politics ireland law wsm)
The Reactionary ‘Freeman-?on-?the-?land’ and a Political Fracture
Another leftie view on the Freeman movement
(tags: freeman-on-the-land politics ireland left-wing anarchism law)
-
Well, apparently tomorrow, but close enough. Happy birthday to bradfitz' greatest creation and its wonderful slab allocator!
(tags: birthdays code via:alex-popescu open-source history malloc memory caching memcached)
Newegg nukes “corporate troll” Alcatel in third patent appeal win this year
I am loving this. Particularly this:
At trial in East Texas Cheng took the stand to tell Newegg's story. Alcatel-Lucent's corporate representative, at the heart of its massive licensing campaign, couldn't even name the technology or the patents it was suing Newegg over. "Successful defendants have their litigation managed by people who care," said Cheng. "For me, it's easy. I believe in Newegg, I care about Newegg. Alcatel Lucent, meanwhile, they drag out some random VP—who happens to be a decorated Navy veteran, who happens to be handsome and has a beautiful wife and kids—but the guy didn't know what patents were being asserted. What a joke." "Shareholders of public companies that engage in patent trolling should ask themselves if they're really well-served by their management teams," Cheng added. "Are they properly monetizing their R&D? Surely there are better ways to make money than to just rely on litigating patents. If I was a shareholder, I would take a hard look as to whether their management was competent."
(tags: patents ip swpats alcatel bell-labs newegg east-texas litigation lucent)
Call me maybe: Carly Rae Jepsen and the perils of network partitions
Kyle "aphyr" Kingsbury expands on his slides demonstrating the real-world failure scenarios that arise during some kinds of partitions (specifically, the TCP-hang, no clear routing failure, network partition scenario). Great set of blog posts clarifying CAP
(tags: distributed network databases cap nosql redis mongodb postgresql riak crdt aphyr)
-
Welcome to the Galapagos of Chinese “open” source. I call it “gongkai” (??). Gongkai is the transliteration of “open” as applied to “open source”. I feel it deserves a term of its own, as the phenomenon has grown beyond the so-called “shanzhai” (??) and is becoming a self-sustaining innovation ecosystem of its own. Just as the Galapagos Islands is a unique biological ecosystem evolved in the absence of continental species, gongkai is a unique innovation ecosystem evolved with little western influence, thanks to political, language, and cultural isolation. Of course, just as the Galapagos was seeded by hardy species that found their way to the islands, gongkai was also seeded by hardy ideas that came from the west. These ideas fell on the fertile minds of the Pearl River delta, took root, and are evolving. Significantly, gongkai isn’t a totally lawless free-for-all. It’s a network of ideas, spread peer-to-peer, with certain rules to enforce sharing and to prevent leeching. It’s very different from Western IP concepts, but I’m trying to have an open mind about it.
(tags: gongkai bunnie-huang china phone mobile hardware devices open-source)
Stability Patterns and Antipatterns [slides]
Michael "Release It!" Nygard's slides from a recent O'Reilly event, discussing large-scale service reliability design patterns
(tags: michael-nygard design-patterns architecture systems networking reliability soa slides pdf)
Deep In The Game: Not The RTE Guide
Good interview with Alan Maguire, the satirist behind the very funny @NotTheRTEGuide on Twitter:
I’ve always been a huge fan of TV Go Home and Charlie Brooker in general and it seemed like Irish TV and culture was a good target for the kind of barbed surrealism that he does. (I’m not claiming I’m in his league or anything but he’s the main influence). I was really surprised that there hadn’t been a parody RTÉ Guide already. TV listings are 140-ish characters already and the RTÉ Guide has a kind of weird place in Irish culture where everybody knows it but nobody our age really has any idea of what’s in it anymore. We associate it with a small-c conservatism, or I did at least and I play that up occasionally with the account.
(tags: nottherteguide rte rte-guide ireland funny satire interviews)
-
'based on my observations while I was a Site Reliability Engineer at Google.' - by Rob Ewaschuk; very good, and matching the similar recommendations and best practices at Amazon for that matter
(tags: monitoring ops devops alerting alerts pager-duty via:jk)
Monitoring the Status of Your EBS Volumes
Page in the AWS docs which describes their derived metrics and how they are computed -- these are visible in the AWS Management Console, and alarmable, but not viewable in the Cloudwatch UI. grr. (page-joshea!)
(tags: ebs aws monitoring metrics ops documentation cloudwatch)
Interpol filter scope creep: ASIC ordering unilateral website blocks
Bloody hell. This is stupidity of the highest order, and a canonical example of "filter creep" by a government -- secret state censorship of 1200 websites due to a single investment scam site.
The Federal Government has confirmed its financial regulator has started requiring Australian Internet service providers to block websites suspected of providing fraudulent financial opportunities, in a move which appears to also open the door for other government agencies to unilaterally block sites they deem questionable in their own portfolios. The instrument through which the ISPs are blocking the Interpol list of sites is Section 313 of the Telecommunications Act. Under the Act, the Australian Federal Police is allowed to issue notices to telcos asking for reasonable assistance in upholding the law. [...] Tonight Senator Conroy’s office revealed that the incident that resulted in Melbourne Free University and more than a thousand other sites being blocked originated from a different source — financial regulator the Australian Securities and Investment Commission. On 22 March this year, ASIC issued a media release warning consumers about the activities of a cold-calling investment scam using the name ‘Global Capital Wealth’, which ASIC said was operating several fraudulent websites — www.globalcapitalwealth.com and www.globalcapitalaustralia.com. In its release on that date, ASIC stated: “ASIC has already blocked access to these websites.”
(tags: scams australia filtering filter-creep false-positives isps asic fraud secrecy)
Obfuscatory pie-chart from Garda penalty-points corruption report
"Twitter / gavinsblog: For sake of clarity here is helpful pie chart of the 95.4% of fixed charge notices not terminated #missingthepoint" Paging Edward Tufte: classic example of an obfuscatory pie-chart, diagramming the wrong thing misleadingly. By presenting it like this, it appears that the 95.4% of cases where fixed charge notices were issued by the guards are relevant to the discussion of the other classes; in reality, that means that 4.6% of cases, 37,000 cases, were terminated, some for good reasons, others for not, and it's the difference between those two classes that are relevant. In my opinion, 2 separate pie charts would be better; one to show the dismissed-versus-undismissed count (which IMO could have been omitted entirely), and one to show the good-vs-not-so-good termination reason counts (which is the meat of the issue).
(tags: dataviz visualisation data obfuscation gardai police corruption penalty-points)
Berkeley DB Java Edition Architecture [PDF]
background white paper on the BDB-JE innards and design, from 2006. Still pretty accurate and good info
(tags: bdb-je java berkeley-db bdb design databases pdf white-papers trees)
-
This Court has developed a new awareness and understanding of a category of vexatious litigant. As we shall see, while there is often a lack of homogeneity, and some individuals or groups have no name or special identity, they (by their own admission or by descriptions given by others) often fall into the following descriptions: Detaxers; Freemen or Freemen-on-the-Land; Sovereign Men or Sovereign Citizens; Church of the Ecumenical Redemption International (CERI); Moorish Law; and other labels - there is no closed list. In the absence of a better moniker, I have collectively labelled them as Organized Pseudolegal Commercial Argument litigants [“OPCA litigants”], to functionally define them collectively for what they literally are. These persons employ a collection of techniques and arguments promoted and sold by ‘gurus’ (as hereafter defined) to disrupt court operations and to attempt to frustrate the legal rights of governments, corporations, and individuals. Over a decade of reported cases have proven that the individual concepts advanced by OPCA litigants are invalid. What remains is to categorize these schemes and concepts, identify global defects to simplify future response to variations of identified and invalid OPCA themes, and develop court procedures and sanctions for persons who adopt and advance these vexatious litigation strategies. One participant in this matter [...] appears to be a sophisticated and educated person, but is also an OPCA litigant. One of the purposes of these Reasons is, through this litigant, to uncover, expose, collate, and publish the tactics employed by the OPCA community, as a part of a process to eradicate the growing abuse that these litigants direct towards the justice and legal system we otherwise enjoy in Alberta and across Canada. I will respond on a point-by-point basis to the broad spectrum of OPCA schemes, concepts, and arguments advanced in this action by [him].
Via Ronan Lupton(tags: via:ronanlupton law canada legal freeman opca court tax judgements)
-
This classic came up in discussions yesterday...
In the Linux Kernel community Rusty Russell came up with a API rating scheme to help us determine if our API is sensible, or not. It's a rating from -10 to 10, where 10 is perfect is -10 is hell. Unfortunately there are too many examples at the wrong end of the scale.
(tags: rusty-russell quality coding kernel linux apis design code-reviews code)
-
hooray! Command-line gmailish goodness returns. And with a signed gem, to boot
Martin Thompson, Luke "Snabb Switch" Gorrie etc. review the C10M presentation from Schmoocon
on the mechanical-sympathy mailing list. Some really interesting discussion on handling insane quantities of TCP connections using low volumes of hardware:
This talk has some good points and I think the subject is really interesting. I would take the suggested approach with serious caution. For starters the Linux kernel is nowhere near as bad as it made out. Last year I worked with a client and we scaled a single server to 1 million concurrent connections with async programming in Java and some sensible kernel tuning. I've heard they have since taken this to over 5 million concurrent connections. BTW Open Onload is an open source implementation. Writing a network stack is a serious undertaking. In a previous life I wrote a network probe and had to reassemble TCP streams and kept getting tripped up by edge cases. It is a great exercise in data structures and lock-free programming. If you need very high-end performance I'd talk to the Solarflare or Mellanox guys before writing my own. There are some errors and omissions in this talk. For example, his range of ephemeral ports is not quite right, and atomic operations are only 15 cycles on Sandy Bridge when hitting local cache. A big issue for me is when he defined C10M he did not mention the TIME_WAIT issue with closing connections. Creating and destroying 1 million connections per second is a major issue. A protocol like HTTP is very broken in that the server closes the socket and therefore has to retain the TCB until the specified timeout occurs to ensure no older packet is delivered to a new socket connection.
(tags: mechanical-sympathy hardware scaling c10m tcp http scalability snabb-switch martin-thompson)
-
This program creates an EBS snapshot for an Amazon EC2 EBS volume. To help ensure consistent data in the snapshot, it tries to flush and freeze the filesystem(s) first as well as flushing and locking the database, if applicable. Filesystems can be frozen during the snapshot. Prior to Linux kernel 2.6.29, XFS must be used for freezing support. While frozen, a filesystem will be consistent on disk and all writes will block. There are a number of timeouts to reduce the risk of interfering with the normal database operation while improving the chances of getting a consistent snapshot. If you have multiple EBS volumes in a RAID configuration, you can specify all of the volume ids on the command line and it will create snapshots for each while the filesystem and database are locked. Note that it is your responsibility to keep track of the resulting snapshot ids and to figure out how to put these back together when you need to restore the RAID setup.
Handy!(tags: ubuntu ec2 aws linux ebs snapshots ops tools alestic)
Measuring & Optimizing I/O Performance
Another good writeup on iostat and EBS, from Ilya Grigorik
(tags: io optimization sysadmin performance iostat ebs aws ops)
AWS forum post on interpreting iostat output for EBS
Great post from AndrewC@EBS on interpreting iostat output on EBS volumes -- from 2009, but still looks reasonable enough
Operations is Dead, but Please Don’t Replace it with DevOps
This is so damn spot on.
Functional silos (and a standalone DevOps team is a great example of one) decouple actions from responsibility. Functional silos allow people to ignore, or at least feel disconnected from, the consequences of their actions. DevOps is a cultural change that encourages, rewards and exposes people taking responsibility for what they do, and what is expected from them. As Werner Vogels from Amazon Web Services says, “you build it, you run it”. So a “DevOps team” is a risky and ultimately doomed strategy. Sure there are some technical roles, specifically related to the enablement of DevOps as an approach and these roles and tools need to be filled and built. Self service platforms, collaboration and communication systems, tool chains for testing, deployment and operations are all necessary. Sure someone needs to deliver on that stuff. But those are specific technical deliverables and not DevOps. DevOps is about people, communication and collaboration. Organizations ignore that at their peril.
(tags: devops teams work ops silos collaboration organisations)
Universal Music Group adding audible "watermarks"
including on paid-for, losslessly-compressed digital audio music files:
Why isn't UMG's watermark talked about more? Maybe people think the audio quality problems are due to some kind of lossy compression, as I did, and ignore it completely, or blame the streaming service/distributor. The problem here is that the UMG watermark degrades the audio to about the equivalent of a 96 kbit MP3. My guess is that if consumers were informed about what is going on, they would care. Especially those who pay full retail price for digital downloads advertised as lossless audio.
(tags: lame audio drm media music umg universal watermarks noise consumer mp3)
“Call Me Maybe: Carly Rae Jepsen and the Perils of Network Partitions”
Aphyr's epic RICON talk, exploring distributed-database failure modes through music. and what a lot of fail there is! Bottom line: CRDTs win
(tags: crdts data-structures storage ricon apyhr failures network partitions puns slides)
Cloudera Impala 1.0: It’s Here, It’s Real, It’s Already the Standard for SQL on Hadoop
we are proud to announce the first production drop of Impala, which reflects feedback from across the user community based on multiple types of real-world workloads. Just as a refresher, the main design principle behind Impala is complete integration with the Hadoop platform (jointly utilizing a single pool of storage, metadata model, security framework, and set of system resources). This integration allows Impala users to take advantage of the time-tested cost, flexibility, and scale advantages of Hadoop for interactive SQL queries, and makes SQL a first-class Hadoop citizen alongside MapReduce and other frameworks. The net result is that all your data becomes available for interactive analysis simultaneously with all other types of processing, with no ETL delays needed.
Along with some great benchmark numbers against Hive. nifty stuff(tags: cloudera impala sql querying etl olap hadoop analytics business-intelligence reports)
Alex Feinberg's response to Damien Katz' anti-Dynamoish/pro-Couchbase blog post
Insightful response, worth bookmarking. (the original post is at http://damienkatz.net/2013/05/dynamo_sure_works_hard.html ).
while you are saving on read traffic (online reads only go to the master), you are now decreasing availability (contrary to your stated goal), and increasing system complexity. You also do hurt performance by requiring all writes and reads to be serialized through a single node: unless you plan to have a leader election whenever the node fails to meet a read SLA (which is going to result a disaster -- I am speaking from personal experience), you will have to accept that you're bottlenecked by a single node. With a Dynamo-style quorum (for either reads or writes), a single straggler will not reduce whole-cluster latency. The core point of Dynamo is low latency, availability and handling of all kinds of partitions: whether clean partitions (long term single node failures), transient failures (garbage collection pauses, slow disks, network blips, etc...), or even more complex dependent failures. The reality, of course, is that availability is neither the sole, nor the principal concern of every system. It's perfect fine to trade off availability for other goals -- you just need to be aware of that trade off.
(tags: cap distributed-databases databases quorum availability scalability damien-katz alex-feinberg partitions network dynamo riak voldemort couchbase)
CAP Confusion: Problems with ‘partition tolerance’
Another good clarification about CAP which resurfaced during last week's discussion:
So what causes partitions? Two things, really. The first is obvious – a network failure, for example due to a faulty switch, can cause the network to partition. The other is less obvious, but fits with the definition [...]: machine failures, either hard or soft. In an asynchronous network, i.e. one where processing a message could take unbounded time, it is impossible to distinguish between machine failures and lost messages. Therefore a single machine failure partitions it from the rest of the network. A correlated failure of several machines partitions them all from the network. Not being able to receive a message is the same as the network not delivering it. In the face of sufficiently many machine failures, it is still impossible to maintain availability and consistency, not because two writes may go to separate partitions, but because the failure of an entire ‘quorum’ of servers may render some recent writes unreadable.
(sorry, catching up on old interesting things posted last week...)(tags: failure scalability network partitions cap quorum distributed-databases fault-tolerance)
Big-O Algorithm Complexity Cheat Sheet
nicely done, very readable
(tags: algorithms reference cheat-sheet big-o complexity estimation coding)
Did Conroy’s AFP filter wrongly block 1,200 sites?
Looks like many Aussie network operators were legally required to block 1,200 websites (presumably, one target and 1199 false positives), in secret. Quoting http://lists.ausnog.net/pipermail/ausnog/2013-April/017993.html : "You get a notice to block. You block or either get fined, go to jail or lose your carrier licence. It is a blunt instrument and it is a condition of being at 'the big boys table' i.e. you're a carrier or a carriage service provider."
(tags: australia law afp filtering internet blocking censorship secret eff)
Making sense out of BDB-JE fast stats
good info on the system metrics recorded by BDB-JE's EnvironmentStats code, particularly where cache and cleaner activity are concerned. Particularly useful for Voldemort
(tags: voldemort caching bdb bdb-je storage tuning ops metrics reference)
Approximate Heavy Hitters -The SpaceSaving Algorithm
nice, readable intro to SpaceSaving (which I've linked to before) -- a simple stream-processing cardinality top-K estimation algorithm with bounded error.
(tags: algorithms coding space-saving cardinality streams stream-processing estimation)
Darach Ennis on CEP, Stream Processing, Messaging, OOP vs Functional Architecture
good interview -- lots of food for thought!
(tags: darach-ennis stream-processing messaging architecture qcon interviews erlang cep realtime rx comet events)
One Year Later, the Results of Tor Books UK Going DRM-Free
As it is, we’ve seen no discernible increase in piracy on any of our titles, despite them being DRM-free for nearly a year.
Understanding Elastic Block Store Availability and Performance [slides]
fantastic in-depth presentation on EBS usage; lots of good advice here if you're using EBS volumes with/without PIOPS
(tags: piops ebs performance aws ec2 ops storage amazon presentations)
-
Github get good results using Judy arrays to replace a Ruby hash. However: the whole blog post is a bit dodgy to me. It feels like there are much better ways to fix the problem: 1. the big one: don't do GC-heavy activity in the front-end web servers. Split that language-classification code into a separate service. Write its results to a cache and don't re-query needlessly. 2. why isn't this benchmarked against a C/C++ hash? it's only 36000 entries, loaded once at startup. lookups against that should be blisteringly fast even with the basic data structures, and that would also be outside the Ruby heap so avoid the GC overhead. Feels like the use of a Judy array was a "because I want to" decision. 3. personally, I'd have preferred they spend time fixing their uptime problems.... See also https://news.ycombinator.com/item?id=5639013 for more kvetching.
(tags: ruby github gc judy-arrays linguist hashes data-structures)
-
Mozilla's experience with Kanban. We've had good results in Amazon, too. good intro links in this post -- might start talking about it in Swrve...
(tags: kanban scheduling team agile mozilla)
Secret Bitcoin mining code added to game sparks outrage
Thunberg's admission that [the E-Sports Entertainment Association client software] ran Bitcoin-mining software without explicit user consent is startling. Aside from potentially opening the company up to huge legal liability, the move is likely to engender distrust among some of the company's most loyal fans. The nonchalance of some of Thunberg's comments may only add insult to the betrayal many users are likely to feel. "But for the record, I told jag he shouldn't be lazy and run the miner in a separate process," he wrote in a post, referring to one of his software engineers with the screen name Jaguar, who didn't take steps to conceal the Bitcoin miner. "Rookie move." In the later post he wrote: "100% of the funds are going into the s14 prize pot, so at the very least your melted gpus contributed to a good cause."
Gap's application of Knockout.js and the MVVM model
Interesting, first time I'd heard of it; the Model-View-View Model pattern.
(tags: mvvm architecture javascript web ui knockout-js martin-fowler json)
-
very nice single-purpose site -- figure out who represents any given Irish postal address
Lectures in Advanced Data Structures (6.851)
Good lecture notes on the current state of the art in data structure research.
Data structures play a central role in modern computer science. You interact with data structures even more often than with algorithms (think Google, your mail server, and even your network routers). In addition, data structures are essential building blocks in obtaining efficient algorithms. This course covers major results and current directions of research in data structures: TIME TRAVEL We can remember the past efficiently (a technique called persistence), but in general it's difficult to change the past and see the outcomes on the present (retroactivity). So alas, Back To The Future isn't really possible. GEOMETRY When data has more than one dimension (e.g. maps, database tables). DYNAMIC OPTIMALITY Is there one binary search tree that's as good as all others? We still don't know, but we're close. MEMORY HIERARCHY Real computers have multiple levels of caches. We can optimize the number of cache misses, often without even knowing the size of the cache. HASHING Hashing is the most used data structure in computer science. And it's still an active area of research. INTEGERS Logarithmic time is too easy. By careful analysis of the information you're dealing with, you can often reduce the operation times substantially, sometimes even to constant. We will also cover lower bounds that illustrate when this is not possible. DYNAMIC GRAPHS A network link went down, or you just added or deleted a friend in a social network. We can still maintain essential information about the connectivity as it changes. STRINGS Searching for phrases in giant text (think Google or DNA). SUCCINCT Most “linear size” data structures you know are much larger than they need to be, often by an order of magnitude. Some data structures require almost no space beyond the raw data but are still fast (think heaps, but much cooler).
(via Tim Freeman)(tags: data-structures lectures mit video data algorithms coding csail strings integers hashing sorting bst memory)
Older Is Wiser: Study Shows Software Developers’ Skills Improve Over Time
At least in terms of StackOverflow rep:
For the first part of the study, the researchers compared the age of users with their reputation scores. They found that an individual’s reputation increases with age, at least into a user’s 40s. There wasn’t enough data to draw meaningful conclusions for older programmers. The researchers then looked at the number of different subjects that users asked and answered questions about, which reflects the breadth of their programming interests. The researchers found that there is a sharp decline in the number of subjects users weighed in on between the ages of 15 and 30 – but that the range of subjects increased steadily through the programmers’ 30s and into their early 50s. Finally, the researchers evaluated the knowledge of older programmers (ages 37 and older) compared to younger programmers (younger than 37) in regard to relatively recent technologies – meaning technologies that have been around for less than 10 years. For two smartphone operating systems, iOS and Windows Phone 7, the veteran programmers had a significant edge in knowledge over their younger counterparts. For every other technology, from Django to Silverlight, there was no statistically significant difference between older and younger programmers. “The data doesn’t support the bias against older programmers – if anything, just the opposite,” Murphy-Hill says.
Damn right ;)(tags: coding age studies software work stack-overflow ncsu knowledge skills life)
-
Test Double is a generic term for any case where you replace a production object for testing purposes. There are various kinds of double that Gerard lists: Dummy objects are passed around but never actually used. Usually they are just used to fill parameter lists. Fake objects actually have working implementations, but usually take some shortcut which makes them not suitable for production (an InMemoryTestDatabase is a good example). Stubs provide canned answers to calls made during the test, usually not responding at all to anything outside what's programmed in for the test. Spies are stubs that also record some information based on how they were called. One form of this might be an email service that records how many messages it was sent. Mocks are pre-programmed with expectations which form a specification of the calls they are expected to receive. They can throw an exception if they receive a call they don't expect and are checked during verification to ensure they got all the calls they were expecting.
(tags: test-doubles naming patterns tdd testing mocking tests martin-fowler)
Limerick-Tralee walking/cycling route blocked by farmers
Oh for god's sake. I know a few people who've made a trip to Mayo explicitly because the Greenway was there to visit. This is shocking, backwards stuff:
The success of [Mayo's] Great Western Greenway [trail] has overtaken that of others, such as the Great Southern Trail group, which has been working hard to install a walking and cycling route on sections of the former Limerick-Tralee railway line. On February 2nd, to mark the 50th anniversary of its closure, about 150 members and supporters of the Great Southern Trail set out from the old railway station at Abbeyfeale, Co Limerick, along the most recently developed section to cross the Kerry county boundary. The trailers were greeted by a barricade on the border, manned by more than 30 farmers, including the Listowel Fine Gael town councillor Denis Stack. A stand-off continued for three hours, with the Garda mediating in vain. The farmers were trying to lay claim to the land occupied by the disused railway line, even though Minister for Transport Leo Varadkar had made it clear that CIÉ “is the owner of the property [and] will object to any application by others to register these lands”.
(via Rossa McMahon)(tags: via:rossamcmahon cycling walking hiking trails ireland kerry limerick listowel denis-stack cie)
-
like sed for JSON data – you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text. [it] is written in portable C, and it has zero runtime dependencies. You can download a single binary, scp it to a far away machine, and expect it to work.
Nice tool. Needs to get into the Debian/Ubuntu apt repos pronto ;)(tags: jq tools cli via:peakscale json coding data sed unix)
"Clickwrap" licensing established as legal in Irish court
"The evidence does establish that there is a practice in the airline and online travel agency sectors of contractually binding web users by click wrapping or browse wrapping, which practice is generally and regularly followed by the operators in those sectors. In reality, it is difficult to see how online trade could be carried on in the absence of those devices. As regards the third question which arises from the MSG decision, in this case it is whether the defendant was aware or is presumed to have been aware of the practice. The evidence before the Court, in my view, clearly demonstrates that the defendant was aware of the practice, it being a practice which is generally and regularly followed when making bookings with online travel agents and with airlines and which, in the words of the Court in the MSG case, may be regarded as being a consolidated practice. Accordingly, in my view, by application of Article 23(1)(c), the defendant is bound by the jurisdiction clause in the Terms of Use on the plaintiff’s website by its use, either through the medium of an automaton or a manual operator or a third party data provider, of the website.”
(via Rossa McMahon)
Functional Reactive Programming in the Netflix API with RxJava
Hmm, this seems nifty as a compositional building block for Java code to enable concurrency without thread-safety and sync problems.
Functional reactive programming offers efficient execution and composition by providing a collection of operators capable of filtering, selecting, transforming, combining and composing Observable's. The Observable data type can be thought of as a "push" equivalent to Iterable which is "pull". With an Iterable, the consumer pulls values from the producer and the thread blocks until those values arrive. By contrast with the Observable type, the producer pushes values to the consumer whenever values are available. This approach is more flexible, because values can arrive synchronously or asynchronously.
(tags: concurrency java jvm threads thread-safety coding rx frp fp functional-programming reactive functional async observable)
You probably shouldn’t use a spreadsheet for important work
Daniel Lemire comments on the recent cases of bugs in spreadsheets causing major impact:
There are several critical problems with a tool like Excel that need to be widely known: * Spreadsheets do not support testing. For anything that matters, you should validate and test your code automatically and systematically; * Spreadsheets make code reviews impractical. To visually inspect the code, you need to click and each and every cell. In practice, this means that you cannot reasonably ask someone to read over your formulas to make sure that there is no mistake; * Spreadsheets encourage redundancies. Spreadsheets encourage copy-and-paste. Though copying and pasting is sometimes the right tool, it also creates redundancies. These redundancies make it very difficult to update a spreadsheet: are you absolutely sure that you have changed the formula throughout?
Agreed on all three, particularly on the impossibility of testing. IMO, everyone who may be in a job where automation via spreadsheet is likely, needs training in SDE fundamentals: unit testing, the important of open source and open data for reproducibility, version control, and code review. We are all computer scientists now.(tags: spreadsheets excel coding errors bugs testability unit-testing testing quality sde sde-fundamentals dry)
Log4j2 Asynchronous Loggers for Low-Latency Logging - Apache Log4j 2
implemented using the LMAX Disruptor library -- very impressive performance figures. I presume in real-world usage, these latencies are dwarfed by hardware costs, though
(tags: disruptor coding java log4j logging async performance)
-
Google Drive and GMail have a built-in scripting engine. I had no idea
(tags: gmail evernote archival scripting coding hacks google-drive)
-
How the Irish media are partly to blame for the catastrophic property bubble, from a paper entitled _The Role Of The Media In Propping Up Ireland’s Housing Bubble_, by Dr Julien Mercille, in the _Social Europe Journal_:
“The overall argument is that the Irish media are part and parcel of the political and corporate establishment, and as such the news they convey tend to reflect those sectors’ interests and views. In particular, the Celtic Tiger years involved the financialisation of the economy and a large property bubble, all of it wrapped in an implicit neoliberal ideology. The media, embedded within this particular political economy and itself a constitutive element of it, thus mostly presented stories sustaining it. In particular, news organisations acquired direct stakes in an inflated real estate market by purchasing property websites and receiving vital advertising revenue from the real estate sector. Moreover, a number of their board members were current or former high officials in the finance industry and government, including banks deeply involved in the bubble’s expansion."
(tags: economics irish-times ireland newspapers media elite insiders bubble property-bubble property celtic-tiger papers news bias)
-
Ugh. low-end ISPs MITM'ing DNS queries:
Some ISP's are now using a technology called 'Transparent DNS proxy'. Using this technology, they will intercept all DNS lookup requests (TCP/UDP port 53) and transparently proxy the results. This effectively forces you to use their DNS service for all DNS lookups. If you have changed your DNS settings to an open DNS service such as Google, Comodo or OpenDNS expecting that your DNS traffic is no longer being sent to your ISP's DNS server, you may be surprised to find out that they are using transparent DNS proxying.
(via Nelson) BitTorrent’s Secure Dropbox Alternative Goes Public
As kragen says, 'a decentralized way to sync a folder of large files, using BitTorrent instead of an untrustworthy central server'. Windows, OSX, and Linux supported
(tags: bittorrent dropbox cloud storage filesharing sharing sync synchronization)
DataSift Architecture: Realtime Datamining at 120,000 Tweets Per Second
250 million tweets per day, 30-node HBase cluster, 400TB of storage, Kafka and 0mq. This is from 2011, hence this dated line: 'for a distributed application they thought AWS was too limited, especially in the network. AWS doesn’t do well when nodes are connected together and they need to talk to each other. Not low enough latency network. Their customers care about latency.' (Nowadays, it would be damn hard to build a lower-latency network than that attached to a cc2.8xlarge instance.)
(tags: datasift architecture scalability data twitter firehose hbase kafka zeromq)
Breaking the 1000 ms Time to Glass Mobile Barrier [slides]
Great presentation from Google on HTML5 CSS+JS render speed, 3G/4G network latency, etc. (via John G)
(tags: google slides 3g 4g lte networking telcos telecom css js html5 web via:jg)
Lucene 4 - Revisiting Problems For Speed [slides]
a Presentation from Simon Willnauer on optimization work performed on Lucene in 2011. The most interesting stuff here is the work done to replace an O(n^2) FuzzyQuery fuzzy-match algorithm with a FSM trie is extremely cool -- benchmarked at 214 times faster!
(tags: benchmarks slides lucene search fuzzy-matching text-matching strings algorithms coding fsm tries)
Microsoft Code Digger extension
Miguel de Icaza says it's witchcraft -- I'm inclined to agree:
Code Digger analyzes possible execution paths through your .NET code. The result is a table where each row shows a unique behavior of your code. The table helps you understand the behavior of the code, and it may also uncover hidden bugs. Through the new context menu item "Generate Inputs / Outputs Table" in the Visual Studio editor, you can invoke Code Digger to analyze your code. Code Digger computes and displays input-output pairs. Code Digger systematically hunts for bugs, exceptions, and assertion failures.
(tags: testing constraint-solving solver witchcraft magic dot-net coding tests code-digger microsoft)
Swansea measles outbreak: was an MMR scare in the local press to blame?
Sixteen years ago, journalists had a much easier job assembling "balanced" stories about MMR in south Wales. When I wrote about the measles outbreak last week, I suggested that it was related to Andrew Wakefield's discredited 1998 Lancet research, but the Swansea contagion seems more likely to be the result of a separate scare a year earlier in the South Wales Evening Post. Before 1997, uptake of MMR in the distribution area of the Post was 91%, and 87.2% in the rest of Wales. After the Post's campaign, uptake in the distribution area fell to 77.4% (it was 86.8% in the rest of Wales). That's almost a 14% drop where the Post had influence, compared with less than 3% elsewhere. In the dry wording of the BMJ, "the [South West Evening Post] campaign is the most likely explanation". In other words, what we can see in Swansea is the local effect of local reporting‚ in all probability, just a taster of what happens when the news irresponsibly creates unfounded terror. [...] The 1997 coverage focused on a group of families who blamed MMR for various ailments in their children, including learning difficulties, digestive problems and autism‚ none of which have been found to have any connection with the vaccine. The Post's coverage was at the time deemed a success, and in 1998 it won a prize for investigative reporting in the BT Wales Press Awards. That year, the SWEP ran at least 39 stories related to the alleged dangers of MMR. And yes, it's true that the paper never directly endorsed non-vaccination. What it did do was publicise the idea of "vaccine damage" as a risk, one that parents would then likely weigh up against the risk of contracting measles, mumps or rubella. And this went beyond the reporting of parental anxieties‚ it was part of the Post's editorial line. One article is entitled "Young bodies cannot take it". The all-important "journalistic balance" was constantly available, thanks to campaigning parents and their solicitor Richard Barr. (It was Barr who engaged Wakefield for a lawsuit, leading to the "fishing expedition" research that became the Lancet paper.) They were happy to provide a quote on the dangers of the "triple jab", which health authorities were then obliged to rebut politely. The Post also seemed to downplay the risk of measles, reporting on 6 July 1998 that "not a single child has been hit by the illness‚ despite a 13% drop in take-up levels". It's not parents who should feel embarrassed by the Swansea measles outbreak: some may have acted from overt dread at the prospect of harming their child, and some simply from omission, but all were encouraged by a press that focused on non-existent risks and downplayed the genuine horror of the diseases MMR prevents. The shame belongs to journalists: those of the South West Evening Post who allowed themselves to be recruited in the service of a speculative lawsuit, and any who let a specious devotion to "balance" overrule a duty to tell the truth.
(tags: south-wales wales mmr health vaccination scares journalism ethics disease measles south-wales-evening-post)
-
mostly a DynamoDB puff-piece from last week's Amazon Cloud Connect, but contains some good real-world figures for a 20-billion-GUID deduping table use-case at end. ($4,150 per month, to cut to the chase)
(tags: dynamodb aws figures costs architecture ec2 dedupe cloud-connect slides)
Excel, untestability, and the reliability of quants
Wow, this is a great software-quality story -- I knew Excel was the most widely used programming environment out there, but this is a factor I'd overlooked:
In his remarks on the final panel, Frank Partnoy mentioned something I missed when it came out a few weeks ago: the role of Microsoft Excel in the “London Whale” trading debacle. [..] To summarize: JPMorgan’s Chief Investment Office needed a new value-at-risk (VaR) model for the synthetic credit portfolio (the one that blew up) and assigned a quantitative whiz [...] to create it. The new model “operated through a series of Excel spreadsheets, which had to be completed manually, by a process of copying and pasting data from one spreadsheet to another.” The internal Model Review Group identified this problem as well as a few others, but approved the model, while saying that it should be automated and another significant flaw should be fixed. After the London Whale trade blew up, the Model Review Group discovered that the model had not been automated and found several other errors. Most spectacularly, “After subtracting the old rate from the new rate, the spreadsheet divided by their sum instead of their average, as the modeler had intended. This error likely had the effect of muting volatility by a factor of two and of lowering the VaR ...” I write periodically about the perils of bad software in the business world in general and the financial industry in particular, by which I usually mean back-end enterprise software that is poorly designed, insufficiently tested, and dangerously error-prone. But this is something different. [...] While Excel the program is reasonably robust, the spreadsheets that people create with Excel are incredibly fragile. There is no way to trace where your data come from, there’s no audit trail (so you can overtype numbers and not know it), and there’s no easy way to test spreadsheets, for starters. The biggest problem is that anyone can create Excel spreadsheets -- badly. Because it’s so easy to use, the creation of even important spreadsheets is not restricted to people who understand programming and do it in a methodical, well-documented way. This is why the JPMorgan VaR model is the rule, not the exception: manual data entry, manual copy-and-paste, and formula errors. This is another important reason why you should pause whenever you hear that banks’ quantitative experts are smarter than Einstein, or that sophisticated risk management technology can protect banks from blowing up. At the end of the day, it’s all software. While all software breaks occasionally, Excel spreadsheets break all the time. But they don’t tell you when they break: they just give you the wrong number.
(tags: excel reliability software coding ides jpmorgan value-at-risk finance london-whale quants spreadsheets unit-tests testability testing)
Riak, CAP, and eventual consistency
Good (albeit draft) write-up of the implications of CAP, allow_mult, and last_write_wins conflict-resolution policies in Riak:
As Brewer's CAP theorem established, distributed systems have to make hard choices. Network partition is inevitable. Hardware failure is inevitable. When a partition occurs, a well-behaved system must choose its behavior from a spectrum of options ranging from "stop accepting any writes until the outage is resolved" (thus maintaining absolute consistency) to "allow any writes and worry about consistency later" (to maximize availability). Riak leans toward the availability end of the spectrum, but allows the operator and even the developer to tune read and write requests to better meet the business needs for any given set of data.
(tags: riak cap eventual-consistency distcomp distributed-systems partition last-write-wins voldemort allow_mult)
How You Can Help Save Upcoming.org, Posterous, and More
Yahoo! sucks. shutting down in days? ArchiveTeam Warrior to the rescue; install the VM!
(tags: archival yahoo shutdowns upcoming waxy archives virtualbox)
The Excel Depression - NYTimes.com
Krugman on the Reinhart-Rogoff Excel-bug fiasco.
What the Reinhart-Rogoff affair shows is the extent to which austerity has been sold on false pretenses. For three years, the turn to austerity has been presented not as a choice but as a necessity. Economic research, austerity advocates insisted, showed that terrible things happen once debt exceeds 90 percent of G.D.P. But “economic research” showed no such thing; a couple of economists made that assertion, while many others disagreed. Policy makers abandoned the unemployed and turned to austerity because they wanted to, not because they had to. So will toppling Reinhart-Rogoff from its pedestal change anything? I’d like to think so. But I predict that the usual suspects will just find another dubious piece of economic analysis to canonize, and the depression will go on and on.
(tags: paul-krugman economics excel coding bugs software austerity debt)
Vaccination 'herd immunity' demonstration
'Stochastic monte-carlo epidemic SIR model to reveal herd immunity'. Fantastic demo of this important medical concept (via Colin Whittaker)
(tags: via:colinwh stochastic herd-immunity random sir epidemics health immunity vaccination measles medicine monte-carlo-simulations simulations)
Fred's ImageMagick Scripts: SIMILAR
compute an image-similarity metric, to discover mostly-identical-but-slightly-tweaked images:
SIMILAR computes the normalized cross correlation similarity metric between two equal dimensioned images. The normalized cross correlation metric measures how similar two images are, not how different they are. The range of ncc metric values is between 0 (dissimilar) and 1 (similar). If mode=g, then the two images will be converted to grayscale. If mode=rgb, then the two images first will be converted to colorspace=rgb. Next, the ncc similarity metric will be computed for each channel. Finally, they will be combined into an rms value.
(via Dan O'Neill)(tags: image photos pictures similar imagemagick via:dano metrics similarity)
-
a first-person game prototype in which players navigate a 3D space while picking up orbs that reduce the speed of light in increments. Custom-built, open-source relativistic graphics code allows the speed of light in the game to approach the player’s own maximum walking speed. Visual effects of special relativity gradually become apparent to the player, increasing the challenge of gameplay. These effects, rendered in realtime to vertex accuracy, include the Doppler effect (red- and blue-shifting of visible light, and the shifting of infrared and ultraviolet light into the visible spectrum); the searchlight effect (increased brightness in the direction of travel); time dilation (differences in the perceived passage of time from the player and the outside world); Lorentz transformation (warping of space at near-light speeds); and the runtime effect (the ability to see objects as they were in the past, due to the travel time of light). Players can choose to share their mastery and experience of the game through Twitter. A Slower Speed of Light combines accessible gameplay and a fantasy setting with theoretical and computational physics research to deliver an engaging and pedagogically rich experience.
Eventual Consistency Today: Limitations, Extensions, and Beyond - ACM Queue
Good overview of the current state of eventually-consistent data store research, covering CALM and CRDTs, from Peter Bailis and Ali Ghodsi
(tags: eventual-consistency data storage horizontal-scaling research distcomp distributed-systems via:martin-thompson crdts calm acid cap)
Latency's Worst Nightmare: Performance Tuning Tips and Tricks [slides]
the basics of running a service stack (web, app servers, data stores) on AWS. some good benchmark figures in the final slides
(tags: benchmarks aws ec2 ebs piops services scaling scalability presentations)
Rob "b3ta" Manuel in Dublin next week
The Bottom Half Of The Internet -- "Racism; typos; filth; spam; ignorance; rage – that's all the bottom half of the internet is good for, right? Rob Manuel wants you to question the internet dictum, most beloved of high-profile columnists, that you should ignore all of the comments all of the time. The 'war on comments', he reckons, might just be an echo of a fourth estate that's having trouble adjusting to the idea of an unwashed public disagreeing with their sacred opinions. Sous les pavés, la plage." On Tuesday, le cool Dublin & Pilcrow present SPIEL. Rob Manuel is the flashy animator behind B3ta and he's joined by Ed Melvin, who wants to educate you on 'The Unreal Engines' of virtual currencies and economies.
(tags: rob-manuel b3ta dublin comments internet meetings talks lecool)
Reality, Reactivity, Relevance and Repeatability in Java Application Profiling
this product from JInspired appears to support runtime profiling of java apps with < 5% performance impact
(tags: profiling performance java coding measurement)
You Lookin' At Me? Reflections on Google Glass
ex-Nokia product design guru Jan Chipchase on Google Glass
(tags: google privacy technology google-glass pervasive-computing life future)
Not the ‘best in the world’ - The Medical Independent
Debunking this prolife talking point:
'Our maternity services are amongst the best in the world’. This phrase has been much hackneyed since the heartbreaking death of Savita Halappanavar was revealed in mid October. James Reilly and other senior politicians are particularly guilty of citing this inaccurate position. So what is the state of Irish maternity services and how do our figures compare with other comparable countries? Let’s start with the statistics.
The bottom line:Eight deaths per 100,000 is not bad, but it ranks our maternity services far from the best in world and below countries such as Slovakia and Poland.
(tags: pro-choice ireland savita medicine health maternity morbidity statistics)
How Kaggle Is Changing How We Work - Thomas Goetz - The Atlantic
Founded in 2010, Kaggle is an online platform for data-mining and predictive-modeling competitions. A company arranges with Kaggle to post a dump of data with a proposed problem, and the site's community of computer scientists and mathematicians -- known these days as data scientists -- take on the task, posting proposed solutions. [...] On one level, of course, Kaggle is just another spin on crowdsourcing, tapping the global brain to solve a big problem. That stuff has been around for a decade or more, at least back to Wikipedia (or farther back, Linux, etc). And companies like TaskRabbit and oDesk have thrown jobs to the crowd for several years. But I think Kaggle, and other online labor markets, represent more than that, and I'll offer two arguments. First, Kaggle doesn't incorporate work from all levels of proficiency, professionals to amateurs. Participants are experts, and they aren't working for benevolent reasons alone: they want to win, and they want to get better to improve their chances of winning next time. Second, Kaggle doesn't just create the incidental work product, it creates a new marketplace for work, a deeper disruption in a professional field. Unlike traditional temp labor, these aren't bottom of the totem pole jobs. Kagglers are on top. And that disruption is what will kill Joy's Law. Because here's the thing: the Kaggle ranking has become an essential metric in the world of data science. Employers like American Express and the New York Times have begun listing a Kaggle rank as an essential qualification in their help wanted ads for data scientists. It's not just a merit badge for the coders; it's a more significant, more valuable, indicator of capability than our traditional benchmarks for proficiency or expertise. In other words, your Ivy League diploma and IBM resume don't matter so much as my Kaggle score. It's flipping the resume, where your work is measurable and metricized and your value in the marketplace is more valuable than the place you work.
(tags: academia datamining economics data kaggle data-science ranking work competition crowdsourcing contracting)
-
a good reference, with lots of sample output. Not clear if it takes 1.6/1.7 differences into account, though
Austerity policies founded on Excel typo
You've probably heard that countries with a high debt:GDP ratio suffer from slow economic growth. The specific number 90 percent has been invoked frequently. That's all thanks to a study conducted by Carmen Reinhardt and Kenneth Rogoff for their book This Time It's Different. But the results have been difficult for other researchers to replicate. Now three scholars at the University of Massachusetts have done so in "Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff" and they find that the Reinhart/Rogoff result is based on opportunistic exclusion of Commonwealth data in the late-1940s, a debatable premise about how to weight the data, and most of all a sloppy Excel coding error. Read Mike Konczal for the whole rundown, but I'll just focus on the spreadsheet part. At one point they set cell L51 equal to AVERAGE(L30:L44) when the correct procuedure was AVERAGE(L30:L49). By typing wrong, they accidentally left Denmark, Canada, Belgium, Austria, and Australia out of the average. When you run the math correctly "the average real GDP growth rate for countries carrying a public debt-to-GDP ratio of over 90 percent is actually 2.2 percent, not -0.1 percent."
(tags: austerity politics excel coding errors bugs spreadsheets economics economy)
Is Your MySQL Buffer Pool Warm? Make It Sweat!
How GroupOn are warming up a failover warm MySQL spare, using Percona stuff and a "tee" of the live in-flight queries. (via Dave Doran)
(tags: via:dave-doran mysql databases warm-spares spares failover groupon percona replication)
So now you know who gets some of those excessive Ticketmaster fees….
Interesting evidence; it appears Irish music promoters are getting "rebates" from the massive TicketMaster "booking fee", on each ticket sold. This sounds like a cartel to me, and we need to regulate this. Where is the National Consumer Agency and Competition Authority?
The matter is something which should be of concern to every gig-going music fan, regardless of whether they go to Stradbally or not. For years, many have asked about TicketMaster's quasi-monopoly position in the marketplace and why this is so. We’ve always been told that promoters preferred to deal with one company rather than several and that TM’s systems and nationwide reach yadda yadda yadda was the bees’ knees etc. Other companies have tried to compete but no-one has been able to beat TM at this game. But why would promoters go elsewhere when they’re getting a slice of the TM fees back as rebates? Those past off-the-record attempts by and briefings from promoters blaming TM for those fees can now be seen as hypocritical. They’re sticking with TM because they’re receiving a take of the fees paid by punters who have no other choice in service provider if they want to get their hands on tickets. You wonder what the acts make of this cash-grab – perhaps some whip-smart agent is already making a claim for a percentage of the rebates because there would be no rebates in the first place without the act. Surely this is an issue for the Competition Authority and National Consumers Association too, given the manner in which the rebates are made and TM’s deals with the promoters? While promoters under TM deals are free to sell a certain proportion of their tickets with another provider, it’s usually only a very small percentage of the total and unlikely to trouble TM’s bottom line. Also, given that the rebates are volume-driven, it’s better for the promoters to keep the largest possible chunk of their business with TM. It seems that we have a new suspect in the blame game about why ticket prices are so high.
(tags: regulation ireland cartels competition ticketing tickets ticketmaster music gigs consumer)
Blog shines spotlight on Dublin city’s illegal dumping problem
Hooray, Eoin's activism gets some coverage!
THE SCALE OF Dublin’s dumping problem is laid bare in a blog that has seen contributors send in photos of chairs, fridges and heaps of rubbish strewn on city streets. Eoin Parker, one of organisers behind DublinLitterBlog.com, spoke to TheJournal.ie about the problem, saying that the blog was set up following the privatisation of waste management by Dublin City Council in 2012.
(tags: dumping dublin litter rubbish blogs dcc d1 activism community)
-
To our knowledge, Ked is the first scripting language to emerge from The People's Republic of Cork. Below is an account of what we know so far about the mysterious Corkonian language. Any suggested updates or contributions are encouraged.
Genius. Just how bad are RTE’s finances?
A sobering examination by NAMAwinelake into the quagmire of Ireland's publicly-funded national broadcaster:
It seems that RTE has become a disaster zone, with libels and incompetence overseen by incapable management, and this is reflected in that organisation’s financial results. RTE still employs nearly 2,000 people and supports jobs and industry across independent producers and suppliers; it is a major business. But the time has come to call a halt to delusional management that is sinking the organization deeper into a quagmire which will ultimately need to be bailed out by the State. And Noel Curran is fobbing us off with flying a kite about a reduction in 65-year old Pat Kenny’s salary from €630,000 to €570,000?!
(tags: rte namawinelake public funding finances money mismanagement ireland incompetence tv news)
High Scalability - Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
wow, Pinterest have a pretty hardcore architecture. Sharding to the max. This is scary stuff for me:
a [Cassandra-style] Cluster Management Algorithm is a SPOF. If there’s a bug it impacts every node. This took them down 4 times.
yeah, so, eek ;)(tags: clustering sharding architecture aws scalability scaling pinterest via:matt-sergeant redis mysql memcached)
Expert in Savita inquiry confirms Irish women get lower standard of care with chorioamnionitis
Dr. Jen Gunter again:
Dr. Knowles’ testimony confirms for me that the law played a role, because her statements indicate the standard of care for treatment of chorioamnionitis is less aggressive in Ireland. This can only be because of the law as there is no medical evidence to support delaying delivery when chorioamnionitis is diagnosed. Standard of care is not to wait until a woman is sick enough to need a termination, the idea is to treat her, you know, before she gets sick enough. An elevated white count and ruptured membranes at 17 weeks is typically enough to make the diagnosis, so Dr. Knowles needs to testify as to what in Savita’s medical record made it safe to not recommend a delivery. By the way, I also disagree with Dr. Knowles about her interpretation of Savita’s medical record, the chart doesn’t have “subtle indicators” of infection, it screams chorioamnionitis long before Wednesday morning. In North America the standard of care with chorioamnionitis is to recommend delivery as soon as the diagnosis is made, not wait until women enter the antechamber of death in the hopes that we can somehow snatch them back from the brink. If Irish law, or the interpretation thereof, had nothing to do with Savita’s death no expert would be mentioning sick enough at all.
(tags: jen-gunter ob-gyn medicine savita law ireland abortion tragedy galway hospital)
Boundary Product Update: Trends Dashboard Now Available
Boundary implement week-on-week trend display. Pity they use silly "giant number" dashboard boxes showing comparisons of the current datapoint with the previous week's datapoint; there's no indication of smoothing being applied, and "giant number" dashboards are basically useless anyway compared to a time-series graph, for unsmoothed time-series data. Also, no prediction bands. :(
(tags: boundary time-series tsd prediction metrics smoothing dataviz dashboards)
ESB Networks | Power Check | Service Interruptions Map
real-time service outage information on a map, from Ireland's power network
Project Voldemort at Gilt Groupe: When Failure Isn't an Option [slides]
Geir Magnusson explains how Gilt Groupe is using Project Voldemort to scale out their e-commerce transactional system. The initial SQL solution had to be replaced because it could not handle the transactional spikes the site is experiencing daily due to its particular way of selling their inventory: each day at noon. Magnusson explains why they chose Voldemort and talks about the architecture.
via Filippo(tags: via:filippo database architecture nosql data voldemort gilt-groupe ops storage presentations)
The full timeline of Savita Halappanavar's mistreatment
a comment on Dr. Jen Gunter's blog puts it all together
(tags: timeline savita abortion malpractice ireland medicine fail)
-
No holds barred:
Speaking today, spokesman Charles Stanley-Smith said; "This idea is insane. This area has suffered from dumping due to a lack of enforcement - yet the council now propose to effectively withdraw services altogether. As numerous studies such as 'the broken window hypothesis' indicate, where a small problem is left un-tackled it is likely to become far worse rather than better. In other words, rather than increase enforcement to solve the problem, Dublin City Council is going to remove enforcement. How will this deal with the problem? Imagine if that logic were applied to crime; would the removal of police services in an area help resolve criminal behaviour - or increase it? The answer is obvious."
(tags: an-taisce environment cleaning dublin ireland dcc rubbish trash society d1)
-
Written by Google, this library is a flexible, efficient, and powerful Java client library for accessing any resource on the web via HTTP. It features a pluggable HTTP transport abstraction that allows any low-level library to be used, such as java.net.HttpURLConnection, Apache HTTP Client, or URL Fetch on Google App Engine. It also features efficient JSON and XML data models for parsing and serialization of HTTP response and request content. The JSON and XML libraries are also fully pluggable, including support for Jackson and Android's GSON libraries for JSON.
Not quite as simple an API as Python's requests, sadly, but still an improvement on the verbose Apache HttpComponent API. Good support for unit testing via a built-in mock-response class. Still in beta(tags: google beta software http libraries json xml transports protocols)
Former IMF chief of mission to Ireland says not burning the bondholders was "a mistake"
Former IMF chief of mission to Ireland, Ashoka Mody, above left with Ajai Chopra in 2010. Melancholy of eye and large of loafer, Ashoka was involved in negotiating Ireland’s EU/IMF bailout. [...] This morning Ashok gave an interview to Gavin Jennings on Morning Ireland, in which he admitted Ireland’s bailout was riddled with mistakes, namely the non-burning of the senior bondholders and the program of austerity. Jennings: “So, if imposing austerity on Ireland was wrong, or a mistake; if not allowing any burning of bondholders, whether official, sovereign or private was a mistake; you were centrally involved in that program. I know Ajai Chopra was very much the public face of the IMF mission to Ireland. But you were centrally involved in constructing this bailout. How much responsibility do you take for those errors.” Mody: “Yes, so, obviously, I have to take the responsibility in…but I’m in very good company in taking responsibility in this. There were many parties involved. And my role really was to bring such matters to the attention of people who finally made these decisions.”
Great.(tags: bondholders imf ireland economy default ajai-chopra ashoka-mody)
Savita Halappanavar’s inquest: the three questions that must be answered | Dr. Jen Gunter
A professional OB/GYN analyses the horrors coming to light in the Savita inquest. Here's one particular gem:
Fetal survival with ruptured membranes at 17 weeks is 0%, this is from prospective study. [...but] “real and substantial risk” to the woman’s life is what is required by the Irish constitution to terminate a pregnancy, *whether or not the foetus is viable*.
So the foetus had 0% chance of survival -- but still termination was not considered an option. Bloody hell.(tags: religion ireland savita horrors malpractice galway guh hospitals hse health inquest abortion pro-choice pregnancy)
Minister Rabbitte welcomes EU agreement on re-use of Public Sector Information
Lots of talk about "charging regimes", "income-generating public sector bodies" etc., but not a single mention of open data or free access. Terrible stuff. :( (via conoro)
(tags: via:conoro open-access government public-sector ireland eu open-data public free)
Compression in Kafka: GZIP or Snappy ?
With Ack: in this mode, as far as compression is concerned, the data gets compressed at the producer, decompressed and compressed on the broker before it sends the ack to the producer. The producer throughput with Snappy compression was roughly 22.3MB/s as compared to 8.9MB/s of the GZIP producer. Producer throughput is 150% higher with Snappy as compared to GZIP. No ack, similar to Kafka 0.7 behavior: In this mode, the data gets compressed at the producer and it doesn’t wait for the ack from the broker. The producer throughput with Snappy compression was roughly 60.8MB/s as compared to 18.5MB/s of the GZIP producer. Producer throughput is 228% higher with Snappy as compared to GZIP. The higher compression savings in this test are due to the fact that the producer does not wait for the leader to re-compress and append the data; it simply compresses messages and fires away. Since Snappy has very high compression speed and low CPU usage, a single producer is able to compress the same amount of messages much faster as compared to GZIP.
The Bw-Tree: A B-tree for New Hardware - Microsoft Research
The emergence of new hardware and platforms has led to reconsideration of how data management systems are designed. However, certain basic functions such as key indexed access to records remain essential. While we exploit the common architectural layering of prior systems, we make radically new design decisions about each layer. Our new form of B tree, called the Bw-tree achieves its very high performance via a latch-free approach that effectively exploits the processor caches of modern multi-core chips. Our storage manager uses a unique form of log structuring that blurs the distinction between a page and a record store and works well with flash storage. This paper describes the architecture and algorithms for the Bw-tree, focusing on the main memory aspects. The paper includes results of our experiments that demonstrate that this fresh approach produces outstanding performance.
(tags: bw-trees database paper toread research algorithms microsoft sql sql-server b-trees data-structures storage cache-friendly mechanical-sympathy)
Boundary Techtalk - Large-scale OLAP with Kobayashi
Boundary on their TSD-on-Riak store.
Dietrich Featherston, Engineer at Boundary, walks through the process of designing Kobayashi, the time-series analytics database behind our network metrics. He goes through the false-starts and lessons learned in effectively using Riak as the storage layer for a large-scale OLAP database. The system is ultimately capable of answering complex, ad-hoc queries at interactive latencies.
(tags: video boundary tsd riak eventual-consistency storage kobayashi olap time-series)
-
A few days old, but already an instant Streisand-Effect classic:
Sometimes people borrow [Colin Purrington's free guide about making scientific posters] without giving him credit. This happens fairly regularly, and when he finds out about it, he sends an e-mail asking them to take it down. Usually they do. But when he sent an e-mail to the Consortium for Plant Biotechnology Research, asking that a roughly 1,200-word, near-verbatim, uncredited chunk from his guide be removed from the consortium’s materials, the response was unexpected. Rather than apologise, a lawyer sent him a cease-and-desist letter accusing him of plagiarizing the consortium’s materials and demanding that he take down his guide or face a lawsuit seeking damages up to $150,000.
(tags: streisand-effect lawsuits law infringement copyright cpbr bullying science posters)
Kafka 0.8 Producer Performance
Great benchmarking from Piotr Kozikowski at the LiveRamp team, into performance of the upcoming Kafka 0.8 release
(tags: performance kafka apache benchmarks ops queueing)
Running a Multi-Broker Apache Kafka 0.8 Cluster on a Single Node
an excellent writeup on Kafka 0.8's use and operation, including details of the new replication features
(tags: kafka replication queueing distributed ops)
-
'A €10 silver coin being offered for sale to the public in honour of James Joyce by the Central Bank tomorrow contains a misquote from the author. The line used on the coin from Chapter 3 of Ulysses includes a superfluous conjunction – a rogue ‘that’.' [..] The coin reads:
“Ineluctable modality of the visible: at least that if no more, thought through my eyes. Signatures of all things *that* I am here to read.”
(Incorrect 'that' emphasised)(tags: for:robotwisdom james-joyce typos funny fail central-bank ireland coins minting errors ulysses)
Netflix ISP Speed Index for Ireland
Via Mulley. Magnet doing well, with UPC coming second; UPC have dropped a fair bit in the past month. Would love to see it broken down by region...
(tags: upc ireland isps speed bandwidth netflix broadband magnet eircom)
Why I'm Walking Away From CouchDB
In practice there are two gotchas that are so painful I am looking for a replacement with a different featureset than couchdb provides. The location tracking project icecondor.com uses couchdb to store 20,000 new records per day. It has more write traffic than read traffic and runs on modest hardware. Those two gotchas are: 1. View Index updates. While I have a vague understanding of why view index updates are slow and bulky and important, in practice it is unworkable. Every write sets up a trap for the first reader to come along after the write. The more writes there are, the bigger the trap for the first reader which has to wait on the couchdb process that refreshes the view index on an as-needed basis. I believe this trade-off was made to keep writes fast. No need to update the view index until all writes are actually complete, right? Write traffic is heavier than read traffic and the time needed for that index refresh causes the webapp to crash because its not setup to handle timeouts from a database query. The workaround is as hackish as one can imagine - cron jobs to hit every map/reduce query to keep indexes fresh. 2. Append only database file Append only is in theory a great way to ensure on-disk reliability. A system crash during an append should only affect that append. Its a crash during an update to existing parts of the file that risks the integrity of more than whats being updated. With so many layers of caching and optimizations in the kernel and the filesystem and now in the workings of SSD drives, I'm not sure append-only gives extra protection anymore. What it does do is a create a huge operational headache. The on-disk file can never grow beyond half the available storage space. Record deletion uses new disk space and if the half-full mark approaches, vacuuming must be done. The entire database is rewritten to the filesystem, leaving out no longer needed records. If the data file should happen to grow beyond half the partition, the system has esentially crashed because there is no way to compact the file and soon the partition will be full. This is a likely scenario when there is a lot of record deletion activity. The system in question does a lot of writes of temporary data that is followed up by deletes a few days later. There is also a lot of permanent storage that hardly gets used. Rewriting every byte of the records that are long-lived due to compaction is an enormous amount of wasted I/O - doubly so given SSD drives have a short write-cycle lifespan.
(tags: nosql couchdb consistency checkpointing databases data-stores indexing)
CouchDB: not drinking the kool-aid
Jonathan Ellis on some CouchDB negatives:
Here are some reasons you should think twice and do careful testing before using CouchDB in a non-toy project: Writes are serialized. Not serialized as in the isolation level, serialized as in there can only be one write active at a time. Want to spread writes across multiple disks? Sorry. CouchDB uses a MVCC model, which means that updates and deletes need to be compacted for the space to be made available to new writes. Just like PostgreSQL, only without the man-years of effort to make vacuum hurt less. CouchDB is simple. Gloriously simple. Why is that a negative? It's competing with systems (in the popular imagination, if not in its author's mind) that have been maturing for years. The reason PostgreSQL et al have those features is because people want them. And if you don't, you should at least ask a DBA with a few years of non-MySQL experience what you'll be missing. The majority of CouchDB fans don't appear to really understand what a good relational database gives them, just as a lot of PHP programmers don't get what the big deal is with namespaces. A special case of simplicity deserves mention: nontrivial queries must be created as a view with mapreduce. MapReduce is a great approach to trivially parallelizing certain classes of problem. The problem is, it's tedious and error-prone to write raw MapReduce code. This is why Google and Yahoo have both created high-level languages on top of it (Sawzall and Pig, respectively). Poor SQL; even with DSLs being the new hotness, people forget that SQL is one of the original domain-specific languages. It's a little verbose, and you might be bored with it, but it's much better than writing low-level mapreduce code.
(tags: cassandra couch nosql storage distributed databases consistency)
What is the CouchDB replication protocol? Is it like Git? - Stack Overflow
Good write up of CouchDB replication
(tags: protocols couchdb sync replication git mvcc databases merging timelines)
TouchDB's reverse-engineered write-up of the Couch replication protocol
There really isn’t a separate “protocol” per se for replication. Instead, replication uses CouchDB’s REST API and data model. It’s therefore a bit difficult to talk about replication independently of the rest of CouchDB. In this document I’ll focus on the algorithm used, and link to documentation of the APIs it invokes. The “protocol” is simply the set of those APIs operating over HTTP.
(tags: couchdb protocols touchdb nosql replication sync mvcc revisions rest)
-
A good writeup of how to detect cases of copyright infringement for photography, art and other visual media.
Von Glitschka, Modern Dog and myriad others make clear that the support of the creative community is absolutely vital in raising awareness of copyright infringements. Sites like www.youthoughtwewouldntnotice.com name and shame clear breaches of copyright, while the Modern Dog case shows that there is no better IP tracing system than the eyes and ears of the design community itself. “It’s the industry at large that has kept me aware of infringements,” states Von. “Without that I would miss most of them because I don’t go looking – they find me via the eyes of others.”
(tags: photography art visual-media copyright infringement piracy ripping)
FastBit: An Efficient Compressed Bitmap Index Technology
an [LGPL] open-source data processing library following the spirit of NoSQL movement. It offers a set of searching functions supported by compressed bitmap indexes. It treats user data in the column-oriented manner similar to well-known database management systems such as Sybase IQ, MonetDB, and Vertica. It is designed to accelerate user's data selection tasks without imposing undue requirements. In particular, the user data is NOT required to be under the control of FastBit software, which allows the user to continue to use their existing data analysis tools. The key technology underlying the FastBit software is a set of compressed bitmap indexes. In database systems, an index is a data structure to accelerate data accesses and reduce the query response time. Most of the commonly used indexes are variants of the B-tree, such as B+-tree and B*-tree. FastBit implements a set of alternative indexes called compressed bitmap indexes. Compared with B-tree variants, these indexes provide very efficient searching and retrieval operations, but are somewhat slower to update after a modification of an individual record. A key innovation in FastBit is the Word-Aligned Hybrid compression (WAH) for the bitmaps.[...] Another innovation in FastBit is the multi-level bitmap encoding methods.
(tags: fastbit nosql algorithms indexing search compressed-bitmaps indexes wah bitmaps compression)
-
The bit array data structure is implemented in Java as the BitSet class. Unfortunately, this fails to scale without compression. JavaEWAH is a word-aligned compressed variant of the Java bitset class. It uses a 64-bit run-length encoding (RLE) compression scheme. We trade-off some compression for better processing speed. We also have a 32-bit version which compresses better, but is not as fast. In general, the goal of word-aligned compression is not to achieve the best compression, but rather to improve query processing time. Hence, we try to save CPU cycles, maybe at the expense of storage. However, the EWAH scheme we implemented is always more efficient storage-wise than an uncompressed bitmap (as implemented in the BitSet class). Unlike some alternatives, javaewah does not rely on a patented scheme.
(tags: javaewah wah rle compression bitmaps bitmap-indexes bitset algorithms data-structures)
Measure Anything, Measure Everything « Code as Craft
the classic Etsy pro-metrics "measure everything" post. Some good basic rules and mindset
Testing Your Automation [slides]
Test-driven infrastructure, using Chef -- slides from Big Ruby 2013. Tools used: foodcritic (lol), Chefspec, minitest-chef-handler, fauxhai, cucumber chef. This is really good to see -- TDD applied to ops. Video at: http://confreaks.com/videos/2309-bigruby2013-testing-your-automation-ttd-for-chef-cookbooks
(tags: devops ops chef automation testing tdd infrastructure provisioning deployment)
Meet the nice-guy lawyers who want $1,000 per worker for using scanners | Ars Technica
Great investigative journalism, interviewing the legal team behind the current big patent-troll shakedown; that on scanning documents with a button press, using a scanner attached to a network. They express whole-hearted belief in the legality of their actions, unsurprisingly -- they're exactly what you think they'd be like (via Nelson)
(tags: via:nelson ethics business legal patents swpats patent-trolls texas shakedown)
[#HADOOP-9448] Reimplement things - ASF JIRA
Pretty good April Fools from this year -- a patch to delete the entirety of Hadoop's codebase:
To avoid any bias to the existing code and make the same mistakes we should just delete trunk completely. Attached it is a script that deletes everything.
(tags: hadoop april-fools asf patches open-source oss)
Lucas Nussbaum’s Blog » Blog Archive » RVM: seriously?
+1. RVM is atrocious code -- some of the worst bash script I've seen. And it's not just installing as a command, it requires that it be sourced and hooks into your login shell. If you then use "set -e", it crashes; "set -u", it crashes; reset $HOME, crash. It's dire.
-
Next April 11th, at the IIEA in North Gt Georges St:
Rick Falkvinge, founder of the Swedish Pirate Party, will examine the case for reform of copyright and patent law in the EU. Legalised file sharing, free sampling and shortened copyright protection times are the main elements of a proposal co-authored by Mr. Falkvinge which was submitted to the European Parliament in 2012. He will question whether, in the context of ever-increasing online activity, existing legal frameworks pose a threat to users’ civil liberties.
(tags: rick-falkvinge pirate-party ireland iiea dublin copyright patents filesharing)
High Performance MongoDB Clusters with Amazon EBS Provisioned IOPS
yeah yeah, Mongo. bookmarking for the good data on EBS+PIOPS
(tags: ebs piops aws performance tips ops ec2 mongodb presentations)
-
These notes are intended to help users and system administrators maximize TCP/IP performance on their computer systems. They summarize all of the end-system (computer system) network tuning issues including a tutorial on TCP tuning, easy configuration checks for non-experts, and a repository of operating system specific instructions for getting the best possible network performance on these platforms.
Some tips for maximizing HPC network performance for the intra-DC case; recommended by the LinkedIn Kafka operations page.(tags: tuning network tcp sysadmin performance ops kafka ec2)
Increasing EBS Performance - Amazon Elastic Compute Cloud
good docs from EC2
(tags: ec2 ebs performance piops docs)
-
an open source virtualized Ethernet networking stack. I am developing Snabb Switch in response to several exciting trends: x86 has risen to be a powerful networking platform. Virtualization and SDN are pulling more networking into servers. Optimized user-space software is out-performing kernel-space software. Snabb Switch's simple and fast software-only data plane makes developing networking software easier than ever before.
Written in LuaJIT but aiming to be very fast. cool stuff, worth watching(tags: sdn software networking emulation snabb-switch luajit lua virtualization)
Abusing hash kernels for wildly unprincipled machine learning
what, is this the first time our spam filtering approach of hashing a giant feature space is hitting mainstream machine learning? that can't be right!
(tags: ai machine-learning python data hashing features feature-selection anti-spam spamassassin)
-
Joel On Software weighs in (via Tony Finch):
The fastest growing industry in the US right now, even during this time of slow economic growth, is probably the patent troll protection racket industry.
(tags: joel-on-software patents swpats shakedown extortion us-politics patent-trolls via:fanf)
-
Cap’n Proto is an insanely fast data interchange format and capability-based RPC system. Think JSON, except binary. Or think Protocol Buffers, except faster. In fact, in benchmarks, Cap’n Proto is INFINITY TIMES faster than Protocol Buffers.
Basically, marshalling like writing an aligned C struct to the wire, QNX messaging protocol-style. Wasteful on space, but responds to this by suggesting compression (which is a fair point tbh). C++-only for now. I'm not seeing the same kind of support for optional data that protobufs has though. Overall I'm worried there's some useful features being omitted here...(tags: serialization formats protobufs capn-proto protocols coding c++ rpc qnx messaging compression compatibility interoperability i14y)
CRDTs - Commutative Replicated Data Types [pdf]
Shared read-only data is easy to scale by using well-understood replication techniques. However, sharing mutable data at a large scale is a dicult problem, because of the CAP impossibility result [5]. Two approaches dominate in practice. One ensures scalability by giving up consistency guarantees, for instance using the Last-Writer-Wins (LWW) approach [7]. The alternative guarantees consistency by serialising all updates, which does not scale beyond a small cluster [12]. Optimistic replication allows replicas to diverge, eventually resolving conflicts either by LWW-like methods or by serialisation [11]. In some (limited) cases, a radical simplication is possible. If concurrent updates to some datum commute, and all of its replicas execute all updates in causal order, then the replicas converge.1 We call this a Commutative Replicated Data Type (CRDT). The CRDT approach ensures that there are no conflicts, hence, no need for consensus-based concurrency control. CRDTs are not a universal solution, but, perhaps surprisingly, we were able to design highly useful CRDTs. This new research direction is promising as it ensures consistency in the large scale at a low cost, at least for some applications.
(tags: consistency algorithms concurrency crdts distcomp data)
-
'The CRDT toolbox provides a collection of basic Conflict-free replicated data types as well as a common interface for defining your own CRDTs'. - in Eric Moritz' github. Also includes some more links to CRDT background reading.
(tags: crdt github eric-moritz python algorithms)
Eventually-Consistent Data Structures [slides]
implementing CRDTs in Riak and Voldemort
(tags: crdt algorithms distcomp riak voldemort distributed)
-
What do you get if you take one accountant with "a fondness for spreadsheets, finance and business" and mix with "a life-long passion for video games"? Well it's obvious isn't it? A turn-based RPG made and played entirely in Microsoft Excel.
(via Paul Moloney)(tags: via:oceanclub arena.xlsm excel spreadsheets games gaming rpg)
serverspec - unit tests for servers
With serverspec, you can write RSpec tests for checking your servers are provisioned correctly. Serverspec tests your servers' actual state through SSH access, so you don't need to install any agent softwares on your servers and can use any provisioning tools, Puppet, Chef, CFEngine and so on.
(via Dave Doran)(tags: via:dave-doran puppet testing chef cfengine unit-testing ops provisioning serverspec rspec ruby)
joshua's blog: overclocking the lecture
Joshua's old tip on watching videos at 2x speed using Perian
(tags: quicktime video hacks mac speed lectures presentations learning)
-
This seems pretty significant. Is the tide turning in the Texas Eastern District against patent trolls, at last? And does it establish sufficient precedent?
A federal judge has thrown out a patent claim against Rackspace, ruling that mathematical algorithms can’t be patented. The ruling in the Eastern Disrict stemmed from a 2012 complaint filed by Uniloc USA asserting that processing of floating point numbers by the Linux operating system was a patent violation. Chief Judge Leonard Davis based the ruling on U.S. Supreme Court case law that prohibits the patenting of mathematical algorithms. According to Rackspace, this is the first reported instance in which the Eastern District of Texas has granted an early motion to dismiss finding a patent invalid because it claimed unpatentable subject matter. Red Hat, which supplies Linux to Rackspace, provided Rackspace’s defense. Red Hat has a policy of standing behind customers through its Open Source Assurance program.
See https://news.ycombinator.com/item?id=5455869 for more discussion.(tags: east-texas patents swpats maths patenting law judges rackspace linux red-hat uniloc-usa floating-point)
Introducing Chronos: A Replacement for Cron
A distributed, fault-tolerant "cron" is something which comes up frequently -- it makes for a great fault-tolerance building block. This one sounds like it's too closely tied into Mesos, though (IMO).
Chronos is our replacement for cron. It is a distributed and fault-tolerant scheduler which runs on top of Mesos. It's a framework and supports custom mesos executors as well as the default command executor. Thus by default, Chronos executes SH (on most systems BASH) scripts. Chronos can be used to interact with systems such as Hadoop (incl. EMR), even if the mesos slaves on which execution happens do not have Hadoop installed. Included wrapper scripts allow transfering files and executing them on a remote machine in the background and using asynchroneous callbacks to notify Chronos of job completion or failures.
(tags: cron scheduling mesos stacks design airbnb chronos fault-tolerance distcomp distributed-computing scripts jobs)
One of CloudFlare's upstream providers on the "death of the internet" scare-mongering
Having a bad day on the Internet is nothing new. These are the types of events we deal with on a regular basis, and most large network operators are very good at responding quickly to deal with situations like this. In our case, we worked with Cloudflare to quickly identify the attack profile, rolled out global filters on our network to limit the attack traffic without adversely impacting legitimate users, and worked with our other partner networks (like NTT) to do the same. If the attacks had stopped here, nobody in the "mainstream media" would have noticed, and it would have been just another fun day for a few geeks on the Internet. The next part is where things got interesting, and is the part that nobody outside of extremely technical circles has actually bothered to try and understand yet. After attacking Cloudflare and their upstream Internet providers directly stopped having the desired effect, the attackers turned to any other interconnection point they could find, and stumbled upon Internet Exchange Points like LINX (in London), AMS-IX (in Amsterdam), and DEC-IX (in Frankfurt), three of the largest IXPs in the world. An IXP is an "interconnection fabric", or essentially just a large switched LAN, which acts as a common meeting point for different networks to connect and exchange traffic with each other. One downside to the way this architecture works is that there is a single big IP block used at each of these IXPs, where every network who interconnects is given 1 IP address, and this IP block CAN be globally routable. When the attackers stumbled upon this, probably by accident, it resulted in a lot of bogus traffic being injected into the IXP fabrics in an unusual way, until the IXP operators were able to work with everyone to make certain the IXP IP blocks weren't being globally re-advertised. Note that the vast majority of global Internet traffic does NOT travel over IXPs, but rather goes via direct private interconnections between specific networks. The IXP traffic represents more of the "long tail" of Internet traffic exchange, a larger number of smaller networks, which collectively still adds up to be a pretty big chunk of traffic. So, what you actually saw in this attack was a larger number of smaller networks being affected by something which was an completely unrelated and unintended side-effect of the actual attacks, and thus *poof* you have the recipe for a lot of people talking about it. :) Hopefully that clears up a bit of the situation.
(tags: bandwidth internet gizmodo traffic cloudflare ddos hacking)
21 graphs that show America’s health-care prices are ludicrous
Excellent data, this. I'd heard a few of these prices, but these graphs really hit home. $26k for a caesarean section at the 95th percentile!? talk about out of control price gouging.
(tags: healthcare costs economics us-politics world comparison graphs charts data via:hn america)
Design for developers [presentation]
A nice set of practical web/UI/tpyography design guidelines, naming specific sources (via Rob C)
-
'13 Security Gotchas You Should Know About'