“Clickwrap” licensing established as legal in Irish court
“The evidence does establish that there is a practice in the airline and online travel agency sectors of contractually binding web users by click wrapping or browse wrapping, which practice is generally and regularly followed by the operators in those sectors. In reality, it is difficult to see how online trade could be carried on in the absence of those devices. As regards the third question which arises from the MSG decision, in this case it is whether the defendant was aware or is presumed to have been aware of the practice. The evidence before the Court, in my view, clearly demonstrates that the defendant was aware of the practice, it being a practice which is generally and regularly followed when making bookings with online travel agents and with airlines and which, in the words of the Court in the MSG case, may be regarded as being a consolidated practice. Accordingly, in my view, by application of Article 23(1)(c), the defendant is bound by the jurisdiction clause in the Terms of Use on the plaintiff’s website by its use, either through the medium of an automaton or a manual operator or a third party data provider, of the website.”
(via Rossa McMahon)
Justin's Linklog Posts
Functional Reactive Programming in the Netflix API with RxJava
Hmm, this seems nifty as a compositional building block for Java code to enable concurrency without thread-safety and sync problems.
Functional reactive programming offers efficient execution and composition by providing a collection of operators capable of filtering, selecting, transforming, combining and composing Observable’s. The Observable data type can be thought of as a “push” equivalent to Iterable which is “pull”. With an Iterable, the consumer pulls values from the producer and the thread blocks until those values arrive. By contrast with the Observable type, the producer pushes values to the consumer whenever values are available. This approach is more flexible, because values can arrive synchronously or asynchronously.
(tags: concurrency java jvm threads thread-safety coding rx frp fp functional-programming reactive functional async observable)
You probably shouldn’t use a spreadsheet for important work
Daniel Lemire comments on the recent cases of bugs in spreadsheets causing major impact:
There are several critical problems with a tool like Excel that need to be widely known: * Spreadsheets do not support testing. For anything that matters, you should validate and test your code automatically and systematically; * Spreadsheets make code reviews impractical. To visually inspect the code, you need to click and each and every cell. In practice, this means that you cannot reasonably ask someone to read over your formulas to make sure that there is no mistake; * Spreadsheets encourage redundancies. Spreadsheets encourage copy-and-paste. Though copying and pasting is sometimes the right tool, it also creates redundancies. These redundancies make it very difficult to update a spreadsheet: are you absolutely sure that you have changed the formula throughout?
Agreed on all three, particularly on the impossibility of testing. IMO, everyone who may be in a job where automation via spreadsheet is likely, needs training in SDE fundamentals: unit testing, the important of open source and open data for reproducibility, version control, and code review. We are all computer scientists now.(tags: spreadsheets excel coding errors bugs testability unit-testing testing quality sde sde-fundamentals dry)
Log4j2 Asynchronous Loggers for Low-Latency Logging – Apache Log4j 2
implemented using the LMAX Disruptor library — very impressive performance figures. I presume in real-world usage, these latencies are dwarfed by hardware costs, though
(tags: disruptor coding java log4j logging async performance)
-
Google Drive and GMail have a built-in scripting engine. I had no idea
(tags: gmail evernote archival scripting coding hacks google-drive)
-
How the Irish media are partly to blame for the catastrophic property bubble, from a paper entitled _The Role Of The Media In Propping Up Ireland’s Housing Bubble_, by Dr Julien Mercille, in the _Social Europe Journal_:
“The overall argument is that the Irish media are part and parcel of the political and corporate establishment, and as such the news they convey tend to reflect those sectors’ interests and views. In particular, the Celtic Tiger years involved the financialisation of the economy and a large property bubble, all of it wrapped in an implicit neoliberal ideology. The media, embedded within this particular political economy and itself a constitutive element of it, thus mostly presented stories sustaining it. In particular, news organisations acquired direct stakes in an inflated real estate market by purchasing property websites and receiving vital advertising revenue from the real estate sector. Moreover, a number of their board members were current or former high officials in the finance industry and government, including banks deeply involved in the bubble’s expansion.”
(tags: economics irish-times ireland newspapers media elite insiders bubble property-bubble property celtic-tiger papers news bias)
-
Ugh. low-end ISPs MITM’ing DNS queries:
Some ISP’s are now using a technology called ‘Transparent DNS proxy’. Using this technology, they will intercept all DNS lookup requests (TCP/UDP port 53) and transparently proxy the results. This effectively forces you to use their DNS service for all DNS lookups. If you have changed your DNS settings to an open DNS service such as Google, Comodo or OpenDNS expecting that your DNS traffic is no longer being sent to your ISP’s DNS server, you may be surprised to find out that they are using transparent DNS proxying.
(via Nelson) BitTorrent’s Secure Dropbox Alternative Goes Public
As kragen says, ‘a decentralized way to sync a folder of large files, using BitTorrent instead of an untrustworthy central server’. Windows, OSX, and Linux supported
(tags: bittorrent dropbox cloud storage filesharing sharing sync synchronization)
DataSift Architecture: Realtime Datamining at 120,000 Tweets Per Second
250 million tweets per day, 30-node HBase cluster, 400TB of storage, Kafka and 0mq. This is from 2011, hence this dated line: ‘for a distributed application they thought AWS was too limited, especially in the network. AWS doesn’t do well when nodes are connected together and they need to talk to each other. Not low enough latency network. Their customers care about latency.’ (Nowadays, it would be damn hard to build a lower-latency network than that attached to a cc2.8xlarge instance.)
(tags: datasift architecture scalability data twitter firehose hbase kafka zeromq)
Breaking the 1000 ms Time to Glass Mobile Barrier [slides]
Great presentation from Google on HTML5 CSS+JS render speed, 3G/4G network latency, etc. (via John G)
(tags: google slides 3g 4g lte networking telcos telecom css js html5 web via:jg)
Lucene 4 – Revisiting Problems For Speed [slides]
a Presentation from Simon Willnauer on optimization work performed on Lucene in 2011. The most interesting stuff here is the work done to replace an O(n^2) FuzzyQuery fuzzy-match algorithm with a FSM trie is extremely cool — benchmarked at 214 times faster!
(tags: benchmarks slides lucene search fuzzy-matching text-matching strings algorithms coding fsm tries)
Microsoft Code Digger extension
Miguel de Icaza says it’s witchcraft — I’m inclined to agree:
Code Digger analyzes possible execution paths through your .NET code. The result is a table where each row shows a unique behavior of your code. The table helps you understand the behavior of the code, and it may also uncover hidden bugs. Through the new context menu item “Generate Inputs / Outputs Table” in the Visual Studio editor, you can invoke Code Digger to analyze your code. Code Digger computes and displays input-output pairs. Code Digger systematically hunts for bugs, exceptions, and assertion failures.
(tags: testing constraint-solving solver witchcraft magic dot-net coding tests code-digger microsoft)
Swansea measles outbreak: was an MMR scare in the local press to blame?
Sixteen years ago, journalists had a much easier job assembling “balanced” stories about MMR in south Wales. When I wrote about the measles outbreak last week, I suggested that it was related to Andrew Wakefield’s discredited 1998 Lancet research, but the Swansea contagion seems more likely to be the result of a separate scare a year earlier in the South Wales Evening Post. Before 1997, uptake of MMR in the distribution area of the Post was 91%, and 87.2% in the rest of Wales. After the Post’s campaign, uptake in the distribution area fell to 77.4% (it was 86.8% in the rest of Wales). That’s almost a 14% drop where the Post had influence, compared with less than 3% elsewhere. In the dry wording of the BMJ, “the [South West Evening Post] campaign is the most likely explanation”. In other words, what we can see in Swansea is the local effect of local reporting‚ in all probability, just a taster of what happens when the news irresponsibly creates unfounded terror. […] The 1997 coverage focused on a group of families who blamed MMR for various ailments in their children, including learning difficulties, digestive problems and autism‚ none of which have been found to have any connection with the vaccine. The Post’s coverage was at the time deemed a success, and in 1998 it won a prize for investigative reporting in the BT Wales Press Awards. That year, the SWEP ran at least 39 stories related to the alleged dangers of MMR. And yes, it’s true that the paper never directly endorsed non-vaccination. What it did do was publicise the idea of “vaccine damage” as a risk, one that parents would then likely weigh up against the risk of contracting measles, mumps or rubella. And this went beyond the reporting of parental anxieties‚ it was part of the Post’s editorial line. One article is entitled “Young bodies cannot take it”. The all-important “journalistic balance” was constantly available, thanks to campaigning parents and their solicitor Richard Barr. (It was Barr who engaged Wakefield for a lawsuit, leading to the “fishing expedition” research that became the Lancet paper.) They were happy to provide a quote on the dangers of the “triple jab”, which health authorities were then obliged to rebut politely. The Post also seemed to downplay the risk of measles, reporting on 6 July 1998 that “not a single child has been hit by the illness‚ despite a 13% drop in take-up levels”. It’s not parents who should feel embarrassed by the Swansea measles outbreak: some may have acted from overt dread at the prospect of harming their child, and some simply from omission, but all were encouraged by a press that focused on non-existent risks and downplayed the genuine horror of the diseases MMR prevents. The shame belongs to journalists: those of the South West Evening Post who allowed themselves to be recruited in the service of a speculative lawsuit, and any who let a specious devotion to “balance” overrule a duty to tell the truth.
(tags: south-wales wales mmr health vaccination scares journalism ethics disease measles south-wales-evening-post)
-
mostly a DynamoDB puff-piece from last week’s Amazon Cloud Connect, but contains some good real-world figures for a 20-billion-GUID deduping table use-case at end. ($4,150 per month, to cut to the chase)
(tags: dynamodb aws figures costs architecture ec2 dedupe cloud-connect slides)
Excel, untestability, and the reliability of quants
Wow, this is a great software-quality story — I knew Excel was the most widely used programming environment out there, but this is a factor I’d overlooked:
In his remarks on the final panel, Frank Partnoy mentioned something I missed when it came out a few weeks ago: the role of Microsoft Excel in the “London Whale” trading debacle. [..] To summarize: JPMorgan’s Chief Investment Office needed a new value-at-risk (VaR) model for the synthetic credit portfolio (the one that blew up) and assigned a quantitative whiz […] to create it. The new model “operated through a series of Excel spreadsheets, which had to be completed manually, by a process of copying and pasting data from one spreadsheet to another.” The internal Model Review Group identified this problem as well as a few others, but approved the model, while saying that it should be automated and another significant flaw should be fixed. After the London Whale trade blew up, the Model Review Group discovered that the model had not been automated and found several other errors. Most spectacularly, “After subtracting the old rate from the new rate, the spreadsheet divided by their sum instead of their average, as the modeler had intended. This error likely had the effect of muting volatility by a factor of two and of lowering the VaR …” I write periodically about the perils of bad software in the business world in general and the financial industry in particular, by which I usually mean back-end enterprise software that is poorly designed, insufficiently tested, and dangerously error-prone. But this is something different. […] While Excel the program is reasonably robust, the spreadsheets that people create with Excel are incredibly fragile. There is no way to trace where your data come from, there’s no audit trail (so you can overtype numbers and not know it), and there’s no easy way to test spreadsheets, for starters. The biggest problem is that anyone can create Excel spreadsheets — badly. Because it’s so easy to use, the creation of even important spreadsheets is not restricted to people who understand programming and do it in a methodical, well-documented way. This is why the JPMorgan VaR model is the rule, not the exception: manual data entry, manual copy-and-paste, and formula errors. This is another important reason why you should pause whenever you hear that banks’ quantitative experts are smarter than Einstein, or that sophisticated risk management technology can protect banks from blowing up. At the end of the day, it’s all software. While all software breaks occasionally, Excel spreadsheets break all the time. But they don’t tell you when they break: they just give you the wrong number.
(tags: excel reliability software coding ides jpmorgan value-at-risk finance london-whale quants spreadsheets unit-tests testability testing)
Riak, CAP, and eventual consistency
Good (albeit draft) write-up of the implications of CAP, allow_mult, and last_write_wins conflict-resolution policies in Riak:
As Brewer’s CAP theorem established, distributed systems have to make hard choices. Network partition is inevitable. Hardware failure is inevitable. When a partition occurs, a well-behaved system must choose its behavior from a spectrum of options ranging from “stop accepting any writes until the outage is resolved” (thus maintaining absolute consistency) to “allow any writes and worry about consistency later” (to maximize availability). Riak leans toward the availability end of the spectrum, but allows the operator and even the developer to tune read and write requests to better meet the business needs for any given set of data.
(tags: riak cap eventual-consistency distcomp distributed-systems partition last-write-wins voldemort allow_mult)
How You Can Help Save Upcoming.org, Posterous, and More
Yahoo! sucks. shutting down in days? ArchiveTeam Warrior to the rescue; install the VM!
(tags: archival yahoo shutdowns upcoming waxy archives virtualbox)
The Excel Depression – NYTimes.com
Krugman on the Reinhart-Rogoff Excel-bug fiasco.
What the Reinhart-Rogoff affair shows is the extent to which austerity has been sold on false pretenses. For three years, the turn to austerity has been presented not as a choice but as a necessity. Economic research, austerity advocates insisted, showed that terrible things happen once debt exceeds 90 percent of G.D.P. But “economic research” showed no such thing; a couple of economists made that assertion, while many others disagreed. Policy makers abandoned the unemployed and turned to austerity because they wanted to, not because they had to. So will toppling Reinhart-Rogoff from its pedestal change anything? I’d like to think so. But I predict that the usual suspects will just find another dubious piece of economic analysis to canonize, and the depression will go on and on.
(tags: paul-krugman economics excel coding bugs software austerity debt)
Vaccination ‘herd immunity’ demonstration
‘Stochastic monte-carlo epidemic SIR model to reveal herd immunity’. Fantastic demo of this important medical concept (via Colin Whittaker)
(tags: via:colinwh stochastic herd-immunity random sir epidemics health immunity vaccination measles medicine monte-carlo-simulations simulations)
Fred’s ImageMagick Scripts: SIMILAR
compute an image-similarity metric, to discover mostly-identical-but-slightly-tweaked images:
SIMILAR computes the normalized cross correlation similarity metric between two equal dimensioned images. The normalized cross correlation metric measures how similar two images are, not how different they are. The range of ncc metric values is between 0 (dissimilar) and 1 (similar). If mode=g, then the two images will be converted to grayscale. If mode=rgb, then the two images first will be converted to colorspace=rgb. Next, the ncc similarity metric will be computed for each channel. Finally, they will be combined into an rms value.
(via Dan O’Neill)(tags: image photos pictures similar imagemagick via:dano metrics similarity)
-
a first-person game prototype in which players navigate a 3D space while picking up orbs that reduce the speed of light in increments. Custom-built, open-source relativistic graphics code allows the speed of light in the game to approach the player’s own maximum walking speed. Visual effects of special relativity gradually become apparent to the player, increasing the challenge of gameplay. These effects, rendered in realtime to vertex accuracy, include the Doppler effect (red- and blue-shifting of visible light, and the shifting of infrared and ultraviolet light into the visible spectrum); the searchlight effect (increased brightness in the direction of travel); time dilation (differences in the perceived passage of time from the player and the outside world); Lorentz transformation (warping of space at near-light speeds); and the runtime effect (the ability to see objects as they were in the past, due to the travel time of light). Players can choose to share their mastery and experience of the game through Twitter. A Slower Speed of Light combines accessible gameplay and a fantasy setting with theoretical and computational physics research to deliver an engaging and pedagogically rich experience.
Eventual Consistency Today: Limitations, Extensions, and Beyond – ACM Queue
Good overview of the current state of eventually-consistent data store research, covering CALM and CRDTs, from Peter Bailis and Ali Ghodsi
(tags: eventual-consistency data storage horizontal-scaling research distcomp distributed-systems via:martin-thompson crdts calm acid cap)
Latency’s Worst Nightmare: Performance Tuning Tips and Tricks [slides]
the basics of running a service stack (web, app servers, data stores) on AWS. some good benchmark figures in the final slides
(tags: benchmarks aws ec2 ebs piops services scaling scalability presentations)
Rob “b3ta” Manuel in Dublin next week
The Bottom Half Of The Internet — “Racism; typos; filth; spam; ignorance; rage – that’s all the bottom half of the internet is good for, right? Rob Manuel wants you to question the internet dictum, most beloved of high-profile columnists, that you should ignore all of the comments all of the time. The ‘war on comments’, he reckons, might just be an echo of a fourth estate that’s having trouble adjusting to the idea of an unwashed public disagreeing with their sacred opinions. Sous les pavés, la plage.” On Tuesday, le cool Dublin & Pilcrow present SPIEL. Rob Manuel is the flashy animator behind B3ta and he’s joined by Ed Melvin, who wants to educate you on ‘The Unreal Engines’ of virtual currencies and economies.
(tags: rob-manuel b3ta dublin comments internet meetings talks lecool)
Reality, Reactivity, Relevance and Repeatability in Java Application Profiling
this product from JInspired appears to support runtime profiling of java apps with < 5% performance impact
(tags: profiling performance java coding measurement)
You Lookin’ At Me? Reflections on Google Glass
ex-Nokia product design guru Jan Chipchase on Google Glass
(tags: google privacy technology google-glass pervasive-computing life future)
Not the ‘best in the world’ – The Medical Independent
Debunking this prolife talking point:
‘Our maternity services are amongst the best in the world’. This phrase has been much hackneyed since the heartbreaking death of Savita Halappanavar was revealed in mid October. James Reilly and other senior politicians are particularly guilty of citing this inaccurate position. So what is the state of Irish maternity services and how do our figures compare with other comparable countries? Let’s start with the statistics.
The bottom line:Eight deaths per 100,000 is not bad, but it ranks our maternity services far from the best in world and below countries such as Slovakia and Poland.
(tags: pro-choice ireland savita medicine health maternity morbidity statistics)
How Kaggle Is Changing How We Work – Thomas Goetz – The Atlantic
Founded in 2010, Kaggle is an online platform for data-mining and predictive-modeling competitions. A company arranges with Kaggle to post a dump of data with a proposed problem, and the site’s community of computer scientists and mathematicians — known these days as data scientists — take on the task, posting proposed solutions. […] On one level, of course, Kaggle is just another spin on crowdsourcing, tapping the global brain to solve a big problem. That stuff has been around for a decade or more, at least back to Wikipedia (or farther back, Linux, etc). And companies like TaskRabbit and oDesk have thrown jobs to the crowd for several years. But I think Kaggle, and other online labor markets, represent more than that, and I’ll offer two arguments. First, Kaggle doesn’t incorporate work from all levels of proficiency, professionals to amateurs. Participants are experts, and they aren’t working for benevolent reasons alone: they want to win, and they want to get better to improve their chances of winning next time. Second, Kaggle doesn’t just create the incidental work product, it creates a new marketplace for work, a deeper disruption in a professional field. Unlike traditional temp labor, these aren’t bottom of the totem pole jobs. Kagglers are on top. And that disruption is what will kill Joy’s Law. Because here’s the thing: the Kaggle ranking has become an essential metric in the world of data science. Employers like American Express and the New York Times have begun listing a Kaggle rank as an essential qualification in their help wanted ads for data scientists. It’s not just a merit badge for the coders; it’s a more significant, more valuable, indicator of capability than our traditional benchmarks for proficiency or expertise. In other words, your Ivy League diploma and IBM resume don’t matter so much as my Kaggle score. It’s flipping the resume, where your work is measurable and metricized and your value in the marketplace is more valuable than the place you work.
(tags: academia datamining economics data kaggle data-science ranking work competition crowdsourcing contracting)
-
a good reference, with lots of sample output. Not clear if it takes 1.6/1.7 differences into account, though
Austerity policies founded on Excel typo
You’ve probably heard that countries with a high debt:GDP ratio suffer from slow economic growth. The specific number 90 percent has been invoked frequently. That’s all thanks to a study conducted by Carmen Reinhardt and Kenneth Rogoff for their book This Time It’s Different. But the results have been difficult for other researchers to replicate. Now three scholars at the University of Massachusetts have done so in “Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff” and they find that the Reinhart/Rogoff result is based on opportunistic exclusion of Commonwealth data in the late-1940s, a debatable premise about how to weight the data, and most of all a sloppy Excel coding error. Read Mike Konczal for the whole rundown, but I’ll just focus on the spreadsheet part. At one point they set cell L51 equal to AVERAGE(L30:L44) when the correct procuedure was AVERAGE(L30:L49). By typing wrong, they accidentally left Denmark, Canada, Belgium, Austria, and Australia out of the average. When you run the math correctly “the average real GDP growth rate for countries carrying a public debt-to-GDP ratio of over 90 percent is actually 2.2 percent, not -0.1 percent.”
(tags: austerity politics excel coding errors bugs spreadsheets economics economy)
Is Your MySQL Buffer Pool Warm? Make It Sweat!
How GroupOn are warming up a failover warm MySQL spare, using Percona stuff and a “tee” of the live in-flight queries. (via Dave Doran)
(tags: via:dave-doran mysql databases warm-spares spares failover groupon percona replication)
So now you know who gets some of those excessive Ticketmaster fees….
Interesting evidence; it appears Irish music promoters are getting “rebates” from the massive TicketMaster “booking fee”, on each ticket sold. This sounds like a cartel to me, and we need to regulate this. Where is the National Consumer Agency and Competition Authority?
The matter is something which should be of concern to every gig-going music fan, regardless of whether they go to Stradbally or not. For years, many have asked about TicketMaster’s quasi-monopoly position in the marketplace and why this is so. We’ve always been told that promoters preferred to deal with one company rather than several and that TM’s systems and nationwide reach yadda yadda yadda was the bees’ knees etc. Other companies have tried to compete but no-one has been able to beat TM at this game. But why would promoters go elsewhere when they’re getting a slice of the TM fees back as rebates? Those past off-the-record attempts by and briefings from promoters blaming TM for those fees can now be seen as hypocritical. They’re sticking with TM because they’re receiving a take of the fees paid by punters who have no other choice in service provider if they want to get their hands on tickets. You wonder what the acts make of this cash-grab – perhaps some whip-smart agent is already making a claim for a percentage of the rebates because there would be no rebates in the first place without the act. Surely this is an issue for the Competition Authority and National Consumers Association too, given the manner in which the rebates are made and TM’s deals with the promoters? While promoters under TM deals are free to sell a certain proportion of their tickets with another provider, it’s usually only a very small percentage of the total and unlikely to trouble TM’s bottom line. Also, given that the rebates are volume-driven, it’s better for the promoters to keep the largest possible chunk of their business with TM. It seems that we have a new suspect in the blame game about why ticket prices are so high.
(tags: regulation ireland cartels competition ticketing tickets ticketmaster music gigs consumer)
Blog shines spotlight on Dublin city’s illegal dumping problem
Hooray, Eoin’s activism gets some coverage!
THE SCALE OF Dublin’s dumping problem is laid bare in a blog that has seen contributors send in photos of chairs, fridges and heaps of rubbish strewn on city streets. Eoin Parker, one of organisers behind DublinLitterBlog.com, spoke to TheJournal.ie about the problem, saying that the blog was set up following the privatisation of waste management by Dublin City Council in 2012.
(tags: dumping dublin litter rubbish blogs dcc d1 activism community)
-
To our knowledge, Ked is the first scripting language to emerge from The People’s Republic of Cork. Below is an account of what we know so far about the mysterious Corkonian language. Any suggested updates or contributions are encouraged.
Genius. Just how bad are RTE’s finances?
A sobering examination by NAMAwinelake into the quagmire of Ireland’s publicly-funded national broadcaster:
It seems that RTE has become a disaster zone, with libels and incompetence overseen by incapable management, and this is reflected in that organisation’s financial results. RTE still employs nearly 2,000 people and supports jobs and industry across independent producers and suppliers; it is a major business. But the time has come to call a halt to delusional management that is sinking the organization deeper into a quagmire which will ultimately need to be bailed out by the State. And Noel Curran is fobbing us off with flying a kite about a reduction in 65-year old Pat Kenny’s salary from €630,000 to €570,000?!
(tags: rte namawinelake public funding finances money mismanagement ireland incompetence tv news)
High Scalability – Scaling Pinterest – From 0 to 10s of Billions of Page Views a Month in Two Years
wow, Pinterest have a pretty hardcore architecture. Sharding to the max. This is scary stuff for me:
a [Cassandra-style] Cluster Management Algorithm is a SPOF. If there’s a bug it impacts every node. This took them down 4 times.
yeah, so, eek ;)(tags: clustering sharding architecture aws scalability scaling pinterest via:matt-sergeant redis mysql memcached)
Expert in Savita inquiry confirms Irish women get lower standard of care with chorioamnionitis
Dr. Jen Gunter again:
Dr. Knowles’ testimony confirms for me that the law played a role, because her statements indicate the standard of care for treatment of chorioamnionitis is less aggressive in Ireland. This can only be because of the law as there is no medical evidence to support delaying delivery when chorioamnionitis is diagnosed. Standard of care is not to wait until a woman is sick enough to need a termination, the idea is to treat her, you know, before she gets sick enough. An elevated white count and ruptured membranes at 17 weeks is typically enough to make the diagnosis, so Dr. Knowles needs to testify as to what in Savita’s medical record made it safe to not recommend a delivery. By the way, I also disagree with Dr. Knowles about her interpretation of Savita’s medical record, the chart doesn’t have “subtle indicators” of infection, it screams chorioamnionitis long before Wednesday morning. In North America the standard of care with chorioamnionitis is to recommend delivery as soon as the diagnosis is made, not wait until women enter the antechamber of death in the hopes that we can somehow snatch them back from the brink. If Irish law, or the interpretation thereof, had nothing to do with Savita’s death no expert would be mentioning sick enough at all.
(tags: jen-gunter ob-gyn medicine savita law ireland abortion tragedy galway hospital)
Boundary Product Update: Trends Dashboard Now Available
Boundary implement week-on-week trend display. Pity they use silly “giant number” dashboard boxes showing comparisons of the current datapoint with the previous week’s datapoint; there’s no indication of smoothing being applied, and “giant number” dashboards are basically useless anyway compared to a time-series graph, for unsmoothed time-series data. Also, no prediction bands. :(
(tags: boundary time-series tsd prediction metrics smoothing dataviz dashboards)
ESB Networks | Power Check | Service Interruptions Map
real-time service outage information on a map, from Ireland’s power network
Project Voldemort at Gilt Groupe: When Failure Isn’t an Option [slides]
Geir Magnusson explains how Gilt Groupe is using Project Voldemort to scale out their e-commerce transactional system. The initial SQL solution had to be replaced because it could not handle the transactional spikes the site is experiencing daily due to its particular way of selling their inventory: each day at noon. Magnusson explains why they chose Voldemort and talks about the architecture.
via Filippo(tags: via:filippo database architecture nosql data voldemort gilt-groupe ops storage presentations)
The full timeline of Savita Halappanavar’s mistreatment
a comment on Dr. Jen Gunter’s blog puts it all together
(tags: timeline savita abortion malpractice ireland medicine fail)
-
No holds barred:
Speaking today, spokesman Charles Stanley-Smith said; “This idea is insane. This area has suffered from dumping due to a lack of enforcement – yet the council now propose to effectively withdraw services altogether. As numerous studies such as ‘the broken window hypothesis’ indicate, where a small problem is left un-tackled it is likely to become far worse rather than better. In other words, rather than increase enforcement to solve the problem, Dublin City Council is going to remove enforcement. How will this deal with the problem? Imagine if that logic were applied to crime; would the removal of police services in an area help resolve criminal behaviour – or increase it? The answer is obvious.”
(tags: an-taisce environment cleaning dublin ireland dcc rubbish trash society d1)
-
Written by Google, this library is a flexible, efficient, and powerful Java client library for accessing any resource on the web via HTTP. It features a pluggable HTTP transport abstraction that allows any low-level library to be used, such as java.net.HttpURLConnection, Apache HTTP Client, or URL Fetch on Google App Engine. It also features efficient JSON and XML data models for parsing and serialization of HTTP response and request content. The JSON and XML libraries are also fully pluggable, including support for Jackson and Android’s GSON libraries for JSON.
Not quite as simple an API as Python’s requests, sadly, but still an improvement on the verbose Apache HttpComponent API. Good support for unit testing via a built-in mock-response class. Still in beta(tags: google beta software http libraries json xml transports protocols)
Former IMF chief of mission to Ireland says not burning the bondholders was “a mistake”
Former IMF chief of mission to Ireland, Ashoka Mody, above left with Ajai Chopra in 2010. Melancholy of eye and large of loafer, Ashoka was involved in negotiating Ireland’s EU/IMF bailout. […] This morning Ashok gave an interview to Gavin Jennings on Morning Ireland, in which he admitted Ireland’s bailout was riddled with mistakes, namely the non-burning of the senior bondholders and the program of austerity. Jennings: “So, if imposing austerity on Ireland was wrong, or a mistake; if not allowing any burning of bondholders, whether official, sovereign or private was a mistake; you were centrally involved in that program. I know Ajai Chopra was very much the public face of the IMF mission to Ireland. But you were centrally involved in constructing this bailout. How much responsibility do you take for those errors.” Mody: “Yes, so, obviously, I have to take the responsibility in…but I’m in very good company in taking responsibility in this. There were many parties involved. And my role really was to bring such matters to the attention of people who finally made these decisions.”
Great.(tags: bondholders imf ireland economy default ajai-chopra ashoka-mody)
Savita Halappanavar’s inquest: the three questions that must be answered | Dr. Jen Gunter
A professional OB/GYN analyses the horrors coming to light in the Savita inquest. Here’s one particular gem:
Fetal survival with ruptured membranes at 17 weeks is 0%, this is from prospective study. […but] “real and substantial risk” to the woman’s life is what is required by the Irish constitution to terminate a pregnancy, *whether or not the foetus is viable*.
So the foetus had 0% chance of survival — but still termination was not considered an option. Bloody hell.(tags: religion ireland savita horrors malpractice galway guh hospitals hse health inquest abortion pro-choice pregnancy)
Minister Rabbitte welcomes EU agreement on re-use of Public Sector Information
Lots of talk about “charging regimes”, “income-generating public sector bodies” etc., but not a single mention of open data or free access. Terrible stuff. :( (via conoro)
(tags: via:conoro open-access government public-sector ireland eu open-data public free)
Compression in Kafka: GZIP or Snappy ?
With Ack: in this mode, as far as compression is concerned, the data gets compressed at the producer, decompressed and compressed on the broker before it sends the ack to the producer. The producer throughput with Snappy compression was roughly 22.3MB/s as compared to 8.9MB/s of the GZIP producer. Producer throughput is 150% higher with Snappy as compared to GZIP. No ack, similar to Kafka 0.7 behavior: In this mode, the data gets compressed at the producer and it doesn’t wait for the ack from the broker. The producer throughput with Snappy compression was roughly 60.8MB/s as compared to 18.5MB/s of the GZIP producer. Producer throughput is 228% higher with Snappy as compared to GZIP. The higher compression savings in this test are due to the fact that the producer does not wait for the leader to re-compress and append the data; it simply compresses messages and fires away. Since Snappy has very high compression speed and low CPU usage, a single producer is able to compress the same amount of messages much faster as compared to GZIP.
The Bw-Tree: A B-tree for New Hardware – Microsoft Research
The emergence of new hardware and platforms has led to reconsideration of how data management systems are designed. However, certain basic functions such as key indexed access to records remain essential. While we exploit the common architectural layering of prior systems, we make radically new design decisions about each layer. Our new form of B tree, called the Bw-tree achieves its very high performance via a latch-free approach that effectively exploits the processor caches of modern multi-core chips. Our storage manager uses a unique form of log structuring that blurs the distinction between a page and a record store and works well with flash storage. This paper describes the architecture and algorithms for the Bw-tree, focusing on the main memory aspects. The paper includes results of our experiments that demonstrate that this fresh approach produces outstanding performance.
(tags: bw-trees database paper toread research algorithms microsoft sql sql-server b-trees data-structures storage cache-friendly mechanical-sympathy)
Boundary Techtalk – Large-scale OLAP with Kobayashi
Boundary on their TSD-on-Riak store.
Dietrich Featherston, Engineer at Boundary, walks through the process of designing Kobayashi, the time-series analytics database behind our network metrics. He goes through the false-starts and lessons learned in effectively using Riak as the storage layer for a large-scale OLAP database. The system is ultimately capable of answering complex, ad-hoc queries at interactive latencies.
(tags: video boundary tsd riak eventual-consistency storage kobayashi olap time-series)
-
A few days old, but already an instant Streisand-Effect classic:
Sometimes people borrow [Colin Purrington’s free guide about making scientific posters] without giving him credit. This happens fairly regularly, and when he finds out about it, he sends an e-mail asking them to take it down. Usually they do. But when he sent an e-mail to the Consortium for Plant Biotechnology Research, asking that a roughly 1,200-word, near-verbatim, uncredited chunk from his guide be removed from the consortium’s materials, the response was unexpected. Rather than apologise, a lawyer sent him a cease-and-desist letter accusing him of plagiarizing the consortium’s materials and demanding that he take down his guide or face a lawsuit seeking damages up to $150,000.
(tags: streisand-effect lawsuits law infringement copyright cpbr bullying science posters)
Kafka 0.8 Producer Performance
Great benchmarking from Piotr Kozikowski at the LiveRamp team, into performance of the upcoming Kafka 0.8 release
(tags: performance kafka apache benchmarks ops queueing)
Running a Multi-Broker Apache Kafka 0.8 Cluster on a Single Node
an excellent writeup on Kafka 0.8’s use and operation, including details of the new replication features
(tags: kafka replication queueing distributed ops)
-
‘A €10 silver coin being offered for sale to the public in honour of James Joyce by the Central Bank tomorrow contains a misquote from the author. The line used on the coin from Chapter 3 of Ulysses includes a superfluous conjunction – a rogue ‘that’.’ [..] The coin reads:
“Ineluctable modality of the visible: at least that if no more, thought through my eyes. Signatures of all things *that* I am here to read.”
(Incorrect ‘that’ emphasised)(tags: for:robotwisdom james-joyce typos funny fail central-bank ireland coins minting errors ulysses)
Netflix ISP Speed Index for Ireland
Via Mulley. Magnet doing well, with UPC coming second; UPC have dropped a fair bit in the past month. Would love to see it broken down by region…
(tags: upc ireland isps speed bandwidth netflix broadband magnet eircom)
Why I’m Walking Away From CouchDB
In practice there are two gotchas that are so painful I am looking for a replacement with a different featureset than couchdb provides. The location tracking project icecondor.com uses couchdb to store 20,000 new records per day. It has more write traffic than read traffic and runs on modest hardware. Those two gotchas are: 1. View Index updates. While I have a vague understanding of why view index updates are slow and bulky and important, in practice it is unworkable. Every write sets up a trap for the first reader to come along after the write. The more writes there are, the bigger the trap for the first reader which has to wait on the couchdb process that refreshes the view index on an as-needed basis. I believe this trade-off was made to keep writes fast. No need to update the view index until all writes are actually complete, right? Write traffic is heavier than read traffic and the time needed for that index refresh causes the webapp to crash because its not setup to handle timeouts from a database query. The workaround is as hackish as one can imagine – cron jobs to hit every map/reduce query to keep indexes fresh. 2. Append only database file Append only is in theory a great way to ensure on-disk reliability. A system crash during an append should only affect that append. Its a crash during an update to existing parts of the file that risks the integrity of more than whats being updated. With so many layers of caching and optimizations in the kernel and the filesystem and now in the workings of SSD drives, I’m not sure append-only gives extra protection anymore. What it does do is a create a huge operational headache. The on-disk file can never grow beyond half the available storage space. Record deletion uses new disk space and if the half-full mark approaches, vacuuming must be done. The entire database is rewritten to the filesystem, leaving out no longer needed records. If the data file should happen to grow beyond half the partition, the system has esentially crashed because there is no way to compact the file and soon the partition will be full. This is a likely scenario when there is a lot of record deletion activity. The system in question does a lot of writes of temporary data that is followed up by deletes a few days later. There is also a lot of permanent storage that hardly gets used. Rewriting every byte of the records that are long-lived due to compaction is an enormous amount of wasted I/O – doubly so given SSD drives have a short write-cycle lifespan.
(tags: nosql couchdb consistency checkpointing databases data-stores indexing)
CouchDB: not drinking the kool-aid
Jonathan Ellis on some CouchDB negatives:
Here are some reasons you should think twice and do careful testing before using CouchDB in a non-toy project: Writes are serialized. Not serialized as in the isolation level, serialized as in there can only be one write active at a time. Want to spread writes across multiple disks? Sorry. CouchDB uses a MVCC model, which means that updates and deletes need to be compacted for the space to be made available to new writes. Just like PostgreSQL, only without the man-years of effort to make vacuum hurt less. CouchDB is simple. Gloriously simple. Why is that a negative? It’s competing with systems (in the popular imagination, if not in its author’s mind) that have been maturing for years. The reason PostgreSQL et al have those features is because people want them. And if you don’t, you should at least ask a DBA with a few years of non-MySQL experience what you’ll be missing. The majority of CouchDB fans don’t appear to really understand what a good relational database gives them, just as a lot of PHP programmers don’t get what the big deal is with namespaces. A special case of simplicity deserves mention: nontrivial queries must be created as a view with mapreduce. MapReduce is a great approach to trivially parallelizing certain classes of problem. The problem is, it’s tedious and error-prone to write raw MapReduce code. This is why Google and Yahoo have both created high-level languages on top of it (Sawzall and Pig, respectively). Poor SQL; even with DSLs being the new hotness, people forget that SQL is one of the original domain-specific languages. It’s a little verbose, and you might be bored with it, but it’s much better than writing low-level mapreduce code.
(tags: cassandra couch nosql storage distributed databases consistency)
What is the CouchDB replication protocol? Is it like Git? – Stack Overflow
Good write up of CouchDB replication
(tags: protocols couchdb sync replication git mvcc databases merging timelines)
TouchDB’s reverse-engineered write-up of the Couch replication protocol
There really isn’t a separate “protocol” per se for replication. Instead, replication uses CouchDB’s REST API and data model. It’s therefore a bit difficult to talk about replication independently of the rest of CouchDB. In this document I’ll focus on the algorithm used, and link to documentation of the APIs it invokes. The “protocol” is simply the set of those APIs operating over HTTP.
(tags: couchdb protocols touchdb nosql replication sync mvcc revisions rest)
-
A good writeup of how to detect cases of copyright infringement for photography, art and other visual media.
Von Glitschka, Modern Dog and myriad others make clear that the support of the creative community is absolutely vital in raising awareness of copyright infringements. Sites like www.youthoughtwewouldntnotice.com name and shame clear breaches of copyright, while the Modern Dog case shows that there is no better IP tracing system than the eyes and ears of the design community itself. “It’s the industry at large that has kept me aware of infringements,” states Von. “Without that I would miss most of them because I don’t go looking – they find me via the eyes of others.”
(tags: photography art visual-media copyright infringement piracy ripping)
FastBit: An Efficient Compressed Bitmap Index Technology
an [LGPL] open-source data processing library following the spirit of NoSQL movement. It offers a set of searching functions supported by compressed bitmap indexes. It treats user data in the column-oriented manner similar to well-known database management systems such as Sybase IQ, MonetDB, and Vertica. It is designed to accelerate user’s data selection tasks without imposing undue requirements. In particular, the user data is NOT required to be under the control of FastBit software, which allows the user to continue to use their existing data analysis tools. The key technology underlying the FastBit software is a set of compressed bitmap indexes. In database systems, an index is a data structure to accelerate data accesses and reduce the query response time. Most of the commonly used indexes are variants of the B-tree, such as B+-tree and B*-tree. FastBit implements a set of alternative indexes called compressed bitmap indexes. Compared with B-tree variants, these indexes provide very efficient searching and retrieval operations, but are somewhat slower to update after a modification of an individual record. A key innovation in FastBit is the Word-Aligned Hybrid compression (WAH) for the bitmaps.[…] Another innovation in FastBit is the multi-level bitmap encoding methods.
(tags: fastbit nosql algorithms indexing search compressed-bitmaps indexes wah bitmaps compression)
-
The bit array data structure is implemented in Java as the BitSet class. Unfortunately, this fails to scale without compression. JavaEWAH is a word-aligned compressed variant of the Java bitset class. It uses a 64-bit run-length encoding (RLE) compression scheme. We trade-off some compression for better processing speed. We also have a 32-bit version which compresses better, but is not as fast. In general, the goal of word-aligned compression is not to achieve the best compression, but rather to improve query processing time. Hence, we try to save CPU cycles, maybe at the expense of storage. However, the EWAH scheme we implemented is always more efficient storage-wise than an uncompressed bitmap (as implemented in the BitSet class). Unlike some alternatives, javaewah does not rely on a patented scheme.
(tags: javaewah wah rle compression bitmaps bitmap-indexes bitset algorithms data-structures)
Measure Anything, Measure Everything « Code as Craft
the classic Etsy pro-metrics “measure everything” post. Some good basic rules and mindset
Testing Your Automation [slides]
Test-driven infrastructure, using Chef — slides from Big Ruby 2013. Tools used: foodcritic (lol), Chefspec, minitest-chef-handler, fauxhai, cucumber chef. This is really good to see — TDD applied to ops. Video at: http://confreaks.com/videos/2309-bigruby2013-testing-your-automation-ttd-for-chef-cookbooks
(tags: devops ops chef automation testing tdd infrastructure provisioning deployment)
Meet the nice-guy lawyers who want $1,000 per worker for using scanners | Ars Technica
Great investigative journalism, interviewing the legal team behind the current big patent-troll shakedown; that on scanning documents with a button press, using a scanner attached to a network. They express whole-hearted belief in the legality of their actions, unsurprisingly — they’re exactly what you think they’d be like (via Nelson)
(tags: via:nelson ethics business legal patents swpats patent-trolls texas shakedown)
[#HADOOP-9448] Reimplement things – ASF JIRA
Pretty good April Fools from this year — a patch to delete the entirety of Hadoop’s codebase:
To avoid any bias to the existing code and make the same mistakes we should just delete trunk completely. Attached it is a script that deletes everything.
(tags: hadoop april-fools asf patches open-source oss)
Lucas Nussbaum’s Blog » Blog Archive » RVM: seriously?
+1. RVM is atrocious code — some of the worst bash script I’ve seen. And it’s not just installing as a command, it requires that it be sourced and hooks into your login shell. If you then use “set -e”, it crashes; “set -u”, it crashes; reset $HOME, crash. It’s dire.
-
Next April 11th, at the IIEA in North Gt Georges St:
Rick Falkvinge, founder of the Swedish Pirate Party, will examine the case for reform of copyright and patent law in the EU. Legalised file sharing, free sampling and shortened copyright protection times are the main elements of a proposal co-authored by Mr. Falkvinge which was submitted to the European Parliament in 2012. He will question whether, in the context of ever-increasing online activity, existing legal frameworks pose a threat to users’ civil liberties.
(tags: rick-falkvinge pirate-party ireland iiea dublin copyright patents filesharing)
High Performance MongoDB Clusters with Amazon EBS Provisioned IOPS
yeah yeah, Mongo. bookmarking for the good data on EBS+PIOPS
(tags: ebs piops aws performance tips ops ec2 mongodb presentations)
-
These notes are intended to help users and system administrators maximize TCP/IP performance on their computer systems. They summarize all of the end-system (computer system) network tuning issues including a tutorial on TCP tuning, easy configuration checks for non-experts, and a repository of operating system specific instructions for getting the best possible network performance on these platforms.
Some tips for maximizing HPC network performance for the intra-DC case; recommended by the LinkedIn Kafka operations page.(tags: tuning network tcp sysadmin performance ops kafka ec2)
Increasing EBS Performance – Amazon Elastic Compute Cloud
good docs from EC2
(tags: ec2 ebs performance piops docs)
-
an open source virtualized Ethernet networking stack. I am developing Snabb Switch in response to several exciting trends: x86 has risen to be a powerful networking platform. Virtualization and SDN are pulling more networking into servers. Optimized user-space software is out-performing kernel-space software. Snabb Switch’s simple and fast software-only data plane makes developing networking software easier than ever before.
Written in LuaJIT but aiming to be very fast. cool stuff, worth watching(tags: sdn software networking emulation snabb-switch luajit lua virtualization)
Abusing hash kernels for wildly unprincipled machine learning
what, is this the first time our spam filtering approach of hashing a giant feature space is hitting mainstream machine learning? that can’t be right!
(tags: ai machine-learning python data hashing features feature-selection anti-spam spamassassin)
-
Joel On Software weighs in (via Tony Finch):
The fastest growing industry in the US right now, even during this time of slow economic growth, is probably the patent troll protection racket industry.
(tags: joel-on-software patents swpats shakedown extortion us-politics patent-trolls via:fanf)
-
Cap’n Proto is an insanely fast data interchange format and capability-based RPC system. Think JSON, except binary. Or think Protocol Buffers, except faster. In fact, in benchmarks, Cap’n Proto is INFINITY TIMES faster than Protocol Buffers.
Basically, marshalling like writing an aligned C struct to the wire, QNX messaging protocol-style. Wasteful on space, but responds to this by suggesting compression (which is a fair point tbh). C++-only for now. I’m not seeing the same kind of support for optional data that protobufs has though. Overall I’m worried there’s some useful features being omitted here…(tags: serialization formats protobufs capn-proto protocols coding c++ rpc qnx messaging compression compatibility interoperability i14y)
CRDTs – Commutative Replicated Data Types [pdf]
Shared read-only data is easy to scale by using well-understood replication techniques. However, sharing mutable data at a large scale is a dicult problem, because of the CAP impossibility result [5]. Two approaches dominate in practice. One ensures scalability by giving up consistency guarantees, for instance using the Last-Writer-Wins (LWW) approach [7]. The alternative guarantees consistency by serialising all updates, which does not scale beyond a small cluster [12]. Optimistic replication allows replicas to diverge, eventually resolving conflicts either by LWW-like methods or by serialisation [11]. In some (limited) cases, a radical simplication is possible. If concurrent updates to some datum commute, and all of its replicas execute all updates in causal order, then the replicas converge.1 We call this a Commutative Replicated Data Type (CRDT). The CRDT approach ensures that there are no conflicts, hence, no need for consensus-based concurrency control. CRDTs are not a universal solution, but, perhaps surprisingly, we were able to design highly useful CRDTs. This new research direction is promising as it ensures consistency in the large scale at a low cost, at least for some applications.
(tags: consistency algorithms concurrency crdts distcomp data)
-
‘The CRDT toolbox provides a collection of basic Conflict-free replicated data types as well as a common interface for defining your own CRDTs’. – in Eric Moritz’ github. Also includes some more links to CRDT background reading.
(tags: crdt github eric-moritz python algorithms)
Eventually-Consistent Data Structures [slides]
implementing CRDTs in Riak and Voldemort
(tags: crdt algorithms distcomp riak voldemort distributed)
-
What do you get if you take one accountant with “a fondness for spreadsheets, finance and business” and mix with “a life-long passion for video games”? Well it’s obvious isn’t it? A turn-based RPG made and played entirely in Microsoft Excel.
(via Paul Moloney)(tags: via:oceanclub arena.xlsm excel spreadsheets games gaming rpg)
serverspec – unit tests for servers
With serverspec, you can write RSpec tests for checking your servers are provisioned correctly. Serverspec tests your servers’ actual state through SSH access, so you don’t need to install any agent softwares on your servers and can use any provisioning tools, Puppet, Chef, CFEngine and so on.
(via Dave Doran)(tags: via:dave-doran puppet testing chef cfengine unit-testing ops provisioning serverspec rspec ruby)
joshua’s blog: overclocking the lecture
Joshua’s old tip on watching videos at 2x speed using Perian
(tags: quicktime video hacks mac speed lectures presentations learning)
-
This seems pretty significant. Is the tide turning in the Texas Eastern District against patent trolls, at last? And does it establish sufficient precedent?
A federal judge has thrown out a patent claim against Rackspace, ruling that mathematical algorithms can’t be patented. The ruling in the Eastern Disrict stemmed from a 2012 complaint filed by Uniloc USA asserting that processing of floating point numbers by the Linux operating system was a patent violation. Chief Judge Leonard Davis based the ruling on U.S. Supreme Court case law that prohibits the patenting of mathematical algorithms. According to Rackspace, this is the first reported instance in which the Eastern District of Texas has granted an early motion to dismiss finding a patent invalid because it claimed unpatentable subject matter. Red Hat, which supplies Linux to Rackspace, provided Rackspace’s defense. Red Hat has a policy of standing behind customers through its Open Source Assurance program.
See https://news.ycombinator.com/item?id=5455869 for more discussion.(tags: east-texas patents swpats maths patenting law judges rackspace linux red-hat uniloc-usa floating-point)
Introducing Chronos: A Replacement for Cron
A distributed, fault-tolerant “cron” is something which comes up frequently — it makes for a great fault-tolerance building block. This one sounds like it’s too closely tied into Mesos, though (IMO).
Chronos is our replacement for cron. It is a distributed and fault-tolerant scheduler which runs on top of Mesos. It’s a framework and supports custom mesos executors as well as the default command executor. Thus by default, Chronos executes SH (on most systems BASH) scripts. Chronos can be used to interact with systems such as Hadoop (incl. EMR), even if the mesos slaves on which execution happens do not have Hadoop installed. Included wrapper scripts allow transfering files and executing them on a remote machine in the background and using asynchroneous callbacks to notify Chronos of job completion or failures.
(tags: cron scheduling mesos stacks design airbnb chronos fault-tolerance distcomp distributed-computing scripts jobs)
One of CloudFlare’s upstream providers on the “death of the internet” scare-mongering
Having a bad day on the Internet is nothing new. These are the types of events we deal with on a regular basis, and most large network operators are very good at responding quickly to deal with situations like this. In our case, we worked with Cloudflare to quickly identify the attack profile, rolled out global filters on our network to limit the attack traffic without adversely impacting legitimate users, and worked with our other partner networks (like NTT) to do the same. If the attacks had stopped here, nobody in the “mainstream media” would have noticed, and it would have been just another fun day for a few geeks on the Internet. The next part is where things got interesting, and is the part that nobody outside of extremely technical circles has actually bothered to try and understand yet. After attacking Cloudflare and their upstream Internet providers directly stopped having the desired effect, the attackers turned to any other interconnection point they could find, and stumbled upon Internet Exchange Points like LINX (in London), AMS-IX (in Amsterdam), and DEC-IX (in Frankfurt), three of the largest IXPs in the world. An IXP is an “interconnection fabric”, or essentially just a large switched LAN, which acts as a common meeting point for different networks to connect and exchange traffic with each other. One downside to the way this architecture works is that there is a single big IP block used at each of these IXPs, where every network who interconnects is given 1 IP address, and this IP block CAN be globally routable. When the attackers stumbled upon this, probably by accident, it resulted in a lot of bogus traffic being injected into the IXP fabrics in an unusual way, until the IXP operators were able to work with everyone to make certain the IXP IP blocks weren’t being globally re-advertised. Note that the vast majority of global Internet traffic does NOT travel over IXPs, but rather goes via direct private interconnections between specific networks. The IXP traffic represents more of the “long tail” of Internet traffic exchange, a larger number of smaller networks, which collectively still adds up to be a pretty big chunk of traffic. So, what you actually saw in this attack was a larger number of smaller networks being affected by something which was an completely unrelated and unintended side-effect of the actual attacks, and thus *poof* you have the recipe for a lot of people talking about it. :) Hopefully that clears up a bit of the situation.
(tags: bandwidth internet gizmodo traffic cloudflare ddos hacking)
21 graphs that show America’s health-care prices are ludicrous
Excellent data, this. I’d heard a few of these prices, but these graphs really hit home. $26k for a caesarean section at the 95th percentile!? talk about out of control price gouging.
(tags: healthcare costs economics us-politics world comparison graphs charts data via:hn america)
Design for developers [presentation]
A nice set of practical web/UI/tpyography design guidelines, naming specific sources (via Rob C)
-
’13 Security Gotchas You Should Know About’
Film4 Presents A Season Of Studio Ghibli Classics
hooray! Plenty of dubs, too, which is handy when you have little kids like mine ;)
(tags: studio-ghibli film4 movies anime animation to-watch tv)
The first pillar of agile sysadmin: We alert on what we draw
‘One of [the] purposes of monitoring systems was to provide data to allow us, as engineers, to detect patterns, and predict issues before they become production impacting. In order to do this, we need to be capturing data and storing it somewhere which allows us to analyse it. If we care about it – if the data could provide the kind of engineering insight which helps us to understand our systems and give early warning – we should be capturing it. ‘ …. ‘There are a couple of weaknesses in [Nagios’ design]. Assuming we’ve agreed that if we care about a metric enough to want to alert on it then we should be gathering that data for analysis, and graphing it, then we already have the data upon which to base our check. Furthermore, this data is not on the machine we’re monitoring, so our checks don’t in any way add further stress to that machine.’ I would add that if we are alerting on a different set of data from what we collect for graphing, then using the graphs to investigate an alarm may run into problems if they don’t sync up.
(tags: devops monitoring deployment production sysadmin ops alerting metrics)
JPL Institutional Coding Standard for the Java Programming Language
From JPL’s Laboratory for Reliable Software (LaRS). Great reference; there’s some really useful recommendations here, and good explanations of familiar ones like “prefer composition over inheritance”. Many are supported by FindBugs, too. Here’s the full list:
compile with checks turned on; apply static analysis; document public elements; write unit tests; use the standard naming conventions; do not override field or class names; make imports explicit; do not have cyclic package and class dependencies; obey the contract for equals(); define both equals() and hashCode(); define equals when adding fields; define equals with parameter type Object; do not use finalizers; do not implement the Cloneable interface; do not call nonfinal methods in constructors; select composition over inheritance; make fields private; do not use static mutable fields; declare immutable fields final; initialize fields before use; use assertions; use annotations; restrict method overloading; do not assign to parameters; do not return null arrays or collections; do not call System.exit; have one concept per line; use braces in control structures; do not have empty blocks; use breaks in switch statements; end switch statements with default; terminate if-else-if with else; restrict side effects in expressions; use named constants for non-trivial literals; make operator precedence explicit; do not use reference equality; use only short-circuit logic operators; do not use octal values; do not use floating point equality; use one result type in conditional expressions; do not use string concatenation operator in loops; do not drop exceptions; do not abruptly exit a finally block; use generics; use interfaces as types when available; use primitive types; do not remove literals from collections; restrict numeric conversions; program against data races; program against deadlocks; do not rely on the scheduler for synchronization; wait and notify safely; reduce code complexity
(tags: nasa java reference guidelines coding-standards jpl reliability software coding oo concurrency findbugs bugs)
KDE’s brush with git repository corruption: post-mortem
a barely-averted disaster… phew.
while we planned for the case of the server losing a disk or entirely biting the dust, or the total loss of the VM’s filesystem, we didn’t plan for the case of filesystem corruption, and the way the corruption affected our mirroring system triggered some very unforeseen and pathological conditions. […] the corruption was perfectly mirrored… or rather, due to its nature, imperfectly mirrored. And all data on the anongit [mirrors] was lost.
One risk demonstrated: by trusting in mirroring, rather than a schedule of snapshot backups covering a wide time range, they nearly had a major outage. Silent data corruption, and code bugs, happen — backups protect against this, but RAID, replication, and mirrors do not. Another risk: they didn’t have a rate limit on project-deletion, which resulted in the “anongit” mirrors deleting their (safe) data copies in response to the upstream corruption. Rate limiting to sanity-check automated changes is vital. What they should have had in place was described by the fix: ‘If a new projects file is generated and is more than 1% different than the previous file, the previous file is kept intact (at 1500 repositories, that means 15 repositories would have to be created or deleted in the span of three minutes, which is extremely unlikely).’(tags: rate-limiting case-studies post-mortems kde git data-corruption risks mirroring replication raid bugs backups snapshots sanity-checks automation ops)
-
Metrics rule the roost — I guess there’s been a long history of telemetry in space applications.
To make software more visible, you need to know what it is doing, he said, which means creating “metrics on everything you can think of”…. Those metrics should cover areas like performance, network utilization, CPU load, and so on. The metrics gathered, whether from testing or real-world use, should be stored as it is “incredibly valuable” to be able to go back through them, he said. For his systems, telemetry data is stored with the program metrics, as is the version of all of the code running so that everything can be reproduced if needed. SpaceX has programs to parse the metrics data and raise an alarm when “something goes bad”. It is important to automate that, Rose said, because forcing a human to do it “would suck”. The same programs run on the data whether it is generated from a developer’s test, from a run on the spacecraft, or from a mission. Any failures should be seen as an opportunity to add new metrics. It takes a while to “get into the rhythm” of doing so, but it is “very useful”. He likes to “geek out on error reporting”, using tools like libSegFault and ftrace. Automation is important, and continuous integration is “very valuable”, Rose said. He suggested building for every platform all of the time, even for “things you don’t use any more”. SpaceX does that and has found interesting problems when building unused code. Unit tests are run from the continuous integration system any time the code changes. “Everyone here has 100% unit test coverage”, he joked, but running whatever tests are available, and creating new ones is useful. When he worked on video games, they had a test to just “warp” the character to random locations in a level and had it look in the four directions, which regularly found problems. “Automate process processes”, he said. Things like coding standards, static analysis, spaces vs. tabs, or detecting the use of Emacs should be done automatically. SpaceX has a complicated process where changes cannot be made without tickets, code review, signoffs, and so forth, but all of that is checked automatically. If static analysis is part of the workflow, make it such that the code will not build unless it passes that analysis step. When the build fails, it should “fail loudly” with a “monitor that starts flashing red” and email to everyone on the team. When that happens, you should “respond immediately” to fix the problem. In his team, they have a full-size Justin Bieber cutout that gets placed facing the team member who broke the build. They found that “100% of software engineers don’t like Justin Bieber”, and will work quickly to fix the build problem.
(tags: spacex dev coding metrics deplyment production space justin-bieber)
-
‘the story of ketchup is a story of globalization and centuries of economic domination by a world superpower. But the superpower isn’t America, and the century isn’t ours. Ketchup’s origins in the fermented sauces of China and Southeast Asia mean that those little plastic packets under the seat of your car are a direct result of Chinese and Asian domination of a single global world economy for most of the last millenium.’
(tags: ketchup china nam-pla food etymology condiments history trade)
-
now this is a neat trick — having been stuck having to flip to spares and do other antics while a long-running heap dump took place, this is a winner.
Dumping a JVM’s heap is an extremely useful tool for debugging problems with a J2EE application. Unfortunately, when a JVM explodes, using the standard jmap tool can take an inordinate amount of time to execute for lots of different reasons. This leads to extended downtime when a heap dump is attempted and even then, jmap regularly fails. This blog post is intended to outline an alternate method using [gdb] to achieve a heap dump that only requires mere seconds of additional downtime allowing the slow jmap process to happen once the application is back in service.
(tags: heap-dump gdb heap jvm java via:peakscale gcore core core-dump debugging)
-
‘Edition has a ‘design for life’ philosophy – we think that unique designer-made items can be a part of our everyday lives without costing the earth. We stock affordable, contemporary and functional products (mostly handmade), including jewellery, home-ware, accessories, art and toys. Every item has been carefully selected and are all designed here in Ireland.’
BBC Test Card image (1080p HD version)
via colinwh. The de-facto standard HTPC desktop background
(tags: htpc desktops hd 1080p bbc test-card tv scary-clowns)
-
Neil Fraser visits a school in Vietnam, and investigates their computer science curriculum. They are doing an incredible job, it looks like — very impressive!
(tags: vietnam programming education cs computer-science schools coding children)
TOSEC: Commodore C64 (2012-04-23) : Free Download & Streaming : Internet Archive
A massive, 6.5GB collection of C64 history.
There are an astounding 134,000+ disk, cassette and documentation items in this Commodore 64 collection, including games, demos, cractros, and compilations.
(tags: commodore c64 history computing software demos archive)
By the numbers: How Google Compute Engine stacks up to Amazon EC2
Scalr’s thoughts on Google’s EC2 competitor.
with Google Compute Engine, AWS has a formidable new competitor in the public cloud space, and we’ll likely be moving some of Scalr’s production workloads from our hybrid aws-rackspace-softlayer setup to it when it leaves beta. There’s a strong technical case for migrating heavy workloads to GCE, and I’ll be grabbing popcorn to eagerly watch as the battle unfolds between the giants.
-
realtime collaboration API. nifty! but can it collaborate on a per-app shared doc, or does it require that the app user auth to Google and access their own docs?
(tags: collaboration api realtime google javascript)
Percona Playback’s tcpdump plugin
Capture MySQL traffic via tcpdump, tee it over the network to replay against a second database. Even supports query execution times and pauses between queries to playback the same load level
(tags: tcpdump production load-testing testing staging tee networking netcat percona replay mysql)
Riak CS is now ASL2 open source
‘Organizations and users can now access the source code on Github and download the latest packages from the downloads page. Also, today, we announced that Riak CS Enterprise is now available as commercial licensed software, featuring multi-datacenter replication technology and 24×7 Basho customer support.’
(tags: riak riak-cs nosql storage basho open-source github apache asl2)
Hadoop Operations at LinkedIn [slides]
another good Hadoop-at-scale presentation, from LI this time
Sift Science says it can sniff out cyber fraud — before it gets expensive
Great idea for a startup. This stuff is complex, right in the heart of every company’s ordering pipeline, and I can see a lot of customers for this
(tags: sift-science anti-fraud fraud b2b b2c ecommerce startups aws)
What would you do: Part 2, the Island of Surpyc
Amazing. ‘Cyprus Bailout Choose Your Own Adventure’, basically
(tags: cyoa adventure dice games cyprus politics eu bailouts ecb banking troika)
Running the Largest Hadoop DFS Cluster
Facebook’s 1PB Hadoop cluster. features improved NameNode availability work and 4 levels of data aging, with reduced replication and Reed-Solomon RAID encoding for colder data ages
(tags: aging data facebook hadoop hdfs reed-solomon error-correction replication erasure-coding)
The America Invents Act: Fighting Patent Trolls With “Prior Art”
Don Marti makes some suggestions regarding the America Invents Act: record your work’s timeline; use the new Post-Grant Challenging process; and use the new “prior user” defence, which lets you rely on your own non-public uses.
many of the best practices for tracking new versions of software and other digital assets can also help protect you against patent trolls. It’s a good time to talk to your lawyer about a defensive strategy, and to connect that strategy to your version control and deployment systems to make sure you’re collecting and retaining all of the information that could help you under this new law.
(tags: swpats patent-trolls patenting us prior-art)
Announcing the Voldemort 1.3 Open Source Release
new release from LinkedIn — better p90/p99 PUT performance, improvements to the BDB-JE storage layer, massively-improved rebalance performance
(tags: voldemort linkedin open-source bdb nosql)
Data Corruption To Go: The Perils Of sql_mode = NULL « Code as Craft
bloody hell. A load of cases where MySQL will happily accommodate all sorts of malformed and invalid input — thankfully with fixes
(tags: mysql input corrupt invalid validation coding databases sql)
-
a high-performance C server which is used to expose bloom filters and operations over them to networked clients. It uses a simple ASCII protocol which is human readable, and similar to memcached.
(via Tony Finch)(tags: via:fanf memcached bloomd open-source bloom-filters)
Thoughts on configuration file complexity
some interesting thoughts on the old “Turing complete configuration language” question
(tags: configuration turing-complete programming ops testing)
From a monolithic Ruby on Rails app to the JVM
How Soundcloud have ditched the monolithic Rails for nimbler, small-scale distributed polyglot services running on the JVM
(tags: soundcloud rails slides jvm scalability ruby scala clojure coding)
Opinion: The Internet is a surveillance state
Bruce Schneier op-ed on CNN.com.
So, we’re done. Welcome to a world where Google knows exactly what sort of porn you all like, and more about your interests than your spouse does. Welcome to a world where your cell phone company knows exactly where you are all the time. Welcome to the end of private conversations, because increasingly your conversations are conducted by e-mail, text, or social networking sites. And welcome to a world where all of this, and everything else that you do or is done on a computer, is saved, correlated, studied, passed around from company to company without your knowledge or consent; and where the government accesses it at will without a warrant. Welcome to an Internet without privacy, and we’ve ended up here with hardly a fight.
(tags: freedom surveillance legal privacy internet bruce-schneier web google facebook)
Single Producer/Consumer lock free Queue step by step
great dissection of Martin “Disruptor” Thompson’s lock-free single-producer/single-consumer queue data structure, with benchmark results showing crazy speedups. This is particularly useful since it’s a data structure that can be used to provide good lock-free speedups without adopting the entire Disruptor design pattern.
(tags: disruptor coding java jvm martin-thompson lock-free volatile atomic queue data-structures)
Roko’s basilisk – RationalWiki
Wacky transhumanists.
Roko’s basilisk is notable for being completely banned from discussion on LessWrong, where any mention of it is deleted. Eliezer Yudkowsky, founder of LessWrong, considers the basilisk would not work, but will not explain why because he does not consider open discussion of the notion of acausal trade with possible superintelligences to be provably safe. Silly over-extrapolations of local memes are posted to LessWrong quite a lot; almost all are just downvoted and ignored. But this one, Yudkowsky reacted to hugely, then doubled-down on his reaction. Thanks to the Streisand effect, discussion of the basilisk and the details of the affair soon spread outside of LessWrong. The entire affair is a worked example of spectacular failure at community management and at controlling purportedly dangerous information. Some people familiar with the LessWrong memeplex have suffered serious psychological distress after contemplating basilisk-like ideas — even when they’re fairly sure intellectually that it’s a silly problem.[5] The notion is taken sufficiently seriously by some LessWrong posters that they try to work out how to erase evidence of themselves so a future AI can’t reconstruct a copy of them to torture.[6]
(tags: transhumanism funny insane stupid singularity ai rokos-basilisk via:maciej lesswrong rationalism superintelligences striesand-effect absurd)
How the America Invents Act Will Change Patenting Forever
Bet you didn’t think the US software patents situation could get worse? wrong!
“Now it’s really important to be the first to file, and it’s really important to file before somebody else puts a product out, or puts the invention in their product,” says Barr, adding that it will “create a new urgency on the part of everyone to file faster — and that’s going to be a problem for the small inventor.”
(tags: first-to-file omnishambles uspto swpats patents software-patents law legal)
Distributed Systems Tracing with Zipkin
Twitter’s version of the “canary”/”tracer” request concept
(tags: twitter zipkin tracing tracer-requests canary-requests http debugging production live distributed-systems distcomp stack infrastructure ops)
Transitioning from Google Reader to feedly
xpecting for some time: We have been working on a project called Normandy which is a feedly clone of the Google Reader API – running on Google App Engine. When Google Reader shuts down, feedly will seamlessly transition to the Normandy back end.
Excellent stuff — I’ve just tried feedly and it’s looking good — in fact it may be a better UI overall anyway.(tags: feedly google-reader transition rss atom feeds web)
Double vision: seeing both sides of Syria’s war
A skirmish is filmed, using HD video cameras, by both sides. Storyful pinpoint the location. War as panopticon
(tags: storyful war syria future tanks battle video youtube hd panopticon)
Using DiffMerge as your Git visual merge and diff tool
A decent 3-way-diff GUI merge tool which works with git on OSX. “git config” command-lines included in this blog post
(tags: git merge osx mac macosx diff mergetool merging cli diffmerge)
-
A bunch of magic command lines to set useful OS X prefs without pointy-clicky. at least some also seem to work on Mountain Lion
-
‘bootstrap an OSX development machine with a one-liner’.
Many teams use chef to manage their production machines, but developers often build their development boxes by hand. SoloWizard makes it painless to create a configurable chef solo script to get your development machine humming: mysql, sublime text, .bash_profile tweaks to OS-X settings – it’s all there!
(tags: osx chef mac build-out ops macosx deployment developers desktops laptops mysql rabbitmq activemq nginx)
-
‘Our results suggest that the Cablevision decision, [which was widely seen as easing certain ambiguities surrounding intellectual property], led to additional incremental investment in U.S. cloud computing firms that ranged from $728 million to approximately $1.3 billion over the two-and-a-half years after the decision. When paired with the findings of the enhanced effects of VC investment relative to corporate investment, this may be the equivalent of $2 to $5 billion in traditional R&D investment.’ via Fred Logue.
(tags: via:fplogue law ip copyright policy cablevision funding vc cloud-computing investment legal buffering)
A History Of Ireland In 100 Objects
Now free!
The Royal Irish Academy, the National Museum of Ireland, and The Irish Times are collaborating with the EU Presidency, the Department of Foreign Affairs and Trade and Adobe to bring you a gift of A History of Ireland in 100 objects ‘from the people of Ireland to the people of the world’ for St Patrick’s Day. It is available as an interactive app for Apple iPhone and iPad, for most Android tablets and on the Kindle Fire, from our website, as well as associated app stores. You can also experience the book on your computer, smartphone or eReader by clicking on the ‘eBook’ button below. The gift is free to download until the end of March.
(tags: free st-patricks-day museum ireland history objects eu apps iphone ipad android books ebooks)
First 5 Minutes Troubleshooting A Server
quite a good checklist of first steps for troubleshooting. Worth bookmarking for “dstat –top-io –top-bio” alone, which is an absolutely excellent tool and new to me
(tags: dstat server io disks hardware performance linux sysadmin ops troubleshooting checklists root-cause)
-
you really know you’ve made it as an inept Irish politician when Panti Bliss gets dressed up in her most senatorial wig to take the mickey out of you
(tags: funny comedy fidelma-healy-eames politics ireland social-media inept youtube video)
Confusion reigns over three “hijacked” ccTLDs
This kind of silliness is only likely to increase as the number of TLDs increases (and they become more trivial).
What seems to be happening here is that [two companies involved] have had some kind of dispute, and that as a result the registrants and the reputation of three countries’ ccTLDs have been harmed. Very amateurish.
(tags: tlds domains via:fanf amateur-hour dns cctlds registrars adamsnames)
-
interesting details about Riak’s support for secondary indexes. Not quite SQL, but still more powerful than plain old K/V storage (via dehora)
(tags: via:dehora riak indexes storage nosql key-value-stores 2i range-queries)
Metric Collection and Storage with Cassandra | DataStax
DataStax’ documentation on how they store TSD data in Cass. Pretty generic
(tags: datastax nosql metrics analytics cassandra tsd time-series storage)
Jeff Dean’s list of “Numbers Everyone Should Know”
from a 2007 Google all-hands, the list of typical latency timings from ranging from an L1 cache reference (0.5 nanoseconds) to a CA->NL->CA IP round trip (150 milliseconds).
(tags: performance latencies google jeff-dean timing caches speed network zippy disks via:kellabyte)
-
‘a columnar storage format that supports nested data’, from Twitter and Cloudera, encoded using Apache Thrift in a Dremel-based record shredding and assembly algorithm. Pretty crazy stuff:
We created Parquet to make the advantages of compressed, efficient columnar data representation available to any project in the Hadoop ecosystem. Parquet is built from the ground up with complex nested data structures in mind, and uses the record shredding and assembly algorithm described in the Dremel paper. We believe this approach is superior to simple flattening of nested name spaces. Parquet is built to support very efficient compression and encoding schemes. Multiple projects have demonstrated the performance impact of applying the right compression and encoding scheme to the data. Parquet allows compression schemes to be specified on a per-column level, and is future-proofed to allow adding more encodings as they are invented and implemented. Parquet is built to be used by anyone. The Hadoop ecosystem is rich with data processing frameworks, and we are not interested in playing favorites. We believe that an efficient, well-implemented columnar storage substrate should be useful to all frameworks without the cost of extensive and difficult to set up dependencies.
(tags: twitter cloudera storage parquet dremel columns record-shredding hadoop marshalling columnar-storage compression data)
Bunnie Huang’s “Hacking the Xbox” now available as a free PDF
‘No Starch Press and I have decided to release this free ebook version of Hacking the Xbox in honor of Aaron Swartz. As you read this book, I hope that you’ll be reminded of how important freedom is to the hacking community and that you’ll be inclined to support the causes that Aaron believed in. I agreed to release this book for free in part because Aaron’s treatment by MIT is not unfamiliar to me. In this book, you will find the story of when I was an MIT graduate student, extracting security keys from the original Microsoft Xbox. You’ll also read about the crushing disappointment of receiving a letter from MIT legal repudiating any association with my work, effectively leaving me on my own to face Microsoft. The difference was that the faculty of my lab, the AI laboratory, were outraged by this treatment. They openly defied MIT legal and vowed to publish my work as an official “AI Lab Memo,” thereby granting me greater negotiating leverage with Microsoft. Microsoft, mindful of the potential backlash from the court of public opinion over suing a legitimate academic researcher, came to a civil understanding with me over the issue.’ This is a classic text on hardware reverse-engineering and the freedom to tinker — strongly recommended.
(tags: hacking bunnie-huang xbox free hardware drm freedom-to-tinker books reading mit microsoft history)
Daemon Showdown: Upstart vs. Runit vs. Systemd vs. Circus vs. God
strangely, no mention of runit being total shite though
(tags: daemons runit upstart systemd supervisord circus god nannies processes unix crash-only-software linux ops)
-
Clojure-style lazy functional collections (via QCon via Caro)
(tags: via:caro collections java functional lazy-loading lazy-computation lazy clojure)