Skip to content

Justin's Linklog Posts

Links for 2013-04-23

Links for 2013-04-21

  • Swansea measles outbreak: was an MMR scare in the local press to blame?

    Sixteen years ago, journalists had a much easier job assembling “balanced” stories about MMR in south Wales. When I wrote about the measles outbreak last week, I suggested that it was related to Andrew Wakefield’s discredited 1998 Lancet research, but the Swansea contagion seems more likely to be the result of a separate scare a year earlier in the South Wales Evening Post. Before 1997, uptake of MMR in the distribution area of the Post was 91%, and 87.2% in the rest of Wales. After the Post’s campaign, uptake in the distribution area fell to 77.4% (it was 86.8% in the rest of Wales). That’s almost a 14% drop where the Post had influence, compared with less than 3% elsewhere. In the dry wording of the BMJ, “the [South West Evening Post] campaign is the most likely explanation”. In other words, what we can see in Swansea is the local effect of local reporting‚ in all probability, just a taster of what happens when the news irresponsibly creates unfounded terror. […] The 1997 coverage focused on a group of families who blamed MMR for various ailments in their children, including learning difficulties, digestive problems and autism‚ none of which have been found to have any connection with the vaccine. The Post’s coverage was at the time deemed a success, and in 1998 it won a prize for investigative reporting in the BT Wales Press Awards. That year, the SWEP ran at least 39 stories related to the alleged dangers of MMR. And yes, it’s true that the paper never directly endorsed non-vaccination. What it did do was publicise the idea of “vaccine damage” as a risk, one that parents would then likely weigh up against the risk of contracting measles, mumps or rubella. And this went beyond the reporting of parental anxieties‚ it was part of the Post’s editorial line. One article is entitled “Young bodies cannot take it”. The all-important “journalistic balance” was constantly available, thanks to campaigning parents and their solicitor Richard Barr. (It was Barr who engaged Wakefield for a lawsuit, leading to the “fishing expedition” research that became the Lancet paper.) They were happy to provide a quote on the dangers of the “triple jab”, which health authorities were then obliged to rebut politely. The Post also seemed to downplay the risk of measles, reporting on 6 July 1998 that “not a single child has been hit by the illness‚ despite a 13% drop in take-up levels”. It’s not parents who should feel embarrassed by the Swansea measles outbreak: some may have acted from overt dread at the prospect of harming their child, and some simply from omission, but all were encouraged by a press that focused on non-existent risks and downplayed the genuine horror of the diseases MMR prevents. The shame belongs to journalists: those of the South West Evening Post who allowed themselves to be recruited in the service of a speculative lawsuit, and any who let a specious devotion to “balance” overrule a duty to tell the truth.

    (tags: south-wales wales mmr health vaccination scares journalism ethics disease measles south-wales-evening-post)

  • Under the Covers of DynamoDB

    mostly a DynamoDB puff-piece from last week’s Amazon Cloud Connect, but contains some good real-world figures for a 20-billion-GUID deduping table use-case at end. ($4,150 per month, to cut to the chase)

    (tags: dynamodb aws figures costs architecture ec2 dedupe cloud-connect slides)

Links for 2013-04-20

  • Excel, untestability, and the reliability of quants

    Wow, this is a great software-quality story — I knew Excel was the most widely used programming environment out there, but this is a factor I’d overlooked:

    In his remarks on the final panel, Frank Partnoy mentioned something I missed when it came out a few weeks ago: the role of Microsoft Excel in the “London Whale” trading debacle. [..] To summarize: JPMorgan’s Chief Investment Office needed a new value-at-risk (VaR) model for the synthetic credit portfolio (the one that blew up) and assigned a quantitative whiz […] to create it. The new model “operated through a series of Excel spreadsheets, which had to be completed manually, by a process of copying and pasting data from one spreadsheet to another.” The internal Model Review Group identified this problem as well as a few others, but approved the model, while saying that it should be automated and another significant flaw should be fixed. After the London Whale trade blew up, the Model Review Group discovered that the model had not been automated and found several other errors. Most spectacularly, “After subtracting the old rate from the new rate, the spreadsheet divided by their sum instead of their average, as the modeler had intended. This error likely had the effect of muting volatility by a factor of two and of lowering the VaR …” I write periodically about the perils of bad software in the business world in general and the financial industry in particular, by which I usually mean back-end enterprise software that is poorly designed, insufficiently tested, and dangerously error-prone. But this is something different. […] While Excel the program is reasonably robust, the spreadsheets that people create with Excel are incredibly fragile. There is no way to trace where your data come from, there’s no audit trail (so you can overtype numbers and not know it), and there’s no easy way to test spreadsheets, for starters. The biggest problem is that anyone can create Excel spreadsheets — badly. Because it’s so easy to use, the creation of even important spreadsheets is not restricted to people who understand programming and do it in a methodical, well-documented way. This is why the JPMorgan VaR model is the rule, not the exception: manual data entry, manual copy-and-paste, and formula errors. This is another important reason why you should pause whenever you hear that banks’ quantitative experts are smarter than Einstein, or that sophisticated risk management technology can protect banks from blowing up. At the end of the day, it’s all software. While all software breaks occasionally, Excel spreadsheets break all the time. But they don’t tell you when they break: they just give you the wrong number.

    (tags: excel reliability software coding ides jpmorgan value-at-risk finance london-whale quants spreadsheets unit-tests testability testing)

  • Riak, CAP, and eventual consistency

    Good (albeit draft) write-up of the implications of CAP, allow_mult, and last_write_wins conflict-resolution policies in Riak:

    As Brewer’s CAP theorem established, distributed systems have to make hard choices. Network partition is inevitable. Hardware failure is inevitable. When a partition occurs, a well-behaved system must choose its behavior from a spectrum of options ranging from “stop accepting any writes until the outage is resolved” (thus maintaining absolute consistency) to “allow any writes and worry about consistency later” (to maximize availability). Riak leans toward the availability end of the spectrum, but allows the operator and even the developer to tune read and write requests to better meet the business needs for any given set of data.

    (tags: riak cap eventual-consistency distcomp distributed-systems partition last-write-wins voldemort allow_mult)

  • How You Can Help Save Upcoming.org, Posterous, and More

    Yahoo! sucks. shutting down in days? ArchiveTeam Warrior to the rescue; install the VM!

    (tags: archival yahoo shutdowns upcoming waxy archives virtualbox)

  • The Excel Depression – NYTimes.com

    Krugman on the Reinhart-Rogoff Excel-bug fiasco.

    What the Reinhart-Rogoff affair shows is the extent to which austerity has been sold on false pretenses. For three years, the turn to austerity has been presented not as a choice but as a necessity. Economic research, austerity advocates insisted, showed that terrible things happen once debt exceeds 90 percent of G.D.P. But “economic research” showed no such thing; a couple of economists made that assertion, while many others disagreed. Policy makers abandoned the unemployed and turned to austerity because they wanted to, not because they had to. So will toppling Reinhart-Rogoff from its pedestal change anything? I’d like to think so. But I predict that the usual suspects will just find another dubious piece of economic analysis to canonize, and the depression will go on and on.

    (tags: paul-krugman economics excel coding bugs software austerity debt)

Links for 2013-04-19

  • Vaccination ‘herd immunity’ demonstration

    ‘Stochastic monte-carlo epidemic SIR model to reveal herd immunity’. Fantastic demo of this important medical concept (via Colin Whittaker)

    (tags: via:colinwh stochastic herd-immunity random sir epidemics health immunity vaccination measles medicine monte-carlo-simulations simulations)

  • Fred’s ImageMagick Scripts: SIMILAR

    compute an image-similarity metric, to discover mostly-identical-but-slightly-tweaked images:

    SIMILAR computes the normalized cross correlation similarity metric between two equal dimensioned images. The normalized cross correlation metric measures how similar two images are, not how different they are. The range of ncc metric values is between 0 (dissimilar) and 1 (similar). If mode=g, then the two images will be converted to grayscale. If mode=rgb, then the two images first will be converted to colorspace=rgb. Next, the ncc similarity metric will be computed for each channel. Finally, they will be combined into an rms value.
    (via Dan O’Neill)

    (tags: image photos pictures similar imagemagick via:dano metrics similarity)

  • A Slower Speed of Light

    a first-person game prototype in which players navigate a 3D space while picking up orbs that reduce the speed of light in increments. Custom-built, open-source relativistic graphics code allows the speed of light in the game to approach the player’s own maximum walking speed. Visual effects of special relativity gradually become apparent to the player, increasing the challenge of gameplay. These effects, rendered in realtime to vertex accuracy, include the Doppler effect (red- and blue-shifting of visible light, and the shifting of infrared and ultraviolet light into the visible spectrum); the searchlight effect (increased brightness in the direction of travel); time dilation (differences in the perceived passage of time from the player and the outside world); Lorentz transformation (warping of space at near-light speeds); and the runtime effect (the ability to see objects as they were in the past, due to the travel time of light). Players can choose to share their mastery and experience of the game through Twitter. A Slower Speed of Light combines accessible gameplay and a fantasy setting with theoretical and computational physics research to deliver an engaging and pedagogically rich experience.

    (tags: games physics mit science light relativity)

  • Eventual Consistency Today: Limitations, Extensions, and Beyond – ACM Queue

    Good overview of the current state of eventually-consistent data store research, covering CALM and CRDTs, from Peter Bailis and Ali Ghodsi

    (tags: eventual-consistency data storage horizontal-scaling research distcomp distributed-systems via:martin-thompson crdts calm acid cap)

  • Latency’s Worst Nightmare: Performance Tuning Tips and Tricks [slides]

    the basics of running a service stack (web, app servers, data stores) on AWS. some good benchmark figures in the final slides

    (tags: benchmarks aws ec2 ebs piops services scaling scalability presentations)

  • Rob “b3ta” Manuel in Dublin next week

    The Bottom Half Of The Internet — “Racism; typos; filth; spam; ignorance; rage – that’s all the bottom half of the internet is good for, right? Rob Manuel wants you to question the internet dictum, most beloved of high-profile columnists, that you should ignore all of the comments all of the time. The ‘war on comments’, he reckons, might just be an echo of a fourth estate that’s having trouble adjusting to the idea of an unwashed public disagreeing with their sacred opinions. Sous les pavés, la plage.” On Tuesday, le cool Dublin & Pilcrow present SPIEL. Rob Manuel is the flashy animator behind B3ta and he’s joined by Ed Melvin, who wants to educate you on ‘The Unreal Engines’ of virtual currencies and economies.

    (tags: rob-manuel b3ta dublin comments internet meetings talks lecool)

Links for 2013-04-18

  • Reality, Reactivity, Relevance and Repeatability in Java Application Profiling

    this product from JInspired appears to support runtime profiling of java apps with < 5% performance impact

    (tags: profiling performance java coding measurement)

  • You Lookin’ At Me? Reflections on Google Glass

    ex-Nokia product design guru Jan Chipchase on Google Glass

    (tags: google privacy technology google-glass pervasive-computing life future)

  • Not the ‘best in the world’ – The Medical Independent

    Debunking this prolife talking point:

    ‘Our maternity services are amongst the best in the world’. This phrase has been much hackneyed since the heartbreaking death of Savita Halappanavar was revealed in mid October. James Reilly and other senior politicians are particularly guilty of citing this inaccurate position. So what is the state of Irish maternity services and how do our figures compare with other comparable countries? Let’s start with the statistics.
    The bottom line:
    Eight deaths per 100,000 is not bad, but it ranks our maternity services far from the best in world and below countries such as Slovakia and Poland.

    (tags: pro-choice ireland savita medicine health maternity morbidity statistics)

  • How Kaggle Is Changing How We Work – Thomas Goetz – The Atlantic

    Founded in 2010, Kaggle is an online platform for data-mining and predictive-modeling competitions. A company arranges with Kaggle to post a dump of data with a proposed problem, and the site’s community of computer scientists and mathematicians — known these days as data scientists — take on the task, posting proposed solutions. […] On one level, of course, Kaggle is just another spin on crowdsourcing, tapping the global brain to solve a big problem. That stuff has been around for a decade or more, at least back to Wikipedia (or farther back, Linux, etc). And companies like TaskRabbit and oDesk have thrown jobs to the crowd for several years. But I think Kaggle, and other online labor markets, represent more than that, and I’ll offer two arguments. First, Kaggle doesn’t incorporate work from all levels of proficiency, professionals to amateurs. Participants are experts, and they aren’t working for benevolent reasons alone: they want to win, and they want to get better to improve their chances of winning next time. Second, Kaggle doesn’t just create the incidental work product, it creates a new marketplace for work, a deeper disruption in a professional field. Unlike traditional temp labor, these aren’t bottom of the totem pole jobs. Kagglers are on top. And that disruption is what will kill Joy’s Law. Because here’s the thing: the Kaggle ranking has become an essential metric in the world of data science. Employers like American Express and the New York Times have begun listing a Kaggle rank as an essential qualification in their help wanted ads for data scientists. It’s not just a merit badge for the coders; it’s a more significant, more valuable, indicator of capability than our traditional benchmarks for proficiency or expertise. In other words, your Ivy League diploma and IBM resume don’t matter so much as my Kaggle score. It’s flipping the resume, where your work is measurable and metricized and your value in the marketplace is more valuable than the place you work.

    (tags: academia datamining economics data kaggle data-science ranking work competition crowdsourcing contracting)

  • The useful JVM options

    a good reference, with lots of sample output. Not clear if it takes 1.6/1.7 differences into account, though

    (tags: jvm reference java ops hotspot command-line)

Links for 2013-04-16

  • Austerity policies founded on Excel typo

    You’ve probably heard that countries with a high debt:GDP ratio suffer from slow economic growth. The specific number 90 percent has been invoked frequently. That’s all thanks to a study conducted by Carmen Reinhardt and Kenneth Rogoff for their book This Time It’s Different. But the results have been difficult for other researchers to replicate. Now three scholars at the University of Massachusetts have done so in “Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff” and they find that the Reinhart/Rogoff result is based on opportunistic exclusion of Commonwealth data in the late-1940s, a debatable premise about how to weight the data, and most of all a sloppy Excel coding error. Read Mike Konczal for the whole rundown, but I’ll just focus on the spreadsheet part. At one point they set cell L51 equal to AVERAGE(L30:L44) when the correct procuedure was AVERAGE(L30:L49). By typing wrong, they accidentally left Denmark, Canada, Belgium, Austria, and Australia out of the average. When you run the math correctly “the average real GDP growth rate for countries carrying a public debt-to-GDP ratio of over 90 percent is actually 2.2 percent, not -0.1 percent.”

    (tags: austerity politics excel coding errors bugs spreadsheets economics economy)

  • Is Your MySQL Buffer Pool Warm? Make It Sweat!

    How GroupOn are warming up a failover warm MySQL spare, using Percona stuff and a “tee” of the live in-flight queries. (via Dave Doran)

    (tags: via:dave-doran mysql databases warm-spares spares failover groupon percona replication)

Links for 2013-04-15

  • So now you know who gets some of those excessive Ticketmaster fees….

    Interesting evidence; it appears Irish music promoters are getting “rebates” from the massive TicketMaster “booking fee”, on each ticket sold. This sounds like a cartel to me, and we need to regulate this. Where is the National Consumer Agency and Competition Authority?

    The matter is something which should be of concern to every gig-going music fan, regardless of whether they go to Stradbally or not. For years, many have asked about TicketMaster’s quasi-monopoly position in the marketplace and why this is so. We’ve always been told that promoters preferred to deal with one company rather than several and that TM’s systems and nationwide reach yadda yadda yadda was the bees’ knees etc. Other companies have tried to compete but no-one has been able to beat TM at this game. But why would promoters go elsewhere when they’re getting a slice of the TM fees back as rebates? Those past off-the-record attempts by and briefings from promoters blaming TM for those fees can now be seen as hypocritical. They’re sticking with TM because they’re receiving a take of the fees paid by punters who have no other choice in service provider if they want to get their hands on tickets. You wonder what the acts make of this cash-grab – perhaps some whip-smart agent is already making a claim for a percentage of the rebates because there would be no rebates in the first place without the act. Surely this is an issue for the Competition Authority and National Consumers Association too, given the manner in which the rebates are made and TM’s deals with the promoters? While promoters under TM deals are free to sell a certain proportion of their tickets with another provider, it’s usually only a very small percentage of the total and unlikely to trouble TM’s bottom line. Also, given that the rebates are volume-driven, it’s better for the promoters to keep the largest possible chunk of their business with TM. It seems that we have a new suspect in the blame game about why ticket prices are so high.

    (tags: regulation ireland cartels competition ticketing tickets ticketmaster music gigs consumer)

  • Blog shines spotlight on Dublin city’s illegal dumping problem

    Hooray, Eoin’s activism gets some coverage!

    THE SCALE OF Dublin’s dumping problem is laid bare in a blog that has seen contributors send in photos of chairs, fridges and heaps of rubbish strewn on city streets. Eoin Parker, one of organisers behind DublinLitterBlog.com, spoke to TheJournal.ie about the problem, saying that the blog was set up following the privatisation of waste management by Dublin City Council in 2012.

    (tags: dumping dublin litter rubbish blogs dcc d1 activism community)

  • Ked

    To our knowledge, Ked is the first scripting language to emerge from The People’s Republic of Cork. Below is an account of what we know so far about the mysterious Corkonian language. Any suggested updates or contributions are encouraged.
    Genius.

    (tags: coding cork jokes funny like languages programming)

  • Just how bad are RTE’s finances?

    A sobering examination by NAMAwinelake into the quagmire of Ireland’s publicly-funded national broadcaster:

    It seems that RTE has become a disaster zone, with libels and incompetence overseen by incapable management, and this is reflected in that organisation’s financial results. RTE still employs nearly 2,000 people and supports jobs and industry across independent producers and suppliers; it is a major business. But the time has come to call a halt to delusional management that is sinking the organization deeper into a quagmire which will ultimately need to be bailed out by the State. And Noel Curran is fobbing us off with flying a kite about a reduction in 65-year old Pat Kenny’s salary from €630,000 to €570,000?!

    (tags: rte namawinelake public funding finances money mismanagement ireland incompetence tv news)

  • High Scalability – Scaling Pinterest – From 0 to 10s of Billions of Page Views a Month in Two Years

    wow, Pinterest have a pretty hardcore architecture. Sharding to the max. This is scary stuff for me:

    a [Cassandra-style] Cluster Management Algorithm is a SPOF. If there’s a bug it impacts every node. This took them down 4 times.
    yeah, so, eek ;)

    (tags: clustering sharding architecture aws scalability scaling pinterest via:matt-sergeant redis mysql memcached)

Links for 2013-04-13

  • Expert in Savita inquiry confirms Irish women get lower standard of care with chorioamnionitis

    Dr. Jen Gunter again:

    Dr. Knowles’ testimony confirms for me that the law played a role, because her statements indicate the standard of care for treatment of chorioamnionitis is less aggressive in Ireland. This can only be because of the law as there is no medical evidence to support delaying delivery when chorioamnionitis is diagnosed. Standard of care is not to wait until a woman is sick enough to need a termination, the idea is to treat her, you know, before she gets sick enough. An elevated white count and ruptured membranes at 17 weeks is typically enough to make the diagnosis, so Dr. Knowles needs to testify as to what in Savita’s medical record made it safe to not recommend a delivery. By the way, I also disagree with Dr. Knowles about her interpretation of Savita’s medical record, the chart doesn’t have “subtle indicators” of infection, it screams chorioamnionitis long before Wednesday morning. In North America the standard of care with chorioamnionitis is to recommend delivery as soon as the diagnosis is made, not wait until women enter the antechamber of death in the hopes that we can somehow snatch them back from the brink. If Irish law, or the interpretation thereof, had nothing to do with Savita’s death no expert would be mentioning sick enough at all.

    (tags: jen-gunter ob-gyn medicine savita law ireland abortion tragedy galway hospital)

Links for 2013-04-12

Links for 2013-04-11

  • google-http-java-client

    Written by Google, this library is a flexible, efficient, and powerful Java client library for accessing any resource on the web via HTTP. It features a pluggable HTTP transport abstraction that allows any low-level library to be used, such as java.net.HttpURLConnection, Apache HTTP Client, or URL Fetch on Google App Engine. It also features efficient JSON and XML data models for parsing and serialization of HTTP response and request content. The JSON and XML libraries are also fully pluggable, including support for Jackson and Android’s GSON libraries for JSON.
    Not quite as simple an API as Python’s requests, sadly, but still an improvement on the verbose Apache HttpComponent API. Good support for unit testing via a built-in mock-response class. Still in beta

    (tags: google beta software http libraries json xml transports protocols)

  • Former IMF chief of mission to Ireland says not burning the bondholders was “a mistake”

    Former IMF chief of mission to Ireland, Ashoka Mody, above left with Ajai Chopra in 2010. Melancholy of eye and large of loafer, Ashoka was involved in negotiating Ireland’s EU/IMF bailout. […] This morning Ashok gave an interview to Gavin Jennings on Morning Ireland, in which he admitted Ireland’s bailout was riddled with mistakes, namely the non-burning of the senior bondholders and the program of austerity. Jennings: “So, if imposing austerity on Ireland was wrong, or a mistake; if not allowing any burning of bondholders, whether official, sovereign or private was a mistake; you were centrally involved in that program. I know Ajai Chopra was very much the public face of the IMF mission to Ireland. But you were centrally involved in constructing this bailout. How much responsibility do you take for those errors.” Mody: “Yes, so, obviously, I have to take the responsibility in…but I’m in very good company in taking responsibility in this. There were many parties involved. And my role really was to bring such matters to the attention of people who finally made these decisions.”
    Great.

    (tags: bondholders imf ireland economy default ajai-chopra ashoka-mody)

  • Savita Halappanavar’s inquest: the three questions that must be answered | Dr. Jen Gunter

    A professional OB/GYN analyses the horrors coming to light in the Savita inquest. Here’s one particular gem:

    Fetal survival with ruptured membranes at 17 weeks is 0%, this is from prospective study. […but] “real and substantial risk” to the woman’s life is what is required by the Irish constitution to terminate a pregnancy, *whether or not the foetus is viable*.
    So the foetus had 0% chance of survival — but still termination was not considered an option. Bloody hell.

    (tags: religion ireland savita horrors malpractice galway guh hospitals hse health inquest abortion pro-choice pregnancy)

Links for 2013-04-10

  • Minister Rabbitte welcomes EU agreement on re-use of Public Sector Information

    Lots of talk about “charging regimes”, “income-generating public sector bodies” etc., but not a single mention of open data or free access. Terrible stuff. :( (via conoro)

    (tags: via:conoro open-access government public-sector ireland eu open-data public free)

  • Compression in Kafka: GZIP or Snappy ?

    With Ack: in this mode, as far as compression is concerned, the data gets compressed at the producer, decompressed and compressed on the broker before it sends the ack to the producer. The producer throughput with Snappy compression was roughly 22.3MB/s as compared to 8.9MB/s of the GZIP producer. Producer throughput is 150% higher with Snappy as compared to GZIP. No ack, similar to Kafka 0.7 behavior: In this mode, the data gets compressed at the producer and it doesn’t wait for the ack from the broker. The producer throughput with Snappy compression was roughly 60.8MB/s as compared to 18.5MB/s of the GZIP producer. Producer throughput is 228% higher with Snappy as compared to GZIP. The higher compression savings in this test are due to the fact that the producer does not wait for the leader to re-compress and append the data; it simply compresses messages and fires away. Since Snappy has very high compression speed and low CPU usage, a single producer is able to compress the same amount of messages much faster as compared to GZIP.

    (tags: gzip snappy compression kafka streaming ops)

  • The Bw-Tree: A B-tree for New Hardware – Microsoft Research

    The emergence of new hardware and platforms has led to reconsideration of how data management systems are designed. However, certain basic functions such as key indexed access to records remain essential. While we exploit the common architectural layering of prior systems, we make radically new design decisions about each layer. Our new form of B tree, called the Bw-tree achieves its very high performance via a latch-free approach that effectively exploits the processor caches of modern multi-core chips. Our storage manager uses a unique form of log structuring that blurs the distinction between a page and a record store and works well with flash storage. This paper describes the architecture and algorithms for the Bw-tree, focusing on the main memory aspects. The paper includes results of our experiments that demonstrate that this fresh approach produces outstanding performance.

    (tags: bw-trees database paper toread research algorithms microsoft sql sql-server b-trees data-structures storage cache-friendly mechanical-sympathy)

  • Boundary Techtalk – Large-scale OLAP with Kobayashi

    Boundary on their TSD-on-Riak store.

    Dietrich Featherston, Engineer at Boundary, walks through the process of designing Kobayashi, the time-series analytics database behind our network metrics. He goes through the false-starts and lessons learned in effectively using Riak as the storage layer for a large-scale OLAP database.  The system is ultimately capable of answering complex, ad-hoc queries at interactive latencies.

    (tags: video boundary tsd riak eventual-consistency storage kobayashi olap time-series)

  • Adding Insult to Plagiary?

    A few days old, but already an instant Streisand-Effect classic:

    Sometimes people borrow [Colin Purrington’s free guide about making scientific posters] without giving him credit. This happens fairly regularly, and when he finds out about it, he sends an e-mail asking them to take it down. Usually they do. But when he sent an e-mail to the Consortium for Plant Biotechnology Research, asking that a roughly 1,200-word, near-verbatim, uncredited chunk from his guide be removed from the consortium’s materials, the response was unexpected. Rather than apologise, a lawyer sent him a cease-and-desist letter accusing him of plagiarizing the consortium’s materials and demanding that he take down his guide or face a lawsuit seeking damages up to $150,000.

    (tags: streisand-effect lawsuits law infringement copyright cpbr bullying science posters)

  • Kafka 0.8 Producer Performance

    Great benchmarking from Piotr Kozikowski at the LiveRamp team, into performance of the upcoming Kafka 0.8 release

    (tags: performance kafka apache benchmarks ops queueing)

  • Running a Multi-Broker Apache Kafka 0.8 Cluster on a Single Node

    an excellent writeup on Kafka 0.8’s use and operation, including details of the new replication features

    (tags: kafka replication queueing distributed ops)

  • Ah Here (To Coin A Phrase)

    ‘A €10 silver coin being offered for sale to the public in honour of James Joyce by the Central Bank tomorrow contains a misquote from the author. The line used on the coin from Chapter 3 of Ulysses includes a superfluous conjunction – a rogue ‘that’.’ [..] The coin reads:

    “Ineluctable modality of the visible: at least that if no more, thought through my eyes. Signatures of all things *that* I am here to read.”
    (Incorrect ‘that’ emphasised)

    (tags: for:robotwisdom james-joyce typos funny fail central-bank ireland coins minting errors ulysses)

Links for 2013-04-09

  • Netflix ISP Speed Index for Ireland

    Via Mulley. Magnet doing well, with UPC coming second; UPC have dropped a fair bit in the past month. Would love to see it broken down by region…

    (tags: upc ireland isps speed bandwidth netflix broadband magnet eircom)

  • Why I’m Walking Away From CouchDB

    In practice there are two gotchas that are so painful I am  looking for a replacement with a different featureset than couchdb provides. The location tracking project icecondor.com uses couchdb to store 20,000 new records per day. It has more write traffic than read traffic and runs on modest hardware. Those two gotchas are: 1. View Index updates. While I have a vague understanding of why view index updates are slow and bulky and important, in practice it is unworkable. Every write sets up a trap for the first reader to come along after the write. The more writes there are, the bigger the trap for the first reader which has to wait on the couchdb process that refreshes the view index on an as-needed basis. I believe this trade-off was made to keep writes fast. No need to update the view index until all writes are actually complete, right? Write traffic is heavier than read traffic and the time needed for that index refresh causes the webapp to crash because its not setup to handle timeouts from a database query. The workaround is as hackish as one can imagine –  cron jobs to hit every  map/reduce query to keep indexes fresh. 2. Append only database file Append only is in theory a great way to ensure on-disk reliability. A system crash during an append should only affect that append. Its a crash during an update to existing parts of the file that risks the integrity of more than whats being updated. With so many layers of caching and optimizations in the kernel and the filesystem and now in the workings of SSD drives, I’m not sure append-only gives extra protection anymore. What it does do is a create a huge operational headache. The on-disk file can never grow beyond half the available storage space. Record deletion uses new disk space and if the half-full mark approaches, vacuuming must be done. The entire database is rewritten to the filesystem, leaving out no longer needed records. If the data file should happen to grow beyond half the partition, the system has esentially crashed because there is no way to compact the file and soon the partition will be full. This is a likely scenario when there is a lot of record deletion activity. The system in question does a lot of writes of temporary data that is followed up by deletes a few days later. There is also a lot of permanent storage that hardly gets used. Rewriting every byte of the records that are long-lived due to compaction is an enormous amount of wasted I/O – doubly so given SSD drives have a short write-cycle lifespan.

    (tags: nosql couchdb consistency checkpointing databases data-stores indexing)

  • CouchDB: not drinking the kool-aid

    Jonathan Ellis on some CouchDB negatives:

    Here are some reasons you should think twice and do careful testing before using CouchDB in a non-toy project: Writes are serialized.  Not serialized as in the isolation level, serialized as in there can only be one write active at a time.  Want to spread writes across multiple disks?  Sorry. CouchDB uses a MVCC model, which means that updates and deletes need to be compacted for the space to be made available to new writes.  Just like PostgreSQL, only without the man-years of effort to make vacuum hurt less. CouchDB is simple.  Gloriously simple.  Why is that a negative?  It’s competing with systems (in the popular imagination, if not in its author’s mind) that have been maturing for years.  The reason PostgreSQL et al have those features is because people want them.  And if you don’t, you should at least ask a DBA with a few years of non-MySQL experience what you’ll be missing.  The majority of CouchDB fans don’t appear to really understand what a good relational database gives them, just as a lot of PHP programmers don’t get what the big deal is with namespaces. A special case of simplicity deserves mention: nontrivial queries must be created as a view with mapreduce.  MapReduce is a great approach to trivially parallelizing certain classes of problem.  The problem is, it’s tedious and error-prone to write raw MapReduce code.  This is why Google and Yahoo have both created high-level languages on top of it (Sawzall and Pig, respectively).  Poor SQL; even with DSLs being the new hotness, people forget that SQL is one of the original domain-specific languages.  It’s a little verbose, and you might be bored with it, but it’s much better than writing low-level mapreduce code.

    (tags: cassandra couch nosql storage distributed databases consistency)

  • What is the CouchDB replication protocol? Is it like Git? – Stack Overflow

    Good write up of CouchDB replication

    (tags: protocols couchdb sync replication git mvcc databases merging timelines)

  • TouchDB’s reverse-engineered write-up of the Couch replication protocol

    There really isn’t a separate “protocol” per se for replication. Instead, replication uses CouchDB’s REST API and data model. It’s therefore a bit difficult to talk about replication independently of the rest of CouchDB. In this document I’ll focus on the algorithm used, and link to documentation of the APIs it invokes. The “protocol” is simply the set of those APIs operating over HTTP.

    (tags: couchdb protocols touchdb nosql replication sync mvcc revisions rest)

  • Protect your designs

    A good writeup of how to detect cases of copyright infringement for photography, art and other visual media.

    Von Glitschka, Modern Dog and myriad others make clear that the support of the creative community is absolutely vital in raising awareness of copyright infringements. Sites like www.youthoughtwewouldntnotice.com name and shame clear breaches of copyright, while the Modern Dog case shows that there is no better IP tracing system than the eyes and ears of the design community itself. “It’s the industry at large that has kept me aware of infringements,” states Von. “Without that I would miss most of them because I don’t go looking – they find me via the eyes of others.”

    (tags: photography art visual-media copyright infringement piracy ripping)

  • FastBit: An Efficient Compressed Bitmap Index Technology

    an [LGPL] open-source data processing library following the spirit of NoSQL movement. It offers a set of searching functions supported by compressed bitmap indexes. It treats user data in the column-oriented manner similar to well-known database management systems such as Sybase IQ, MonetDB, and Vertica. It is designed to accelerate user’s data selection tasks without imposing undue requirements. In particular, the user data is NOT required to be under the control of FastBit software, which allows the user to continue to use their existing data analysis tools. The key technology underlying the FastBit software is a set of compressed bitmap indexes. In database systems, an index is a data structure to accelerate data accesses and reduce the query response time. Most of the commonly used indexes are variants of the B-tree, such as B+-tree and B*-tree. FastBit implements a set of alternative indexes called compressed bitmap indexes. Compared with B-tree variants, these indexes provide very efficient searching and retrieval operations, but are somewhat slower to update after a modification of an individual record. A key innovation in FastBit is the Word-Aligned Hybrid compression (WAH) for the bitmaps.[…] Another innovation in FastBit is the multi-level bitmap encoding methods.

    (tags: fastbit nosql algorithms indexing search compressed-bitmaps indexes wah bitmaps compression)

  • javaewah

    The bit array data structure is implemented in Java as the BitSet class. Unfortunately, this fails to scale without compression. JavaEWAH is a word-aligned compressed variant of the Java bitset class. It uses a 64-bit run-length encoding (RLE) compression scheme. We trade-off some compression for better processing speed. We also have a 32-bit version which compresses better, but is not as fast. In general, the goal of word-aligned compression is not to achieve the best compression, but rather to improve query processing time. Hence, we try to save CPU cycles, maybe at the expense of storage. However, the EWAH scheme we implemented is always more efficient storage-wise than an uncompressed bitmap (as implemented in the BitSet class). Unlike some alternatives, javaewah does not rely on a patented scheme.

    (tags: javaewah wah rle compression bitmaps bitmap-indexes bitset algorithms data-structures)

Links for 2013-04-08

Links for 2013-04-06

Links for 2013-04-05

Links for 2013-04-04

Links for 2013-04-03

  • The Patent Protection Racket

    Joel On Software weighs in (via Tony Finch):

    The fastest growing industry in the US right now, even during this time of slow economic growth, is probably the patent troll protection racket industry.

    (tags: joel-on-software patents swpats shakedown extortion us-politics patent-trolls via:fanf)

  • Cap’n Proto

    Cap’n Proto is an insanely fast data interchange format and capability-based RPC system. Think JSON, except binary. Or think Protocol Buffers, except faster. In fact, in benchmarks, Cap’n Proto is INFINITY TIMES faster than Protocol Buffers.
    Basically, marshalling like writing an aligned C struct to the wire, QNX messaging protocol-style. Wasteful on space, but responds to this by suggesting compression (which is a fair point tbh). C++-only for now. I’m not seeing the same kind of support for optional data that protobufs has though. Overall I’m worried there’s some useful features being omitted here…

    (tags: serialization formats protobufs capn-proto protocols coding c++ rpc qnx messaging compression compatibility interoperability i14y)

  • CRDTs – Commutative Replicated Data Types [pdf]

    Shared read-only data is easy to scale by using well-understood replication techniques. However, sharing mutable data at a large scale is a dicult problem, because of the CAP impossibility result [5]. Two approaches dominate in practice. One ensures scalability by giving up consistency guarantees, for instance using the Last-Writer-Wins (LWW) approach [7]. The alternative guarantees consistency by serialising all updates, which does not scale beyond a small cluster [12]. Optimistic replication allows replicas to diverge, eventually resolving conflicts either by LWW-like methods or by serialisation [11]. In some (limited) cases, a radical simplication is possible. If concurrent updates to some datum commute, and all of its replicas execute all updates in causal order, then the replicas converge.1 We call this a Commutative Replicated Data Type (CRDT). The CRDT approach ensures that there are no conflicts, hence, no need for consensus-based concurrency control. CRDTs are not a universal solution, but, perhaps surprisingly, we were able to design highly useful CRDTs. This new research direction is promising as it ensures consistency in the large scale at a low cost, at least for some applications.

    (tags: consistency algorithms concurrency crdts distcomp data)

  • CRDT toolbox

    ‘The CRDT toolbox provides a collection of basic Conflict-free replicated data types as well as a common interface for defining your own CRDTs’. – in Eric Moritz’ github. Also includes some more links to CRDT background reading.

    (tags: crdt github eric-moritz python algorithms)

  • Eventually-Consistent Data Structures [slides]

    implementing CRDTs in Riak and Voldemort

    (tags: crdt algorithms distcomp riak voldemort distributed)

Links for 2013-04-02

Links for 2013-03-29

  • East Texas Judge Says Mathematical Algorithms Can’t Be Patented, Dismisses Uniloc Claim Against Rackspace

    This seems pretty significant. Is the tide turning in the Texas Eastern District against patent trolls, at last? And does it establish sufficient precedent?

    A federal judge has thrown out a patent claim against Rackspace, ruling that mathematical algorithms can’t be patented. The ruling in the Eastern Disrict stemmed from a 2012 complaint filed by Uniloc USA asserting that processing of floating point numbers by the Linux operating system was a patent violation. Chief Judge Leonard Davis based the ruling on U.S. Supreme Court case law that prohibits the patenting of mathematical algorithms. According to Rackspace, this is the first reported instance in which the Eastern District of Texas has granted an early motion to dismiss finding a patent invalid because it claimed unpatentable subject matter. Red Hat, which supplies Linux to Rackspace, provided Rackspace’s defense. Red Hat has a policy of standing behind customers through its Open Source Assurance program.
    See https://news.ycombinator.com/item?id=5455869 for more discussion.

    (tags: east-texas patents swpats maths patenting law judges rackspace linux red-hat uniloc-usa floating-point)

  • Introducing Chronos: A Replacement for Cron

    A distributed, fault-tolerant “cron” is something which comes up frequently — it makes for a great fault-tolerance building block. This one sounds like it’s too closely tied into Mesos, though (IMO).

    Chronos is our replacement for cron. It is a distributed and fault-tolerant scheduler which runs on top of Mesos. It’s a framework and supports custom mesos executors as well as the default command executor. Thus by default, Chronos executes SH (on most systems BASH) scripts. Chronos can be used to interact with systems such as Hadoop (incl. EMR), even if the mesos slaves on which execution happens do not have Hadoop installed. Included wrapper scripts allow transfering files and executing them on a remote machine in the background and using asynchroneous callbacks to notify Chronos of job completion or failures.

    (tags: cron scheduling mesos stacks design airbnb chronos fault-tolerance distcomp distributed-computing scripts jobs)

Links for 2013-03-28

  • One of CloudFlare’s upstream providers on the “death of the internet” scare-mongering

    Having a bad day on the Internet is nothing new. These are the types of events we deal with on a regular basis, and most large network operators are very good at responding quickly to deal with situations like this. In our case, we worked with Cloudflare to quickly identify the attack profile, rolled out global filters on our network to limit the attack traffic without adversely impacting legitimate users, and worked with our other partner networks (like NTT) to do the same. If the attacks had stopped here, nobody in the “mainstream media” would have noticed, and it would have been just another fun day for a few geeks on the Internet. The next part is where things got interesting, and is the part that nobody outside of extremely technical circles has actually bothered to try and understand yet. After attacking Cloudflare and their upstream Internet providers directly stopped having the desired effect, the attackers turned to any other interconnection point they could find, and stumbled upon Internet Exchange Points like LINX (in London), AMS-IX (in Amsterdam), and DEC-IX (in Frankfurt), three of the largest IXPs in the world. An IXP is an “interconnection fabric”, or essentially just a large switched LAN, which acts as a common meeting point for different networks to connect and exchange traffic with each other. One downside to the way this architecture works is that there is a single big IP block used at each of these IXPs, where every network who interconnects is given 1 IP address, and this IP block CAN be globally routable. When the attackers stumbled upon this, probably by accident, it resulted in a lot of bogus traffic being injected into the IXP fabrics in an unusual way, until the IXP operators were able to work with everyone to make certain the IXP IP blocks weren’t being globally re-advertised. Note that the vast majority of global Internet traffic does NOT travel over IXPs, but rather goes via direct private interconnections between specific networks. The IXP traffic represents more of the “long tail” of Internet traffic exchange, a larger number of smaller networks, which collectively still adds up to be a pretty big chunk of traffic. So, what you actually saw in this attack was a larger number of smaller networks being affected by something which was an completely unrelated and unintended side-effect of the actual attacks, and thus *poof* you have the recipe for a lot of people talking about it. :) Hopefully that clears up a bit of the situation.

    (tags: bandwidth internet gizmodo traffic cloudflare ddos hacking)

Links for 2013-03-27

Links for 2013-03-25

  • The first pillar of agile sysadmin: We alert on what we draw

    ‘One of [the] purposes of monitoring systems was to provide data to allow us, as engineers, to detect patterns, and predict issues before they become production impacting. In order to do this, we need to be capturing data and storing it somewhere which allows us to analyse it. If we care about it – if the data could provide the kind of engineering insight which helps us to understand our systems and give early warning – we should be capturing it. ‘ …. ‘There are a couple of weaknesses in [Nagios’ design]. Assuming we’ve agreed that if we care about a metric enough to want to alert on it then we should be gathering that data for analysis, and graphing it, then we already have the data upon which to base our check. Furthermore, this data is not on the machine we’re monitoring, so our checks don’t in any way add further stress to that machine.’ I would add that if we are alerting on a different set of data from what we collect for graphing, then using the graphs to investigate an alarm may run into problems if they don’t sync up.

    (tags: devops monitoring deployment production sysadmin ops alerting metrics)

  • JPL Institutional Coding Standard for the Java Programming Language

    From JPL’s Laboratory for Reliable Software (LaRS). Great reference; there’s some really useful recommendations here, and good explanations of familiar ones like “prefer composition over inheritance”. Many are supported by FindBugs, too. Here’s the full list:

    compile with checks turned on; apply static analysis; document public elements; write unit tests; use the standard naming conventions; do not override field or class names; make imports explicit; do not have cyclic package and class dependencies; obey the contract for equals(); define both equals() and hashCode(); define equals when adding fields; define equals with parameter type Object; do not use finalizers; do not implement the Cloneable interface; do not call nonfinal methods in constructors; select composition over inheritance; make fields private; do not use static mutable fields; declare immutable fields final; initialize fields before use; use assertions; use annotations; restrict method overloading; do not assign to parameters; do not return null arrays or collections; do not call System.exit; have one concept per line; use braces in control structures; do not have empty blocks; use breaks in switch statements; end switch statements with default; terminate if-else-if with else; restrict side effects in expressions; use named constants for non-trivial literals; make operator precedence explicit; do not use reference equality; use only short-circuit logic operators; do not use octal values; do not use floating point equality; use one result type in conditional expressions; do not use string concatenation operator in loops; do not drop exceptions; do not abruptly exit a finally block; use generics; use interfaces as types when available; use primitive types; do not remove literals from collections; restrict numeric conversions; program against data races; program against deadlocks; do not rely on the scheduler for synchronization; wait and notify safely; reduce code complexity

    (tags: nasa java reference guidelines coding-standards jpl reliability software coding oo concurrency findbugs bugs)

Links for 2013-03-24

  • KDE’s brush with git repository corruption: post-mortem

    a barely-averted disaster… phew.

    while we planned for the case of the server losing a disk or entirely biting the dust, or the total loss of the VM’s filesystem, we didn’t plan for the case of filesystem corruption, and the way the corruption affected our mirroring system triggered some very unforeseen and pathological conditions. […] the corruption was perfectly mirrored… or rather, due to its nature, imperfectly mirrored. And all data on the anongit [mirrors] was lost.
    One risk demonstrated: by trusting in mirroring, rather than a schedule of snapshot backups covering a wide time range, they nearly had a major outage. Silent data corruption, and code bugs, happen — backups protect against this, but RAID, replication, and mirrors do not. Another risk: they didn’t have a rate limit on project-deletion, which resulted in the “anongit” mirrors deleting their (safe) data copies in response to the upstream corruption. Rate limiting to sanity-check automated changes is vital. What they should have had in place was described by the fix: ‘If a new projects file is generated and is more than 1% different than the previous file, the previous file is kept intact (at 1500 repositories, that means 15 repositories would have to be created or deleted in the span of three minutes, which is extremely unlikely).’

    (tags: rate-limiting case-studies post-mortems kde git data-corruption risks mirroring replication raid bugs backups snapshots sanity-checks automation ops)

  • SpaceX software dev practices

    Metrics rule the roost — I guess there’s been a long history of telemetry in space applications.

    To make software more visible, you need to know what it is doing, he said, which means creating “metrics on everything you can think of”…. Those metrics should cover areas like performance, network utilization, CPU load, and so on. The metrics gathered, whether from testing or real-world use, should be stored as it is “incredibly valuable” to be able to go back through them, he said. For his systems, telemetry data is stored with the program metrics, as is the version of all of the code running so that everything can be reproduced if needed. SpaceX has programs to parse the metrics data and raise an alarm when “something goes bad”. It is important to automate that, Rose said, because forcing a human to do it “would suck”. The same programs run on the data whether it is generated from a developer’s test, from a run on the spacecraft, or from a mission. Any failures should be seen as an opportunity to add new metrics. It takes a while to “get into the rhythm” of doing so, but it is “very useful”. He likes to “geek out on error reporting”, using tools like libSegFault and ftrace. Automation is important, and continuous integration is “very valuable”, Rose said. He suggested building for every platform all of the time, even for “things you don’t use any more”. SpaceX does that and has found interesting problems when building unused code. Unit tests are run from the continuous integration system any time the code changes. “Everyone here has 100% unit test coverage”, he joked, but running whatever tests are available, and creating new ones is useful. When he worked on video games, they had a test to just “warp” the character to random locations in a level and had it look in the four directions, which regularly found problems. “Automate process processes”, he said. Things like coding standards, static analysis, spaces vs. tabs, or detecting the use of Emacs should be done automatically. SpaceX has a complicated process where changes cannot be made without tickets, code review, signoffs, and so forth, but all of that is checked automatically. If static analysis is part of the workflow, make it such that the code will not build unless it passes that analysis step. When the build fails, it should “fail loudly” with a “monitor that starts flashing red” and email to everyone on the team. When that happens, you should “respond immediately” to fix the problem. In his team, they have a full-size Justin Bieber cutout that gets placed facing the team member who broke the build. They found that “100% of software engineers don’t like Justin Bieber”, and will work quickly to fix the build problem.

    (tags: spacex dev coding metrics deplyment production space justin-bieber)

  • on the etymology of “Ketchup”

    ‘the story of ketchup is a story of globalization and centuries of economic domination by a world superpower. But the superpower isn’t America, and the century isn’t ours. Ketchup’s origins in the fermented sauces of China and Southeast Asia mean that those little plastic packets under the seat of your car are a direct result of Chinese and Asian domination of a single global world economy for most of the last millenium.’

    (tags: ketchup china nam-pla food etymology condiments history trade)

Links for 2013-03-23

  • dumping a JVM heap using gdb

    now this is a neat trick — having been stuck having to flip to spares and do other antics while a long-running heap dump took place, this is a winner.

    Dumping a JVM’s heap is an extremely useful tool for debugging problems with a J2EE application. Unfortunately, when a JVM explodes, using the standard jmap tool can take an inordinate amount of time to execute for lots of different reasons. This leads to extended downtime when a heap dump is attempted and even then, jmap regularly fails. This blog post is intended to outline an alternate method using [gdb] to achieve a heap dump that only requires mere seconds of additional downtime allowing the slow jmap process to happen once the application is back in service.

    (tags: heap-dump gdb heap jvm java via:peakscale gcore core core-dump debugging)

  • Edition – Irish Design

    ‘Edition has a ‘design for life’ philosophy – we think that unique designer-made items can be a part of our everyday lives without costing the earth. We stock affordable, contemporary and functional products (mostly handmade), including jewellery, home-ware, accessories, art and toys. Every item has been carefully selected and are all designed here in Ireland.’

    (tags: edition design ireland art graphics jewellery toys)

Links for 2013-03-21

Links for 2013-03-20

Links for 2013-03-19

Links for 2013-03-18

Links for 2013-03-16

  • Roko’s basilisk – RationalWiki

    Wacky transhumanists.

    Roko’s basilisk is notable for being completely banned from discussion on LessWrong, where any mention of it is deleted. Eliezer Yudkowsky, founder of LessWrong, considers the basilisk would not work, but will not explain why because he does not consider open discussion of the notion of acausal trade with possible superintelligences to be provably safe. Silly over-extrapolations of local memes are posted to LessWrong quite a lot; almost all are just downvoted and ignored. But this one, Yudkowsky reacted to hugely, then doubled-down on his reaction. Thanks to the Streisand effect, discussion of the basilisk and the details of the affair soon spread outside of LessWrong. The entire affair is a worked example of spectacular failure at community management and at controlling purportedly dangerous information. Some people familiar with the LessWrong memeplex have suffered serious psychological distress after contemplating basilisk-like ideas — even when they’re fairly sure intellectually that it’s a silly problem.[5] The notion is taken sufficiently seriously by some LessWrong posters that they try to work out how to erase evidence of themselves so a future AI can’t reconstruct a copy of them to torture.[6]

    (tags: transhumanism funny insane stupid singularity ai rokos-basilisk via:maciej lesswrong rationalism superintelligences striesand-effect absurd)

  • How the America Invents Act Will Change Patenting Forever

    Bet you didn’t think the US software patents situation could get worse? wrong!

    “Now it’s really important to be the first to file, and it’s really important to file before somebody else puts a product out, or puts the invention in their product,” says Barr, adding that it will “create a new urgency on the part of everyone to file faster — and that’s going to be a problem for the small inventor.”

    (tags: first-to-file omnishambles uspto swpats patents software-patents law legal)

Links for 2013-03-14

Links for 2013-03-13

Links for 2013-03-12

Links for 2013-03-11

  • Bunnie Huang’s “Hacking the Xbox” now available as a free PDF

    ‘No Starch Press and I have decided to release this free ebook version of Hacking the Xbox in honor of Aaron Swartz. As you read this book, I hope that you’ll be reminded of how important freedom is to the hacking community and that you’ll be inclined to support the causes that Aaron believed in. I agreed to release this book for free in part because Aaron’s treatment by MIT is not unfamiliar to me. In this book, you will find the story of when I was an MIT graduate student, extracting security keys from the original Microsoft Xbox. You’ll also read about the crushing disappointment of receiving a letter from MIT legal repudiating any association with my work, effectively leaving me on my own to face Microsoft. The difference was that the faculty of my lab, the AI laboratory, were outraged by this treatment. They openly defied MIT legal and vowed to publish my work as an official “AI Lab Memo,” thereby granting me greater negotiating leverage with Microsoft. Microsoft, mindful of the potential backlash from the court of public opinion over suing a legitimate academic researcher, came to a civil understanding with me over the issue.’ This is a classic text on hardware reverse-engineering and the freedom to tinker — strongly recommended.

    (tags: hacking bunnie-huang xbox free hardware drm freedom-to-tinker books reading mit microsoft history)

Links for 2013-03-07

  • 4 Things Java Programmers Can Learn from Clojure (without learning Clojure)

    ‘1. Use immutable values; 2. Do no work in the constructor; 3. Program to small interfaces; 4. Represent computation, not the world’. Strongly agreed with #1, and the others look interesting too

    (tags: clojure lisp design programming coding java)

  • Tactical Chat: How the U.S. Military Uses IRC to Wage War

    Excellent stuff. Lessons to be learned from this: IRC has some key features that mean it can be useful in this case. 1. simple text, everything supports it, no fancy UI clients are necessary; 2. resilient against lossy/transient/low-bandwidth/high-latency networks; 3. standards-compliant and “battle-hardened” (so to speak); 4. open-source/non-proprietary.

    Despite the U.S. military’s massive spending each year on advanced communications technology, the use of simple text chat or tactical chat has outpaced other systems to become one of the most popular paths for communicating practical information on the battlefield.  Though the use of text chat by the U.S. military first began in the early 1990s, in recent years tactical chat has evolved into a “primary ‘comms’ path, having supplanted voice communications as the primary means of common operational picture (COP) updating in support of situational awareness.”  An article from January 2012 in the Air Land Sea Bulletin describes the value of tactical chat as an effective and immediate communications method that is highly effective in distributed, intermittent, low bandwidth environments which is particularly important with “large numbers of distributed warfighters” who must “frequently jump onto and off of a network” and coordinate with other coalition partners.  Text chat also provides “persistency in situational understanding between those leaving and those assuming command watch duties” enabling a persistent record of tactical decision making. A 2006 thesis from the Naval Postgraduate School states that internet relay chat (IRC) is one of the most widely used chat protocols for military command and control (C2).  Software such as mIRC, a Windows-based chat client, or integrated systems in C2 equipment are used primarily in tactical conditions though efforts are underway to upgrade systems to newer protocols. 
    (via JK)

    (tags: via:jk war irc chat mirc us-military tactical-chat distcomp networking)

  • “Whataboutery”

    Great neologism from Mick Fealty:

    Familiar to anyone who’s followed public debate on Northern Ireland. Some define it as the often multiple blaming and finger pointing that goes on between communities in conflict. Political differences are marked by powerful emotional (often tribal) reactions as opposed to creative conflict over policy and issues. It’s beginning to be known well beyond the bounds of Northern Ireland. […] Evasion may not be the intention but it is the obvious effect. It occurs when individuals are confronted with a difficult or uncomfortable question. The respondent retrenches his/her position and rejigs the question, being careful to pick open a sore point on the part of questioner’s ‘tribe’. He/she then fires the original query back at the inquirer.

    (tags: words etymology whataboutery argument debate northern-ireland mick-fealty slugger-otoole)

  • Dropbox Sync API

    Give your app its own private Dropbox client and leave the syncing to us.

    (tags: apps dropbox synchronization sync ios android api)

  • the real reason Marissa Mayer canned remote Y! employees (apparently)

    After spending months frustrated at how empty Yahoo parking lots were, Mayer consulted Yahoo’s VPN logs to see if remote employees were checking in enough. Mayer discovered they were not — and her decision was made. we’re hearing from people close to Yahoo executives and employees that she made the right decision banning work from home. “The employees at Yahoo are thrilled,” says one source close to the company. “There isn’t massive uprising. The truth is, they’ve all been pissed off that people haven’t been working.”

    (tags: yahoo work remote-work teleworking slacking marissa-mayer funny)

Links for 2013-03-06

  • Online Schema Change for MySQL

    A tool written by Facebook to ease the pain of online MySQL schema-change migrations.

    Some ALTER TABLE statements take too long form the perspective of some MySQL users. The fast index create feature for the InnoDB plugin in MySQL 5.1 makes this less of an issue but this can still take minutes to hours for a large table and for some MySQL deployments that is too long.   A workaround is to perform the change on a slave first and then promote the slave to be the new master. But this requires a slave located near the master. MySQL 5.0 added support for triggers and some replication systems have been built using triggers to capture row changes. Why not use triggers for this? The openarkkit toolkit did just that with oak-online-alter-table. We have published our version of an online schema change utility (OnlineSchemaChange.php aka OSC).

    (tags: facebook mysql sql schema database migrations ops alter-table)

  • Netflix Queue: Data migration for a high volume web application

    There will come a time in the life of most systems serving data, when there is a need to migrate data to [another] data store while maintaining or improving data consistency, latency and efficiency. This document explains the data migration technique we used at Netflix to migrate the user’s queue data between two different distributed NoSQL storage systems [SimpleDB to Cassandra].

    (tags: cassandra netflix migrations data schema simpledb storage)

  • Monitoring Apache Hadoop, Cassandra and Zookeeper using Graphite and JMXTrans

    nice enough, but a lot of moving parts. It would be nice to see a simpler ZK+Graphite setup using the ‘mntr’ verb

    (tags: graphite monitoring ops zookeeper cassandra hadoop jmx jmxtrans graphs)

  • RFC 6585 – Additional HTTP Status Codes

    includes “429 Too Many Requests”, for rate limits

    (tags: api rfc http reference standards web rest)

  • Curator Framework: Reducing the Complexity of Building Distributed Systems | Marketing Technology

    good +1 for using Netflix’ Curator ZK client library

    (tags: zookeeper curator netflix oss libraries distributed)

  • Netflix Curator

    a high-level API that greatly simplifies using ZooKeeper. It adds many features that build on ZooKeeper and handles the complexity of managing connections to the ZooKeeper cluster and retrying operations. Some of the features are: Automatic connection management: There are potential error cases that require ZooKeeper clients to recreate a connection and/or retry operations. Curator automatically and transparently (mostly) handles these cases. Cleaner API: simplifies the raw ZooKeeper methods, events, etc.; provides a modern, fluent interface Recipe implementations (see Recipes): Leader election, Shared lock, Path cache and watcher, Distributed Queue, Distributed Priority Queue

    (tags: zookeeper java netflix distcomp libraries oss open-source distributed)

  • OscarGodson.js | What I Learned At Yammer

    some pretty interesting lessons, it turns out: a ‘take what you need’ vacation policy means nobody takes vacations (unsurprising); Yammer actively work to avoid employee burnout (good idea); Yammer A/B test every feature; and Yammer mgmt try to let their devs work autonomously.

    (tags: yammer startups testing analytics culture work)

  • moreutils

    Some really cool-looking UNIX command line utils, packaged in Debian (and therefore in Ubuntu too). A few of these I’ve reimplemented separately, but it’s always good to replace a hack with a more widely available “official” tool. Thanks, Joey Hess!

    sponge: accept input, wait til EOF, then rewrite a file; chronic: runs a command quietly unless it fails; combine: combine the lines in two files using boolean operations; ifdata: get network interface info without parsing ifconfig output; ifne: run a program if the standard input is not empty; isutf8: check if a file or standard input is utf-8; lckdo: execute a program with a lock held; mispipe: pipe two commands, returning the exit status of the first; parallel: run multiple jobs at once; pee: tee standard input to pipes; sponge: soak up standard input and write to a file; ts: timestamp standard input; vidir: edit a directory in your text editor; vipe: insert a text editor into a pipe; zrun: automatically uncompress arguments to command

    (tags: bash shell cli unix scripting via:peakscale joey-hess debian ubuntu tools command-line commands)

  • Test-Driven Infrastructure with Chef

    Interesting idea.

    The book introduces “Infrastructure as Code,” test-driven development, Chef, and cucumber-chef, and then proceeds to a simple example using Chef to provision a shared Linux server. The recipes for the server are developed test-first, demonstrating both the technique and the workflow.

    (tags: tdd chef server provisioning build deploy linux coding ops sysadmin)

  • Peek and poke in the age of Linux

    Neat demo of using ptrace to inject into a running process, just like the good old days ;)

    Some time ago I ran into a production issue where the init process (upstart) stopped behaving properly. Specifically, instead of spawning new processes, it deadlocked in a transitional state. […] What’s worse, upstart doesn’t allow forcing a state transition and trying to manually create and send DBus events didn’t help either. That meant the sane options we were left with were: restart the host (not desirable at all in that scenario); start the process manually and hope auto-respawn will not be needed. Of course there are also some insane options. Why not cheat like in the old times and just PEEK and POKE the process in the right places? The solution used at the time involved a very ugly script driving gdb which probably summoned satan in some edge cases. But edge cases were not hit and majority of hosts recovered without issues.

    (tags: debugging memory linux upstart peek poke ptrace gdb processes hacks)

Links for 2013-03-05

  • The World Wide Web is Moving to AOL! | Brian Bailey

    brilliant parody of those “we’re so happy to be shutting down!” posts.

    Don’t worry, all of that hard work won’t be wasted. The World Wide Web will remain accessible for 30 days, which will give you plenty of time to update your readers and customers. Each of you will also receive a 30-day free trial for AOL. Look for your CD in the mail soon. Even better, we’ve created an import tool to make it easy to migrate everything you’ve put on the web to American Online! The address will change, of course, but now it will be available to every AOL member. You may find that you don’t need to bother, though. America Online already has groups and pages about almost every topic you can imagine. Take a look around first and you might save yourself a lot of time. There are only so many different ways to say that Citizen Kane was a good movie! We understand that not all of you will become AOL subscribers and not all web sites will move to the new platform. Just to be safe, be sure to print out all of your favorite pages before the end of the month.

    (tags: acquihired acquisitions aol www funny parody humour web)