Skip to content

Justin's Linklog Posts

Links for 2013-04-26

  • “Clickwrap” licensing established as legal in Irish court

    “The evidence does establish that there is a practice in the airline and online travel agency sectors of contractually binding web users by click wrapping or browse wrapping, which practice is generally and regularly followed by the operators in those sectors. In reality, it is difficult to see how online trade could be carried on in the absence of those devices. As regards the third question which arises from the MSG decision, in this case it is whether the defendant was aware or is presumed to have been aware of the practice. The evidence before the Court, in my view, clearly demonstrates that the defendant was aware of the practice, it being a practice which is generally and regularly followed when making bookings with online travel agents and with airlines and which, in the words of the Court in the MSG case, may be regarded as being a consolidated practice. Accordingly, in my view, by application of Article 23(1)(c), the defendant is bound by the jurisdiction clause in the Terms of Use on the plaintiff’s website by its use, either through the medium of an automaton or a manual operator or a third party data provider, of the website.”
    (via Rossa McMahon)

    (tags: clickwrap licensing ireland)

Links for 2013-04-25

  • Functional Reactive Programming in the Netflix API with RxJava

    Hmm, this seems nifty as a compositional building block for Java code to enable concurrency without thread-safety and sync problems.

    Functional reactive programming offers efficient execution and composition by providing a collection of operators capable of filtering, selecting, transforming, combining and composing Observable’s. The Observable data type can be thought of as a “push” equivalent to Iterable which is “pull”. With an Iterable, the consumer pulls values from the producer and the thread blocks until those values arrive. By contrast with the Observable type, the producer pushes values to the consumer whenever values are available. This approach is more flexible, because values can arrive synchronously or asynchronously.

    (tags: concurrency java jvm threads thread-safety coding rx frp fp functional-programming reactive functional async observable)

  • You probably shouldn’t use a spreadsheet for important work

    Daniel Lemire comments on the recent cases of bugs in spreadsheets causing major impact:

    There are several critical problems with a tool like Excel that need to be widely known: * Spreadsheets do not support testing. For anything that matters, you should validate and test your code automatically and systematically; * Spreadsheets make code reviews impractical. To visually inspect the code, you need to click and each and every cell. In practice, this means that you cannot reasonably ask someone to read over your formulas to make sure that there is no mistake; * Spreadsheets encourage redundancies. Spreadsheets encourage copy-and-paste. Though copying and pasting is sometimes the right tool, it also creates redundancies. These redundancies make it very difficult to update a spreadsheet: are you absolutely sure that you have changed the formula throughout?
    Agreed on all three, particularly on the impossibility of testing. IMO, everyone who may be in a job where automation via spreadsheet is likely, needs training in SDE fundamentals: unit testing, the important of open source and open data for reproducibility, version control, and code review. We are all computer scientists now.

    (tags: spreadsheets excel coding errors bugs testability unit-testing testing quality sde sde-fundamentals dry)

  • Log4j2 Asynchronous Loggers for Low-Latency Logging – Apache Log4j 2

    implemented using the LMAX Disruptor library — very impressive performance figures. I presume in real-world usage, these latencies are dwarfed by hardware costs, though

    (tags: disruptor coding java log4j logging async performance)

Links for 2013-04-24

  • Archiving Gmail to Evernote

    Google Drive and GMail have a built-in scripting engine. I had no idea

    (tags: gmail evernote archival scripting coding hacks google-drive)

  • The Why

    How the Irish media are partly to blame for the catastrophic property bubble, from a paper entitled _The Role Of The Media In Propping Up Ireland’s Housing Bubble_, by Dr Julien Mercille, in the _Social Europe Journal_:

    “The overall argument is that the Irish media are part and parcel of the political and corporate establishment, and as such the news they convey tend to reflect those sectors’ interests and views. In particular, the Celtic Tiger years involved the financialisation of the economy and a large property bubble, all of it wrapped in an implicit neoliberal ideology. The media, embedded within this particular political economy and itself a constitutive element of it, thus mostly presented stories sustaining it. In particular, news organisations acquired direct stakes in an inflated real estate market by purchasing property websites and receiving vital advertising revenue from the real estate sector. Moreover, a number of their board members were current or former high officials in the finance industry and government, including banks deeply involved in the bubble’s expansion.”

    (tags: economics irish-times ireland newspapers media elite insiders bubble property-bubble property celtic-tiger papers news bias)

  • transparent DNS proxies

    Ugh. low-end ISPs MITM’ing DNS queries:

    Some ISP’s are now using a technology called ‘Transparent DNS proxy’. Using this technology, they will intercept all DNS lookup requests (TCP/UDP port 53) and transparently proxy the results. This effectively forces you to use their DNS service for all DNS lookups. If you have changed your DNS settings to an open DNS service such as Google, Comodo or OpenDNS expecting that your DNS traffic is no longer being sent to your ISP’s DNS server, you may be surprised to find out that they are using transparent DNS proxying.
    (via Nelson)

    (tags: via:nelson dns isps proxying mitm phorm attacks)

  • BitTorrent’s Secure Dropbox Alternative Goes Public

    As kragen says, ‘a decentralized way to sync a folder of large files, using BitTorrent instead of an untrustworthy central server’. Windows, OSX, and Linux supported

    (tags: bittorrent dropbox cloud storage filesharing sharing sync synchronization)

Links for 2013-04-23

Links for 2013-04-21

  • Swansea measles outbreak: was an MMR scare in the local press to blame?

    Sixteen years ago, journalists had a much easier job assembling “balanced” stories about MMR in south Wales. When I wrote about the measles outbreak last week, I suggested that it was related to Andrew Wakefield’s discredited 1998 Lancet research, but the Swansea contagion seems more likely to be the result of a separate scare a year earlier in the South Wales Evening Post. Before 1997, uptake of MMR in the distribution area of the Post was 91%, and 87.2% in the rest of Wales. After the Post’s campaign, uptake in the distribution area fell to 77.4% (it was 86.8% in the rest of Wales). That’s almost a 14% drop where the Post had influence, compared with less than 3% elsewhere. In the dry wording of the BMJ, “the [South West Evening Post] campaign is the most likely explanation”. In other words, what we can see in Swansea is the local effect of local reporting‚ in all probability, just a taster of what happens when the news irresponsibly creates unfounded terror. […] The 1997 coverage focused on a group of families who blamed MMR for various ailments in their children, including learning difficulties, digestive problems and autism‚ none of which have been found to have any connection with the vaccine. The Post’s coverage was at the time deemed a success, and in 1998 it won a prize for investigative reporting in the BT Wales Press Awards. That year, the SWEP ran at least 39 stories related to the alleged dangers of MMR. And yes, it’s true that the paper never directly endorsed non-vaccination. What it did do was publicise the idea of “vaccine damage” as a risk, one that parents would then likely weigh up against the risk of contracting measles, mumps or rubella. And this went beyond the reporting of parental anxieties‚ it was part of the Post’s editorial line. One article is entitled “Young bodies cannot take it”. The all-important “journalistic balance” was constantly available, thanks to campaigning parents and their solicitor Richard Barr. (It was Barr who engaged Wakefield for a lawsuit, leading to the “fishing expedition” research that became the Lancet paper.) They were happy to provide a quote on the dangers of the “triple jab”, which health authorities were then obliged to rebut politely. The Post also seemed to downplay the risk of measles, reporting on 6 July 1998 that “not a single child has been hit by the illness‚ despite a 13% drop in take-up levels”. It’s not parents who should feel embarrassed by the Swansea measles outbreak: some may have acted from overt dread at the prospect of harming their child, and some simply from omission, but all were encouraged by a press that focused on non-existent risks and downplayed the genuine horror of the diseases MMR prevents. The shame belongs to journalists: those of the South West Evening Post who allowed themselves to be recruited in the service of a speculative lawsuit, and any who let a specious devotion to “balance” overrule a duty to tell the truth.

    (tags: south-wales wales mmr health vaccination scares journalism ethics disease measles south-wales-evening-post)

  • Under the Covers of DynamoDB

    mostly a DynamoDB puff-piece from last week’s Amazon Cloud Connect, but contains some good real-world figures for a 20-billion-GUID deduping table use-case at end. ($4,150 per month, to cut to the chase)

    (tags: dynamodb aws figures costs architecture ec2 dedupe cloud-connect slides)

Links for 2013-04-20

  • Excel, untestability, and the reliability of quants

    Wow, this is a great software-quality story — I knew Excel was the most widely used programming environment out there, but this is a factor I’d overlooked:

    In his remarks on the final panel, Frank Partnoy mentioned something I missed when it came out a few weeks ago: the role of Microsoft Excel in the “London Whale” trading debacle. [..] To summarize: JPMorgan’s Chief Investment Office needed a new value-at-risk (VaR) model for the synthetic credit portfolio (the one that blew up) and assigned a quantitative whiz […] to create it. The new model “operated through a series of Excel spreadsheets, which had to be completed manually, by a process of copying and pasting data from one spreadsheet to another.” The internal Model Review Group identified this problem as well as a few others, but approved the model, while saying that it should be automated and another significant flaw should be fixed. After the London Whale trade blew up, the Model Review Group discovered that the model had not been automated and found several other errors. Most spectacularly, “After subtracting the old rate from the new rate, the spreadsheet divided by their sum instead of their average, as the modeler had intended. This error likely had the effect of muting volatility by a factor of two and of lowering the VaR …” I write periodically about the perils of bad software in the business world in general and the financial industry in particular, by which I usually mean back-end enterprise software that is poorly designed, insufficiently tested, and dangerously error-prone. But this is something different. […] While Excel the program is reasonably robust, the spreadsheets that people create with Excel are incredibly fragile. There is no way to trace where your data come from, there’s no audit trail (so you can overtype numbers and not know it), and there’s no easy way to test spreadsheets, for starters. The biggest problem is that anyone can create Excel spreadsheets — badly. Because it’s so easy to use, the creation of even important spreadsheets is not restricted to people who understand programming and do it in a methodical, well-documented way. This is why the JPMorgan VaR model is the rule, not the exception: manual data entry, manual copy-and-paste, and formula errors. This is another important reason why you should pause whenever you hear that banks’ quantitative experts are smarter than Einstein, or that sophisticated risk management technology can protect banks from blowing up. At the end of the day, it’s all software. While all software breaks occasionally, Excel spreadsheets break all the time. But they don’t tell you when they break: they just give you the wrong number.

    (tags: excel reliability software coding ides jpmorgan value-at-risk finance london-whale quants spreadsheets unit-tests testability testing)

  • Riak, CAP, and eventual consistency

    Good (albeit draft) write-up of the implications of CAP, allow_mult, and last_write_wins conflict-resolution policies in Riak:

    As Brewer’s CAP theorem established, distributed systems have to make hard choices. Network partition is inevitable. Hardware failure is inevitable. When a partition occurs, a well-behaved system must choose its behavior from a spectrum of options ranging from “stop accepting any writes until the outage is resolved” (thus maintaining absolute consistency) to “allow any writes and worry about consistency later” (to maximize availability). Riak leans toward the availability end of the spectrum, but allows the operator and even the developer to tune read and write requests to better meet the business needs for any given set of data.

    (tags: riak cap eventual-consistency distcomp distributed-systems partition last-write-wins voldemort allow_mult)

  • How You Can Help Save Upcoming.org, Posterous, and More

    Yahoo! sucks. shutting down in days? ArchiveTeam Warrior to the rescue; install the VM!

    (tags: archival yahoo shutdowns upcoming waxy archives virtualbox)

  • The Excel Depression – NYTimes.com

    Krugman on the Reinhart-Rogoff Excel-bug fiasco.

    What the Reinhart-Rogoff affair shows is the extent to which austerity has been sold on false pretenses. For three years, the turn to austerity has been presented not as a choice but as a necessity. Economic research, austerity advocates insisted, showed that terrible things happen once debt exceeds 90 percent of G.D.P. But “economic research” showed no such thing; a couple of economists made that assertion, while many others disagreed. Policy makers abandoned the unemployed and turned to austerity because they wanted to, not because they had to. So will toppling Reinhart-Rogoff from its pedestal change anything? I’d like to think so. But I predict that the usual suspects will just find another dubious piece of economic analysis to canonize, and the depression will go on and on.

    (tags: paul-krugman economics excel coding bugs software austerity debt)

Links for 2013-04-19

  • Vaccination ‘herd immunity’ demonstration

    ‘Stochastic monte-carlo epidemic SIR model to reveal herd immunity’. Fantastic demo of this important medical concept (via Colin Whittaker)

    (tags: via:colinwh stochastic herd-immunity random sir epidemics health immunity vaccination measles medicine monte-carlo-simulations simulations)

  • Fred’s ImageMagick Scripts: SIMILAR

    compute an image-similarity metric, to discover mostly-identical-but-slightly-tweaked images:

    SIMILAR computes the normalized cross correlation similarity metric between two equal dimensioned images. The normalized cross correlation metric measures how similar two images are, not how different they are. The range of ncc metric values is between 0 (dissimilar) and 1 (similar). If mode=g, then the two images will be converted to grayscale. If mode=rgb, then the two images first will be converted to colorspace=rgb. Next, the ncc similarity metric will be computed for each channel. Finally, they will be combined into an rms value.
    (via Dan O’Neill)

    (tags: image photos pictures similar imagemagick via:dano metrics similarity)

  • A Slower Speed of Light

    a first-person game prototype in which players navigate a 3D space while picking up orbs that reduce the speed of light in increments. Custom-built, open-source relativistic graphics code allows the speed of light in the game to approach the player’s own maximum walking speed. Visual effects of special relativity gradually become apparent to the player, increasing the challenge of gameplay. These effects, rendered in realtime to vertex accuracy, include the Doppler effect (red- and blue-shifting of visible light, and the shifting of infrared and ultraviolet light into the visible spectrum); the searchlight effect (increased brightness in the direction of travel); time dilation (differences in the perceived passage of time from the player and the outside world); Lorentz transformation (warping of space at near-light speeds); and the runtime effect (the ability to see objects as they were in the past, due to the travel time of light). Players can choose to share their mastery and experience of the game through Twitter. A Slower Speed of Light combines accessible gameplay and a fantasy setting with theoretical and computational physics research to deliver an engaging and pedagogically rich experience.

    (tags: games physics mit science light relativity)

  • Eventual Consistency Today: Limitations, Extensions, and Beyond – ACM Queue

    Good overview of the current state of eventually-consistent data store research, covering CALM and CRDTs, from Peter Bailis and Ali Ghodsi

    (tags: eventual-consistency data storage horizontal-scaling research distcomp distributed-systems via:martin-thompson crdts calm acid cap)

  • Latency’s Worst Nightmare: Performance Tuning Tips and Tricks [slides]

    the basics of running a service stack (web, app servers, data stores) on AWS. some good benchmark figures in the final slides

    (tags: benchmarks aws ec2 ebs piops services scaling scalability presentations)

  • Rob “b3ta” Manuel in Dublin next week

    The Bottom Half Of The Internet — “Racism; typos; filth; spam; ignorance; rage – that’s all the bottom half of the internet is good for, right? Rob Manuel wants you to question the internet dictum, most beloved of high-profile columnists, that you should ignore all of the comments all of the time. The ‘war on comments’, he reckons, might just be an echo of a fourth estate that’s having trouble adjusting to the idea of an unwashed public disagreeing with their sacred opinions. Sous les pavés, la plage.” On Tuesday, le cool Dublin & Pilcrow present SPIEL. Rob Manuel is the flashy animator behind B3ta and he’s joined by Ed Melvin, who wants to educate you on ‘The Unreal Engines’ of virtual currencies and economies.

    (tags: rob-manuel b3ta dublin comments internet meetings talks lecool)

Links for 2013-04-18

  • Reality, Reactivity, Relevance and Repeatability in Java Application Profiling

    this product from JInspired appears to support runtime profiling of java apps with < 5% performance impact

    (tags: profiling performance java coding measurement)

  • You Lookin’ At Me? Reflections on Google Glass

    ex-Nokia product design guru Jan Chipchase on Google Glass

    (tags: google privacy technology google-glass pervasive-computing life future)

  • Not the ‘best in the world’ – The Medical Independent

    Debunking this prolife talking point:

    ‘Our maternity services are amongst the best in the world’. This phrase has been much hackneyed since the heartbreaking death of Savita Halappanavar was revealed in mid October. James Reilly and other senior politicians are particularly guilty of citing this inaccurate position. So what is the state of Irish maternity services and how do our figures compare with other comparable countries? Let’s start with the statistics.
    The bottom line:
    Eight deaths per 100,000 is not bad, but it ranks our maternity services far from the best in world and below countries such as Slovakia and Poland.

    (tags: pro-choice ireland savita medicine health maternity morbidity statistics)

  • How Kaggle Is Changing How We Work – Thomas Goetz – The Atlantic

    Founded in 2010, Kaggle is an online platform for data-mining and predictive-modeling competitions. A company arranges with Kaggle to post a dump of data with a proposed problem, and the site’s community of computer scientists and mathematicians — known these days as data scientists — take on the task, posting proposed solutions. […] On one level, of course, Kaggle is just another spin on crowdsourcing, tapping the global brain to solve a big problem. That stuff has been around for a decade or more, at least back to Wikipedia (or farther back, Linux, etc). And companies like TaskRabbit and oDesk have thrown jobs to the crowd for several years. But I think Kaggle, and other online labor markets, represent more than that, and I’ll offer two arguments. First, Kaggle doesn’t incorporate work from all levels of proficiency, professionals to amateurs. Participants are experts, and they aren’t working for benevolent reasons alone: they want to win, and they want to get better to improve their chances of winning next time. Second, Kaggle doesn’t just create the incidental work product, it creates a new marketplace for work, a deeper disruption in a professional field. Unlike traditional temp labor, these aren’t bottom of the totem pole jobs. Kagglers are on top. And that disruption is what will kill Joy’s Law. Because here’s the thing: the Kaggle ranking has become an essential metric in the world of data science. Employers like American Express and the New York Times have begun listing a Kaggle rank as an essential qualification in their help wanted ads for data scientists. It’s not just a merit badge for the coders; it’s a more significant, more valuable, indicator of capability than our traditional benchmarks for proficiency or expertise. In other words, your Ivy League diploma and IBM resume don’t matter so much as my Kaggle score. It’s flipping the resume, where your work is measurable and metricized and your value in the marketplace is more valuable than the place you work.

    (tags: academia datamining economics data kaggle data-science ranking work competition crowdsourcing contracting)

  • The useful JVM options

    a good reference, with lots of sample output. Not clear if it takes 1.6/1.7 differences into account, though

    (tags: jvm reference java ops hotspot command-line)

Links for 2013-04-16

  • Austerity policies founded on Excel typo

    You’ve probably heard that countries with a high debt:GDP ratio suffer from slow economic growth. The specific number 90 percent has been invoked frequently. That’s all thanks to a study conducted by Carmen Reinhardt and Kenneth Rogoff for their book This Time It’s Different. But the results have been difficult for other researchers to replicate. Now three scholars at the University of Massachusetts have done so in “Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff” and they find that the Reinhart/Rogoff result is based on opportunistic exclusion of Commonwealth data in the late-1940s, a debatable premise about how to weight the data, and most of all a sloppy Excel coding error. Read Mike Konczal for the whole rundown, but I’ll just focus on the spreadsheet part. At one point they set cell L51 equal to AVERAGE(L30:L44) when the correct procuedure was AVERAGE(L30:L49). By typing wrong, they accidentally left Denmark, Canada, Belgium, Austria, and Australia out of the average. When you run the math correctly “the average real GDP growth rate for countries carrying a public debt-to-GDP ratio of over 90 percent is actually 2.2 percent, not -0.1 percent.”

    (tags: austerity politics excel coding errors bugs spreadsheets economics economy)

  • Is Your MySQL Buffer Pool Warm? Make It Sweat!

    How GroupOn are warming up a failover warm MySQL spare, using Percona stuff and a “tee” of the live in-flight queries. (via Dave Doran)

    (tags: via:dave-doran mysql databases warm-spares spares failover groupon percona replication)

Links for 2013-04-15

  • So now you know who gets some of those excessive Ticketmaster fees….

    Interesting evidence; it appears Irish music promoters are getting “rebates” from the massive TicketMaster “booking fee”, on each ticket sold. This sounds like a cartel to me, and we need to regulate this. Where is the National Consumer Agency and Competition Authority?

    The matter is something which should be of concern to every gig-going music fan, regardless of whether they go to Stradbally or not. For years, many have asked about TicketMaster’s quasi-monopoly position in the marketplace and why this is so. We’ve always been told that promoters preferred to deal with one company rather than several and that TM’s systems and nationwide reach yadda yadda yadda was the bees’ knees etc. Other companies have tried to compete but no-one has been able to beat TM at this game. But why would promoters go elsewhere when they’re getting a slice of the TM fees back as rebates? Those past off-the-record attempts by and briefings from promoters blaming TM for those fees can now be seen as hypocritical. They’re sticking with TM because they’re receiving a take of the fees paid by punters who have no other choice in service provider if they want to get their hands on tickets. You wonder what the acts make of this cash-grab – perhaps some whip-smart agent is already making a claim for a percentage of the rebates because there would be no rebates in the first place without the act. Surely this is an issue for the Competition Authority and National Consumers Association too, given the manner in which the rebates are made and TM’s deals with the promoters? While promoters under TM deals are free to sell a certain proportion of their tickets with another provider, it’s usually only a very small percentage of the total and unlikely to trouble TM’s bottom line. Also, given that the rebates are volume-driven, it’s better for the promoters to keep the largest possible chunk of their business with TM. It seems that we have a new suspect in the blame game about why ticket prices are so high.

    (tags: regulation ireland cartels competition ticketing tickets ticketmaster music gigs consumer)

  • Blog shines spotlight on Dublin city’s illegal dumping problem

    Hooray, Eoin’s activism gets some coverage!

    THE SCALE OF Dublin’s dumping problem is laid bare in a blog that has seen contributors send in photos of chairs, fridges and heaps of rubbish strewn on city streets. Eoin Parker, one of organisers behind DublinLitterBlog.com, spoke to TheJournal.ie about the problem, saying that the blog was set up following the privatisation of waste management by Dublin City Council in 2012.

    (tags: dumping dublin litter rubbish blogs dcc d1 activism community)

  • Ked

    To our knowledge, Ked is the first scripting language to emerge from The People’s Republic of Cork. Below is an account of what we know so far about the mysterious Corkonian language. Any suggested updates or contributions are encouraged.
    Genius.

    (tags: coding cork jokes funny like languages programming)

  • Just how bad are RTE’s finances?

    A sobering examination by NAMAwinelake into the quagmire of Ireland’s publicly-funded national broadcaster:

    It seems that RTE has become a disaster zone, with libels and incompetence overseen by incapable management, and this is reflected in that organisation’s financial results. RTE still employs nearly 2,000 people and supports jobs and industry across independent producers and suppliers; it is a major business. But the time has come to call a halt to delusional management that is sinking the organization deeper into a quagmire which will ultimately need to be bailed out by the State. And Noel Curran is fobbing us off with flying a kite about a reduction in 65-year old Pat Kenny’s salary from €630,000 to €570,000?!

    (tags: rte namawinelake public funding finances money mismanagement ireland incompetence tv news)

  • High Scalability – Scaling Pinterest – From 0 to 10s of Billions of Page Views a Month in Two Years

    wow, Pinterest have a pretty hardcore architecture. Sharding to the max. This is scary stuff for me:

    a [Cassandra-style] Cluster Management Algorithm is a SPOF. If there’s a bug it impacts every node. This took them down 4 times.
    yeah, so, eek ;)

    (tags: clustering sharding architecture aws scalability scaling pinterest via:matt-sergeant redis mysql memcached)

Links for 2013-04-13

  • Expert in Savita inquiry confirms Irish women get lower standard of care with chorioamnionitis

    Dr. Jen Gunter again:

    Dr. Knowles’ testimony confirms for me that the law played a role, because her statements indicate the standard of care for treatment of chorioamnionitis is less aggressive in Ireland. This can only be because of the law as there is no medical evidence to support delaying delivery when chorioamnionitis is diagnosed. Standard of care is not to wait until a woman is sick enough to need a termination, the idea is to treat her, you know, before she gets sick enough. An elevated white count and ruptured membranes at 17 weeks is typically enough to make the diagnosis, so Dr. Knowles needs to testify as to what in Savita’s medical record made it safe to not recommend a delivery. By the way, I also disagree with Dr. Knowles about her interpretation of Savita’s medical record, the chart doesn’t have “subtle indicators” of infection, it screams chorioamnionitis long before Wednesday morning. In North America the standard of care with chorioamnionitis is to recommend delivery as soon as the diagnosis is made, not wait until women enter the antechamber of death in the hopes that we can somehow snatch them back from the brink. If Irish law, or the interpretation thereof, had nothing to do with Savita’s death no expert would be mentioning sick enough at all.

    (tags: jen-gunter ob-gyn medicine savita law ireland abortion tragedy galway hospital)

Links for 2013-04-12

Links for 2013-04-11

  • google-http-java-client

    Written by Google, this library is a flexible, efficient, and powerful Java client library for accessing any resource on the web via HTTP. It features a pluggable HTTP transport abstraction that allows any low-level library to be used, such as java.net.HttpURLConnection, Apache HTTP Client, or URL Fetch on Google App Engine. It also features efficient JSON and XML data models for parsing and serialization of HTTP response and request content. The JSON and XML libraries are also fully pluggable, including support for Jackson and Android’s GSON libraries for JSON.
    Not quite as simple an API as Python’s requests, sadly, but still an improvement on the verbose Apache HttpComponent API. Good support for unit testing via a built-in mock-response class. Still in beta

    (tags: google beta software http libraries json xml transports protocols)

  • Former IMF chief of mission to Ireland says not burning the bondholders was “a mistake”

    Former IMF chief of mission to Ireland, Ashoka Mody, above left with Ajai Chopra in 2010. Melancholy of eye and large of loafer, Ashoka was involved in negotiating Ireland’s EU/IMF bailout. […] This morning Ashok gave an interview to Gavin Jennings on Morning Ireland, in which he admitted Ireland’s bailout was riddled with mistakes, namely the non-burning of the senior bondholders and the program of austerity. Jennings: “So, if imposing austerity on Ireland was wrong, or a mistake; if not allowing any burning of bondholders, whether official, sovereign or private was a mistake; you were centrally involved in that program. I know Ajai Chopra was very much the public face of the IMF mission to Ireland. But you were centrally involved in constructing this bailout. How much responsibility do you take for those errors.” Mody: “Yes, so, obviously, I have to take the responsibility in…but I’m in very good company in taking responsibility in this. There were many parties involved. And my role really was to bring such matters to the attention of people who finally made these decisions.”
    Great.

    (tags: bondholders imf ireland economy default ajai-chopra ashoka-mody)

  • Savita Halappanavar’s inquest: the three questions that must be answered | Dr. Jen Gunter

    A professional OB/GYN analyses the horrors coming to light in the Savita inquest. Here’s one particular gem:

    Fetal survival with ruptured membranes at 17 weeks is 0%, this is from prospective study. […but] “real and substantial risk” to the woman’s life is what is required by the Irish constitution to terminate a pregnancy, *whether or not the foetus is viable*.
    So the foetus had 0% chance of survival — but still termination was not considered an option. Bloody hell.

    (tags: religion ireland savita horrors malpractice galway guh hospitals hse health inquest abortion pro-choice pregnancy)

Links for 2013-04-10

  • Minister Rabbitte welcomes EU agreement on re-use of Public Sector Information

    Lots of talk about “charging regimes”, “income-generating public sector bodies” etc., but not a single mention of open data or free access. Terrible stuff. :( (via conoro)

    (tags: via:conoro open-access government public-sector ireland eu open-data public free)

  • Compression in Kafka: GZIP or Snappy ?

    With Ack: in this mode, as far as compression is concerned, the data gets compressed at the producer, decompressed and compressed on the broker before it sends the ack to the producer. The producer throughput with Snappy compression was roughly 22.3MB/s as compared to 8.9MB/s of the GZIP producer. Producer throughput is 150% higher with Snappy as compared to GZIP. No ack, similar to Kafka 0.7 behavior: In this mode, the data gets compressed at the producer and it doesn’t wait for the ack from the broker. The producer throughput with Snappy compression was roughly 60.8MB/s as compared to 18.5MB/s of the GZIP producer. Producer throughput is 228% higher with Snappy as compared to GZIP. The higher compression savings in this test are due to the fact that the producer does not wait for the leader to re-compress and append the data; it simply compresses messages and fires away. Since Snappy has very high compression speed and low CPU usage, a single producer is able to compress the same amount of messages much faster as compared to GZIP.

    (tags: gzip snappy compression kafka streaming ops)

  • The Bw-Tree: A B-tree for New Hardware – Microsoft Research

    The emergence of new hardware and platforms has led to reconsideration of how data management systems are designed. However, certain basic functions such as key indexed access to records remain essential. While we exploit the common architectural layering of prior systems, we make radically new design decisions about each layer. Our new form of B tree, called the Bw-tree achieves its very high performance via a latch-free approach that effectively exploits the processor caches of modern multi-core chips. Our storage manager uses a unique form of log structuring that blurs the distinction between a page and a record store and works well with flash storage. This paper describes the architecture and algorithms for the Bw-tree, focusing on the main memory aspects. The paper includes results of our experiments that demonstrate that this fresh approach produces outstanding performance.

    (tags: bw-trees database paper toread research algorithms microsoft sql sql-server b-trees data-structures storage cache-friendly mechanical-sympathy)

  • Boundary Techtalk – Large-scale OLAP with Kobayashi

    Boundary on their TSD-on-Riak store.

    Dietrich Featherston, Engineer at Boundary, walks through the process of designing Kobayashi, the time-series analytics database behind our network metrics. He goes through the false-starts and lessons learned in effectively using Riak as the storage layer for a large-scale OLAP database.  The system is ultimately capable of answering complex, ad-hoc queries at interactive latencies.

    (tags: video boundary tsd riak eventual-consistency storage kobayashi olap time-series)

  • Adding Insult to Plagiary?

    A few days old, but already an instant Streisand-Effect classic:

    Sometimes people borrow [Colin Purrington’s free guide about making scientific posters] without giving him credit. This happens fairly regularly, and when he finds out about it, he sends an e-mail asking them to take it down. Usually they do. But when he sent an e-mail to the Consortium for Plant Biotechnology Research, asking that a roughly 1,200-word, near-verbatim, uncredited chunk from his guide be removed from the consortium’s materials, the response was unexpected. Rather than apologise, a lawyer sent him a cease-and-desist letter accusing him of plagiarizing the consortium’s materials and demanding that he take down his guide or face a lawsuit seeking damages up to $150,000.

    (tags: streisand-effect lawsuits law infringement copyright cpbr bullying science posters)

  • Kafka 0.8 Producer Performance

    Great benchmarking from Piotr Kozikowski at the LiveRamp team, into performance of the upcoming Kafka 0.8 release

    (tags: performance kafka apache benchmarks ops queueing)

  • Running a Multi-Broker Apache Kafka 0.8 Cluster on a Single Node

    an excellent writeup on Kafka 0.8’s use and operation, including details of the new replication features

    (tags: kafka replication queueing distributed ops)

  • Ah Here (To Coin A Phrase)

    ‘A €10 silver coin being offered for sale to the public in honour of James Joyce by the Central Bank tomorrow contains a misquote from the author. The line used on the coin from Chapter 3 of Ulysses includes a superfluous conjunction – a rogue ‘that’.’ [..] The coin reads:

    “Ineluctable modality of the visible: at least that if no more, thought through my eyes. Signatures of all things *that* I am here to read.”
    (Incorrect ‘that’ emphasised)

    (tags: for:robotwisdom james-joyce typos funny fail central-bank ireland coins minting errors ulysses)

Links for 2013-04-09

  • Netflix ISP Speed Index for Ireland

    Via Mulley. Magnet doing well, with UPC coming second; UPC have dropped a fair bit in the past month. Would love to see it broken down by region…

    (tags: upc ireland isps speed bandwidth netflix broadband magnet eircom)

  • Why I’m Walking Away From CouchDB

    In practice there are two gotchas that are so painful I am  looking for a replacement with a different featureset than couchdb provides. The location tracking project icecondor.com uses couchdb to store 20,000 new records per day. It has more write traffic than read traffic and runs on modest hardware. Those two gotchas are: 1. View Index updates. While I have a vague understanding of why view index updates are slow and bulky and important, in practice it is unworkable. Every write sets up a trap for the first reader to come along after the write. The more writes there are, the bigger the trap for the first reader which has to wait on the couchdb process that refreshes the view index on an as-needed basis. I believe this trade-off was made to keep writes fast. No need to update the view index until all writes are actually complete, right? Write traffic is heavier than read traffic and the time needed for that index refresh causes the webapp to crash because its not setup to handle timeouts from a database query. The workaround is as hackish as one can imagine –  cron jobs to hit every  map/reduce query to keep indexes fresh. 2. Append only database file Append only is in theory a great way to ensure on-disk reliability. A system crash during an append should only affect that append. Its a crash during an update to existing parts of the file that risks the integrity of more than whats being updated. With so many layers of caching and optimizations in the kernel and the filesystem and now in the workings of SSD drives, I’m not sure append-only gives extra protection anymore. What it does do is a create a huge operational headache. The on-disk file can never grow beyond half the available storage space. Record deletion uses new disk space and if the half-full mark approaches, vacuuming must be done. The entire database is rewritten to the filesystem, leaving out no longer needed records. If the data file should happen to grow beyond half the partition, the system has esentially crashed because there is no way to compact the file and soon the partition will be full. This is a likely scenario when there is a lot of record deletion activity. The system in question does a lot of writes of temporary data that is followed up by deletes a few days later. There is also a lot of permanent storage that hardly gets used. Rewriting every byte of the records that are long-lived due to compaction is an enormous amount of wasted I/O – doubly so given SSD drives have a short write-cycle lifespan.

    (tags: nosql couchdb consistency checkpointing databases data-stores indexing)

  • CouchDB: not drinking the kool-aid

    Jonathan Ellis on some CouchDB negatives:

    Here are some reasons you should think twice and do careful testing before using CouchDB in a non-toy project: Writes are serialized.  Not serialized as in the isolation level, serialized as in there can only be one write active at a time.  Want to spread writes across multiple disks?  Sorry. CouchDB uses a MVCC model, which means that updates and deletes need to be compacted for the space to be made available to new writes.  Just like PostgreSQL, only without the man-years of effort to make vacuum hurt less. CouchDB is simple.  Gloriously simple.  Why is that a negative?  It’s competing with systems (in the popular imagination, if not in its author’s mind) that have been maturing for years.  The reason PostgreSQL et al have those features is because people want them.  And if you don’t, you should at least ask a DBA with a few years of non-MySQL experience what you’ll be missing.  The majority of CouchDB fans don’t appear to really understand what a good relational database gives them, just as a lot of PHP programmers don’t get what the big deal is with namespaces. A special case of simplicity deserves mention: nontrivial queries must be created as a view with mapreduce.  MapReduce is a great approach to trivially parallelizing certain classes of problem.  The problem is, it’s tedious and error-prone to write raw MapReduce code.  This is why Google and Yahoo have both created high-level languages on top of it (Sawzall and Pig, respectively).  Poor SQL; even with DSLs being the new hotness, people forget that SQL is one of the original domain-specific languages.  It’s a little verbose, and you might be bored with it, but it’s much better than writing low-level mapreduce code.

    (tags: cassandra couch nosql storage distributed databases consistency)

  • What is the CouchDB replication protocol? Is it like Git? – Stack Overflow

    Good write up of CouchDB replication

    (tags: protocols couchdb sync replication git mvcc databases merging timelines)

  • TouchDB’s reverse-engineered write-up of the Couch replication protocol

    There really isn’t a separate “protocol” per se for replication. Instead, replication uses CouchDB’s REST API and data model. It’s therefore a bit difficult to talk about replication independently of the rest of CouchDB. In this document I’ll focus on the algorithm used, and link to documentation of the APIs it invokes. The “protocol” is simply the set of those APIs operating over HTTP.

    (tags: couchdb protocols touchdb nosql replication sync mvcc revisions rest)

  • Protect your designs

    A good writeup of how to detect cases of copyright infringement for photography, art and other visual media.

    Von Glitschka, Modern Dog and myriad others make clear that the support of the creative community is absolutely vital in raising awareness of copyright infringements. Sites like www.youthoughtwewouldntnotice.com name and shame clear breaches of copyright, while the Modern Dog case shows that there is no better IP tracing system than the eyes and ears of the design community itself. “It’s the industry at large that has kept me aware of infringements,” states Von. “Without that I would miss most of them because I don’t go looking – they find me via the eyes of others.”

    (tags: photography art visual-media copyright infringement piracy ripping)

  • FastBit: An Efficient Compressed Bitmap Index Technology

    an [LGPL] open-source data processing library following the spirit of NoSQL movement. It offers a set of searching functions supported by compressed bitmap indexes. It treats user data in the column-oriented manner similar to well-known database management systems such as Sybase IQ, MonetDB, and Vertica. It is designed to accelerate user’s data selection tasks without imposing undue requirements. In particular, the user data is NOT required to be under the control of FastBit software, which allows the user to continue to use their existing data analysis tools. The key technology underlying the FastBit software is a set of compressed bitmap indexes. In database systems, an index is a data structure to accelerate data accesses and reduce the query response time. Most of the commonly used indexes are variants of the B-tree, such as B+-tree and B*-tree. FastBit implements a set of alternative indexes called compressed bitmap indexes. Compared with B-tree variants, these indexes provide very efficient searching and retrieval operations, but are somewhat slower to update after a modification of an individual record. A key innovation in FastBit is the Word-Aligned Hybrid compression (WAH) for the bitmaps.[…] Another innovation in FastBit is the multi-level bitmap encoding methods.

    (tags: fastbit nosql algorithms indexing search compressed-bitmaps indexes wah bitmaps compression)

  • javaewah

    The bit array data structure is implemented in Java as the BitSet class. Unfortunately, this fails to scale without compression. JavaEWAH is a word-aligned compressed variant of the Java bitset class. It uses a 64-bit run-length encoding (RLE) compression scheme. We trade-off some compression for better processing speed. We also have a 32-bit version which compresses better, but is not as fast. In general, the goal of word-aligned compression is not to achieve the best compression, but rather to improve query processing time. Hence, we try to save CPU cycles, maybe at the expense of storage. However, the EWAH scheme we implemented is always more efficient storage-wise than an uncompressed bitmap (as implemented in the BitSet class). Unlike some alternatives, javaewah does not rely on a patented scheme.

    (tags: javaewah wah rle compression bitmaps bitmap-indexes bitset algorithms data-structures)

Links for 2013-04-08

Links for 2013-04-06

Links for 2013-04-05

Links for 2013-04-04

Links for 2013-04-03

  • The Patent Protection Racket

    Joel On Software weighs in (via Tony Finch):

    The fastest growing industry in the US right now, even during this time of slow economic growth, is probably the patent troll protection racket industry.

    (tags: joel-on-software patents swpats shakedown extortion us-politics patent-trolls via:fanf)

  • Cap’n Proto

    Cap’n Proto is an insanely fast data interchange format and capability-based RPC system. Think JSON, except binary. Or think Protocol Buffers, except faster. In fact, in benchmarks, Cap’n Proto is INFINITY TIMES faster than Protocol Buffers.
    Basically, marshalling like writing an aligned C struct to the wire, QNX messaging protocol-style. Wasteful on space, but responds to this by suggesting compression (which is a fair point tbh). C++-only for now. I’m not seeing the same kind of support for optional data that protobufs has though. Overall I’m worried there’s some useful features being omitted here…

    (tags: serialization formats protobufs capn-proto protocols coding c++ rpc qnx messaging compression compatibility interoperability i14y)

  • CRDTs – Commutative Replicated Data Types [pdf]

    Shared read-only data is easy to scale by using well-understood replication techniques. However, sharing mutable data at a large scale is a dicult problem, because of the CAP impossibility result [5]. Two approaches dominate in practice. One ensures scalability by giving up consistency guarantees, for instance using the Last-Writer-Wins (LWW) approach [7]. The alternative guarantees consistency by serialising all updates, which does not scale beyond a small cluster [12]. Optimistic replication allows replicas to diverge, eventually resolving conflicts either by LWW-like methods or by serialisation [11]. In some (limited) cases, a radical simplication is possible. If concurrent updates to some datum commute, and all of its replicas execute all updates in causal order, then the replicas converge.1 We call this a Commutative Replicated Data Type (CRDT). The CRDT approach ensures that there are no conflicts, hence, no need for consensus-based concurrency control. CRDTs are not a universal solution, but, perhaps surprisingly, we were able to design highly useful CRDTs. This new research direction is promising as it ensures consistency in the large scale at a low cost, at least for some applications.

    (tags: consistency algorithms concurrency crdts distcomp data)

  • CRDT toolbox

    ‘The CRDT toolbox provides a collection of basic Conflict-free replicated data types as well as a common interface for defining your own CRDTs’. – in Eric Moritz’ github. Also includes some more links to CRDT background reading.

    (tags: crdt github eric-moritz python algorithms)

  • Eventually-Consistent Data Structures [slides]

    implementing CRDTs in Riak and Voldemort

    (tags: crdt algorithms distcomp riak voldemort distributed)

Links for 2013-04-02

Links for 2013-03-29

  • East Texas Judge Says Mathematical Algorithms Can’t Be Patented, Dismisses Uniloc Claim Against Rackspace

    This seems pretty significant. Is the tide turning in the Texas Eastern District against patent trolls, at last? And does it establish sufficient precedent?

    A federal judge has thrown out a patent claim against Rackspace, ruling that mathematical algorithms can’t be patented. The ruling in the Eastern Disrict stemmed from a 2012 complaint filed by Uniloc USA asserting that processing of floating point numbers by the Linux operating system was a patent violation. Chief Judge Leonard Davis based the ruling on U.S. Supreme Court case law that prohibits the patenting of mathematical algorithms. According to Rackspace, this is the first reported instance in which the Eastern District of Texas has granted an early motion to dismiss finding a patent invalid because it claimed unpatentable subject matter. Red Hat, which supplies Linux to Rackspace, provided Rackspace’s defense. Red Hat has a policy of standing behind customers through its Open Source Assurance program.
    See https://news.ycombinator.com/item?id=5455869 for more discussion.

    (tags: east-texas patents swpats maths patenting law judges rackspace linux red-hat uniloc-usa floating-point)

  • Introducing Chronos: A Replacement for Cron

    A distributed, fault-tolerant “cron” is something which comes up frequently — it makes for a great fault-tolerance building block. This one sounds like it’s too closely tied into Mesos, though (IMO).

    Chronos is our replacement for cron. It is a distributed and fault-tolerant scheduler which runs on top of Mesos. It’s a framework and supports custom mesos executors as well as the default command executor. Thus by default, Chronos executes SH (on most systems BASH) scripts. Chronos can be used to interact with systems such as Hadoop (incl. EMR), even if the mesos slaves on which execution happens do not have Hadoop installed. Included wrapper scripts allow transfering files and executing them on a remote machine in the background and using asynchroneous callbacks to notify Chronos of job completion or failures.

    (tags: cron scheduling mesos stacks design airbnb chronos fault-tolerance distcomp distributed-computing scripts jobs)

Links for 2013-03-28

  • One of CloudFlare’s upstream providers on the “death of the internet” scare-mongering

    Having a bad day on the Internet is nothing new. These are the types of events we deal with on a regular basis, and most large network operators are very good at responding quickly to deal with situations like this. In our case, we worked with Cloudflare to quickly identify the attack profile, rolled out global filters on our network to limit the attack traffic without adversely impacting legitimate users, and worked with our other partner networks (like NTT) to do the same. If the attacks had stopped here, nobody in the “mainstream media” would have noticed, and it would have been just another fun day for a few geeks on the Internet. The next part is where things got interesting, and is the part that nobody outside of extremely technical circles has actually bothered to try and understand yet. After attacking Cloudflare and their upstream Internet providers directly stopped having the desired effect, the attackers turned to any other interconnection point they could find, and stumbled upon Internet Exchange Points like LINX (in London), AMS-IX (in Amsterdam), and DEC-IX (in Frankfurt), three of the largest IXPs in the world. An IXP is an “interconnection fabric”, or essentially just a large switched LAN, which acts as a common meeting point for different networks to connect and exchange traffic with each other. One downside to the way this architecture works is that there is a single big IP block used at each of these IXPs, where every network who interconnects is given 1 IP address, and this IP block CAN be globally routable. When the attackers stumbled upon this, probably by accident, it resulted in a lot of bogus traffic being injected into the IXP fabrics in an unusual way, until the IXP operators were able to work with everyone to make certain the IXP IP blocks weren’t being globally re-advertised. Note that the vast majority of global Internet traffic does NOT travel over IXPs, but rather goes via direct private interconnections between specific networks. The IXP traffic represents more of the “long tail” of Internet traffic exchange, a larger number of smaller networks, which collectively still adds up to be a pretty big chunk of traffic. So, what you actually saw in this attack was a larger number of smaller networks being affected by something which was an completely unrelated and unintended side-effect of the actual attacks, and thus *poof* you have the recipe for a lot of people talking about it. :) Hopefully that clears up a bit of the situation.

    (tags: bandwidth internet gizmodo traffic cloudflare ddos hacking)

Links for 2013-03-27

Links for 2013-03-25

  • The first pillar of agile sysadmin: We alert on what we draw

    ‘One of [the] purposes of monitoring systems was to provide data to allow us, as engineers, to detect patterns, and predict issues before they become production impacting. In order to do this, we need to be capturing data and storing it somewhere which allows us to analyse it. If we care about it – if the data could provide the kind of engineering insight which helps us to understand our systems and give early warning – we should be capturing it. ‘ …. ‘There are a couple of weaknesses in [Nagios’ design]. Assuming we’ve agreed that if we care about a metric enough to want to alert on it then we should be gathering that data for analysis, and graphing it, then we already have the data upon which to base our check. Furthermore, this data is not on the machine we’re monitoring, so our checks don’t in any way add further stress to that machine.’ I would add that if we are alerting on a different set of data from what we collect for graphing, then using the graphs to investigate an alarm may run into problems if they don’t sync up.

    (tags: devops monitoring deployment production sysadmin ops alerting metrics)

  • JPL Institutional Coding Standard for the Java Programming Language

    From JPL’s Laboratory for Reliable Software (LaRS). Great reference; there’s some really useful recommendations here, and good explanations of familiar ones like “prefer composition over inheritance”. Many are supported by FindBugs, too. Here’s the full list:

    compile with checks turned on; apply static analysis; document public elements; write unit tests; use the standard naming conventions; do not override field or class names; make imports explicit; do not have cyclic package and class dependencies; obey the contract for equals(); define both equals() and hashCode(); define equals when adding fields; define equals with parameter type Object; do not use finalizers; do not implement the Cloneable interface; do not call nonfinal methods in constructors; select composition over inheritance; make fields private; do not use static mutable fields; declare immutable fields final; initialize fields before use; use assertions; use annotations; restrict method overloading; do not assign to parameters; do not return null arrays or collections; do not call System.exit; have one concept per line; use braces in control structures; do not have empty blocks; use breaks in switch statements; end switch statements with default; terminate if-else-if with else; restrict side effects in expressions; use named constants for non-trivial literals; make operator precedence explicit; do not use reference equality; use only short-circuit logic operators; do not use octal values; do not use floating point equality; use one result type in conditional expressions; do not use string concatenation operator in loops; do not drop exceptions; do not abruptly exit a finally block; use generics; use interfaces as types when available; use primitive types; do not remove literals from collections; restrict numeric conversions; program against data races; program against deadlocks; do not rely on the scheduler for synchronization; wait and notify safely; reduce code complexity

    (tags: nasa java reference guidelines coding-standards jpl reliability software coding oo concurrency findbugs bugs)

Links for 2013-03-24

  • KDE’s brush with git repository corruption: post-mortem

    a barely-averted disaster… phew.

    while we planned for the case of the server losing a disk or entirely biting the dust, or the total loss of the VM’s filesystem, we didn’t plan for the case of filesystem corruption, and the way the corruption affected our mirroring system triggered some very unforeseen and pathological conditions. […] the corruption was perfectly mirrored… or rather, due to its nature, imperfectly mirrored. And all data on the anongit [mirrors] was lost.
    One risk demonstrated: by trusting in mirroring, rather than a schedule of snapshot backups covering a wide time range, they nearly had a major outage. Silent data corruption, and code bugs, happen — backups protect against this, but RAID, replication, and mirrors do not. Another risk: they didn’t have a rate limit on project-deletion, which resulted in the “anongit” mirrors deleting their (safe) data copies in response to the upstream corruption. Rate limiting to sanity-check automated changes is vital. What they should have had in place was described by the fix: ‘If a new projects file is generated and is more than 1% different than the previous file, the previous file is kept intact (at 1500 repositories, that means 15 repositories would have to be created or deleted in the span of three minutes, which is extremely unlikely).’

    (tags: rate-limiting case-studies post-mortems kde git data-corruption risks mirroring replication raid bugs backups snapshots sanity-checks automation ops)

  • SpaceX software dev practices

    Metrics rule the roost — I guess there’s been a long history of telemetry in space applications.

    To make software more visible, you need to know what it is doing, he said, which means creating “metrics on everything you can think of”…. Those metrics should cover areas like performance, network utilization, CPU load, and so on. The metrics gathered, whether from testing or real-world use, should be stored as it is “incredibly valuable” to be able to go back through them, he said. For his systems, telemetry data is stored with the program metrics, as is the version of all of the code running so that everything can be reproduced if needed. SpaceX has programs to parse the metrics data and raise an alarm when “something goes bad”. It is important to automate that, Rose said, because forcing a human to do it “would suck”. The same programs run on the data whether it is generated from a developer’s test, from a run on the spacecraft, or from a mission. Any failures should be seen as an opportunity to add new metrics. It takes a while to “get into the rhythm” of doing so, but it is “very useful”. He likes to “geek out on error reporting”, using tools like libSegFault and ftrace. Automation is important, and continuous integration is “very valuable”, Rose said. He suggested building for every platform all of the time, even for “things you don’t use any more”. SpaceX does that and has found interesting problems when building unused code. Unit tests are run from the continuous integration system any time the code changes. “Everyone here has 100% unit test coverage”, he joked, but running whatever tests are available, and creating new ones is useful. When he worked on video games, they had a test to just “warp” the character to random locations in a level and had it look in the four directions, which regularly found problems. “Automate process processes”, he said. Things like coding standards, static analysis, spaces vs. tabs, or detecting the use of Emacs should be done automatically. SpaceX has a complicated process where changes cannot be made without tickets, code review, signoffs, and so forth, but all of that is checked automatically. If static analysis is part of the workflow, make it such that the code will not build unless it passes that analysis step. When the build fails, it should “fail loudly” with a “monitor that starts flashing red” and email to everyone on the team. When that happens, you should “respond immediately” to fix the problem. In his team, they have a full-size Justin Bieber cutout that gets placed facing the team member who broke the build. They found that “100% of software engineers don’t like Justin Bieber”, and will work quickly to fix the build problem.

    (tags: spacex dev coding metrics deplyment production space justin-bieber)

  • on the etymology of “Ketchup”

    ‘the story of ketchup is a story of globalization and centuries of economic domination by a world superpower. But the superpower isn’t America, and the century isn’t ours. Ketchup’s origins in the fermented sauces of China and Southeast Asia mean that those little plastic packets under the seat of your car are a direct result of Chinese and Asian domination of a single global world economy for most of the last millenium.’

    (tags: ketchup china nam-pla food etymology condiments history trade)

Links for 2013-03-23

  • dumping a JVM heap using gdb

    now this is a neat trick — having been stuck having to flip to spares and do other antics while a long-running heap dump took place, this is a winner.

    Dumping a JVM’s heap is an extremely useful tool for debugging problems with a J2EE application. Unfortunately, when a JVM explodes, using the standard jmap tool can take an inordinate amount of time to execute for lots of different reasons. This leads to extended downtime when a heap dump is attempted and even then, jmap regularly fails. This blog post is intended to outline an alternate method using [gdb] to achieve a heap dump that only requires mere seconds of additional downtime allowing the slow jmap process to happen once the application is back in service.

    (tags: heap-dump gdb heap jvm java via:peakscale gcore core core-dump debugging)

  • Edition – Irish Design

    ‘Edition has a ‘design for life’ philosophy – we think that unique designer-made items can be a part of our everyday lives without costing the earth. We stock affordable, contemporary and functional products (mostly handmade), including jewellery, home-ware, accessories, art and toys. Every item has been carefully selected and are all designed here in Ireland.’

    (tags: edition design ireland art graphics jewellery toys)

Links for 2013-03-21

Links for 2013-03-20

Links for 2013-03-19

Links for 2013-03-18

Links for 2013-03-16

  • Roko’s basilisk – RationalWiki

    Wacky transhumanists.

    Roko’s basilisk is notable for being completely banned from discussion on LessWrong, where any mention of it is deleted. Eliezer Yudkowsky, founder of LessWrong, considers the basilisk would not work, but will not explain why because he does not consider open discussion of the notion of acausal trade with possible superintelligences to be provably safe. Silly over-extrapolations of local memes are posted to LessWrong quite a lot; almost all are just downvoted and ignored. But this one, Yudkowsky reacted to hugely, then doubled-down on his reaction. Thanks to the Streisand effect, discussion of the basilisk and the details of the affair soon spread outside of LessWrong. The entire affair is a worked example of spectacular failure at community management and at controlling purportedly dangerous information. Some people familiar with the LessWrong memeplex have suffered serious psychological distress after contemplating basilisk-like ideas — even when they’re fairly sure intellectually that it’s a silly problem.[5] The notion is taken sufficiently seriously by some LessWrong posters that they try to work out how to erase evidence of themselves so a future AI can’t reconstruct a copy of them to torture.[6]

    (tags: transhumanism funny insane stupid singularity ai rokos-basilisk via:maciej lesswrong rationalism superintelligences striesand-effect absurd)

  • How the America Invents Act Will Change Patenting Forever

    Bet you didn’t think the US software patents situation could get worse? wrong!

    “Now it’s really important to be the first to file, and it’s really important to file before somebody else puts a product out, or puts the invention in their product,” says Barr, adding that it will “create a new urgency on the part of everyone to file faster — and that’s going to be a problem for the small inventor.”

    (tags: first-to-file omnishambles uspto swpats patents software-patents law legal)

Links for 2013-03-14

Links for 2013-03-13

Links for 2013-03-12

Links for 2013-03-11

  • Bunnie Huang’s “Hacking the Xbox” now available as a free PDF

    ‘No Starch Press and I have decided to release this free ebook version of Hacking the Xbox in honor of Aaron Swartz. As you read this book, I hope that you’ll be reminded of how important freedom is to the hacking community and that you’ll be inclined to support the causes that Aaron believed in. I agreed to release this book for free in part because Aaron’s treatment by MIT is not unfamiliar to me. In this book, you will find the story of when I was an MIT graduate student, extracting security keys from the original Microsoft Xbox. You’ll also read about the crushing disappointment of receiving a letter from MIT legal repudiating any association with my work, effectively leaving me on my own to face Microsoft. The difference was that the faculty of my lab, the AI laboratory, were outraged by this treatment. They openly defied MIT legal and vowed to publish my work as an official “AI Lab Memo,” thereby granting me greater negotiating leverage with Microsoft. Microsoft, mindful of the potential backlash from the court of public opinion over suing a legitimate academic researcher, came to a civil understanding with me over the issue.’ This is a classic text on hardware reverse-engineering and the freedom to tinker — strongly recommended.

    (tags: hacking bunnie-huang xbox free hardware drm freedom-to-tinker books reading mit microsoft history)