Skip to content

Justin's Linklog Posts

Links for 2013-03-04

Links for 2013-03-01

  • Denominator: A Multi-Vendor Interface for DNS

    the latest good stuff from Netflix.

    Denominator is a portable Java library for manipulating DNS clouds. Denominator has pluggable back-ends, initially including AWS Route53, Neustar Ultra, DynECT, and a mock for testing. We also ship a command line version so it’s easy for anyone to try it out. The reason we built Denominator is that we are working on multi-region failover and traffic sharing patterns to provide higher availability for the streaming service during regional outages caused by our own bugs and AWS issues. To do this we need to directly control the DNS configuration that routes users to each region and each zone. When we looked at the features and vendors in this space we found that we were already using AWS Route53, which has a nice API but is missing some advanced features; Neustar UltraDNS, which has a SOAP based API; and DynECT, which has a REST API that uses a quite different pseudo-transactional model. We couldn’t find a Java based API that grouped together common set of capabilities that we are interested in, so we created one. The idea is that any feature that is supported by more than one vendor API is the highest common denominator, and that functionality can be switched between vendors as needed, or in the event of a DNS vendor outage.

    (tags: dns netflix java tools ops route53 aws ultradns dynect)

  • Making Really Executable Jars

    Who knew? you can make a runnable JAR file!

    There has long been a hack known in some circles, but not widely known, to make jars really executable, in the chmod +x sense. The hack takes advantage of the fact that jar files are zip files, and zip files allow arbitrary cruft to be prepended to the zip file itself (this is how self-extracting zip files work).

    (tags: jars via:netflix shell java executable chmod zip hacks command-line cli)

Links for 2013-02-28

  • Two surgeons debate the use of cycle helmets

    ‘I am a neurosurgeon and a cyclist, and I am also married to a dedicated cyclist. I wear a cycling helmet and encourage cyclists to wear one. I don’t find that wearing one impedes me in any way. I am under no illusion that it will save me in the event of a high speed collision with a car or lorry (nothing will), but most cycling accidents aren’t of the high-speed variety.’ versus: ‘I am a consultant Trauma orthopaedic surgeon working in Edinburgh and have many years of experience treating cyclists after serious road traffic, cycle sport and commuting cycle injuries. I believe there is no justification for helmet laws or promotional campaigns that portray cycling as a particularly ‘dangerous’ activity, or that make unfounded claims about the effectiveness of helmets. By reducing cycle use even slightly, helmet laws or promotion campaigns are likely to cause a significant net disbenefit to public health, regardless of the effectiveness or otherwise of helmets.’ Generally a lot of sense on either side.

    (tags: helmets cycling bicycles health safety surgeons doctors)

  • Storm and Hadoop: Convergence of Big-Data and Low-Latency Processing

    Yahoo! are going big with Storm for their next-generation internal cloud platform: ‘Yahoo! engineering teams are developing technologies to enable Storm applications and Hadoop applications to be hosted on a single cluster. • We have enhanced Storm to support Hadoop style security mechanism (including Kerberos authentication), and thus enable Storm applications authorized to access Hadoop datasets on HDFS and HBase. • Storm is being integrated into Hadoop YARN for resource management. Storm-on-YARN enables Storm applications to utilize the computation resources in our tens of thousands of Hadoop computation nodes. YARN is used to launch Storm application master (Nimbus) on demand, and enables Nimbus to request resources for Storm application slaves (Supervisors).’

    (tags: yahoo yarn cloud-computing private-clouds big-data latency storm hadoop elastic-computing hbase)

  • Trojan paralyses speed cameras in Moscow

    what a coincidence! (via Tony Finch)

    (tags: virus trojans malware via:fanf kaspersky)

  • IOS TCP wifi optimizer

    Basically, tweaking a few suboptimal sysctls to optimize for 802.11b/n; requires a Jailbroken IOS device. I’m surprised that Apple defaulted segment size to 512 to be honest, and disabling delayed ACKs sounds like it might be useful (see also http://www.stuartcheshire.org/papers/NagleDelayedAck/).

    TCP optimizer modifies a few settings inside iOS, including increasing the TCP receive buffer from 131072 to 292000, disabling TCP delayed ACK’s, allowing a maximum of 16 un-ACK’d packets instead of 8 and set the default packet size to 1460 instead of 512. These changes won’t only speed up your YouTube videos, they’ll also improve your internet connection’s performance overall, including Wi-Fi network connectivity.

    (tags: tcp performance tuning ios apple wifi wireless 802.11n sysctl ip)

  • It’s the Sugar, Folks

    A study published in the Feb. 27 issue of the journal PLoS One links increased consumption of sugar with increased rates of diabetes by examining the data on sugar availability and the rate of diabetes in 175 countries over the past decade. And after accounting for many other factors, the researchers found that increased sugar in a population’s food supply was linked to higher diabetes rates independent of rates of obesity. In other words, according to this study, obesity doesn’t cause diabetes: sugar does. The study demonstrates this with the same level of confidence that linked cigarettes and lung cancer in the 1960s. As Rob Lustig, one of the study’s authors and a pediatric endocrinologist at the University of California, San Francisco, said to me, “You could not enact a real-world study that would be more conclusive than this one.”

    (tags: nytimes health food via:fanf sugar eating diabetes papers medicine)

Links for 2013-02-26

Links for 2013-02-25

  • UnoDNS

    ‘Watch Netflix USA, Hulu, Pandora, BBC iPlayer, and more in [sic] anywhere you live!’ — seems to use similar techniques to tunlr.net, looks like it works for my Netflix

    (tags: netflix dns tv tunnelling drm networking spotify hulu)

  • Cassandra, Hive, and Hadoop: How We Picked Our Analytics Stack

    reasonably good whole-stack performance testing and analysis; HBase, Riak, MongoDB, and Cassandra compared. Riak did pretty badly :(

    (tags: riak mongodb cassandra hbase performance analytics hadoop hive big-data storage databases nosql)

  • Big Data Analytics at Netflix. Interview with Christos Kalantzis and Jason Brown.

    Good interview with the Cassandra guys at Netflix, and some top Mongo-bashing in the comments

    (tags: cassandra netflix user-stories testimonials nosql storage ec2 mongodb)

  • Werner Knaupp – Acrylbilder

    my favourite art of the moment. Thick, heavy layers of acrylic black and white paint, evoking the stormy Atlantic (brr). Gallery Bode, which showed this in Nuremberg in 2011, wrote the following at http://www.bode-galerie.de/en/exhibitions/schwarz_weiss :

    Gallery Bode is pleased to constitute the cooperation with Werner Knaupp with an exhibition of a new workseries. The exhibition showcases artworks out of the series “Westmen Isles”. […] The journeys to Iceland are a background to the development of this new workseries. These paintings are telling of a forbidding nature. The beholder can’t take a [safe] position but he is involved into the event which becomes comprehensible in a nearly physical way. These pictures of a overwhelming nature could be traced back to Knaupp’s confrontation with the force of nature while his journeys. The experience of this force pushes the limits of human being and evokes primal fear. With the abdication of colours the artworks reach dynamic. This foots on the consistency of colour and on the changing between reality and abstraction. In an art historical view the new black and white paintings detached themselves from traditional landscape painting. Werner Knaupp implements the pure force of nature into pure painting, to visualise the force fields of nature. The beholder experiences with these artworks a nature without human dimension. In Werner Knaupp’s Oeuvre the “Westmen Isles” paintings are a new expression of his examination with existential fundamental questions.

    (tags: germany art painting werner-knaupp paintings monochrome sea iceland)

Links for 2013-02-22

  • Indymedia: It’s time to move on

    Our decision to curtail publishing on the Nottingham Indymedia site and call a meeting is an attempt to create a space for new ideas. We are not interested in continuing along the slow but certain path to total irrelevance but want to draw in new people and start off in new directions whilst remaining faithful to the underlying principles of Indymedia.

    (tags: indymedia community communication web anonymity publishing left-wing)

  • How to revert a faulty merge in git

    omgwtf, this is pretty horrific.

    (tags: merging git merge omgwtf version-control branching)

  • #AltDevBlogADay » Latency Mitigation Strategies

    John Carmack on the low-latency coding techniques used to support head mounted display devices.

    Virtual reality (VR) is one of the most demanding human-in-the-loop applications from a latency standpoint. The latency between the physical movement of a user’s head and updated photons from a head mounted display reaching their eyes is one of the most critical factors in providing a high quality experience. Human sensory systems can detect very small relative delays in parts of the visual or, especially, audio fields, but when absolute delays are below approximately 20 milliseconds they are generally imperceptible. Interactive 3D systems today typically have latencies that are several times that figure, but alternate configurations of the same hardware components can allow that target to be reached. A discussion of the sources of latency throughout a system follows, along with techniques for reducing the latency in the processing done on the host system.

    (tags: head-mounted-display display ui latency vision coding john-carmack)

Links for 2013-02-21

  • Distributed Streams Algorithms for Sliding Windows [PDF]

    ‘Massive data sets often arise as physically distributed, parallel data streams, and it is important to estimate various aggregates and statistics on the union of these streams. This paper presents algorithms for estimating aggregate functions over a “sliding window” of the N most recent data items in one or more streams. […] Our results are obtained using a novel family of synopsis data structures called waves.’

    (tags: waves papers streaming algorithms percentiles histogram distcomp distributed aggregation statistics estimation streams)

  • good blog post on histogram-estimation stream processing algorithms

    After reviewing several dozen papers, a score or so in depth, I identified two data structures that appear to enable us to answer these recency and frequency queries: exponential histograms (from “Maintaining Stream Statistics Over Sliding Windows” by Datar et al.) and waves (from “Distributed Streams Algorithms for Sliding Windows” by Gibbons and Tirthapura). Both of these data structures are used to solve the so-called counting problem, the problem of determining, with a bound on the relative error, the number of 1s in the last N units of time. In other words, the data structures are able to answer the question: how many 1s appeared in the last n units of time within a factor of Error (e.g., 50%). The algorithms are neat, so I’ll present them briefly.

    (tags: streams streaming stream-processing histograms percentiles estimation waves statistics algorithms)

  • Timelike 2: everything fails all the time

    Fantastic post on large-scale distributed load balancing strategies from @aphyr. Random and least-conns routing comes out on top in his simulation (although he hasn’t yet tried Marc Brooker’s two-randoms routing strategy)

    (tags: via:hn routing distributed least-conns load-balancing round-robin distcomp networking scaling)

  • Marc Brooker’s “two-randoms” load balancing approach

    Marc Brooker on this interesting load-balancing algorithm, including simulation results:

    Using stale data for load balancing leads to a herd behavior, where requests will herd toward a previously quiet host for much longer than it takes to make that host very busy indeed. The next refresh of the cached load data will put the server high up the load list, and it will become quiet again. Then busy again as the next herd sees that it’s quiet. Busy. Quiet. Busy. Quiet. And so on. One possible solution would be to give up on load balancing entirely, and just pick a host at random. Depending on the load factor, that can be a good approach. With many typical loads, though, picking a random host degrades latency and reduces throughput by wasting resources on servers which end up unlucky and quiet. The approach taken by the studies surveyed by Mitzenmacher is to try two hosts, and pick the one with the least load. This can be done directly (by querying the hosts) but also works surprisingly well on cached load data. […] Best of 2 is good because it combines the best of both worlds: it uses real information about load to pick a host (unlike random), but rejects herd behavior much more strongly than the other two approaches.
    Having seen what Marc has worked on, and written, inside Amazon, I’d take this very seriously… cool to see he is blogging externally too.

    (tags: algorithm load-balancing distcomp distributed two-randoms marc-brooker least-conns)

  • Can regular expressions parse HTML?

    ‘a summary of the main points: The “regular expressions” used by programmers have very little in common with the original notion of regularity in the context of formal language theory. Regular expressions (at least PCRE) can match all context-free languages. As such they can also match well-formed HTML and pretty much all other programming languages. Regular expressions can match at least some context-sensitive languages. Matching of regular expressions is NP-complete. As such you can solve any other NP problem using regular expressions.’

    (tags: compsci regexps regular-expressions programming np-complete chomsky-grammar context-free languages)

Links for 2013-02-19

Links for 2013-02-18

  • Fatcache

    from Twitter — ‘a cache for your big data. Even though memory is thousand times faster than SSD, network connected SSD-backed memory makes sense, if we design the system in a way that network latencies dominate over the SSD latencies by a large factor. To understand why network connected SSD makes sense, it is important to understand the role distributed memory plays in large-scale web architecture. In recent years, terabyte-scale, distributed, in-memory caches have become a fundamental building block of any web architecture. In-memory indexes, hash tables, key-value stores and caches are increasingly incorporated for scaling throughput and reducing latency of persistent storage systems. However, power consumption, operational complexity and single node DRAM cost make horizontally scaling this architecture challenging. The current cost of DRAM per server increases dramatically beyond approximately 150 GB, and power cost scales similarly as DRAM density increases. Fatcache extends a volatile, in-memory cache by incorporating SSD-backed storage.’

    (tags: twitter ssd cache caching memcached memcache memory network storage)

  • Passively Monitoring Network Round-Trip Times – Boundary

    ‘how Boundary uses [TCP timestamps] to calculate round-trip times (RTTs) between any two hosts by passively monitoring TCP traffic flows, i.e., without actively launching ICMP echo requests (pings). The post is primarily an overview of this one aspect of TCP monitoring, it also outlines the mechanism we are using, and demonstrates its correctness.’

    (tags: tcp boundary monitoring network ip passive-monitoring rtt timestamping)

  • drug cartel-controlled mobile comms networks

    “The Mexican military has recently broken up several secret telecommunications networks that were built and controlled by drug cartels so they could coordinate drug shipments, monitor their rivals and orchestrate attacks on the security forces. A network that was dismantled just last week provided cartel members with cellphone and radio communications across four northeastern states. The network had coverage along almost 500 miles of the Texas border and extended nearly another 500 miles into Mexico’s interior. Soldiers seized 167 antennas, more than 150 repeaters and thousands of cellphones and radios that operated on the system. Some of the remote antennas and relay stations were powered with solar panels.”

    (tags: mexico drugs networks mobile-phones crime)

  • Heroku finds out that distributed queueing is hard

    Stage 3 of the Rap Genius/Heroku blog drama. Summary (as far as I can tell): Heroku gave up on a fully-synchronised load-balancing setup (“intelligent routing”), since it didn’t scale, in favour of randomised queue selection; they didn’t sufficiently inform their customers, and metrics and docs were not updated to make this change public; the pessimal case became pretty damn pessimal; a customer eventually noticed and complained publicly, creating a public shit-storm. Comments: 1. this is why you monitor real HTTP request latency (scroll down for crazy graphs!). 2. include 90/99 percentiles to catch the “tail” of poorly-performing requests. 3. Load balancers are hard. http://aphyr.com/posts/277-timelike-a-network-simulator has more info on the intricacies of distributed load balancing — worth a read.

    (tags: heroku rap-genius via:hn networking distcomp distributed load-balancing ip queueing percentiles monitoring)

  • Unhelpful Graphite Tips

    10 particularly good — actually helpful — tips on using the Graphite metric graphing system

    (tags: graphite ops metrics service-metrics graphing ui dataviz)

  • Literate Jenks Natural Breaks and How The Idea Of Code is Lost

    A crazy amount of code archaeology to discover exactly an algorithm — specifically ‘Jenks natural breaks”, works, after decades of cargo-cult copying (via Nelson): ‘I spent a day reading the original text and decoding as much as possible of the code’s intention, so that I could write a ‘literate’ implementation. My definition of literate is highly descriptive variable names, detailed and narrative comments, and straightforward code with no hijinks. So: yes, this isn’t the first implementation of Jenks in Javascript. And it took me several times longer to do things this way than to just get the code working. But the sad and foreboding state of this algorithm’s existing implementations said that to think critically about this code, its result, and possibilities for improvement, we need at least one version that’s clear about what it’s doing.’

    (tags: jenks-natural-breaks algorithms chloropleth javascript reverse-engineering history software copyright via:nelson)

  • don’t order a Raspberry Pi from RS

    I’ve been waiting 24 days for mine so far. Frankly amazing they are so apparently inept, particularly since it seems in breach of EU distance selling regulation if they go beyond 30 days without an update. They’ve just posted this:

    Quick update- we received our delivery of raspberry pi’s last week and as of Friday we had shipped up to order reference 1010239854. We will continue daily to get your orders shipped out as quickly as we possibly can; so that you will all receive your raspberry pi’s shortly. Many thanks everyone for your patience and again apologies for the delay in the dispatch update message on the Pi Store which I know has caused some confusion.

    (tags: rs raspberry-pi inept etailers uk e-commerce shopping hardware)

  • more details on the UK distance selling regulations governing Raspberry Pi RS orders

    ‘my understanding is that according to the Distance Selling Regulations […], unless you agreed otherwise with RS, then they were obligated to fulfill their side of the contract within thirty days from the day after you ordered, and if they were unable to do so they were also obligated to inform you that they could not and repay you within thirty days;ons (more info here in a nice, easy-to-read format), unless you agreed otherwise with RS, then they were obligated to fulfill their side of the contract within thirty days from the day after you ordered, and if they were unable to do so they were also obligated to inform you that they could not and repay you within thirty days’

    (tags: rs shopping etailers inept distance-selling uk law)

Links for 2013-02-12

Links for 2013-02-11

Links for 2013-02-09

Links for 2013-02-07

  • High Scalability – Analyzing billions of credit card transactions and serving low-latency insights in the cloud

    Hadoop, a batch-generated read-only Voldemort cluster, and an intriguing optimal-storage histogram bucketing algorithm:

    The optimal histogram is computed using a random-restart hill climbing approximated algorithm. The algorithm has been shown very fast and accurate: we achieved 99% accuracy compared to an exact dynamic algorithm, with a speed increase of one factor. […] The amount of information to serve in Voldemort for one year of BBVA’s credit card transactions on Spain is 270 GB. The whole processing flow would run in 11 hours on a cluster of 24 “m1.large” instances. The whole infrastructure, including the EC2 instances needed to serve the resulting data would cost approximately $3500/month.

    (tags: scalability scaling voldemort hadoop batch algorithms histograms statistics bucketing percentiles)

  • Splout

    ‘Splout is a scalable, open-source, easy-to-manage SQL big data view. Splout is to Hadoop + SQL what Voldemort or Elephant DB are to Hadoop + Key/Value. Splout serves a read-only, partitioned SQL view which is generated and indexed by Hadoop.’ Some FAQs: ‘What’s the difference between Splout SQL and Dremel-like solutions such as BigQuery, Impala or Apache Drill? Splout SQL is not a “fast analytics” Dremel-like engine. It is more thought to be used for serving datasets under web / mobile high-throughput, many lookups, low-latency applications. Splout SQL is more like a NoSQL database in the sense that it has been thought for answering queries under sub-second latencies. It has been thought for performing queries that impact a very small subset of the data, not queries that analyze the whole dataset at once.’

    (tags: splout sql big-data hadoop read-only scaling queries analytics)

  • Goonwaffe Stories: A Guide For Newbies [PDF]

    impressively high-quality newbie’s guide from the Goonswarm Federation — as themittani.com describes it, ‘frankly a work of art: a 1950’s Pulp Scifi magazine full of internet spaceships and sociopathy.’

    (tags: eve-online space goonswarm gaming mmo pdf pulp science-fiction)

Links for 2013-02-06

  • Evasi0n Jailbreak’s Userland Component

    Good writeup of the exploit techniques used in the new iOS jailbreak.

    Evasi0n is interesting because it escalates privileges and has full access to the system partition all without any memory corruption.  It does this by exploiting the /var/db/timezone vulnerability to gain access to the root user’s launchd socket.  It then abuses launchd to load MobileFileIntegrity with an inserted codeless library, which is overriding MISValidateSignature to always return 0.

    (tags: jailbreak ios iphone ipad exploits evasi0n via:nelson)

Links for 2013-02-05

  • Programming Language Checklist

    ‘You appear to be advocating a new: [ ] functional [ ] imperative [ ] object-oriented [ ] procedural [ ] stack-based [ ] “multi-paradigm” [ ] lazy [ ] eager [ ] statically-typed [ ] dynamically-typed [ ] pure [ ] impure [ ] non-hygienic [ ] visual [ ] beginner-friendly [ ] non-programmer-friendly [ ] completely incomprehensible programming language. Your language will not work. Here is why it will not work.’

    (tags: humor programming funny coding languages)

  • Jetty-9 goes fast with Mechanical Sympathy

    This is very cool! Applying Mechanical Sympathy optimization techniques to Jetty, specifically: “False sharing” on the BlockingArrayQueue data structure resolved; a new ArrayTernaryTrie data structure to improve header field storage, making it faster to build. look up, efficient on RAM, cheap to GC, and more cache-friendly than a traditional trie; and a branchless hex-to-byte conversion statement. The results are a 30%-faster microbenchmark on amd64, with 50% less Young Gen garbage collections. Lovely to see low-level infrastructure libs like Jetty getting this kind of optimization.

    (tags: jetty java mechanical-sympathy optimization coding tries)

  • Event Bars – Craft Beer

    craft beer kegs for hire in Dublin, Sligo, Limerick and Galway. Needs more Metalman, of course ;)

    (tags: beer ireland craft-beer keg-hire events parties)

Links for 2013-02-04

Links for 2013-02-04

Links for 2013-02-01

  • IPMI: Freight Train To Hell

    ‘Intel’s Intelligent Platform Management Interface (IPMI), which is implemented and added onto by all server vendors, grant system administrators with a means to manage their hardware in an Out of Band (OOB) or Lights Out Management (LOM) fashion. However there are a series of design, utilization, and vendor issues that cause complex, pervasive, and serious security infrastructure problems. The BMC is an embedded computer on the motherboard that implements IPMI; it enjoys an asymmetrical relationship with its host, with the BMC able to gain full control of memory and I/O, while the server is both blind and impotent against the BMC. Compromised servers have full access to the private IPMI network The BMC uses reusable passwords that are infrequently changed, widely shared among servers, and stored in clear text in its storage. The passwords may be disclosed with an attack on the server, over the network network against the BMC, or with a physical attack against the motherboard (including after the server has been decommissioned.) IT’s reliance on IPMI to reduce costs, the near-complete lack of research, 3rd party products, or vendor documentation on IPMI and the BMC security, and the permanent nature of the BMC on the motherboard make it currently very difficult to defend, fix or remediate against these issues.’ (via Tony Finch)

    (tags: via:fanf security ipmi power-management hardware intel passwords bios)

  • java – Given that HashMaps in jdk1.6 and above cause problems with multi-threading, how should I fix my code – Stack Overflow

    Massive Java concurrency fail in recent 1.6 and 1.7 JDK releases — the java.util.HashMap type now spin-locks on an AtomicLong in its constructor. Here’s the response from the author: ‘I’ll acknowledge right up front that the initialization of hashSeed is a bottleneck but it is not one we expected to be a problem since it only happens once per Hash Map instance. For this code to be a bottleneck you would have to be creating hundreds or thousands of hash maps per second. This is certainly not typical. Is there really a valid reason for your application to be doing this? How long do these hash maps live?’ Oh dear. Assumptions of “typical” like this are not how you design a fundamental data structure. fail. For now there is a hacky reflection-based workaround, but this is lame and needs to be fixed as soon as possible. (Via cscotta)

    (tags: java hashmap concurrency bugs fail security hashing jdk via:cscotta)

  • High Scalability – geo-aware traffic load balancing and caching at CNBC.com

    Dyn’s anycast DNS service, as used by CNBC.com

    (tags: anycast dns scalability dyn failover geographical load-balancing)

Links for 2013-01-31

  • Using Statsd and Graphite From a Rails App

    Reasonable simple, from the looks of it

    (tags: rails graphite metrics service-metrics ruby)

  • The colour of London’s commute

    Nice visualisation. ‘What the map shows is the mix of transport to work of residents living in each part of London*, using ONS data at Middle Super Output Area (MSOA) level. Each MSOA is given an RGB colour determined by the modal share, with red colours representing travel by car, taxi or motorbike, blue travel by public transport and green cycling or walking. The result is a fairly simple pattern, with motor vehicles predominating on London’s fringes, public transport in the inner suburbs and cycling and walking in the very centre. Those tendrils of blue reaching out presumably represent major public transport links.’

    (tags: data visualisation dataviz london mapping via:ldoody)

Links for 2013-01-30

Links for 2013-01-29

Links for 2013-01-27

  • Ironfan

    ‘an expressive toolset for constructing scalable, resilient [service] architectures. It works in the cloud, in the data center, and on your laptop, and it makes your system diagram visible and inevitable. Inevitable systems coordinate automatically to interconnect, removing the hassle of manual configuration of connection points (and the associated danger of human error).’ Looks like a pretty neat cluster deployment tool; driven from a single configuration file, using Chef, integrating closely with AWS and providing many useful additional features

    (tags: chef deployment clusters knife services aws ec2 ops ironfan demo)

  • Fox DMCA Takedowns Order Google to Remove Fox DMCA Takedowns

    Chilling Effects is setup to stop the ‘chilling effects’ of Internet censorship. Google sees this as a good thing and sends takedown requests it receives to be added to the database. Fox sends takedown requests to Google for pages which the company says contain links to material it holds the copyright to. Those pages include those on Chilling Effects which show which links Fox wants taken down. Google delists the Chilling Effects pages from its search engine, thus completing the circle and defeating the very reason Chilling Effects was set up for in the first place.

    (tags: chilling-effects copyright internet legal dmca google law)

  • PUBLIC joho / 7XX-rfc

    At Railscamp X it became clear there is a gap in the current HTTP specification. There are many ways for a developer to screw up their implementation, but no code to share the nature of the error with the end user. We humbly suggest the following status codes are included in the HTTP spec in the 7XX range.
    Includes such useful status codes as “724 – This line should be unreachable”.

    (tags: http standards humour funny jokes)

  • How Newegg crushed the “shopping cart” patent and saved online retail

    Very cool account of Newegg’s battle against a ludicrous patent-troll shakedown. Great quote from their Chief Legal Officer, Lee Cheng:

    Patent trolling is based upon deficiencies in a critical, but underdeveloped, area of the law. The faster we drive these cases to verdict, and through appeal, and also get legislative reform on track, the faster our economy will be competitive in this critical area. We’re competing with other economies that are not burdened with this type of litigation. China doesn’t have this, South Korea doesn’t have this, Europe doesn’t have this. […] It’s actually surprising how quickly people forget what Lemelson did. [referring to Jerome Lemelson, an infamous patent troll who used so-called “submarine patents” to make billions in licensing fees.] This activity is very similar. Trolls right now “submarine” as well. They use timing, like he used timing. Then they pop up and say “Hello, surprise! Give us your money or we will shut you down!” Screw them. Seriously, screw them. You can quote me on that.

    (tags: patent-trolls east-texas newegg shopping-cart swpat software-patents patents ecommerce soverain)

  • Implementing strcmp, strlen, and strstr using SSE 4.2 instructions – strchr.com

    Using new Intel Core i7 instructions to speed up string manipulation.
    Fascinating stuff. SSE ftw

    (tags: sse optimization simd assembly intel i7 intel-core strstr strings string-matching strchr strlen coding)

Links for 2013-01-26

  • All polar bears descended from one Irish grizzly

    ‘THE ARCTIC’S DWINDLING POPULATION of polar bears all descend from a single mamma brown bear which lived 20,000 to 50,000 years ago in present-day Ireland, new research suggests. DNA samples from the great white carnivores – taken from across their entire range in Russia, Canada, Greenland, Norway and Alaska – revealed that every individual’s lineage could be traced back to this Irish forebear.’ More than the average bear, I guess

    (tags: animals biology science dna history ireland bears polar-bears grizzly-bears via:ben)

  • Basho | Alert Logic Relies on Riak to Support Rapid Growth

    ‘The new [Riak-based] analytics infrastructure performs statistical and correlation processing on all data […] approximately 5 TB/day. All of this data is processed in real-time as it streams in. […] Alert Logic’s analytics infrastructure, powered by Riak, achieves performance results of up to 35k operations/second across each node in the cluster – performance that eclipses the existing MySQL deployment by a large margin on single node performance. In real business terms, the initial deployment of the combination of Riak and the analytic infrastructure has allowed Alert Logic to process in real-time 7,500 reports, which previously took 12 hours of dedicated processing every night.’ Twitter discussion here: https://twitter.com/fisherpk/status/294984960849367040 , which notes ‘heavily cached SAN storage, 12 core blades and 90% get to put ops’, and ‘3 riak nodes, 12-cores, 30k get heavy riak ops/sec. 8 nodes driving ops to that cluster’. Apparently the use of SAN storage on all nodes is historic, but certainly seems to have produced good iops numbers as an (expensive) side-effect…

    (tags: iops riak basho ops systems alert-logic storage nosql databases)

  • Turn a Raspberry Pi Into an AirPlay Receiver for Streaming Music in Your Living Room

    hooray, a viable domestic Raspberry Pi use case at last ;)

    (tags: raspberry-pi audio music mp3 home hardware)

  • Antigua Government Set to Launch “Pirate” Website To Punish United States

    oh the lulz.

    The Government of Antigua is planning to launch a website selling movies, music and software, without paying U.S. copyright holders. The Caribbean island is taking the unprecedented step because the United States refuses to lift a trade “blockade” preventing the island from offering Internet gambling services, despite several WTO decisions in Antigua’s favor. The country now hopes to recoup some of the lost income through a WTO approved “warez” site.

    (tags: us-politics antigua piracy filesharing pirate gambling wto ip blockades)

Links for 2013-01-25

  • Big Data Lambda Architecture

    An article by Nathan “Storm” Marz describing the system architecture he’s been talking about for a while; Hadoop-driven batch view, Storm-driven “speed view”, and a merging API

    (tags: storm systems architecture lambda-architecture design Hadoop)

  • Network graph viz of Irish politicians and organisations on Twitter

    generated by the Clique Research Cluster at UCD and DERI. ‘a visualization of the unified graph representation for the users in the data, produced using Gephi and sigma.js. Users are coloured according to their community (i.e. political affiliation). The size of each node is proportional to its in-degree (i.e. number of incoming links).’ sigma.js provides a really user-friendly UI to the graphs, although — as with most current graph visualisations — it’d be particularly nice if it was possible to ‘tease out’ and focus on interesting nodes, and get a pasteable URL of the result, in context. Still, the most usable graph viz I’ve seen in a while…

    (tags: graphs dataviz ucd research ireland twitter networks community sigma.js javascript canvas gephi)

  • 50 Watts

    Incredible blog of book covers and illustrations, much from the 1970s

    (tags: illustration art prints 1970s graphics)

  • Namazu-e: Earthquake catfish prints

    ‘In November 1855, the Great Ansei Earthquake struck the city of Edo (now Tokyo), claiming 7,000 lives and inflicting widespread damage. Within days, a new type of color woodblock print known as namazu-e (lit. “catfish pictures”) became popular among the residents of the shaken city. These prints featured depictions of mythical giant catfish (namazu) who, according to popular legend, caused earthquakes by thrashing about in their underground lairs. In addition to providing humor and social commentary, many prints claimed to offer protection from future earthquakes.’

    (tags: japan art namazu-e ukiyo-e catfish earthquakes myth)

Links for 2013-01-24

Links for 2013-01-23

  • fail0verflow ::

    Excellent demo of how use of a block cipher with a known secret key makes an insecure MAC. “In short, CBC-MAC is a Message Authentication Code, not a strong hash function. While MACs can be built out of hash functions (e.g. HMAC), and hash functions can be built out of block ciphers like AES, not all MACs are also hash functions. CBC-MAC in particular is completely unsuitable for use as a hash function, because it only allows two parties with knowledge of a particular secret key to securely transmit messages between each other. Anyone with knowledge of that key can forge the messages in a way that keeps the MAC (“hash value”) the same. All you have to do is run the forged message through CBC-MAC as usual, then use the AES decryption operation on the original hash value to find the last intermediate state. XORing this state with the CBC-MAC for the forged message yields a new block of data which, when appended to the forged message, will cause it to have the original hash value. Because the input is taken backwards, you can either modify the first block of the file, or just run the hash function backwards until you reach the block that you want to modify. You can make a forged file pass the hash check as long as you can modify an arbitrary aligned 16-byte block in it.”

    (tags: crypto hashing security cbc mac sha1 aes)

Leaving Amazon

So, after just over 3 and a half years, I’m leaving Amazon.

It’s been great fun — I can honestly say, even with my code being used by hundreds of millions of users in SpamAssassin and elsewhere, I hadn’t really had to come to grips with the distributed systems problems that an Amazon-scale service involves.

During my time at Amazon, I’ve had the pleasure of building out a brand-new, groundbreaking innovative internal service, from scratch to its current status where it’s deployed in production datacenters worldwide. It’s a low-latency service, used to monitor Amazon’s internal networks using massive quantities of measurement data and machine learning algorithms. It’s really very nifty, and I’m quite proud of what we’ve achieved. I was lucky to work closely with some very smart people during this, too — Amazon has some top-notch engineers.

But time to move on! In a week’s time, I’ll be joining Swrve to work on the server-side architecture of their system. Swrve have a very interesting product, extending the A/B-testing model into gaming, and a great team; and it’ll be nice to get back into startup-land once again, for a welcome change. (It’s not all roses working for a big company. ;) I’m looking forward to it. Who knows, I may even start blogging here again…

Pity about losing those 12 phone tool icons though!

Links for 2013-01-18

  • CES: Worse Products Through Software

    ‘The companies out there that know how to make decent software have been steadily eating their way into and through markets previously dominated by the hardware guys. Apple with music players, TiVo with video recording, even Microsoft with its decade-old Xbox Live service, which continues to embarrass the far weaker offerings from Sony and Nintendo. (And, yes, iOS is embarrassing all three console makers.)’ See also Mat Honan’s article at http://www.wired.com/gadgetlab/2012/12/internet-tv-sucks/ : ‘Smart TVs are just too complicated. They have terrible user interfaces that differ wildly from device to device. It’s not always clear what content is even available — for example, after more than two years on the market, you still can’t watch Hulu Plus on your Google TV. […] They give us too many options for apps most people will never use, and they do so at the expense of making it simple to find the shows and movies we want to watch, no matter where they are, be it online or on the air. As NPD puts it in the conclusion to its report, “OEMs and retailers need to focus less on new innovation in this space and more on simplification of the user experience and messaging if they want to drive additional, and new, behaviors on the TV.” Which is a more polite way of saying, clean up your horrible interface, Samsung.’ (via Craig)

    (tags: via:craig design ui tv hardware television sony ces software)

  • Fast Packed String Matching for Short Patterns [paper, PDF]

    ‘Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like NLP, information retrieval and computational biology. In the last two decades a general trend has appeared trying to exploit the power of the word RAM model to speed-up the performances of classical string matching algorithms. […] In this paper we use specialized word-size packed string matching instructions, based on the Intel streaming SIMD extensions (SSE) technology, to design very fast string matching algorithms in the case of short patterns.’ Reminds me of http://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm , but taking advantage of SIMD extensions, which should make things nice and speedy, at the cost of tying it to specific hardware platforms. (via Tony Finch)

    (tags: rabin-karp algorithms strings string-matching papers via:fanf)

  • Irish EU Council Presidency proposes destruction of right to privacy | EDRI

    ‘For example, based on the current situation in Ireland, the idea is that all companies can do whatever they want with personal data, without fear of sanction. Sanctions, such as fines, “should be optional or at least conditional upon a prior warning or reprimand”. In other words, do what you want, the worst that can happen is that you will receive a warning.’ Shame! Daragh O’Brien’s comment: ‘utter idiocy’. ( at https://twitter.com/daraghobrien/status/292041500873850880 )

    (tags: privacy ireland eu fail data-protection data-privacy politics)

Links for 2013-01-17

Links for 2013-01-15

  • The Neurocritic: Fisher-Price Synesthesia

    ‘Synesthesia [jm: sic] is a rare perceptual phenomenon in which the stimulation of one sensory modality, or exposure to one type of stimulus, leads to a sensory (or cognitive) experience in a different, non-stimulated modality. For instance, some synesthetes have colored hearing while others might taste shapes. GRAPHEME-COLOR SYNESTHESIA is the condition in which individual printed letters are perceived in a specific, constant color. This occurs involuntarily and in the absence of colored font. […] A new study has identified 11 synesthetes whose grapheme-color mappings appear to be based on the Fisher Price plastic letter set made between 1972-1990.’ (via Dave Green)

    (tags: fisher-price synesthesia synaesthesia colors colours sight neuroscience brain via-dave-green toys)

  • Extreme Performance with Java – Charlie Hunt [slides, PDF]

    presentation slides for Charlie Hunt’s 2012 QCon presentation, where he discusses ‘what you need to know about a modern JVM in order to be effective at writing a low latency Java application’. The talk video is at http://www.infoq.com/presentations/Extreme-Performance-Java

    (tags: low-latency charlie-hunt performance java jvm presentations qcon slides pdf)

  • Leopold’s Day Map

    ‘Bloomsday Map Of Dublin Based On Ulysses’. Beautiful! ‘The Leopold’s Day map is a stunning marriage of typography and cartography plotting all the streets alluded to by Joyce in Ulysses which were in existence on June 16th 1904. It is accompanied by a comprehensive and beautifully typeset directory with over 400 entries noting the landmarks, business and people of Dublin that were referenced in the text. The Leopold’s Day map is an exquisitely detailed, limited edition piece. It has an impressive dimension of 1000mm x 700mm which means it can also fit into a ready made frame. Price: €125.00’

    (tags: bloomsday ulysses dublin ireland maps james-joyce art prints)

  • aaw/hyperloglog-redis – GitHub

    ‘This gem is a pure Ruby implementation of the HyperLogLog algorithm for estimating cardinalities of sets observed via a stream of events. A Redis instance is used for storing the counters.’

    (tags: cardinality sets redis algorithms ruby gems hyperloglog)

Links for 2013-01-14

  • Tunlr

    ‘uses DNS witchcraft to allow you to access US/UK-only audio and video services like Hulu.com, BBC iPlayer, etc. without using a VPN or Web proxy.’ According to http://superuser.com/questions/461316/how-does-tunlr-work , it proxies the initial connection setup and geo-auth, then mangles the stream address to stream directly, not via proxy. Sounds pretty useful

    (tags: proxy network vpn dns tunnel content video audio iplayer bbc hulu streaming geo-restriction)

  • OmniTI’s Experiences Adopting Chef

    A good, in-depth writeup of OmniTI’s best practices with respect to build-out of multiple customer deployments, using multi-tenant Chef from a version-controlled repo. Good suggestions, and I am really looking forward to this bit: ‘Chef tries to turn your system configuration into code. That means you now inherit all the woes of software engineering: making changes in a coordinated manner and ensuring that changes integrate well are now an even greater concern. In part three of this series, we’ll look at applying software quality assurance and release management practices to Chef cookbooks and roles.’

    (tags: chef deployment ops omniti systems vagrant automation)

  • Effective Scala

    Twitter’s Scala style guide. ‘While highly effective, Scala is also a large language, and our experiences have taught us to practice great care in its application. What are its pitfalls? Which features do we embrace, which do we eschew? When do we employ “purely functional style”, and when do we avoid it? In other words: what have we found to be an effective use of the language? This guide attempts to distill our experience into short essays, providing a set of best practices. Our use of Scala is mainly for creating high volume services that form distributed systems — and our advice is thus biased — but most of the advice herein should translate naturally to other domains.’

    (tags: twitter scala coding style)

  • Notes on Distributed Systems for Young Bloods — Something Similar

    ‘Below is a list of some lessons I’ve learned as a distributed systems engineer that are worth being told to a new engineer. Some are subtle, and some are surprising, but none are controversial. This list is for the new distributed systems engineer to guide their thinking about the field they are taking on. It’s not comprehensive, but it’s a good beginning.’ This is a pretty nice list, a little over-stated, but that’s the format. I particularly like the following: ‘Exploit data-locality’; ‘Learn to estimate your capacity’; ‘Metrics are the only way to get your job done’; ‘Use percentiles, not averages’; ‘Extract services’.

    (tags: systems distributed distcomp cap metrics coding)

Links for 2013-01-11

  • check_graphite

    ‘a Nagios plugin to poll Graphite’. Necessary, since service metrics are the true source of service health information

    (tags: nagios graphite service-metrics ops)

  • paperplanes. The Virtues of Monitoring, Redux

    A rather vague and touchy-feely “state of the union” post on monitoring. Good set of links at the end, though; I like the look of Sensu and Tasseo, but am still unconvinced about the value of Boundary’s offering

    (tags: monitoring metrics ops)

  • What happened to KHTML after Apple announced Safari

    ‘There was a huge amount of excitement at the announcement that Safari would be using KHTML. At that time, it was almost a given that the OSS rendering engine was Gecko. KHTML was KDE’s little engine that could. But nobody ever expected it to be picked up by other folks. One of the original parts of the KHTML-to-OS X port was KWQ (pronounced, “quack”) that abstracted out the KDE API portions that were used in KHTML. Folks were pretty ecstatic at first. It seemed very validating. But that changed quickly. As Zack’s post indicates, WebKit became a thing of unmergable code-drops. Even inside of the KDE community there became a split between the KHTML purists and the WebKit faction. They’d previously more or less all been KHTML developers, but post-WebKit there was something of a pragmatists vs. idealists split. Zack fell on the latter side of that (for understandable reasons: there was an existing community project, with its own set of values, and that was hijacked to a large extent by WebKit). A few years later WebKit transformed itself into a more or less valid open source project (see webkit.org), but that didn’t close the rift in the KDE community between the two, at that point rather divergent, rendering engines. There’s still some remaining melancholy that stems from that initial hope and what could have potentially been, but wasn’t.’

    (tags: history safari open-source code-drops over-the-wall webkit khtml kde oss apple)

  • The Justin Masonic Lodge

    whoa. (via Dave O’Riordan)

    (tags: wtf masons names me texas)

  • Dan McKinley :: Whom the Gods Would Destroy, They First Give Real-time Analytics

    ‘It’s important to divorce the concepts of operational metrics and product analytics. [..] Funny business with timeframes can coerce most A/B tests into statistical significance.’ ‘The truth is that there are very few product decisions that can be made in real time.’ HN discussion: http://news.ycombinator.com/item?id=5032588

    (tags: real-time analytics statistics a-b-testing)

Links for 2013-01-10

  • Greyhound agrees to change consumer contracts and make refunds – National Consumer Agency

    Take note, switchers: ‘The National Consumer Agency (NCA) has received a commitment from Greyhound that it will amend certain terms in its standard consumer contract, which the NCA thinks are unfair to consumers. This will be done by January 18 2013. Among the terms considered unfair by the NCA are that consumers must forfeit their credit balance and pay a €45 administration fee, if they cancel their contract with Greyhound within 12 months. If you were charged money in these circumstances, Greyhound has agreed to refund you. Greyhound will communicate these changes to all of its consumers by 18 January 2013. If you have any questions about the changes or getting a refund, you should contact Greyhound directly.’

    (tags: greyhound consumer ireland dublin rubbish)

  • Pushover: Simple Mobile Notifications for Android and iOS

    ‘Pushover makes it easy to send real-time notifications to your Android and iOS devices.’ extremely simple HTTPS API; ‘Pushover has no monthly subscription fees and users will always be able to receive unlimited messages for free. Most applications can send messages for free, subject to monthly limits.’ Also supported by ifttt.com

    (tags: ios android iphone push messaging)

Links for 2013-01-09

  • Requests: HTTP for Humans

    ‘an elegant and simple HTTP library for Python, built for human beings.’ ‘Requests is an Apache2 Licensed HTTP library, written in Python, for human beings. Python’s standard urllib2 module provides most of the HTTP capabilities you need, but the API is thoroughly broken. It was built for a different time — and a different web. It requires an enormous amount of work (even method overrides) to perform the simplest of tasks. Requests takes all of the work out of Python HTTP/1.1 — making your integration with web services seamless. There’s no need to manually add query strings to your URLs, or to form-encode your POST data. Keep-alive and HTTP connection pooling are 100% automatic, powered by urllib3, which is embedded within Requests.’

    (tags: python http urllib libraries requests via:mikeste)

  • Surprisingly Good Evidence That Real Name Policies Fail To Improve Comments

    ‘Enough theorizing, there’s actually good evidence to inform the debate. For 4 years, Koreans enacted increasingly stiff real-name commenting laws, first for political websites in 2003, then for all websites receiving more than 300,000 viewers in 2007, and was finally tightened to 100,000 viewers a year later after online slander was cited in the suicide of a national figure. The policy, however, was ditched shortly after a Korean Communications Commission study found that it only decreased malicious comments by 0.9%. Korean sites were also inundated by hackers, presumably after valuable identities. Further analysis by Carnegie Mellon’s Daegon Cho and Alessandro Acquisti, found that the policy actually increased the frequency of expletives in comments for some user demographics. While the policy reduced swearing and “anti-normative” behavior at the aggregate level by as much as 30%, individual users were not dismayed. “Light users”, who posted 1 or 2 comments, were most affected by the law, but “heavy” ones (11-16+ comments) didn’t seem to mind. Given that the Commission estimates that only 13% of comments are malicious, a mere 30% reduction only seems to clean up the muddied waters of comment systems a depressingly negligent amount. The finding isn’t surprising: social science researchers have long known that participants eventually begin to ignore cameras video taping their behavior. In other words, the presence of some phantom judgmental audience doesn’t seem to make us better versions of ourselves.’ (via Ronan Lyons)

    (tags: anonymity identity policy comments privacy politics new-media via:ronanlyons)

Links for 2013-01-08

  • HAT-trie: A Cache-conscious Trie-based Data Structure for Strings [PDF]

    ‘Tries are the fastest tree-based data structures for managing strings in-memory, but are space-intensive. The burst-trie is almost as fast but reduces space by collapsing trie-chains into buckets. This is not however, a cache-conscious approach and can lead to poor performance on current processors. In this paper, we introduce the HAT-trie, a cache-conscious trie-based data structure that is formed by carefully combining existing components. We evaluate performance using several real-world datasets and against other highperformance data structures. We show strong improvements in both time and space; in most cases approaching that of the cache-conscious hash table. Our HAT-trie is shown to be the most e?cient trie-based data structure for managing variable-length strings in-memory while maintaining sort order.’ (via Tony Finch)

    (tags: via:fanf data-structures tries cache-aware trees)

  • The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases [PDF]

    ‘Main memory capacities have grown up to a point where most databases ?t into RAM. For main-memory database systems, index structure performance is a critical bottleneck. Traditional in-memory data structures like balanced binary search trees are not ef?cient on modern hardware, because they do not optimally utilize on-CPU caches. Hash tables, also often used for main-memory indexes, are fast but only support point queries. To overcome these shortcomings, we present ART, an adaptive radix tree (trie) for ef?cient indexing in main memory. Its lookup performance surpasses highly tuned, read-only search trees, while supporting very ef?cient insertions and deletions as well. At the same time, ART is very space ef?cient and solves the problem of excessive worst-case space consumption, which plagues most radix trees, by adaptively choosing compact and ef?cient data structures for internal nodes. Even though ART’s performance is comparable to hash tables, it maintains the data in sorted order, which enables additional operations like range scan and pre?x lookup.’ (via Tony Finch)

    (tags: via:fanf data-structures trees indexing cache-aware tries)

  • Ef?cient In-Memory Indexing with Generalized Pre?x Trees [PDF]

    ‘Ef?cient data structures for in-memory indexing gain in importance due to (1) the exponentially increasing amount of data, (2) the growing main-memory capacity, and (3) the gap between main-memory and CPU speed. In consequence, there are high performance demands for in-memory data structures. Such index structures are used—with minor changes—as primary or secondary indices in almost every DBMS. Typically, tree-based or hash-based structures are used, while structures based on prefix-trees (tries) are neglected in this context. For tree-based and hash-based structures, the major disadvantages are inherently caused by the need for reorganization and key comparisons. In contrast, the major disadvantage of trie-based structures in terms of high memory consumption (created and accessed nodes) could be improved. In this paper, we argue for reconsidering pre?x trees as in-memory index structures and we present the generalized trie, which is a pre?x tree with variable prefix length for indexing arbitrary data types of fixed or variable length. The variable prefix length enables the adjustment of the trie height and its memory consumption. Further, we introduce concepts for reducing the number of created and accessed trie levels. This trie is order-preserving and has deterministic trie paths for keys, and hence, it does not require any dynamic reorganization or key comparisons. Finally, the generalized trie yields improvements compared to existing in-memory index structures, especially for skewed data. In conclusion, the generalized trie is applicable as general-purpose in-memory index structure in many different OLTP or hybrid (OLTP and OLAP) data management systems that require balanced read/write performance.’ (via Tony Finch)

    (tags: via:fanf prefix-trees tries data-structures)

  • A Non-Blocking HashTable by Dr. Cliff Click : programming

    Proggit discovers the NonBlockingHashMap. This comment from Boundary’s cscotta is particularly interesting: “The code is intricate and curiously-formatted, but NBHM is quite excellent. The majority of our analytics platform is backed by NBHMs updated rapidly in parallel. Cliff’s a great, friendly, approachable guy; if you have any specific questions about the approaches or implementation, he may be happy to answer.”

    (tags: data-structures algorithms non-blocking concurrency threading multicore cliff-click azul maps java boundary)