Skip to content

Justin's Linklog Posts

Links for 2015-11-17

Links for 2015-11-16

Links for 2015-11-13

  • The impact of Docker containers on the performance of genomic pipelines [PeerJ]

    In this paper, we have assessed the impact of Docker containers technology on the performance of genomic pipelines, showing that container “virtualization” has a negligible overhead on pipeline performance when it is composed of medium/long running tasks, which is the most common scenario in computational genomic pipelines. Interestingly for these tasks the observed standard deviation is smaller when running with Docker. This suggests that the execution with containers is more “homogeneous,” presumably due to the isolation provided by the container environment. The performance degradation is more significant for pipelines where most of the tasks have a fine or very fine granularity (a few seconds or milliseconds). In this case, the container instantiation time, though small, cannot be ignored and produces a perceptible loss of performance.

    (tags: performance docker ops genomics papers)

Links for 2015-11-12

Links for 2015-11-11

  • Dynalite

    Awesome new mock DynamoDB implementation:

    An implementation of Amazon’s DynamoDB, focussed on correctness and performance, and built on LevelDB (well, @rvagg’s awesome LevelUP to be precise). This project aims to match the live DynamoDB instances as closely as possible (and is tested against them in various regions), including all limits and error messages. Why not Amazon’s DynamoDB Local? Because it’s too buggy! And it differs too much from the live instances in a number of key areas.
    We use DynamoDBLocal in our tests — the availability of that tool is one of the key reasons we have adopted Dynamo so heavily, since we can safely test our code properly with it. This looks even better.

    (tags: dynamodb testing unit-tests integration-testing tests ops dynalite aws leveldb)

  • Alarm design: From nuclear power to WebOps

    Imagine you are an operator in a nuclear power control room. An accident has started to unfold. During the first few minutes, more than 100 alarms go off, and there is no system for suppressing the unimportant signals so that you can concentrate on the significant alarms. Information is not presented clearly; for example, although the pressure and temperature within the reactor coolant system are shown, there is no direct indication that the combination of pressure and temperature mean that the cooling water is turning into steam. There are over 50 alarms lit in the control room, and the computer printer registering alarms is running more than 2 hours behind the events. This was the basic scenario facing the control room operators during the Three Mile Island (TMI) partial nuclear meltdown in 1979. The Report of the President’s Commission stated that, “Overall, little attention had been paid to the interaction between human beings and machines under the rapidly changing and confusing circumstances of an accident” (p. 11). The TMI control room operator on the day, Craig Faust, recalled for the Commission his reaction to the incessant alarms: “I would have liked to have thrown away the alarm panel. It wasn’t giving us any useful information”. It was the first major illustration of the alarm problem, and the accident triggered a flurry of human factors/ergonomics (HF/E) activity.
    A familiar topic for this ex-member of the Amazon network monitoring team…

    (tags: ergonomics human-factors ui ux alarms alerts alerting three-mile-island nuclear-power safety outages ops)

  • An Analysis of Reshipping Mule Scams

    We observed that the vast majority of the re-shipped packages end up in the Moscow, Russia area, and that the goods purchased with stolen credit cards span multiple categories, from expensive electronics such as Apple products, to designer clothes, to DSLR cameras and even weapon accessories. Given the amount of goods shipped by the reshipping mule sites that we analysed, the annual revenue generated from such operations can span between 1.8 and 7.3 million US dollars. The overall losses are much higher though: the online merchant loses an expensive item from its inventory and typically has to refund the owner of the stolen credit card. In addition, the rogue goods typically travel labeled as “second hand goods” and therefore custom taxes are also evaded. Once the items purchased with stolen credit cards reach their destination they will be sold on the black market by cybercriminals. […] When applying for the job, people are usually required to send the operator copies of their ID cards and passport. After they are hired, mules are promised to be paid at the end of their first month of employment. However, from our data it is clear that mules are usually never paid. After their first month expires, they are never contacted back by the operator, who just moves on and hires new mules. In other words, the mules become victims of this scam themselves, by never seeing a penny. Moreover, because they sent copies of their documents to the criminals, mules can potentially become victims of identity theft.

    (tags: crime law cybercrime mules shipping-scams identity-theft russia moscow scams papers)

Links for 2015-11-10

  • No Harm, No Fowl: Chicken Farm Inappropriate Choice for Data Disposal

    That’s a lesson that Spruce Manor Special Care Home in Saskatchewan had to learn the hard way (as surprising as that might sound). As a trustee with custody of personal health information, Spruce Manor was required under section 17(2) of the Saskatchewan Health Information Protection Act to dispose of its patient records in a way that protected patient privacy. So, when Spruce Manor chose a chicken farm for the job, it found itself the subject of an investigation by the Saskatchewan Information and Privacy Commissioner.  In what is probably one of the least surprising findings ever, the commissioner wrote in his final report that “I recommend that Spruce Manor […] no longer use [a] chicken farm to destroy records”, and then for good measure added “I find using a chicken farm to destroy records unacceptable.”

    (tags: data law privacy funny chickens farming via:pinboard data-protection health medical-records)

Links for 2015-11-09

  • Caffeine cache adopts Window TinyLfu eviction policy

    ‘Caffeine is a Java 8 rewrite of Guava’s cache. In this version we focused on improving the hit rate by evaluating alternatives to the classic least-recenty-used (LRU) eviction policy. In collaboration with researchers at Israel’s Technion, we developed a new algorithm that matches or exceeds the hit rate of the best alternatives (ARC, LIRS). A paper of our work is being prepared for publication.’ Specifically:

    W-TinyLfu uses a small admission LRU that evicts to a large Segmented LRU if accepted by the TinyLfu admission policy. TinyLfu relies on a frequency sketch to probabilistically estimate the historic usage of an entry. The window allows the policy to have a high hit rate when entries exhibit a high temporal / low frequency access pattern which would otherwise be rejected. The configuration enables the cache to estimate the frequency and recency of an entry with low overhead. This implementation uses a 4-bit CountMinSketch, growing at 8 bytes per cache entry to be accurate. Unlike ARC and LIRS, this policy does not retain non-resident keys.

    (tags: tinylfu caches caching cache-eviction java8 guava caffeine lru count-min sketching algorithms)

  • What Do WebLogic, WebSphere, JBoss, Jenkins, OpenNMS, and Your Application Have in Common? This Vulnerability.

    The ever-shitty Java serialization creates a security hole

    (tags: java serialization security exploits jenkins)

  • Gallery – Steffen Dam

    Danish glassware artist making wonderful Wunderkammers — cabinets of curiosities — entirely from glass. Seeing as one of his works sold for UKP50,000 last year, I suspect these are a bit out of my league, sadly

    (tags: art glassware steffen-dam wunderkammers museums)

  • London garden bridge users to have mobile phone signals tracked

    If it goes ahead, people’s progress across the structure would be tracked by monitors detecting the Wi-Fi signals from their phones, which show up the device’s Mac address, or unique identifying code. The Garden Bridge Trust says it will not store any of this data and is only tracking phones to count numbers and prevent overcrowding.

    (tags: london surveillance mobile-phones mac-trackers tracking)

  • Red lines and no-go zones – the coming surveillance debate

    The Anderson Report to the House of Lords in the UK on RIPA introduces a concept of a “red line”:

    “Firm limits must also be written into the law: not merely safeguards, but red lines that may not be crossed.” …    “Some might find comfort in a world in which our every interaction and movement could be recorded, viewed in real time and indefinitely retained for possible future use by the authorities. Crime fighting, security, safety or public health justifications are never hard to find.” [13.19]  The Report then gives examples, such as a perpetual video feed from every room in every house, the police undertaking to view the record only on receipt of a complaint; blanket drone-based surveillance; licensed service providers, required as a condition of the licence to retain within the jurisdiction a complete plain-text version of every communication to be made available to the authorities on request; a constant data feed from vehicles, domestic appliances and health-monitoring personal devices; fitting of facial recognition software to every CCTV camera and the insertion of a location-tracking chip under every individual’s skin. It goes on: “The impact of such powers on the innocent could be mitigated by the usual apparatus of safeguards, regulators and Codes of Practice. But a country constructed on such a basis would surely be intolerable to many of its inhabitants. A state that enjoyed all those powers would be truly totalitarian, even if the authorities had the best interests of its people at heart.” [13.20] …   “The crucial objection is that of principle. Such a society would have gone beyond Bentham’s Panopticon (whose inmates did not know they were being watched) into a world where constant surveillance was a certainty and quiescence the inevitable result. There must surely come a point (though it comes at different places for different people) where the escalation of intrusive powers becomes too high a price to pay for a safer and more law abiding environment.” [13.21]

    (tags: panopticon jeremy-bentham law uk dripa ripa surveillance spying police drones facial-recognition future tracking cctv crime)

  • Dublin is a medium-density city

    Comparable to Copenhagen or Amsterdam, albeit without sufficient cycling/public-transport infrastructural investment

    (tags: infrastructure density housing dublin ireland cities travel commuting cycling)

Links for 2015-11-07

  • Ignoring ESR won’t do anymore

    I’m tired of this shit. Full stop tired. It’s 2015 and these turds who grope their way around conferences and the like can make allegations like this, get a hand wave and an, “Oh, that’s just crazy Raymond!” Fuck that. Fuck it from here to hell and back. Here’s a man who really hasn’t done anything all that special, is a totally crazy gun-toting misogynist of the highest order and, yet, he remains mostly unchallenged after the tempest dies down, time after time. […] I’m sure ESR will still be haunting conferences when your daughters reach their professional years unless you get serious about outing the assholes like him and making the community a lot less toxic than it is now.?
    Amen to that.

    (tags: esr toxic harassment conferences sexism misogyny culture)

Links for 2015-11-05

Links for 2015-11-04

  • PICO-8:

    PICO-8 is a fantasy console for making, sharing and playing tiny games and other computer programs. When you turn it on, the machine greets you with a shell for typing in Lua commands and provides simple built-in tools for creating your own cartridges.
    So cute! See also Voxatron, something similar for voxel-oriented 3D gaming

    (tags: consoles games gaming lua coding retro 2d pico-8)

  • Why Static Website Generators Are The Next Big Thing

    Now _this_ makes me feel old. Alternative title: “why static website generators have been a good idea since WebMake, 15 years ago”. WebMake does pretty well on the checklist of “key features of the modern static website generator”, which are: 1. Templating (check); 2. Markdown support (well, EtText, which predated Markdown by several years); 3. Metadata (check); and 4. Javascript asset pipeline (didn’t support this one, since complex front-end DHTML JS wasn’t really a thing at the turn of the century. But I would have if it had ;). So I guess I was on the right track!

    (tags: web html history webmake static-sites bake-dont-fry site-generators cms)

  • Food Trucks Are Great Incubators. Why Don’t We Have More?

    So is that kind of thriving food-truck scene something the city should work to encourage? Theresa Hernandez, one of the owners of K Chido Mexico, thinks so. “There’s a whole market there for a new culture,” she says. “There’s no doubt about it, the appetite is there. It’s just a matter for somebody who is innovative enough in Dublin City Council to say: ‘Right, let’s do this.’”
    Amen to that.

    (tags: k-chido food-trucks dublin food ireland dcc)

  • wangle/Codel.h at master · facebook/wangle

    Facebook’s open-source implementation of the CoDel queue management algorithm applied to server request-handling capacity in their C++ service bootstrap library, Wangle.

    (tags: wangle facebook codel services capacity reliability queueing)

Links for 2015-11-02

  • Structural and semantic deficiencies in the systemd architecture for real-world service management, a technical treatise

    Despite its overarching abstractions, it is semantically non-uniform and its complicated transaction and job scheduling heuristics ordered around a dependently networked object system create pathological failure cases with little debugging context that would otherwise not necessarily occur on systems with less layers of indirection. The use of bus APIs complicate communication with the service manager and lead to duplication of the object model for little gain. Further, the unit file options often carry implicit state or are not sufficiently expressive. There is an imbalance with regards to features of an eager service manager and that of a lazy loading service manager, having rusty edge cases of both with non-generic, manager-specific facilities. The approach to logging and the circularly dependent architecture seem to imply that lots of prior art has been ignored or understudied.

    (tags: analysis systemd linux unix ops init critiques software logging)

  • How Facebook avoids failures

    Great paper from Ben Maurer of Facebook in ACM Queue.

    A “move-fast” mentality does not have to be at odds with reliability. To make these philosophies compatible, Facebook’s infrastructure provides safety valves.
    This is full of interesting techniques. * Rapidly deployed configuration changes: Make everybody use a common configuration system; Statically validate configuration changes; Run a canary; Hold on to good configurations; Make it easy to revert. * Hard dependencies on core services: Cache data from core services. Provide hardened APIs. Run fire drills. * Increased latency and resource exhaustion: Controlled Delay (based on the anti-bufferbloat CoDel algorithm — this is really cool); Adaptive LIFO (last-in, first-out) for queue busting; Concurrency Control (essentially a form of circuit breaker). * Tools that Help Diagnose Failures: High-Density Dashboards with Cubism (horizon charts); What just changed? * Learning from Failure: the DERP (!) methodology,

    (tags: ben-maurer facebook reliability algorithms codel circuit-breakers derp failure ops cubism horizon-charts charts dependencies soa microservices uptime deployment configuration change-management)

Links for 2015-11-01

Links for 2015-10-30

Links for 2015-10-29

  • Google tears Symantec a new one on its CA failure

    Symantec are getting a crash course in how to conduct an incident post-mortem to boot:

    More immediately, we are requesting of Symantec that they further update their public incident report with: A post-mortem analysis that details why they did not detect the additional certificates that we found. Details of each of the failures to uphold the relevant Baseline Requirements and EV Guidelines and what they believe the individual root cause was for each failure. We are also requesting that Symantec provide us with a detailed set of steps they will take to correct and prevent each of the identified failures, as well as a timeline for when they expect to complete such work. Symantec may consider this latter information to be confidential and so we are not requesting that this be made public.

    (tags: google symantec ev ssl certificates ca security postmortems ops)

  • Google is Maven Central’s New Best Friend

    google now mirroring Maven Central.

    (tags: google maven maven-central jars hosting java packages build)

  • Apache Kafka, Purgatory, and Hierarchical Timing Wheels

    In the new design, we use Hierarchical Timing Wheels for the timeout timer and DelayQueue of timer buckets to advance the clock on demand. Completed requests are removed from the timer queue immediately with O(1) cost. The buckets remain in the delay queue, however, the number of buckets is bounded. And, in a healthy system, most of the requests are satisfied before timeout, and many of the buckets become empty before pulled out of the delay queue. Thus, the timer should rarely have the buckets of the lower interval. The advantage of this design is that the number of requests in the timer queue is the number of pending requests exactly at any time. This allows us to estimate the number of requests need to be purged. We can avoid unnecessary purge operation of the watcher lists. As the result we achieve a higher scalability in terms of request rate with much better CPU usage.

    (tags: algorithms timers kafka scheduling timing-wheels delayqueue queueing)

Links for 2015-10-28

Links for 2015-10-27

Links for 2015-10-23

Links for 2015-10-22

Links for 2015-10-21

  • How a criminal ring defeated the secure chip-and-PIN credit cards | Ars Technica

    Ingenious —

    The stolen cards were still considered evidence, so the researchers couldn’t do a full tear-down or run any tests that would alter the data on the card, so they used X-ray scans to look at where the chip cards had been tampered with. They also analyzed the way the chips distributed electricity when in use and used read-only programs to see what information the cards sent to a Point of Sale (POS) terminal. According to the paper, the fraudsters were able to perform a man-in-the-middle attack by programming a second hobbyist chip called a FUN card to accept any PIN entry, and soldering that chip onto the card’s original chip. This increased the thickness of the chip from 0.4mm to 0.7mm, “making insertion into a PoS somewhat uneasy but perfectly feasible,” the researchers write. [….] The researchers explain that a typical EMV transaction involves three steps: card authentication, cardholder verification, and then transaction authorization. During a transaction using one of the altered cards, the original chip was allowed to respond with the card authentication as normal. Then, during card holder authentication, the POS system would ask for a user’s PIN, the thief would respond with any PIN, and the FUN card would step in and send the POS the code indicating that it was ok to proceed with the transaction because the PIN checked out. During the final transaction authentication phase, the FUN card would relay the transaction data between the POS and the original chip, sending the issuing bank an authorization request cryptogram which the card issuer uses to tell the POS system whether to accept the transaction or not.

    (tags: security chip-and-pin hacking pos emv transactions credit-cards debit-cards hardware chips pin fun-cards smartcards)

  • How-to: Index Scanned PDFs at Scale Using Fewer Than 50 Lines of Code

    using Spark, Tesseract, HBase, Solr and Leptonica. Actually pretty feasible

    (tags: spark tesseract hbase solr leptonica pdfs scanning cloudera hadoop architecture)

  • Existential Consistency: Measuring and Understanding Consistency at Facebook

    The metric is termed ?(P)-consistency, and is actually very simple. A read for the same data is sent to all replicas in P, and ?(P)-consistency is defined as the frequency with which that read returns the same result from all replicas. ?(G)-consistency applies this metric globally, and ?(R)-consistency applies it within a region (cluster). Facebook have been tracking this metric in production since 2012.

    (tags: facebook eventual-consistency consistency metrics papers cap distributed-computing)

  • Holistic Configuration Management at Facebook

    How FB push config changes from Git (where it is code reviewed, version controlled, and history tracked with strong auth) to Zeus (their Zookeeper fork) and from there to live production servers.

    (tags: facebook configuration zookeeper git ops architecture)

  • Hyperscan

    a high-performance multiple regex matching library. Hyperscan uses hybrid automata techniques to allow simultaneous matching of large numbers (up to tens of thousands) of regular expressions and for the matching of regular expressions across streams of data.
    Via Tony Finch

    (tags: via:fanf regexps regex dpi hyperscan dfa nfa hybrid-automata text-matching matching text strings streams)

Links for 2015-10-20

  • Hologram

    Hologram exposes an imitation of the EC2 instance metadata service on developer workstations that supports the [IAM Roles] temporary credentials workflow. It is accessible via the same HTTP endpoint to calling SDKs, so your code can use the same process in both development and production. The keys that Hologram provisions are temporary, so EC2 access can be centrally controlled without direct administrative access to developer workstations.

    (tags: iam roles ec2 authorization aws adroll open-source cli osx coding dev)

Links for 2015-10-18

Links for 2015-10-16

  • Your Relative’s DNA Could Turn You Into A Suspect

    Familial DNA searching has massive false positives, but is being used to tag suspects:

    The bewildered Usry soon learned that he was a suspect in the 1996 murder of an Idaho Falls teenager named Angie Dodge. Though a man had been convicted of that crime after giving an iffy confession, his DNA didn’t match what was found at the crime scene. Detectives had focused on Usry after running a familial DNA search, a technique that allows investigators to identify suspects who don’t have DNA in a law enforcement database but whose close relatives have had their genetic profiles cataloged. In Usry’s case the crime scene DNA bore numerous similarities to that of Usry’s father, who years earlier had donated a DNA sample to a genealogy project through his Mormon church in Mississippi. That project’s database was later purchased by Ancestry, which made it publicly searchable—a decision that didn’t take into account the possibility that cops might someday use it to hunt for genetic leads. Usry, whose story was first reported in The New Orleans Advocate, was finally cleared after a nerve-racking 33-day wait — the DNA extracted from his cheek cells didn’t match that of Dodge’s killer, whom detectives still seek. But the fact that he fell under suspicion in the first place is the latest sign that it’s time to set ground rules for familial DNA searching, before misuse of the imperfect technology starts ruining lives.

    (tags: dna familial-dna false-positives law crime idaho murder mormon genealogy ancestry.com databases biometrics privacy genes)

Links for 2015-10-15

  • Cluster benchmark: Scylla vs Cassandra

    ScyllaDB (the C* clone in C++) is now actually looking promising — still need more reassurance about its consistency/reliabilty side though

    (tags: scylla databases storage cassandra nosql)

  • _What We Know About Spreadsheet Errors_ [paper]

    As we will see below, there has long been ample evidence that errors in spreadsheets are pandemic. Spreadsheets, even after careful development, contain errors in one percent or more of all formula cells. In large spreadsheets with thousands of formulas, there will be dozens of undetected errors. Even significant errors may go undetected because formal testing in spreadsheet development is rare and because even serious errors may not be apparent.

    (tags: business coding maths excel spreadsheets errors formulas error-rate)

  • Defending Your Time

    great post from Ross Duggan on avoiding developer burnout

    (tags: coding burnout productivity work)

  • How is NSA breaking so much crypto?

    If a client and server are speaking Diffie-Hellman, they first need to agree on a large prime number with a particular form. There seemed to be no reason why everyone couldn’t just use the same prime, and, in fact, many applications tend to use standardized or hard-coded primes. But there was a very important detail that got lost in translation between the mathematicians and the practitioners: an adversary can perform a single enormous computation to “crack” a particular prime, then easily break any individual connection that uses that prime. How enormous a computation, you ask? Possibly a technical feat on a scale (relative to the state of computing at the time) not seen since the Enigma cryptanalysis during World War II. Even estimating the difficulty is tricky, due to the complexity of the algorithm involved, but our paper gives some conservative estimates. For the most common strength of Diffie-Hellman (1024 bits), it would cost a few hundred million dollars to build a machine, based on special purpose hardware, that would be able to crack one Diffie-Hellman prime every year. Would this be worth it for an intelligence agency? Since a handful of primes are so widely reused, the payoff, in terms of connections they could decrypt, would be enormous. Breaking a single, common 1024-bit prime would allow NSA to passively decrypt connections to two-thirds of VPNs and a quarter of all SSH servers globally. Breaking a second 1024-bit prime would allow passive eavesdropping on connections to nearly 20% of the top million HTTPS websites. In other words, a one-time investment in massive computation would make it possible to eavesdrop on trillions of encrypted connections.
    (via Eric)

    (tags: via:eric encryption privacy security nsa crypto)

Links for 2015-10-14

Links for 2015-10-13

  • Chromecast Speakers

    Supports Spotify — totally getting one of these

    (tags: spotify speakers music home google gadgets toget)

  • Where do ‘mama’/’papa’ words come from?

    The sounds came first — as experiments in vocalization — and parents adopted them as pet names for themselves. If you open your mouth and make a sound, it will probably be an open vowel like /a/ unless you move your tongue or lips. The easiest consonants are perhaps the bilabials /m/, /p/, and /b/, requiring no movement of the tongue, followed by consonants made by raising the front of the tongue: /d/, /t/, and /n/. Add a dash of reduplication, and you get mama, papa, baba, dada, tata, nana. That such words refer to people (typically parents or other guardians) is something we have imposed on the sounds and incorporated into our languages and cultures; the meanings don’t inhere in the sounds as uttered by babies, which are more likely calls for food or attention.

    (tags: sounds voice speech babies kids phonetics linguist language)

  • remind101/conveyor

    ‘A fast build system for Docker images’, open source, in Go, hooks into Github

    (tags: build ci docker github go)

  • England opens up 11TB of LiDAR data covering the entire country as open data

    All 11 terabytes of our LIDAR data (that’s roughly equivalent to 2,750,000 MP3 songs) will eventually be available through our new Open LIDAR portal under an Open Government Licence, allowing it to be used for any purpose. We hope that by giving free access to our data businesses and local communities will develop innovative solutions to benefit the environment, grow our thriving rural economy, and boost our world-leading food and farming industry. The possibilities are endless and we hope that making LIDAR data open will be a catalyst for new ideas and innovation.
    Are you reading, Ordnance Survey Ireland?

    (tags: data maps uk lidar mapping geodata open-data ogl)

Links for 2015-10-12

  • SuperChief: From Apache Storm to In-House Distributed Stream Processing

    Another sorry tale of Storm issues:

    Storm has been successful at Librato, but we experienced many of the limitations cited in the Twitter Heron: Stream Processing at Scale paper and outlined here by Adrian Colyer, including: Inability to isolate, reason about, or debug performance issues due to the worker/executor/task paradigm. This led to building and configuring clusters specifically designed to attempt to mitigate these problems (i.e., separate clusters per topology, only running a worker per server.), which added additional complexity to development and operations and also led to over-provisioning. Ability of tasks to move around led to difficult to trace performance problems. Storm’s work provisioning logic led to some tasks serving more Kafka partitions than others. This in turn created latency and performance issues that were difficult to reason about. The initial solution was to over-provision in an attempt to get a better hashing/balancing of work, but eventually we just replaced the work allocation logic. Due to Storm’s architecture, it was very difficult to get a stack trace or heap dump because the processes that managed workers (Storm supervisor) would often forcefully kill a Java process while it was being investigated in this way. The propensity for unexpected and subsequently unhandled exceptions to take down an entire worker led to additional defensive verbose error handling everywhere. This nasty bug STORM-404 coupled with the aforementioned fact that a single exception can take down a worker led to several cascading failures in production, taking down entire topologies until we upgraded to 0.9.4. Additionally, we found the performance we were getting from Storm for the amount of money we were spending on infrastructure was not in line with our expectations. Much of this is due to the fact that, depending upon how your topology is designed, a single tuple may make multiple hops across JVMs, and this is very expensive. For example, in our time series aggregation topologies a single tuple may be serialized/deserialized and shipped across the wire 3-4 times as it progresses through the processing pipeline.

    (tags: scalability storm kafka librato architecture heron ops)

  • librato/disco-java

    Librato’s service discovery library using Zookeeper (so strongly consistent, but with the ZK downside that an AZ outage can stall service discovery updates region-wide)

    (tags: zookeeper service-discovery librato java open-source load-balancing)

  • Tech companies like Facebook not above the law, says Max Schrems

    “Big companies didn’t only rely on safe harbour: they also rely on binding corporate rules and standard contractual clauses. But it’s interesting that the court decided the case on fundamental rights grounds: so it doesn’t matter remotely what ground you transfer on, if that process is still illegal under 7 and 8 of charter, it can’t be done.”
    Also:
    “Ireland has no interest in doing its job, and will continue not to, forever. Clearly it’s an investment issue – but overall the policy is: we don’t regulate companies here. The cost of challenging any of this in the courts is prohibitive. And the people don’t seem to care.”
    :(

    (tags: ireland guardian max-schrems privacy surveillance safe-harbor eu us nsa dpc data-protection)

  • After Bara: All your (Data)base are belong to us

    Sounds like the CJEU’s Bara decision may cause problems for the Irish government’s wilful data-sharing:

    Articles 10, 11 and 13 of Directive 95/46/EC of the European Parliament and of the Council of 24 October 1995, on the protection of individuals with regard to the processing of personal data and on the free movement of such data, must be interpreted as precluding national measures, such as those at issue in the main proceedings, which allow a public administrative body of a Member State to transfer personal data to another public administrative body and their subsequent processing, without the data subjects having been informed of that transfer or processing.

    (tags: data databases bara cjeu eu law privacy data-protection)

Links for 2015-10-10

  • Outage postmortem (2015-10-08 UTC) : Stripe: Help & Support

    There was a breakdown in communication between the developer who requested the index migration and the database operator who deleted the old index. Instead of working on the migration together, they communicated in an implicit way through flawed tooling. The dashboard that surfaced the migration request was missing important context: the reason for the requested deletion, the dependency on another index’s creation, and the criticality of the index for API traffic. Indeed, the database operator didn’t have a way to check whether the index had recently been used for a query.
    Good demo of how the Etsy-style chatops deployment approach would have helped avoid this risk.

    (tags: stripe postmortem outages databases indexes deployment chatops deploy ops)

  • net.wars: Unsafe harbor

    Wendy Grossman on where the Safe Harbor decision is leading.

    One clause would require European companies to tell their relevant data protection authorities if they are being compelled to turn over data – even if they have been forbidden to disclose this under US law. Sounds nice, but doesn’t mobilize the rock or soften the hard place, since companies will still have to pick a law to violate. I imagine the internal discussions there revolving around two questions: which violation is less likely to land the CEO in jail and which set of fines can we afford?
    (via Simon McGarr)

    (tags: safe-harbor privacy law us eu surveillance wendy-grossman via:tupp_ed)

  • CHICKEN COOP & RUN

    bookmarking as a potential future addition to the back garden

    (tags: chickens pets food garden ebay)

Links for 2015-10-09

Links for 2015-10-08

  • Fuzzing Raft for Fun and Publication

    Good intro to fuzz-testing a distributed system; I’ve had great results using similar approaches in unit tests

    (tags: fuzzing fuzz-testing testing raft akka tests)

  • EC2 Spot Blocks for Defined-Duration Workloads

    you can now launch Spot instances that will run continuously for a finite duration (1 to 6 hours). Pricing is based on the requested duration and the available capacity, and is typically 30% to 45% less than On-Demand.

    (tags: ec2 aws spot-instances spot pricing time)

  • The Surveillance Elephant in the Room…

    Very perceptive post on the next steps for safe harbor, post-Schrems.

    And behind that elephant there are other elephants: if US surveillance and surveillance law is a problem, then what about UK surveillance? Is GCHQ any less intrusive than the NSA? It does not seem so – and this puts even more pressure on the current reviews of UK surveillance law taking place. If, as many predict, the forthcoming Investigatory Powers Bill will be even more intrusive and extensive than current UK surveillance laws this will put the UK in a position that could rapidly become untenable. If the UK decides to leave the EU, will that mean that the UK is not considered a safe place for European data? Right now that seems the only logical conclusion – but the ramifications for UK businesses could be huge. [….] What happens next, therefore, is hard to foresee. What cannot be done, however, is to ignore the elephant in the room. The issue of surveillance has to be taken on. The conflict between that surveillance and fundamental human rights is not a merely semantic one, or one for lawyers and academics, it’s a real one. In the words of historian and philosopher Quentin Skinner “the current situation seems to me untenable in a democratic society.” The conflict over Safe Harbor is in many ways just a symptom of that far bigger problem. The biggest elephant of all.

    (tags: ec cjeu surveillance safe-harbor schrems privacy europe us uk gchq nsa)

  • ECJ ruling on Irish privacy case has huge significance

    The only current way to comply with EU law, the judgment indicates, is to keep EU data within the EU. Whether those data can be safely managed within facilities run by US companies will not be determined until the US rules on an ongoing Microsoft case. Microsoft stands in contempt of court right now for refusing to hand over to US authorities, emails held in its Irish data centre. This case will surely go to the Supreme Court and will be an extremely important determination for the cloud business, and any company or individual using data centre storage. If Microsoft loses, US multinationals will be left scrambling to somehow, legally firewall off their EU-based data centres from US government reach.
    (cough, Amazon)

    (tags: aws hosting eu privacy surveillance gchq nsa microsoft ireland)

Links for 2015-10-07

Links for 2015-10-06

  • Marvin.ie: Order Takeaway Food Online

    new Dublin delivery service takes Bitcoin?!

    (tags: bitcoin food delivery takeaway payment ireland dublin wtf)

  • qp tries: smaller and faster than crit-bit tries

    interesting new data structure from Tony Finch. “Some simple benchmarks say qp tries have about 1/3 less memory overhead and are about 10% faster than crit-bit tries.”

    (tags: crit-bit popcount bits bitmaps tries data-structures via:fanf qp-tries crit-bit-tries hacks memory)

  • Schneier on Automatic Face Recognition and Surveillance

    When we talk about surveillance, we tend to concentrate on the problems of data collection: CCTV cameras, tagged photos, purchasing habits, our writings on sites like Facebook and Twitter. We think much less about data analysis. But effective and pervasive surveillance is just as much about analysis. It’s sustained by a combination of cheap and ubiquitous cameras, tagged photo databases, commercial databases of our actions that reveal our habits and personalities, and ­– most of all ­– fast and accurate face recognition software. Don’t expect to have access to this technology for yourself anytime soon. This is not facial recognition for all. It’s just for those who can either demand or pay for access to the required technologies ­– most importantly, the tagged photo databases. And while we can easily imagine how this might be misused in a totalitarian country, there are dangers in free societies as well. Without meaningful regulation, we’re moving into a world where governments and corporations will be able to identify people both in real time and backwards in time, remotely and in secret, without consent or recourse. Despite protests from industry, we need to regulate this budding industry. We need limitations on how our images can be collected without our knowledge or consent, and on how they can be used. The technologies aren’t going away, and we can’t uninvent these capabilities. But we can ensure that they’re used ethically and responsibly, and not just as a mechanism to increase police and corporate power over us.

    (tags: privacy regulation surveillance bruce-schneier faces face-recognition machine-learning ai cctv photos)

Links for 2015-10-05

Links for 2015-10-02

Links for 2015-10-01

Links for 2015-09-30

Links for 2015-09-29

Links for 2015-09-28

Links for 2015-09-24

  • Byteman

    a tool which simplifies tracing and testing of Java programs. Byteman allows you to insert extra Java code into your application, either as it is loaded during JVM startup or even after it has already started running. The injected code is allowed to access any of your data and call any application methods, including where they are private. You can inject code almost anywhere you want and there is no need to prepare the original source code in advance nor do you have to recompile, repackage or redeploy your application. In fact you can remove injected code and reinstall different code while the application continues to execute. The simplest use of Byteman is to install code which traces what your application is doing. This can be used for monitoring or debugging live deployments as well as for instrumenting code under test so that you can be sure it has operated correctly. By injecting code at very specific locations you can avoid the overheads which often arise when you switch on debug or product trace. Also, you decide what to trace when you run your application rather than when you write it so you don’t need 100% hindsight to be able to obtain the information you need.

    (tags: tracing java byteman injection jvm ops debugging testing)

  • Henry Robinson on testing and fault discovery in distributed systems

    ‘Let’s talk about finding bugs in distributed systems for a bit. These chaos monkey-style fault testing systems are all well and good, but by being application independent they’re a very blunt instrument. Particularly they make it hard to search the fault space for bugs in a directed manner, because they don’t ‘know’ what the system is doing. Application-aware scripting of faults in a dist. systems seems to be rarely used, but allows you to directly stress problem areas. For example, if a bug manifests itself only when one RPC returns after some timeout, hard to narrow that down with iptables manipulation. But allow a script to hook into RPC invocations (and other trace points, like DTrace’s probes), and you can script very specific faults. That way you can simulate cross-system integration failures, *and* write reproducible tests for the bugs they expose! Anyhow, I’ve been doing this in Impala, and it’s been very helpful. Haven’t seen much evidence elsewhere.’

    (tags: henry-robinson testing fault-discovery rpc dtrace tracing distributed-systems timeouts chaos-monkey impala)

  • The Best Bourbon Cocktail You’ve Never Heard Of

    The “Paper Plane”, by Sam Ross of Chicago’s “Violet Hour”: .75 oz Bourbon .75 oz Aperol .75 oz Amaro Nonino .75 oz Fresh lemon juice ice-filled shaker, shake, strain.

    (tags: bourbon drinks cocktails recipes aperol amaro-nonino lemon)

  • Seastar

    C++ high-performance app framework; ‘currently focused on high-throughput, low-latency I/O intensive applications.’ Scylla (Cassandra-compatible NoSQL store) is written in this.

    (tags: c++ opensource performance framework scylla seastar latency linux shared-nothing multicore)