Skip to content

Justin's Linklog Posts

Links for 2015-01-25

Links for 2015-01-24

  • How to Catch a Terrorist – The New Yorker

    This is spot on —

    By flooding the system with false positives, big-data approaches to counterterrorism might actually make it harder to identify real terrorists before they act. Two years before the Boston Marathon bombing, Tamerlan Tsarnaev, the older of the two brothers alleged to have committed the attack, was assessed by the city’s Joint Terrorism Task Force. They determined that he was not a threat. This was one of about a thousand assessments that the Boston J.T.T.F. conducted that year, a number that had nearly doubled in the previous two years, according to the Boston F.B.I. As of 2013, the Justice Department has trained nearly three hundred thousand law-enforcement officers in how to file “suspicious-activity reports.” In 2010, a central database held about three thousand of these reports; by 2012 it had grown to almost twenty-eight thousand. “The bigger haystack makes it harder to find the needle,” Sensenbrenner told me. Thomas Drake, a former N.S.A. executive and whistle-blower who has become one of the agency’s most vocal critics, told me, “If you target everything, there’s no target.”

    (tags: terrorism false-positives filtering detection jttf nsa fbi surveillance gchq)

  • Politwoops

    ‘All deleted tweets from politicians’. Great idea

    (tags: delete twitter politics politicians ireland social-media news)

  • Zoë Keating on getting a shitty deal from Google’s new Music Key licensing

    The Youtube music service was introduced to me as a win win and they don’t understand why I don’t see it that way. “We are trying to create a new revenue stream on top of the platform that exists today.” A lot of people in the music industry talk about Google as evil. I don’t think they are evil. I think they, like other tech companies, are just idealistic in a way that works best for them. I think this because I used to be one of them. The people who work at Google, Facebook, etc can’t imagine how everything they make is not, like, totally awesome. If it’s not awesome for you it’s because you just don’t understand it yet and you’ll come around. They can’t imagine scenarios outside their reality and that is how they inadvertently unleash things like the algorithmic cruelty of Facebook’s yearly review (which showed me a picture I had posted after a doctor told me my husband had 6-8 weeks to live).

    (tags: google business music youtube zoe-keating music-key licensing tech)

  • Smash the Engine

    Jacobin Magazine on the revolutionary political allegory in “Snowpiercer”: ‘If Snowpiercer had merely told the tale of an oppressed working class rising up to seize power from an evil overlord, it would already have been an improvement over most of the political messages in mainstream cinema. There are all sorts of nice touches in its portrayal of a declining capitalism that can maintain its ideological legitimacy even when it literally has no more bullets in its guns. But the story Bong tells goes beyond that. It’s about the limitations of a revolution which merely takes over the existing social machinery rather than attempting to transcend it. ‘

    (tags: dystopia revolution snowpiercer movies marxism sf politics)

  • Debunking The Dangerous “If You Have Nothing To Hide, You Have Nothing To Fear”

    A great resource bookmark from Falkvinge.

    There are at least four good reasons to reject this argument solidly and uncompromisingly: The rules may change, it’s not you who determine if you’re guilty, laws must be broken for society to progress, and privacy is a basic human need.

    (tags: nsa politics privacy security surveillance gchq rick-falkvinge society)

Links for 2015-01-20

Links for 2015-01-19

  • carbon-c-relay

    A much better carbon-relay, written in C rather than Python. Linking as we’ve been using it in production for quite a while with no problems.

    The main reason to build a replacement is performance and configurability. Carbon is single threaded, and sending metrics to multiple consistent-hash clusters requires chaining of relays. This project provides a multithreaded relay which can address multiple targets and clusters for each and every metric based on pattern matches.

    (tags: graphite carbon c python ops metrics)

  • Surveillance of social media not way to fight terrorism – Minister

    Blanket surveillance of social media is not the solution to combating terrorism and the rights of the individual to privacy must be protected, Data Protection Minister Dara Murphy said on Monday. [He] said Ireland and the European Union must protect the privacy rights of individuals on social media. “Freedom of expression, freedom of movement, and the protection of privacy are core tenets of the European Union, which must be upheld.”

    (tags: dara-murphy data-protection privacy surveillance europe eu ireland social-media)

Links for 2015-01-18

  • Amazing comment from a random sysadmin who’s been targeted by the NSA

    ‘Here’s a story for you. I’m not a party to any of this. I’ve done nothing wrong, I’ve never been suspected of doing anything wrong, and I don’t know anyone who has done anything wrong. I don’t even mean that in the sense of “I pissed off the wrong people but technically haven’t been charged.” I mean that I am a vanilla, average, 9-5 working man of no interest to anybody. My geographical location is an accident of my birth. Even still, I wasn’t accidentally born in a high-conflict area, and my government is not at war. I’m a sysadmin at a legitimate ISP and my job is to keep the internet up and running smoothly. This agency has stalked me in my personal life, undermined my ability to trust my friends attempting to connect with me on LinkedIn, and infected my family’s computer. They did this because they wanted to bypass legal channels and spy on a customer who pays for services from my employer. Wait, no, they wanted the ability to potentially spy on future customers. Actually, that is still not accurate – they wanted to spy on everybody in case there was a potentially bad person interacting with a customer. After seeing their complete disregard for anybody else, their immense resources, and their extremely sophisticated exploits and backdoors – knowing they will stop at nothing, and knowing that I was personally targeted – I’ll be damned if I can ever trust any electronic device I own ever again. You all rationalize this by telling me that it “isn’t surprising”, and that I don’t live in the [USA,UK] and therefore I have no rights. I just have one question. Are you people even human?’

    (tags: nsa via:ioerror privacy spying surveillance linkedin sysadmins gchq security)

  • DRI’s Unchanged Position on Eircode

    ‘Broadly, they are satisfied with what we are doing’ versus: ‘We have deep concerns about the Eircode initiative… We want to state clearly that we are not at all ‘satisfied’ with the postcode that has been designed or the implementation proposals.’

    (tags: dri ireland eircode postcodes privacy data-protection quotes misrepresentation)

Links for 2015-01-17

  • Misogyny in the Valley

    The young women interns [in one story in this post] worked in a very different way. As I explored their notes, I noticed that ideas were expanded upon, not abandoned. Challenges were identified, but the male language so often heard in Silicon Valley conference rooms – “Well, let me tell you what the problem with that idea is….” – was not in the room.  These young women, without men to define the “appropriate business behavior,” used different behaviors and came up with a startling and valuable solution. They showed many of the values that exist outside of dominance-based leadership: strategic thinking, intuition, nurturing and relationship building, values-based decision-making and acceptance of other’s input. Women need space to be themselves at work. Until people who have created their success by worshipping at the temple of male behavior, like Sheryl Sandberg, learn to value alternate behaviors, the working world will remain a foreign and hostile culture to women. And if we do not continuously work to build corporate cultures where there is room for other behaviors, women will be cast from or abandoned in a world not of our making, where we continuously “just do not fit in,” but where we still must go to earn our livings.

    (tags: sexism misogyny silicon-valley tech work sheryl-sandberg business collaboration)

  • Are you better off running your big-data batch system off your laptop?

    Heh, nice trolling.

    Here are two helpful guidelines (for largely disjoint populations): If you are going to use a big data system for yourself, see if it is faster than your laptop. If you are going to build a big data system for others, see that it is faster than my laptop. […] We think everyone should have to do this, because it leads to better systems and better research.

    (tags: graph coding hadoop spark giraph graph-processing hardware scalability big-data batch algorithms pagerank)

  • BBC uses RIPA terrorism laws to catch TV licence fee dodgers in Northern Ireland

    Give them the power, they’ll use that power. ‘A document obtained under Freedom of Information legislation confirms the BBC’s use of RIPA in Northern Ireland. It states: “The BBC may, in certain circumstances, authorise under the Regulation of Investigatory Powers Act 2000 and Regulation of Investigatory Powers (British Broadcasting Corporation) Order 2001 the lawful use of detection equipment to detect unlicensed use of television receivers… the BBC has used detection authorised under this legislation in Northern Ireland.”‘

    (tags: ripa privacy bbc tv license-fee uk northern-ireland law scope-creep)

  • Australia tries to ban crypto research – by ACCIDENT • The Register

    Researchers are warned off [discussing] 512-bits-plus key lengths, systems “designed or modified to perform cryptanalytic functions, or “designed or modified to use ‘quantum cryptography’”. [….] “an email to a fellow academic could land you a 10 year prison sentence”.
    https://twitter.com/_miw/status/556023024009224192 notes ‘the DSGL 5A002 defines it as >512bit RSA, >512bit DH, >112 bit ECC and >56 bit symmetric ciphers; weak as fuck i say.’

    (tags: law australia crime crypto ecc rsa stupidity fail)

Links for 2015-01-16

  • A Case Study of Toyota Unintended Acceleration and Software Safety

    I drive a Toyota, and this is scary stuff. Critical software systems need to be coded with care, and this isn’t it — they don’t even have a bug tracking system!

    Investigations into potential causes of Unintended Acceleration (UA) for Toyota vehicles have made news several times in the past few years. Some blame has been placed on floor mats and sticky throttle pedals. But, a jury trial verdict was based on expert opinions that defects in Toyota’s Electronic Throttle Control System (ETCS) software and safety architecture caused a fatal mishap.  This talk will outline key events in the still-ongoing Toyota UA litigation process, and pull together the technical issues that were discovered by NASA and other experts. The results paint a picture that should inform future designers of safety critical software in automobiles and other systems.

    (tags: toyota safety realtime coding etcs throttle-control nasa code-review embedded)

Links for 2015-01-15

  • Group warns of postcode project dangers | Irish Examiner

    “We have spoken to the National Consumer Agency, logistics companies and Digital Rights Ireland, with which we have had an indepth conversation to see if there is anything in the proposal that might be considered to have an impact on anyone’s privacy. Broadly, they are satisfied with what we are doing,” [Patricia Cronin, head of the Department of Communications’ postcodes division] told the committee. However in his letter, [DRI’s] O’Lachtnain said the group “want to state clearly that we are not at all ‘satisfied’ with the postcode that has been designed or the implementation proposals”.
    Some nerve!

    (tags: dri nca privacy patricia-cronin goverment postcodes eircode dpc ireland)

Links for 2015-01-14

  • Of Course 23andMe’s Plan Has Been to Sell Your Genetic Data All Along

    Today, 23andMe announced what Forbes reports is only the first of ten deals with big biotech companies: Genentech will pay up to $60 million for access to 23andMe’s data to study Parkinson’s. You think 23andMe was about selling fun DNA spit tests for $99 a pop? Nope, it’s been about selling your data all along.

    (tags: testing ethics dna genentech 23andme parkinsons diseases health privacy)

  • Facette

    Really nice time series dashboarding app. Might consider replacing graphitus with this…

    (tags: time-series data visualisation graphs ops dashboards facette)

  • Getting good cancer care through 3D printing

    This is pretty incredible.

    Balzer downloaded a free software program called InVesalius, developed by a research center in Brazil to convert MRI and CT scan data to 3D images. He used it to create a 3D volume rendering from Scott’s DICOM images, which allowed him to look at the tumor from any angle. Then he uploaded the files to Sketchfab and shared them with neurosurgeons around the country in the hope of finding one who was willing to try a new type of procedure. Perhaps unsurprisingly, he found the doctor he was looking for at UPMC, where Scott had her thyroid removed. A neurosurgeon there agreed to consider a minimally invasive operation in which he would access the tumor through Scott’s left eyelid and remove it using a micro drill. Balzer had adapted the volume renderings for 3D printing and produced a few full-size models of the front section of Scott’s skull on his MakerBot. To help the surgeon vet his micro drilling idea and plan the procedure, Balzer packed up one of the models and shipped it off to Pittsburgh.

    (tags: diy surgery health cancer tumours medicine 3d-printing 3d scanning mri dicom)

Links for 2015-01-13

Links for 2015-01-12

Links for 2015-01-11

Links for 2015-01-10

Links for 2015-01-09

  • A World Transfixed by Screens – The Atlantic

    Excellent “In Focus” this week — ‘The continued massive growth of connected mobile devices is shaping not only how we communicate with each other, but how we look, behave, and experience the world around us. Smartphones and other handheld devices have become indispensable tools, appendages held at arm’s length to record a scene or to snap a selfie. Recent news photos show refugees fleeing war-torn regions holding up their phones as prized possessions to be saved, and relatives of victims lost to a disaster holding up their smartphones to show images of their loved ones to the press. Celebrity selfies, people alone in a crowd with their phones, events obscured by the very devices used to record that event, the brightly lit faces of those bent over their small screens, these are some of the scenes depicted below.’

    (tags: mobile photography in-focus alan-taylor the-atlantic phones selfies pictures)

  • “Incremental Stream Processing using Computational Conflict-free Replicated Data Types” [paper]

    ‘Unlike existing alternatives, such as stream processing, that favor the execution of arbitrary application code, we want to capture much of the processing logic as a set of known operations over specialized Computational CRDTs, with particular semantics and invariants, such as min/max/average/median registers, accumulators, top-N sets, sorted sets/maps, and so on. Keeping state also allows the system to decrease the amount of propagated information. Preliminary results obtained in a single example show that Titan has an higher throughput when compared with state of the art stream processing systems.’

    (tags: crdt distributed stream-processing replication titan papers)

Links for 2015-01-07

Links for 2015-01-06

  • Mantis: Netflix’s Event Stream Processing System

    Rx/reactive in style, autoscaling, support for queue/broker-based strong consistency as well as TCP-based lossy delivery

    (tags: netflix rx reactive autoscaling mantis stream-processing)

  • Bad Kids Jokes

    ‘I now a man with a wooden leg named sea what was the name of the other leg SAND’

    (tags: funny humor kids jokes humour)

  • The Hit Team

    Fergal Crehan’s new gig — good idea!

    The Hit Team helps you fight back against leaked photos and videos, internet targeting and revenge porn.

    (tags: revenge-porn revenge law privacy porn leaks photos videos images selfies)

  • F1: A Distributed SQL Database That Scales

    Beyond the interesting-enough stuff about scalability in a distributed SQL store, there’s this really nifty point about avoiding the horrors of the SQL/ORM impedance mismatch:

    At Google, Protocol Buffers are ubiquitous for data storage and interchange between applications. When we still had a MySQL schema, users often had to write tedious and error-prone transformations between database rows and in-memory data structures. Putting protocol buffers in the schema removes this impedance mismatch and gives users a universal data structure they can use both in the database and in application code…. Protocol Buffer columns are more natural and reduce semantic complexity for users, who can now read and write their logical business objects as atomic units, without having to think about materializing them using joins across several tables.
    This is something that pretty much any store can already adopt. Go protobufs. (or Avro, etc.) Also, I find this really neat, and I hope this idea is implemented elsewhere soon: asynchronous schema updates:
    Schema changes are applied asynchronously on multiple F1 servers. Anomalies are prevented by the use of a schema leasing mechanism with support for only current and next schema versions; and by subdividing schema changes into multiple phases where consecutive pairs of changes are mutually compatible and cannot cause anomalies.

    (tags: schema sql f1 google papers orm protobuf)

Links for 2015-01-05

  • Avleen Vig on distributed engineering teams

    This is a really excellent post on the topic, rebutting Paul Graham’s Bay-Area-centric thoughts on the topic very effectively. I’ve worked in both distributed and non-distributed, as well as effective and ineffective teams ;), and Avleen’s thoughts are very much on target.

    I’ve been involved in the New York start up scene since I joined Etsy in 2010. Since that time, I’ve seen more and more companies there embrace having distributed teams. Two companies I know which have risen to the top while doing this have been Etsy and DigitalOcean. Both have exceptional engineering teams working on high profile products used by many, many people around the world. There are certainly others outside New York, including Automattic, GitHub, Chef Inc, Puppet… the list goes on. So how did this happen? And why do people continue to insist that distributed teams lower performance, and are a bad idea? Partly because we’ve done a poor job of showing our industry how to be successful at it, and partly because it’s hard. Having successful distributed teams requires special skills from management, which arent’t easily learned until you have to manage a distributed team. Catch 22.

    (tags: business culture management communication work distributed-teams avleen-vig engineering)

  • Hack workaround to get JVM thread priorities working on Linux

    As used in Cassandra ( http://grokbase.com/t/hbase/dev/13bf9kezes/about-xx-threadprioritypolicy-42 )!

    if you just set the “ThreadPriorityPolicy” to something else than the legal values 0 or 1, […] a slight logic bug in Sun’s JVM code kicks in, and thus sets the policy to be as if running with root – thus you get exactly what one desire. The operating system, Linux, won’t allow priorities to be heightened above “Normal” (negative nice value), and thus just ignores those requests (setting it to normal instead, nice value 0) – but it lets through the requests to set it lower (setting the nice value to some positive value).

    (tags: cassandra thread-priorities threads java jvm linux nice hacks)

Links for 2015-01-04

  • Amiko Alien2 / Enigma Discussion Thread – boards.ie

    Enigma is a Linux based alternative to the default Spark operating system on these boxes. Enigma is a more customisable OS and provides the ability to add plugins which can accomplish many tasks enabling users to have a box which might look and perform like a Sky box, giving a 7 day EPG and an alternative to series link.
    Looks like a pretty solid hacker community…

    (tags: alien2 tv enigma dvr freeview saorview pvr)

  • Hague reassures MPs on Office 365 data storage as Microsoft ordered to hand over email data

    William Hague, the leader of the House of Commons, has responded to concerns raised by an MP about the security of parliamentary data stored on Microsoft’s Cloud-based servers in Europe. “The relevant servers are situated in the Republic of Ireland and the Netherlands, both being territories covered by the EC Data Protection Directive,” William Hague wrote in a letter to John Hemming, MP for Birmingham Yardley. “Any access by US authorities to such data would have to be by way of mutual legal assistance arrangements with those countries.” […] John Hemming MP told Computer Weekly Hague’s reassurances carried little weight in the face of aggressive legal action by the US government.  “The Microsoft case makes it clear that, in the end, the fact that Microsoft is a US company legally trumps the European Data Protection Directive […] and where [the letter says] the US authorities could not exercise a right of search and seizure on an extraterritorial basis, well, they are doing that, in America, today.”
    Sounds like they didn’t think that through…

    (tags: mail privacy parliament office-365 microsoft mlat surveillance)

Links for 2015-01-03

Links for 2015-01-03

Links for 2015-01-02

  • The open-office trend is destroying the workplace

    Wow, where has this person been for the past 20 years that they haven’t had to encounter this? I can only imagine having a private office, tbh.

    my personal performance at work has hit an all-time low. Each day, my associates and I are seated at a table staring at each other, having an ongoing 12-person conversation from 9 a.m. to 5 p.m.  It’s like being in middle school with a bunch of adults. Those who have worked in private offices for decades have proven to be the most vociferous and rowdy. They haven’t had to consider how their loud habits affect others, so they shout ideas at each other across the table and rehash jokes of yore. As a result, I can only work effectively during times when no one else is around, or if I isolate myself in one of the small, constantly sought-after, glass-windowed meeting rooms around the perimeter.

    (tags: business office productivity work desks open-plan)

Links for 2015-01-02

  • The open-office trend is destroying the workplace

    Wow, where has this person been for the past 20 years that they haven’t had to encounter this? I can only imagine having a private office, tbh.

    my personal performance at work has hit an all-time low. Each day, my associates and I are seated at a table staring at each other, having an ongoing 12-person conversation from 9 a.m. to 5 p.m.  It’s like being in middle school with a bunch of adults. Those who have worked in private offices for decades have proven to be the most vociferous and rowdy. They haven’t had to consider how their loud habits affect others, so they shout ideas at each other across the table and rehash jokes of yore. As a result, I can only work effectively during times when no one else is around, or if I isolate myself in one of the small, constantly sought-after, glass-windowed meeting rooms around the perimeter.

    (tags: business office productivity work desks open-plan)

Links for 2014-12-28

  • ‘Uncertain: A First-Order Type for Uncertain Data’ [paper, PDF]

    ‘Emerging applications increasingly use estimates such as sensor data (GPS), probabilistic models, machine learning, big data, and human data. Unfortunately, representing this uncertain data with discrete types (floats, integers, and booleans) encourages developers to pretend it is not probabilistic, which causes three types of uncertainty bugs. (1) Using estimates as facts ignores random error in estimates. (2) Computation compounds that error. (3) Boolean questions on probabilistic data induce false positives and negatives. This paper introduces Uncertain, a new programming language abstraction for uncertain data. We implement a Bayesian network semantics for computation and conditionals that improves program correctness. The runtime uses sampling and hypothesis tests to evaluate computation and conditionals lazily and efficiently. We illustrate with sensor and machine learning applications that Uncertain improves expressiveness and accuracy.’ (via Tony Finch)

    (tags: uncertainty estimation types strong-typing coding probability statistics machine-learning sampling via:fanf)

Links for 2014-12-27

  • Why Airlines Want to Make You Suffer

    ‘The fee [airline pricing] model comes with systematic costs that are not immediately obvious. Here’s the thing: in order for fees to work, there needs be something worth paying to avoid. That necessitates, at some level, a strategy that can be described as “calculated misery.” Basic service, without fees, must be sufficiently degraded in order to make people want to pay to escape it. And that’s where the suffering begins.’

    (tags: travel airlines pricing fees economy consumer jetblue)

  • A Virtual Machine in Excel

    ‘Ádám was trying his hand at a problem in Excel, but the official rules prohibit the use of Excel macros. In a daze, he came up with one of the most clever uses of Excel: building an assembly interpreter with the most popular spreadsheet program. This is a virtual Harvard architecture machine without writable RAM; the stack is only lots and lots of IFs.’

    (tags: vms excel hacks spreadsheets coding)

Links for 2014-12-22

  • coz

    A causal profiler for C++.

    Causal profiling is a novel technique to measure optimization potential. This measurement matches developers’ assumptions about profilers: that optimizing highly-ranked code will have the greatest impact on performance. Causal profiling measures optimization potential for serial, parallel, and asynchronous programs without instrumentation of special handling for library calls and concurrency primitives. Instead, a causal profiler uses performance experiments to predict the effect of optimizations. This allows the profiler to establish causality: “optimizing function X will have effect Y,” exactly the measurement developers had assumed they were getting all along.
    I can see this being a good technique to stochastically discover race conditions and concurrency bugs, too.

    (tags: optimization c++ performance coding profiling speed causal-profilers)

  • Spark 1.2 released

    This is the version with the superfast petabyte-sort record:

    Spark 1.2 includes several cross-cutting optimizations focused on performance for large scale workloads. Two new features Databricks developed for our world record petabyte sort with Spark are turned on by default in Spark 1.2. The first is a re-architected network transfer subsystem that exploits Netty 4’s zero-copy IO and off heap buffer management. The second is Spark’s sort based shuffle implementation, which we’ve now made the default after significant testing in Spark 1.1. Together, we’ve seen these features give as much as 5X performance improvement for workloads with very large shuffles.

    (tags: spark sorting hadoop map-reduce batch databricks apache netty)

  • The VATMOSS debacle: does the “manual email” loophole work?

    As the 1 January deadline gallops towards the EU, microbusinesses desperate to stay open without breaking the law try to find out, “Can I email stuff out instead?” Well… Yes. – No – It depends – and simultaneously yes AND no, according to Schrödinger’s VAT. So that’s clear, then.

    (tags: vat vatmoss eu tax fiasco email microbusiness sme)

  • One artist closing up their Bandcamp site due to new VATMOSS laws

    Nice work, EU

    (tags: eu law tax vat vatmoss matt-stevens bandcamp music downloads)

Links for 2014-12-21

Links for 2014-12-19

  • AN OFFER TO SONY FROM 2600

    To demonstrate that hackers have no interest in suppressing speech, quashing controversy, or being intimidated by vague threats, we ask that Sony allow the hacker community to distribute “The Interview” for them on the 25th of December. Now, we’re aware that Sony may refer to this distribution method as piracy, but in this particular case, it may well prove to be the salvation of the motion picture industry. By freely offering the film online, millions of people will get to see it and decide for themselves if it has any redeeming qualities whatsoever – as opposed to nobody seeing it and the studios writing it off as a total loss. Theaters would be free from panic as our servers would become the target of any future vague threats (and we believe Hollywood will be most impressed with how resilient peer-to-peer distribution can be in the face of attacks). Most importantly, we would be defying intimidation, something the motion picture industry doesn’t quite have a handle on, which is surprising considering how much they’ve relied upon it in the past.

    (tags: 2600 funny hackers security sony north-korea the-interview movies piracy)

Links for 2014-12-18

Links for 2014-12-17

  • ‘Machine Learning: The High-Interest Credit Card of Technical Debt’ [PDF]

    Oh god yes. This is absolutely spot on, as you would expect from a Google paper — at this stage they probably have accumulated more real-world ML-at-scale experience than anywhere else. ‘Machine learning offers a fantastically powerful toolkit for building complex systems quickly. This paper argues that it is dangerous to think of these quick wins as coming for free. Using the framework of technical debt, we note that it is remarkably easy to incur massive ongoing maintenance costs at the system level when applying machine learning. The goal of this paper is highlight several machine learning specific risk factors and design patterns to be avoided or refactored where possible. These include boundary erosion, entanglement, hidden feedback loops, undeclared consumers, data dependencies, changes in the external world, and a variety of system-level anti-patterns. [….] ‘In this paper, we focus on the system-level interaction between machine learning code and larger systems as an area where hidden technical debt may rapidly accumulate. At a system-level, a machine learning model may subtly erode abstraction boundaries. It may be tempting to re-use input signals in ways that create unintended tight coupling of otherwise disjoint systems. Machine learning packages may often be treated as black boxes, resulting in large masses of “glue code” or calibration layers that can lock in assumptions. Changes in the external world may make models or input signals change behavior in unintended ways, ratcheting up maintenance cost and the burden of any debt. Even monitoring that the system as a whole is operating as intended may be difficult without careful design. Indeed, a remarkable portion of real-world “machine learning” work is devoted to tackling issues of this form. Paying down technical debt may initially appear less glamorous than research results usually reported in academic ML conferences. But it is critical for long-term system health and enables algorithmic advances and other cutting-edge improvements.’

    (tags: machine-learning ml systems ops tech-debt maintainance google papers hidden-costs development)

  • The FBI Used the Web’s Favorite Hacking Tool to Unmask Tor Users | WIRED

    Since Operation Torpedo [use of a Metasploit side project], there’s evidence the FBI’s anti-Tor capabilities have been rapidly advancing. Torpedo was in November 2012. In late July 2013, computer security experts detected a similar attack through Dark Net websites hosted by a shady ISP called Freedom Hosting—court records have since confirmed it was another FBI operation. For this one, the bureau used custom attack code that exploited a relatively fresh Firefox vulnerability—the hacking equivalent of moving from a bow-and-arrow to a 9-mm pistol. In addition to the IP address, which identifies a household, this code collected the MAC address of the particular computer that infected by the malware. “In the course of nine months they went from off the shelf Flash techniques that simply took advantage of the lack of proxy protection, to custom-built browser exploits,” says Soghoian. “That’s a pretty amazing growth … The arms race is going to get really nasty, really fast.”

    (tags: fbi tor police flash security privacy anonymity darknet wired via:bruces)

Links for 2014-12-16

  • Digital Rights Ireland files Amicus Brief in Microsoft v USA with Liberty and ORG

    Microsoft -v- USA is an important ongoing case, currently listed for hearing in 2015 before the US Federal Court of Appeal of the 2nd Circuit. However, as the case centres around the means by which NY law enforcement are seeking to access data of an email account which resides in Dublin, it is also crucially significant to Ireland and the rest of the EU. For that reason, Digital Rights Ireland instructed us to file an Amicus Brief in the US case, in conjunction with the global law firm of White & Case, who have acted pro bono in their representation. Given the significance of the case for the wider EU, both Liberty and the Open Rights Group in the UK have joined Digital Rights Ireland as amici on this brief. We hope it will be of aid to the US court in assessing the significance of the order being appealed by Microsoft for EU citizens and European states, in the light of the existing US and EU Mutual Legal Assistance Treaty.

    (tags: amicus-briefs law us dri microsoft mlats org liberty eu privacy)

Links for 2014-12-15

Links for 2014-12-13

  • littleBits Synth Kit

    Wow, this looks cool. $159

    littleBits and Korg have demystified a traditional analog synthesizer, making it super easy for novices and experts alike to create music. connects to speakers, computers and headphones. can be used to make your own instruments. fits into the littleBits modular system for infinite combos of audio, visual and sensory experiences

    (tags: diy hardware music littlebits gadgets make analog synths)

Links for 2014-12-12

Links for 2014-12-11

Links for 2014-12-10

  • AWS Key Management Service Cryptographic Details

    “AWS Key Management Service (AWS KMS) provides cryptographic keys and operations scaled for the cloud. AWS KMS keys and functionality are used by other AWS cloud services, and you can use them to protect user data in your applications that use AWS. This white paper provides details on the cryptographic operations that are executed within AWS when you use AWS KMS.”

    (tags: white-papers aws amazon kms key-management crypto pdf)

Links for 2014-12-09

  • Aurora for MySQL is coming

    some good details of Aurora innards

    (tags: mysql databases aurora aws ec2 sql storage transactions replication)

  • If Eventual Consistency Seems Hard, Wait Till You Try MVCC

    ex-Percona MySQL wizard Baron Schwartz, noting that MVCC as implemented in common SQL databases is not all that simple or reliable compared to big bad NoSQL Eventual Consistency:

    Since I am not ready to assert that there’s a distributed system I know to be better and simpler than eventually consistent datastores, and since I certainly know that InnoDB’s MVCC implementation is full of complexities, for right now I am probably in the same position most of my readers are: the two viable choices seem to be single-node MVCC and multi-node eventual consistency. And I don’t think MVCC is the simpler paradigm of the two.

    (tags: nosql concurrency databases mysql riak voldemort eventual-consistency reliability storage baron-schwartz mvcc innodb postgresql)

  • Scaling email transparency

    This is quite interesting/weird — Stripe’s protocol for mass-CCing email as they scale up the company, based around http://en.wikipedia.org/wiki/Civil_inattention

    (tags: communication culture email management stripe cc transparency civil-inattention)

  • Shanley Kane of Model View Culture Challenges a “Corrupt” Silicon Valley | MIT Technology Review

    If their interests were better serving the world, using technology as a force for social justice, and equitably distributing technology wealth to enrich society … sure, they’d be acting against their interests. But the reality is that tech companies centralize power and wealth in a small group of privileged white men. When that’s the goal, then exploiting the labor of marginalized people and denying them access to power and wealth is 100 percent in line with the endgame. A more diverse tech industry would be better for its workers and everyone else, but it would be worse for the privileged white men at the top of it, because it would mean they would have to give up their monopoly on money and power. And they will fight that with everything they’ve got, which is why we see barriers to equality at every level of the industry.

    (tags: culture feminism tech mit-tech-review shanley-kane privilege vcs silicon-valley)

  • Announcing Snappy Ubuntu

    Awesome! I was completely unaware this was coming down the pipeline.

    A new, transactionally updated Ubuntu for the cloud. Ubuntu Core is a new rendition of Ubuntu for the cloud with transactional updates. Ubuntu Core is a minimal server image with the same libraries as today’s Ubuntu, but applications are provided through a simpler mechanism. The snappy approach is faster, more reliable, and lets us provide stronger security guarantees for apps and users — that’s why we call them “snappy” applications. Snappy apps and Ubuntu Core itself can be upgraded atomically and rolled back if needed — a bulletproof approach to systems management that is perfect for container deployments. It’s called “transactional” or “image-based” systems management, and we’re delighted to make it available on every Ubuntu certified cloud.

    (tags: ubuntu linux packaging snappy ubuntu-core transactional-updates apt docker ops)

  • Dan McKinley :: Thoughts on the Technical Track

    Ouch. I think Amazon did a better job of the Technical Track concept than this, at least

    (tags: engineering management technical-track principal-engineer career work)

  • OSTree

    “git for operating system binaries”. OSTree is a tool for managing bootable, immutable, versioned filesystem trees. It is not a package system; nor is it a tool for managing full disk images. Instead, it sits between those levels, offering a blend of the advantages (and disadvantages) of both. You can use any build system you like to place content into it on a build server, then export an OSTree repository via static HTTP. On each client system, “ostree admin upgrade” can incrementally replicate that content, creating a new root for the next reboot. This provides fully atomic upgrades. Any changes made to /etc are propagated forwards, and all local state in /var is shared. A key goal of the project is to complement existing package systems like RPM and Debian packages, and help further their evolution. In particular for example, RPM-OSTree (linked below) has as a goal a hybrid tree/package model, where you replicate a base tree via OSTree, and then add packages on top.

    (tags: os gnome git linux immutable deployment packaging via:fanf)

Links for 2014-12-06

  • State sanctions foreign phone and email tapping

    Well, this stinks.

    Foreign law enforcement agencies will be allowed to tap Irish phone calls and intercept emails under a statutory instrument signed into law by Minister for Justice Frances Fitzgerald. Companies that object or refuse to comply with an intercept order could be brought before a private “in camera” court. The legislation, which took effect on Monday, was signed into law without fanfare on November 26th, the day after documents emerged in a German newspaper indicating the British spy agency General Communications Headquarters (GCHQ) had directly tapped undersea communications cables between Ireland and Britain for years.

    (tags: ireland law gchq surveillance mlats phone-tapping)

  • “Looks like Chicago PD had a stingray out at the Eric Garner protest last night”

    Your tax dollars at work: Spying on people just because they demand that the government’s agents stop killing black people. […] Anonymous has released a video featuring what appear to be Chicago police radio transmissions revealing police wiretapping of organizers’ phones at the protests last night the day after Thanksgiving, perhaps using a stingray. The transmissions pointing to real-time wiretapping involve the local DHS-funded spy ‘fusion’ center.

    (tags: imsi-catcher stingray surveillance eric-garner protests privacy us-politics anonymous chicago police wiretapping dhs)

  • When data gets creepy: the secrets we don’t realise we’re giving away | Technology | The Guardian

    Very good article around the privacy implications of derived and inferred aggregate metadata from Ben Goldacre.

    We are entering an age – which we should welcome with open arms – when patients will finally have access to their own full medical records online. So suddenly we have a new problem. One day, you log in to your medical records, and there’s a new entry on your file: “Likely to die in the next year.” We spend a lot of time teaching medical students to be skilful around breaking bad news. A box ticked on your medical records is not empathic communication. Would we hide the box? Is that ethical? Or are “derived variables” such as these, on a medical record, something doctors should share like anything else?

    (tags: advertising ethics privacy security law data aggregation metadata ben-goldacre)

  • Stellar/Ripple suffer a failure of their consensus system, resulting in a split-brain failure

    Prof. Mazières’s research indicated some risk that consensus could fail, though we were nor certain if the required circumstances for such a failure were realistic. This week, we discovered the first instance of a consensus failure. On Tuesday night, the nodes on the network began to disagree and caused a fork of the ledger. The majority of the network was on ledger chain A. At some point, the network decided to switch to ledger chain B. This caused the roll back of a few hours of transactions that had only been recorded on chain A. We were able to replay most of these rolled back transactions on chain B to minimize the impact. However, in cases where an account had already sent a transaction on chain B the replay wasn’t possible.

    (tags: consensus distcomp stellar ripple split-brain postmortems outages ledger-fork payment)

  • the “Unknown Pleasures” cover, emulated in Mathematica

    In July 1967, astronomers at the Cavendish Laboratory in Cambridge, observed an unidentified radio signal from interstellar space, which flashed periodically every 1.33730 seconds. This object flashed with such regularity that it was accurate enough to be used as a clock and only be off by one part in a hundred million. It was eventually determined that this was the first discovery of a pulsar, CP-1919.  This is an object that has about the same mass as the Sun, but is the size of the San Francisco Bay at its widest (~20 kilometers) that is rotating so fast that its emitting a beam of light towards Earth like a strobing light house! Pulsars are neutron stars that are formed from the remnants of a massive star when it experiences stellar death. A hand drawn graph plotted in the style of a waterfall plot, in the Cambridge Encyclopedia of Astronomy, later became renown for its use on the cover of the album “Unknown Pleasures”  by 1970s English band Joy Division.
    The entire blog at http://intothecontinuum.tumblr.com/ is pretty great. Lots of nice mathematical animated GIFs, accompanied by Mathematica source and related ponderings.

    (tags: maths gifs animation art unknown-pleasures mathematica cp-1919 pulsars astronomy joy-division waterfall-plots cambridge blogs)