-
OfCom has published a report on online piracy, which found that the practice is becoming less common and that pirates tend to spend more on legitimate content than non-pirates. The research, which was not funded by the entertainment industry, was conducted by Kantar Media among 21,474 participants and took place in 2012 across four separate stages. Over that time, the ratio of legal to illegal content fell — confirming a suspected trend as legal streaming options became more available. It also confirmed another suspicion — that a relatively small number of web users are responsible for most piracy. In OfCom’s data, just two percent of users conducted three quarters of all piracy. Ofcom described piracy as “a minority activity”. Of those surveyed, 58 percent accessed music, movie or TV content online, while 17 percent accessed illegal content sources. Those who admitted pirating content spent on average £26 every three months on legitimate content, set against an average spend of £16 among non-pirates.
Want to back an Irish Microbrewery?
The excellent Trouble Brewing are looking for investors
(tags: trouble-brewing ireland brewing beer business investment crowdfunding microbreweries)
_An Improved Construction For Counting Bloom Filters_
‘A counting Bloom filter (CBF) generalizes a Bloom filter data structure so as to allow membership queries on a set that can be changing dynamically via insertions and deletions. As with a Bloom filter, a CBF obtains space savings by allowing false positives. We provide a simple hashing-based alternative based on d-left hashing called a d-left CBF (dlCBF). The dlCBF offers the same functionality as a CBF, but uses less space, generally saving a factor of two or more. We describe the construction of dlCBFs, provide an analysis, and demonstrate their effectiveness experimentally’
(tags: bloom-filter data-structures algorithms counting cbf storage false-positives d-left-hashing hashing)
Justin's Linklog Posts
To solve hard problems, you need to use bricolage
In a talk about a neat software component he designed, Bruce Haddon observed that there is no way that the final structure and algorithmic behavior of this component could have been predicted, designed, or otherwise anticipated. Haddon observed that computer science serves as a source of core ideas: it provides the data structures and algorithms that are the building blocks. Meanwhile, he views software engineering as a useful set of methods to help design reliable software without losing your mind. Yet he points out that neither captures the whole experience. That’s because much of the work is what Haddon calls hacking, but what others would call bricolage. Simply put, there is much trial and error: we put ideas to together and see where it goes.
This is a great post, and I agree (broadly). IMO, most software engineering requires little CS, but there are occasional moments where a single significant aspect of a project requires a particular algorithm, and would be kludgy, hacky, or over-complex to solve without it.(tags: bricolage hacking cs computer-science work algorithms)
Getting Real About Distributed System Reliability
I have come around to the view that the real core difficulty of [distributed] systems is operations, not architecture or design. Both are important but good operations can often work around the limitations of bad (or incomplete) software, but good software cannot run reliably with bad operations. This is quite different from the view of unbreakable, self-healing, self-operating systems that I see being pitched by the more enthusiastic NoSQL hypesters. Worse yet, you can’t easily buy good operations in the same way you can buy good software—you might be able to hire good people (if you can find them) but this is more than just people; it is practices, monitoring systems, configuration management, etc.
(tags: reliability nosql distributed-systems jay-kreps ops)
Don’t use Hadoop – your data isn’t that big
see also HN comments: https://news.ycombinator.com/item?id=6398650 , particularly davidmr’s great one:
I suppose all of this is to say that the amount of required parallelization of a problem isn’t necessarily related to the size of the problem set as is mentioned most in the article, but also the inherent CPU and IO characteristics of the problem. Some small problems are great for large-scale map-reduce clusters, some huge problems are horrible for even bigger-scale map-reduce clusters (think fluid dynamics or something that requires each subdivision of the problem space to communicate with its neighbors). I’ve had a quote printed on my door for years: Supercomputers are an expensive tool for turning CPU-bound problems into IO-bound problems.
I love that quote!(tags: hadoop big-data scaling map-reduce)
-
Gilt ran a stress-test of Riak to replace Voldemort (I think) in a shadow stack, with good results:
Riak’s strong performance suggests that, should we pursue implementation, it will withstand our unique traffic needs and prove reliable. As for the Gilt-Basho team’s strong performance: It was amazing that we were able to accomplish so much in just a week’s time! Thanks again to Seth and Steve for making this possible.
THE LONG DARK, a first-person post-disaster survival sim by Hinterland — Kickstarter
wow this looks great.
The Long Dark is a thoughtful, first-person survival simulation that emphasizes quiet exploration in a stark, yet hauntingly beautiful, post-disaster setting. The breathtakingly picturesque Pacific Northwest frames the backdrop for the drama of The Long Dark.
(tags: games survival via:fp eclaire the-long-dark kickstarter)
The Rational Choices of Crack Addicts – NYTimes.com
“The key factor is the environment, whether you’re talking about humans or rats,” Dr. Hart said. “The rats that keep pressing the lever for cocaine are the ones who are stressed out because they’ve been raised in solitary conditions and have no other options. But when you enrich their environment, and give them access to sweets and let them play with other rats, they stop pressing the lever.”
Inside the mind of NSA chief Gen Keith Alexander | Glenn Greenwald
featuring some mental pics of the “Information Dominance Center”, the Star Trek bridge which NSA chief Keith Alexander built with taxpayer money
(tags: big-brother nsa politics keith-alexander star-trek funny bizarre)
Schneier on Security: Reforming the NSA
Regardless of how we got here, the NSA can’t reform itself. Change cannot come from within; it has to come from above. It’s the job of government: of Congress, of the courts, and of the president. These are the people who have the ability to investigate how things became so bad, rein in the rogue agency, and establish new systems of transparency, oversight, and accountability. Any solution we devise will make the NSA less efficient at its eavesdropping job. That’s a trade-off we should be willing to make, just as we accept reduced police efficiency caused by requiring warrants for searches and warning suspects that they have the right to an attorney before answering police questions. We do this because we realize that a too-powerful police force is itself a danger, and we need to balance our need for public safety with our aversion of a police state.
(tags: nsa politics us-politics surveillance snooping society government police public-safety police-state)
Biometric authentication failing in Mysore
Biometrics was rolled out for food distribution in order to cut down on fraud, but it’s now resulting in a subset of users being unable to authenticate:
The biometric authentication system installed at the PDS outlets fails to establish the identity of many genuine beneficiaries, mostly workers, as their daily grind in the agricultural fields, construction sites or as domestic help have eroded the lines on their thumb resulting in distorted impressions.
(tags: fail risks biometrics authentication mysore security india fingerprinting)
Sketch of the Day – Frugal Streaming
ha, this is very clever! If you have enough volume, this is a nice estimation algorithm to compute stream quantiles in very little RAM
(tags: memory streaming stream-processing clever algorithms hacks streams)
-
Spam Arrest is a company that sells an anti-spam service. They attempted to sue some spammers and, as has been widely reported, lost badly. This case emphasizes three points that litigious antispammers seem not to grasp: Under CAN SPAM, a lot of spam is legal. Judges hate plaintiffs who try to be too clever, and hate sloppy preparation even more. Never, ever, file a spam suit in Seattle.
(tags: anti-spam spam law seattle us can-spam spamarrest sentient-jets)
Benchmarking Redis on AWS ElastiCache
good data points, but could do with latency percentiles
(tags: latency redis measurement benchmarks ec2 elasticache aws storage tests)
Being poor changes your thinking about everything
Very interesting research into poverty and scarcity, in the Washington Post:
The scarcity trap captures this notion we see again and again in many domains. When people have very little, they undertake behaviors that maintain or reinforce their future disadvantage. If you have very little, you often behave in such a way so that you’ll have little in the future. In economics, people talk about the poverty trap. We’re generalizing that, saying this happens a lot, and we’ve experienced it.
(tags: poor poverty society economics scarcity washington-post)
Good SSL for your website is absurdly difficult in practice
Yet again, security software fails on packaging and UI. via Tony Finch
Former NSA and CIA director says terrorists love using Gmail
At one point, Hayden expressed a distaste for online anonymity, saying “The problem I have with the Internet is that it’s anonymous.” But he noted, there is a struggle over that issue even inside government. The issue came to a head during the Arab Spring movement when the State Department was funding technology [presumably Tor?] to protect the anonymity of activists so governments could not track down or repress their voices. “We have a very difficult time with this,” Hayden said. He then asked, “is our vision of the World Wide Web the global digital commons — at this point you should see butterflies flying here and soft background meadow-like music — or a global free fire zone?” Given that Hayden also compared the Internet to the wild west and Somalia, Hayden clearly leans toward the “global free fire zone” vision of the Internet.
well, that’s a good analogy for where we’re going — a global free-fire zone.(tags: gmail cia nsa surveillance michael-hayden security snooping law tor arab-spring)
Google swaps out MySQL, moves to MariaDB
When we asked Sallner to quantify the scale of the migration he said, “They’re moving it all. Everything they have. All of the MySQL servers are moving to MariaDB, as far as I understand.” By moving to MariaDB, Google can free itself of any dependence on technology dictated by Oracle – a company whose motivations are unclear, and whose track record for working with the wider technology community is dicey, to say the least. Oracle has controlled MySQL since its acquisition of Sun in 2010, and the key InnoDB storage engine since it got ahold of Innobase in 2005. […] We asked Cole why Google would shift from MySQL to MariaDB, and what the key technical differences between the systems were. “From my perspective, they’re more or less equivalent other than if you look at specific features and how they implement them,” Cole said, speaking in a personal capacity and not on behalf of Google. “Ideologically there are lots of differences.”
So — AWS, when will RDS offer MariaDB as an option?(tags: google mysql mariadb sql open-source licensing databases storage innodb oracle)
FBI Admits It Controlled Tor Servers Behind Mass Malware Attack
The code’s behavior, and the command-and-control server’s Virginia placement, is also consistent with what’s known about the FBI’s “computer and internet protocol address verifier,” or CIPAV, the law enforcement spyware first reported by WIRED in 2007. Court documents and FBI files released under the FOIA have described the CIPAV as software the FBI can deliver through a browser exploit to gather information from the target’s machine and send it to an FBI server in Virginia. The FBI has been using the CIPAV since 2002 against hackers, online sexual predators, extortionists, and others, primarily to identify suspects who are disguising their location using proxy servers or anonymity services, like Tor. Prior to the Freedom Hosting attack, the code had been used sparingly, which kept it from leaking out and being analyzed.
-
lots more detail on the new “Java Mission Control” feature in Hotspot 7u40 JVMs, and how to use it to start and stop profiling in a live, production JVM from a separate “jcmd” command-line client. If the overhead is small, this could be really neat — turn on profiling for 1 minute every hour on a single instance, and collect realtime production profile data on an automated basis for post-facto analysis if required
Necessary and Proportionate — In Which Civil Society is Caught Between a Cop and a Spy
Modern telecommunications technology implied the development of modern telecommunications surveillance, because it moved the scope of action from the physical world (where intelligence, generally seen as part of the military mission, had acted) to the virtual world—including the scope of those actions that could threaten state power. While the public line may have been, as US Secretary of State Henry Stimson said in 1929, “gentlemen do not open each other’s mail”, you can bet that they always did keep a keen eye on the comings and goings of each other’s shipping traffic. The real reason that surveillance in the context of state intelligence was limited until recently was because it was too expensive, and it was too expensive for everyone. The Westphalian compromise demands equality of agency as tied to territory. As soon as one side gains a significant advantage, the structure of sovereignty itself is threatened at a conceptual level?—?hence Oppenheimer as the death of any hope of international rule of law. Once surveillance became cheap enough, all states were (and will increasingly be) forced to attempt it at scale, as a reaction to this pernicious efficiency. The US may be ahead of the game now, but Moore’s law and productization will work their magic here.
(tags: government telecoms snooping gchq nsa surveillance law politics intelligence spying internet)
-
Bit of detail into Twitter’s TSD metric store.
There are separate online clusters for different data sets: application and operating system metrics, performance critical write-time aggregates, long term archives, and temporal indexes. A typical production instance of the time series database is based on four distinct Cassandra clusters, each responsible for a different dimension (real-time, historical, aggregate, index) due to different performance constraints. These clusters are amongst the largest Cassandra clusters deployed in production today and account for over 500 million individual metric writes per minute. Archival data is stored at a lower resolution for trending and long term analysis, whereas higher resolution data is periodically expired. Aggregation is generally performed at write-time to avoid extra storage operations for metrics that are expected to be immediately consumed. Indexing occurs along several dimensions–service, source, and metric names–to give users some flexibility in finding relevant data.
(tags: twitter monitoring metrics service-metrics tsd time-series storage architecture cassandra)
NSA: Possibly breaking US laws, but still bound by laws of computational complexity
I didn’t clearly explain that there’s an enormous continuum between, on the one hand, a full break of RSA or Diffie-Hellman (which still seems extremely unlikely to me), and on the other, “pure side-channel attacks” involving no new cryptanalytic ideas. Along that continuum, there are many plausible places where the NSA might be. For example, imagine that they had a combination of side-channel attacks, novel algorithmic advances, and sheer computing power that enabled them to factor, let’s say, ten 2048-bit RSA keys every year. In such a case, it would still make perfect sense that they’d want to insert backdoors into software, sneak vulnerabilities into the standards, and do whatever else it took to minimize their need to resort to such expensive attacks. But the possibility of number-theoretic advances well beyond what the open world knows certainly wouldn’t be ruled out. Also, as Schneier has emphasized, the fact that NSA has been aggressively pushing elliptic-curve cryptography in recent years invites the obvious speculation that they know something about ECC that the rest of us don’t.
(tags: ecc rsa crypto security nsa gchq snooping sniffing diffie-hellman pki key-length)
-
Built into the HotSpot JVM [in JDK version 7u40] is something called the Java Flight Recorder. It records a lot of information about/from the JVM runtime, and can be thought of as similar to the Data Flight Recorders you find in modern airplanes. You normally use the Flight Recorder to find out what was happening in your JVM when something went wrong, but it is also a pretty awesome tool for production time profiling. Since Mission Control (using the default templates) normally don’t cause more than a per cent overhead, you can use it on your production server.
I’m intrigued by the idea of always-on profiling in production. This could be cool.(tags: performance java measurement profiling jvm jdk hotspot mission-control instrumentation telemetry metrics)
How the NSA Spies on Smartphones
One of the US agents’ tools is the use of backup files established by smartphones. According to one NSA document, these files contain the kind of information that is of particular interest to analysts, such as lists of contacts, call logs and drafts of text messages. To sort out such data, the analysts don’t even require access to the iPhone itself, the document indicates. The department merely needs to infiltrate the target’s computer, with which the smartphone is synchronized, in advance. Under the heading “iPhone capability,” the NSA specialists list the kinds of data they can analyze in these cases. The document notes that there are small NSA programs, known as “scripts,” that can perform surveillance on 38 different features of the iPhone 3 and 4 operating systems. They include the mapping feature, voicemail and photos, as well as the Google Earth, Facebook and Yahoo Messenger applications.
and, of course, the alternative means of backup is iCloud…. wonder how secure those backups are.(tags: nsa surveillance gchq iphone smartphones backups icloud security)
-
Boost ASIO at the front end (!), Kafka 0.8, Storm, and ElasticSearch
(tags: boost scalability loggly logging ingestion cep stream-processing kafka storm architecture elasticsearch)
Schneier on Security: Excess Automobile Deaths as a Result of 9/11
The inconvenience of extra passenger screening and added costs at airports after 9/11 cause many short-haul passengers to drive to their destination instead, and, since airline travel is far safer than car travel, this has led to an increase of 500 U.S. traffic fatalities per year. Using DHS-mandated value of statistical life at $6.5 million, this equates to a loss of $3.2 billion per year, or $32 billion over the period 2002 to 2011 (Blalock et al. 2007).
(tags: risk security death 9-11 politics screening dhs air-travel driving road-safety)
-
The debate has been stifled in Britain more successfully than anywhere else in the free world and, astonishingly, this has been with the compliance of a media and public that regard their attachment to liberty to be a matter of genetic inheritance. So maybe it is best for me to accept that the BBC, together with most of the newspapers, has moved with society, leaving me behind with a few old privacy-loving codgers, wondering about the cause of this shift in attitudes. Is it simply the fear of terror and paedophiles? Are we so overwhelmed by the power of the surveillance agencies that we feel we can’t do anything? Or is it that we have forgotten how precious and rare truly free societies are in history?
(tags: privacy uk politics snooping spies gchq society nsa henry-porter)
-
Some great street art from Brighton, via Darach Ennis
(tags: via:darachennis street-art graffiti big-data snooping spies gchq nsa art)
Blocking The Pirate Bay appears to have ‘no lasting net impact’ on illegal downloading
In the fight against the unauthorised sharing of copyright protected material, aka piracy, Dutch Internet Service Providers have been summoned by courts to block their subscribers’ access to The Pirate Bay (TPB) and related sites. This paper studies the effectiveness of this approach towards online copyright enforcement, using both a consumer survey and a newly developed non-infringing technology for BitTorrent monitoring. While a small group of respondents download less from illegal sources or claim to have stopped, and a small but significant effect is found on the distribution of Dutch peers, no lasting net impact is found on the percentage of the Dutch population downloading from illegal sources.
(tags: fail blocking holland pirate-bay tpb papers via:tjmcintyre internet isps)
How Advanced Is the NSA’s Cryptanalysis — And Can We Resist It?
Bruce Schneier’s suggestions:
Assuming the hypothetical NSA breakthroughs don’t totally break public-cryptography — and that’s a very reasonable assumption — it’s pretty easy to stay a few steps ahead of the NSA by using ever-longer keys. We’re already trying to phase out 1024-bit RSA keys in favor of 2048-bit keys. Perhaps we need to jump even further ahead and consider 3072-bit keys. And maybe we should be even more paranoid about elliptic curves and use key lengths above 500 bits. One last blue-sky possibility: a quantum computer. Quantum computers are still toys in the academic world, but have the theoretical ability to quickly break common public-key algorithms — regardless of key length — and to effectively halve the key length of any symmetric algorithm. I think it extraordinarily unlikely that the NSA has built a quantum computer capable of performing the magnitude of calculation necessary to do this, but it’s possible. The defense is easy, if annoying: stick with symmetric cryptography based on shared secrets, and use 256-bit keys.
(tags: bruce-schneier cryptography wired nsa surveillance snooping gchq cryptanalysis crypto future key-lengths)
DevOps Eye for the Coding Guy: Metrics
a pretty good description of the process of adding service metrics to a Django webapp using graphite and statsd. Bookmarking mainly for the great real-time graphing hack at the end…
Probabalistic Scraping of Plain Text Tables
a nifty hack.
Recently I have been banging my head trying to import a ton of OCR acquired data expressed in tabular form. I think I have come up with a neat approach using probabilistic reasoning combined with mixed integer programming. The method is pretty robust to all sorts of real world issues. In particular, the method leverages topological understanding of tables, encodes it declaratively into a mixed integer/linear program, and integrates weak probabilistic signals to classify the whole table in one go (at sub second speeds). This method can be used for any kind of classification where you have strong logical constraints but noisy data.
(via proggit)(tags: scraping tables ocr probabilistic linear-programming optimization machine-learning via:proggit)
-
‘Plugin to make highly interactive graphite graph objects ((i.e. graphs where you can interactively toggle on/off individual series, inspect datapoints, zoom in realtime, etc) Supports Flot (canvas), Rickshaw (svg) and standard graphite png images (in case you’re nostalgic and don’t like interactivity).’
(tags: graphs graphing graphite dataviz flot rickshaw svg canvas javascript)
modern JVM concurrency primitives are broken if the system clock steps backwards
‘The implementation of the concurrency primitive LockSupport.parkNanos(), the function that controls *every* concurrency primitive on the JVM, is flawed, and any NTP sync, or system time change, can potentially break it with unexpected results across the board when running a 64bit JVM on Linux 64bit.’ Basically, LockSupport.parkNanos() calls pthread_cond_timedwait() using a CLOCK_REALTIME instead of CLOCK_MONOTONIC. ‘tinker step 0’ in ntp.conf may be a viable workaround.
(tags: clocks timing ntp slew sync step pthreads java jvm timers clock_realtime clock_monotonic)
Schneier on Security: The NSA Is Breaking Most Encryption on the Internet
The new Snowden revelations are explosive. Basically, the NSA is able to decrypt most of the Internet. They’re doing it primarily by cheating, not by mathematics. It’s joint reporting between the Guardian, the New York Times, and ProPublica. I have been working with Glenn Greenwald on the Snowden documents, and I have seen a lot of them. These are my two essays on today’s revelations. Remember this: The math is good, but math has no agency. Code has agency, and the code has been subverted.
(tags: encryption communication government nsa security bruce-schneier crypto politics snooping gchq guardian journalism)
How To Buffer Full YouTube Videos Before Playing
summary – turn off DASH (Dynamic adaptive streaming) using a userscript.
Voldemort on Solid State Drives [paper]
‘This paper and talk was given by the LinkedIn Voldemort Team at the Workshop on Big Data Benchmarking (WBDB May 2012).’
With SSD, we find that garbage collection will become a very significant bottleneck, especially for systems which have little control over the storage layer and rely on Java memory management. Big heapsizes make the cost of garbage collection expensive, especially the single threaded CMS Initial mark. We believe that data systems must revisit their caching strategies with SSDs. In this regard, SSD has provided an efficient solution for handling fragmentation and moving towards predictable multitenancy.
(tags: voldemort storage ssd disk linkedin big-data jvm tuning ops gc)
Streaming MapReduce with Summingbird
Before Summingbird at Twitter, users that wanted to write production streaming aggregations would typically write their logic using a Hadoop DSL like Pig or Scalding. These tools offered nice distributed system abstractions: Pig resembled familiar SQL, while Scalding, like Summingbird, mimics the Scala collections API. By running these jobs on some regular schedule (typically hourly or daily), users could build time series dashboards with very reliable error bounds at the unfortunate cost of high latency. While using Hadoop for these types of loads is effective, Twitter is about real-time and we needed a general system to deliver data in seconds, not hours. Twitter’s release of Storm made it easy to process data with very low latencies by sacrificing Hadoop’s fault tolerant guarantees. However, we soon realized that running a fully real-time system on Storm was quite difficult for two main reasons: Recomputation over months of historical logs must be coordinated with Hadoop or streamed through Storm with a custom log loading mechanism; Storm is focused on message passing and random-write databases are harder to maintain. The types of aggregations one can perform in Storm are very similar to what’s possible in Hadoop, but the system issues are very different. Summingbird began as an investigation into a hybrid system that could run a streaming aggregation in both Hadoop and Storm, as well as merge automatically without special consideration of the job author. The hybrid model allows most data to be processed by Hadoop and served out of a read-only store. Only data that Hadoop hasn’t yet been able to process (data that falls within the latency window) would be served out of a datastore populated in real-time by Storm. But the error of the real-time layer is bounded, as Hadoop will eventually get around to processing the same data and will smooth out any error introduced. This hybrid model is appealing because you get well understood, transactional behavior from Hadoop, and up to the second additions from Storm. Despite the appeal, the hybrid approach has the following practical problems: Two sets of aggregation logic have to be kept in sync in two different systems; Keys and values must be serialized consistently between each system and the client. The client is responsible for reading from both datastores, performing a final aggregation and serving the combined results Summingbird was developed to provide a general solution to these problems.
Very interesting stuff. I’m particularly interested in the design constraints they’ve chosen to impose to achieve this — data formats which require associative merging in particular.(tags: mapreduce streaming big-data twitter storm summingbird scala pig hadoop aggregation merging)
Thoughts on Granby Park, the recent pop-up park off Parnell St
We mentioned above that pop-up spaces have become popular across Europe because they allow developers and city councils to harness urban creativity in order to drive up real estate prices without ceding control of a given site. Those who produce the space through hard work, collaboration and passion move on, making way for property development and speculation. The international research in this area is very clear on this point and it has been documented in places from Lower-East Side Manhattan to Berlin’s Kreuzberg. Most perversely, increased property prices make it even more difficult for creativity to flourish in a given area and end up driving out long-term working class communities, migrants and young people. But what can we do? If every attempt we make to make our city a better place simply ends up being captured in the calculations of real estate players, surely the situation is hopeless? Is it better, then, to do nothing? We don’t think it is better to do nothing and, like Upstart, we still believe we can find a way together through experimentation and collaboration. However, this means questioning, reflecting on and publicly discussing the relationship between our efforts to make a city more after our hearts desire and the process of gentrification. As noted above, this is especially the case with pop-up spaces given their temporary nature. It is really necessary that we think about how to make sure our activities don’t contribute to gentrification in the long term, but instead benefit the city as a whole. We certainly don’t have the solutions, but if we sweep these awkward questions under the carpet we risk contributing to the very forces we want to challenge and alienating those who will perceive us as the ‘front-line’ of gentrification.
(tags: gentrification pop-up parks dublin ireland cities upstart spaces urban-planning)
[#CASSANDRA-5582] Replace CustomHsHaServer with better optimized solution based on LMAX Disruptor
Disruptor: decimating P99s since 2011
(tags: disruptor cassandra java p99 latency speed performance concurrency via:kellabyte)
-
I love these.
Photographic prints are great because they don’t need power to be displayed. They are more or less permanent. Videos are great because they record a sequence of time which shows reality almost like how we experience. Is it possible to combine the two? And not via long exposure photography where often details are lost from motion. So I played around with the tools of digital photography and post processing to give you this series: Time is a dimension. This series of images are mostly landscapes, seascapes and cityscapes, and they are a single composite made from sequences that span 2-4 hours, mostly of sunrises and sunsets. The basic structure of a landscape is present in every piece. But each panel or concentric layer shows a different slice of time, which is related to the adjacent panel/layer. The transition from daytime to night is gradual and noticeable in every piece, but would not be something you expect to see in a still image.
(tags: photography beautiful photos art time dimensions prints via:matthaughey)
-
‘Visualizations that make no sense.’ Some of these are unintentional comedy gold — pie charts feature heavily, of course. (via Des Traynor)
(tags: via:destraynor infographics wtf visualization dataviz data fail funny graphics pie-charts)
Non-blocking transactional atomicity
Peter Bailis with an interesting distributed-storage atomicity algorithm for performing multi-record transactional updates
(tags: algorithms nbta transactions databases storage distcomp distributed atomic coding eventual-consistency crdts)
Interview with the Github Elasticsearch Team
good background on Github’s Elasticsearch scaling efforts. Some rather horrific split-brain problems under load, and crashes due to OpenJDK bugs (sounds like OpenJDK *still* isn’t ready for production). painful
(tags: elasticsearch github search ops scaling split-brain outages openjdk java jdk jvm)
The Irish Times, terminations and Holles Street: The story that wasn’t there.
Summarising a very shoddy tale from our paper of record.
I don’t know what happened here. I don’t know whether there ever was a woman who met the description given by the Irish Times who suffered a medical crisis during pregnancy. I don’t know why a group of men in positions of authority in the Irish Times decided that, if there was such a woman, they had any right to tell the rest of the country about her experiences. I don’t know why, when they discovered that a mistake had been made in the one legal fact used to justify that decision they didn’t immediately apologise. And I don’t know what happened between the 23rd August 2013 and 31st August 2013 to prompt them to print a shoulder shrugging ‘acceptance’ that the case ‘hadn’t happened’ and limit the paper’s apology to an institution, as opposed to its readers. But, from what I’ve seen this week, I do know one thing. Whatever questions readers might have, The Irish Times isn’t interested in giving them any answers.
(tags: irish-times fail shoddy abortion health public-interest journalism pregnancy corrections)
-
Rackspace’s large-scale TSD storage system, built on Cassandra, Java, ASL2
(tags: cassandra tsd storage time-series data open-source java rackspace)
Reversing Sinclair’s amazing 1974 calculator hack – half the ROM of the HP-35
Amazing reverse engineering.
In a hotel room in Texas, Clive Sinclair had a big problem. He wanted to sell a cheap scientific calculator that would grab the market from expensive calculators such as the popular HP-35. Hewlett-Packard had taken two years, 20 engineers, and a million dollars to design the HP-35, which used 5 complex chips and sold for $395. Sinclair’s partnership with calculator manufacturer Bowmar had gone nowhere. Now Texas Instruments offered him an inexpensive calculator chip that could barely do four-function math. Could he use this chip to build a $100 scientific calculator? Texas Instruments’ engineers said this was impossible – their chip only had 3 storage registers, no subroutine calls, and no storage for constants such as ?. The ROM storage in the calculator held only 320 instructions, just enough for basic arithmetic. How could they possibly squeeze any scientific functions into this chip? Fortunately Clive Sinclair, head of Sinclair Radionics, had a secret weapon – programming whiz and math PhD Nigel Searle. In a few days in Texas, they came up with new algorithms and wrote the code for the world’s first single-chip scientific calculator, somehow programming sine, cosine, tangent, arcsine, arccos, arctan, log, and exponentiation into the chip. The engineers at Texas Instruments were amazed. How did they do it? Up until now it’s been a mystery. But through reverse engineering, I’ve determined the exact algorithms and implemented a simulator that runs the calculator’s actual code. The reverse-engineered code along with my detailed comments is in the window below.
(tags: reversing reverse-engineering history calculators sinclair ti hp chips silicon hacks)
Microsoft CEO Steve Ballmer retires: A firsthand account of the company’s employee-ranking system
LOL MS. Sadly, this talk of “core competencies” and “visibility” is pretty reminiscent of Amazon’s review season, too:
This illustrated another problem with [stack ranking]: It destroyed trust between individual contributors and management, because the stack rank required that all lower-level managers systematically lie to their reports. Why? Because for years Microsoft did not admit the existence of the stack rank to nonmanagers. Knowledge of the process gradually leaked out, becoming a recurrent complaint on the much-loathed (by Microsoft) Mini-Microsoft blog, where a high-up Microsoft manager bitterly complained about organizational dysfunction and was joined in by a chorus of hundreds of employees. The stack rank finally made it into a Vanity Fair article in 2012, but for many years it was not common knowledge, inside or outside Microsoft. It was presented to the individual contributors as a system of objective assessment of “core competencies,” with each person being judged in isolation. When review time came, and programmers would fill out a short self-assessment talking about their achievements, strengths, and weaknesses, only some of them knew that their ratings had been more or less already foreordained at the stack rank. […] If you did know about the stack rank, you weren’t supposed to admit it. So you went through the pageantry of the performance review anyway, arguing with your manager in the rhetoric of “core competencies.” The managers would respond in kind. Since the managers had little control over the actual score and attendant bonus and raise (if any), their job was to write a review to justify the stack rank in the language of absolute merit. (“Higher visibility” was always a good catch-all: Sure, you may be a great coder and work 80 hours a week, but not enough people have heard of you!)
(tags: amazon stack-ranking employees ranking work microsoft core-competencies)
BBC News – How one man turns annoying cold calls into cash
This is hilarious. Quid pro quo!
Once he had set up the 0871 line, every time a bank, gas or electricity supplier asked him for his details online, he submitted it as his contact number. He added he was “very honest” and the companies did ask why he had a premium number. He told the programme he replied: “Because I’m getting annoyed with PPI phone calls when I’m trying to watch Coronation Street so I’d rather make 10p a minute.” He said almost all of the companies he dealt with were happy to use it and if they refused he asked them to email.
(tags: spam cold-calls phone ads uk funny 0871 premium-rate ppi)
-
This is brilliant. Half of the office now wants prints.
Massive congratulations to Edge magazine. The stellar publication has been around for 20 years! To celebrate, their 258th issue comes in 20 different flavours, and one of those flavours includes the earthly overtones of both Minecraft and Dungeons & Dragons. Junkboy drew it, and I [Owen] worded it a few weeks ago.
(tags: covers images edge minecraft gaming funny dungeons-and-dragons retro dnd)
-
Forecast.io are doing such a great job of applying modern machine-learning to traditional weather data. “Quicksilver” is their neural-net-adjusted global temperature geodata, and here’s how it’s built
(tags: quicksilver forecast forecast.io neural-networks ai machine-learning algorithms weather geodata earth temperature)
_MillWheel: Fault-Tolerant Stream Processing at Internet Scale_ [paper, pdf]
from VLDB 2013:
MillWheel is a framework for building low-latency data-processing applications that is widely used at Google. Users specify a directed computation graph and application code for individual nodes, and the system manages persistent state and the continuous flow of records, all within the envelope of the framework’s fault-tolerance guarantees. This paper describes MillWheel’s programming model as well as its implementation. The case study of a continuous anomaly detector in use at Google serves to motivate how many of MillWheel’s features are used. MillWheel’s programming model provides a notion of logical time, making it simple to write time-based aggregations. MillWheel was designed from the outset with fault tolerance and scalability in mind. In practice, we find that MillWheel’s unique combination of scalability, fault tolerance, and a versatile programming model lends itself to a wide variety of problems at Google.
(tags: millwheel google data-processing cep low-latency fault-tolerance scalability papers event-processing stream-processing)
GCHQ tapping at least 14 EU fiber-optic cables
Süddeutsche Zeitung (SZ) had already revealed in late June that the British had access to the cable TAT-14, which connects Germany with the USA, UK, Denmark, France and the Netherlands. In addition to TAT-14, the other cables that GCHQ has access to include Atlantic Crossing 1, Circe North, Circe South, Flag Atlantic-1, Flag Europa-Asia, SeaMeWe-3 and SeaMeWe-4, Solas, UK France 3, UK Netherlands-14, Ulysses, Yellow and the Pan European Crossing.
(tags: sz germany cables fiber-optic tapping snooping tat-14 eu politics gchq)
In historic vote, New Zealand bans software patents | Ars Technica
This is amazing news. Paying attention, Sean Sherlock?
A major new patent bill, passed in a 117-4 vote by New Zealand’s Parliament after five years of debate, has banned software patents. The relevant clause of the patent bill actually states that a computer program is “not an invention.” Some have suggested that was a way to get around the wording of the TRIPS intellectual property treaty, which requires patents to be “available for any inventions, whether products or processes, in all fields of technology.” […] One Member of Parliament who was deeply involved in the debate, Clare Curran, quoted several heads of software firms complaining about how the patenting process allowed “obvious things” to get patented and that “in general software patents are counter-productive.” Curran quoted one developer as saying, “It’s near impossible for software to be developed without breaching some of the hundreds of thousands of patents granted around the world for obvious work.” “These are the heavyweights of the new economy in software development,” said Curran. “These are the people that needed to be listened to, and thankfully, they were.”
(tags: new-zealand nz patents swpats law trips ip software-patents yay)
-
Docker is to deployment as Git is to development. Developers are able to leverage Git’s performance and flexibility when building applications. Git encourages experiments and doesn’t punish you when things go wrong: start your experiments in a branch, if things fall down, just git rebase or git reset. It’s easy to start a branch and fast to push it. Docker encourages experimentation for operations. Containers start quickly. Building images is a snap. Using another images as a base image is easy. Deploying whole images is fast, and last but not least, it’s not painful to rollback. Fast + flexible = deployments are about to become a lot more enjoyable.
(tags: docker deployment sysadmin ops devops vms vagrant virtualization containers linux git)
-
how LI solved a tricky graph-database-query latency problem with a set-cover algorithm
(tags: linkedin algorithms coding distributed-systems graph databases querying set-cover set replication)
How might the feds have snooped on Lavabit?
“I have been told that they cannot change your fundamental business practices,” said Callas, who unlike Levison was able to say SilentCircle has received no NSLs or court orders of any kind. “I presume that would mean things like getting SSL keys because that would mean they could impersonate your servers. That would be like setting up a store front that says your business name and putting [government agents] in your company uniforms.” Similarly, he added: “They cannot make changes to existing operating systems. They can’t make you change source code.” To which [Lavabit’s] Levison replied: “That was always my understanding, too. That’s why this is so important. Like [Callas] at SilentCircle said, the assumption has been that the government can’t force us to change our business practices like that and compromise that information. Like I said, I don’t hold those beliefs anymore.”
(tags: ars-technica security privacy nsls ssl silentcircle jon-callas crypto)
Lock-Based vs Lock-Free Concurrent Algorithms
An excellent post from Martin Thompson showing a new JSR166 concurrency primitive, StampedLock, compared against a number of alternatives in a simple microbenchmark. The most interesting thing for me is how much the lock-free, AtomicReference.compareAndSet()-based approach blows away all the lock-based approaches — even in the 1-reader-1-writer case. Its code is extremely simple, too: https://github.com/mjpt777/rw-concurrency/blob/master/src/LockFreeSpaceship.java
(tags: concurrency java threads lock-free locking compare-and-set cas atomic jsr166 microbenchmarks performance)
-
This is super-cool. ‘Network engineering no longer should be mundane tasks like conf, set interfaces fe-0/0/0 unit o family inet address 10.1.1.1/24. How does mindless CLI work translate to efficiently spent time ? What if you need to change 300 devices? What if you are writing it by hand? An error-prone waste of time. Juniper today announced Puppet support for their 12.2R3,5 JUNOS code. This is compatible with EX4200, EX4550, and QFX3500 switches. These are top end switches, but this start is directly aimed at their DC and enterprise devices. Initially, the manifest interactions offered are interface, layer 2 interface, vlan, port aggregation groups, and device names.’ Based on what I saw in the Network Automation team in Amazon, this is an amazing leap forward; it’d instantly render obsolete a bunch of horrific SSH-CLI automation cruft.
(tags: ssh cli automation networking networks puppet ops juniper cisco)
-
The future of the AWS command line tools is awscli, a single, unified, consistent command line tool that works with almost all of the AWS services. Here is a quick list of the services that awscli currently supports: Auto Scaling, CloudFormation, CloudSearch, CloudWatch, Data Pipeline, Direct Connect, DynamoDB, EC2, ElastiCache, Elastic Beanstalk, Elastic Transcoder, ELB, EMR, Identity and Access Management, Import/Export, OpsWorks, RDS, Redshift, Route 53, S3, SES, SNS, SQS, Storage Gateway, Security Token Service, Support API, SWF, VPC. Support for the following appears to be planned: CloudFront, Glacier, SimpleDB. The awscli software is being actively developed as an open source project on Github, with a lot of support from Amazon. You’ll note that the biggest contributors to awscli are Amazon employees with Mitch Garnaat leading. Mitch is also the author of boto, the amazing Python library for AWS.
-
Absolute genius from The Onion.
Those of us watching on Google Analytics saw the number of homepage visits skyrocket the second we put up that salacious image of Miley Cyrus dancing half nude on the VMA stage. But here’s where it gets great: We don’t just do a top story on the VMA performance and call it a day. No, no. We also throw in a slideshow called “Evolution of Miley,” which, for those of you who don’t know, is just a way for you to mindlessly click through 13 more photos of Miley Cyrus. And if we get 500,000 of you to do that, well, 500,000 multiplied by 13 means we can get 6.5 million page views on that slideshow alone. Throw in another slideshow titled “6 ‘don’t miss’ VMA moments,” and it’s starting to look like a pretty goddamned good Monday, numbers-wise. Also, there are two videos — one of the event and then some bullshit two-minute clip featuring our “entertainment experts” talking about the performance. Side note: Advertisers, along with you idiots, love videos. Another side note: The Miley Cyrus story was in the same top spot we used for our 9/11 coverage.
(tags: humor journalism cnn miley-cyrus vma news funny advertising ads)
Why wireless mesh networks won’t save us from censorship
I’m not saying mesh networks don’t work ever; the people in the wireless mesh community I’ve met are all great people doing fantastic work. What I am saying is that unplanned wireless mesh networks never work at scale. I think it’s a great problem to think about, but in terms of actual allocation of time and resources I think there are other, more fruitful avenues of action to fight Internet censorship.
(via Kragen)(tags: wireless censorship internet networking mesh mesh-networks organisation scaling wifi)
Information on Google App Engine’s recent US datacenter relocations – Google Groups
or, really, ‘why we had some glitches and outages recently’. A few interesting tidbits about GAE innards though (via Bill De hOra)
(tags: gae google app-engine outages ops paxos eventual-consistency replication storage hrd)
Newest YouTube user to fight a takedown is copyright guru Lawrence Lessig
This is lovely. Here’s hoping it provides a solid precedent.
Illegitimate or simply unnecessary copyright claims are, unfortunately, commonplace in the Internet era. But if there’s one person who’s probably not going to back down from a claim of copyright infringement, it’s Larry Lessig, one of the foremost writers and thinkers on digital-age copyright. [..] If Liberation Music was thinking they’d have an easy go of it when they demanded that YouTube take down a 2010 lecture of Lessig’s entitled “Open,” they were mistaken. Lessig has teamed up with the Electronic Frontier Foundation to sue Liberation, claiming that its overly aggressive takedown violates the DMCA and that it should be made to pay damages.
(tags: liberation-music eff copyright law larry-lessig fair-use)
-
Great account from Cliff Click describing an interest edge-case risk of using TCP without application-level acking, and how it caused a messy intermittent bug in production.
In all these failures the common theme is that the receiver is very heavily loaded, with many hundreds of short-lived TCP connections being opened/read/closed every second from many other machines. The sender sends a ‘SYN’ packet, requesting a connection. The sender (optimistically) sends 1 data packet; optimistic because the receiver has yet to acknowledge the SYN packet. The receiver, being much overloaded, is very slow. Eventually the receiver returns a ‘SYN-ACK’ packet, acknowledging both the open and the data packet. At this point the receiver’s JVM has not been told about the open connection; this work is all opening at the OS layer alone. The sender, being done, sends a ‘FIN’ which it does NOT wait for acknowledgement (all data has already been acknowledged). The receiver, being heavily overloaded, eventually times-out internally (probably waiting for the JVM to accept the open-call, and the JVM being overloaded is too slow to get around to it) – and sends a RST (reset) packet back…. wiping out the connection and the data. The sender, however, has moved on – it already sent a FIN & closed the socket, so the RST is for a closed connection. Net result: sender sent, but the receiver reset the connection without informing either the JVM process or the sender.
(tags: tcp protocols SO_LINGER FIN RST connections cliff-click ip)
The ultimate SO_LINGER page, or: why is my tcp not reliable
If we look at the HTTP protocol, there data is usually sent with length information included, either at the beginning of an HTTP response, or in the course of transmitting information (so called ‘chunked’ mode). And they do this for a reason. Only in this way can the receiving end be sure it received all information that it was sent. Using the shutdown() technique above really only tells us that the remote closed the connection. It does not actually guarantee that all data was received correctly by program B. The best advice is to send length information, and to have the remote program actively acknowledge that all data was received.
(tags: SO_LINGER sockets tcp ip networking linux protocols shutdown FIN RST)
NZ police affidavits show use of PRISM for surveillance of Kim “Megaupload” Dotcom
The discovery was made by blogger Keith Ng who wrote on his On Point blog (http://publicaddress.net/onpoint/ich-bin-ein-cyberpunk/) that the Organised and Financial Crime Agency New Zealand (OFCANZ) requested assistance from the Government Communications Security Bureau (GCSB), the country’s signals intelligence unit, which is charge of surveilling the Pacific region under the Five-Eyes agreement. A list of so-called selectors or search terms were provided to GCSB by the police [PDF, redacted] for the surveillance of emails and other data traffic generated by Dotcom and his Megaupload associates. ‘Selectors’ is the term used for the National Security Agency (NSA) XKEYSCORE categorisation system that Australia and New Zealand contribute to and which was leaked by Edward Snowden as part of his series of PRISM revelations. Some “selectors of interest” have been redacted out, but others such as Kim Dotcom’s email addresses, the mail proxy server used for some of the accounts and websites, remain in the documents.
So to recap; police investigating an entirely non-terrorism-related criminal case in NZ was given access to live surveillance traffic for surveillance of an NZ citizen. Scary stuff(tags: surveillance prism nsa new-zealand xkeyscore gcsb kim-dotcom piracy privacy data-retention megaupload filesharing)
“Scalable Eventually Consistent Counters over Unreliable Networks” [paper, pdf]
Counters are an important abstraction in distributed computing, and play a central role in large scale geo-replicated systems, counting events such as web page impressions or social network “likes”. Classic distributed counters, strongly consistent, cannot be made both available and partition-tolerant, due to the CAP Theorem, being unsuitable to large scale scenarios. This paper defines Eventually Consistent Distributed Counters (ECDC) and presents an implementation of the concept, Handoff Counters, that is scalable and works over unreliable networks. By giving up the sequencer aspect of classic distributed counters, ECDC implementations can be made AP in the CAP design space, while retaining the essence of counting. Handoff Counters are the first CRDT (Conflict-free Replicated Data Type) based mechanism that overcomes the identity explosion problem in naive CRDTs, such as G-Counters (where state size is linear in the number of independent actors that ever incremented the counter), by managing identities towards avoiding global propagation, and garbage collecting temporary entries. The approach used in Handoff Counters is not restricted to counters, being more generally applicable to other data types with associative and commutative operations.
(tags: pdf papers eventual-consistency counters distributed-systems distcomp cap-theorem ecdc handoff-counters crdts data-structures g-counters)
LMDB response to a LevelDB-comparison blog post
This seems like a good point to note about LMDB in general:
We state quite clearly that LMDB is read-optimized, not write-optimized. I wrote this for the OpenLDAP Project; LDAP workloads are traditionally 80-90% reads. Write performance was not the goal of this design, read performance is. We make no claims that LMDB is a silver bullet, good for every situation. It’s not meant to be – but it is still far better at many things than all of the other DBs out there that *do* claim to be good for everything.
How to avoid crappy ISP caches when viewing YouTube video
Must give this a try when I get home — I frequently have latency problems watching YT on my UPC connection, and I bet they have a crappily-managed, overloaded cache box on their network.
(tags: streaming youtube caching isps caches firewalls iptables hacks video networking)
How to configure ntpd so it will not move time backwards
The “-x” switch will expand the step/slew boundary from 128ms to 600 seconds, ensuring the time is slewed (drifted slowly towards the correct time at a max of 5ms per second) rather than “stepped” (a sudden jump, potentially backwards). Since slewing has a max of 5ms per second, time can never “jump backwards”, which is important to avoid some major application bugs (particularly in Java timers).
(tags: ntpd time ntp ops sysadmin slew stepping time-synchronization linux unix java bugs)
-
‘a Java port of Twitter’s Snowflake thrift service presented as an HTTP-based Dropwizard service’.
an HTTP-based service for generating unique ID numbers at high scale with some simple guarantees. supports returning ID numbers as: JSON and JSONP; Google’s Protocol Buffers; Plain text. At GE, we were more interested in the uncoordinated aspects of Snowflake than its throughput requirements, so HTTP was fine for our needs. We also exposed the core of Snowflake as an embeddable module so it can be directly integrated into our applications. We don’t have the guarantees that the Snowflake-Zookeeper integration was providing, but that was also acceptable to us. In places where we really needed high throughput, we leveraged the snowizard-core embeddable module directly.
Odd OSS license, though — BSDish? Containers and Docker: How Secure Are They?
pretty extensive article. (via Tony Finch)
(tags: via:fanf security containerization docker containers lxc linux ops)
-
I loved doing Groklaw, and I believe we really made a significant contribution. But even that turns out to be less than we thought, or less than I hoped for, anyway. My hope was always to show you that there is beauty and safety in the rule of law, that civilization actually depends on it. How quaint. If you have to stay on the Internet, my research indicates that the short term safety from surveillance, to the degree that is even possible, is to use a service like Kolab for email, which is located in Switzerland, and hence is under different laws than the US, laws which attempt to afford more privacy to citizens. I have now gotten for myself an email there, p.jones at mykolab.com in case anyone wishes to contact me over something really important and feels squeamish about writing to an email address on a server in the US. But both emails still work. It’s your choice. My personal decision is to get off of the Internet to the degree it’s possible. I’m just an ordinary person. But I really know, after all my research and some serious thinking things through, that I can’t stay online personally without losing my humanness, now that I know that ensuring privacy online is impossible. I find myself unable to write. I’ve always been a private person. That’s why I never wanted to be a celebrity and why I fought hard to maintain both my privacy and yours. Oddly, if everyone did that, leap off the Internet, the world’s economy would collapse, I suppose. I can’t really hope for that. But for me, the Internet is over. So this is the last Groklaw article. I won’t turn on comments. Thank you for all you’ve done. I will never forget you and our work together. I hope you’ll remember me too. I’m sorry I can’t overcome these feelings, but I yam what I yam, and I tried, but I can’t.
(tags: nsa surveillance privacy groklaw law us-politics data-protection snooping mail kolab)
Nelson’s Weblog: tech / bad / failure-of-encryption
One of the great failures of the Internet era has been giving up on end-to-end encryption. PGP dates back to 1991, 22 years ago. It gave us the technical means to have truly secure email between two people. But it was very difficult to use. And in 22 years no one has ever meaningfully made email encryption really usable. […] We do have SSL/HTTPS, the only real end-to-end encryption most of us use daily. But the key distribution is hopelessly centralized, authority rooted in 40+ certificates. At least 4 of those certs have been compromised by blackhat hackers in the past few years. How many more have been subverted by government agencies? I believe the SSL Observatory is the only way we’d know.
We do also have SSH. Maybe more services need to adopt that model?(tags: ssh ssl tls pki crypto end-to-end pgp security surveillance)
-
a new, and interesting, sketching algorithm, with a Java implementation:
Recordinality is unique in that it provides cardinality estimation like HLL, but also offers “distinct value sampling.” This means that Recordinality can allow us to fetch a random sample of distinct elements in a stream, invariant to cardinality. Put more succinctly, given a stream of elements containing 1,000,000 occurrences of ‘A’ and one occurrence each of ‘B’ – ‘Z’, the probability of any letter appearing in our sample is equal. Moreover, we can also efficiently store the number of times elements in our distinct sample have been observed. This can help us to understand the distribution of occurrences of elements in our stream. With it, we can answer questions like “do the elements we’ve sampled present in a power law-like pattern, or is the distribution of occurrences relatively even across the set?”
(tags: sketching coding algorithms recordinality cardinality estimation hll hashing murmurhash java)
-
A fantastic infographic explaining Australia’s Preferential Voting system, featuring Dennis the Election Koala and Ken the Voting Dingo
(tags: infographics funny pr voting australia images via:fp)
-
The man was unmoved. And so one of the more bizarre moments in the Guardian’s long history occurred – with two GCHQ security experts overseeing the destruction of hard drives in the Guardian’s basement just to make sure there was nothing in the mangled bits of metal which could possibly be of any interest to passing Chinese agents. “We can call off the black helicopters,” joked one as we swept up the remains of a MacBook Pro. Whitehall was satisfied, but it felt like a peculiarly pointless piece of symbolism that understood nothing about the digital age. We will continue to do patient, painstaking reporting on the Snowden documents, we just won’t do it in London. The seizure of Miranda’s laptop, phones, hard drives and camera will similarly have no effect on Greenwald’s work. The state that is building such a formidable apparatus of surveillance will do its best to prevent journalists from reporting on it. Most journalists can see that. But I wonder how many have truly understood the absolute threat to journalism implicit in the idea of total surveillance, when or if it comes – and, increasingly, it looks like “when”. We are not there yet, but it may not be long before it will be impossible for journalists to have confidential sources. Most reporting – indeed, most human life in 2013 – leaves too much of a digital fingerprint. Those colleagues who denigrate Snowden or say reporters should trust the state to know best (many of them in the UK, oddly, on the right) may one day have a cruel awakening. One day it will be their reporting, their cause, under attack. But at least reporters now know to stay away from Heathrow transit lounges.
(tags: nsa gchq surveillance spying snooping guardian reporters journalism uk david-miranda glenn-greenwald edward-snowden)
-
‘Sovereign is a set of Ansible playbooks that you can use to build and maintain’ your own GMail/Google calendar/etc. on a VPS. Some up-to-date hosting tips, basically
New Tweets per second record, and how | Twitter Blog
How Twitter scaled up massively in 3 years — replacing Ruby with the JVM, adopting SOA and custom sharding. Good summary post, looking forward to more techie details soon
(tags: twitter performance scalability jvm ruby soa scaling)
Massive Overblocking Hits Hundreds Of UK Sites | Techdirt
Customers of UK ISPs Virgin Media and Be Broadband found they were unable to access hundreds of sites, including the Radio Times and Zooniverse, due to a secret website-blocking court order from the Premier League. PC Pro believe that 3 other ISPs’ customers were also affected. According to customers reverse-engineering, it looks like the court order incorrectly demanded the blocking of “http-redirection-a.dnsmadeeasy.com”, a HTTP redirector operated by the DNS operator DNSMadeEasy.
The fact that the court could issue an order which didn’t see this coming and that the ISPs would act on it without checking that what they were doing was sensible is, in my opinion, extremely worrying.
(tags: overblocking censorship org uk sky be-broadband virgin-media dnsmadeeasy filtering premier-league false-positives isps)
Beating the CAP Theorem Checklist
‘Your ( ) tweet ( ) blog post ( ) marketing material ( ) online comment advocates a way to beat the CAP theorem. Your idea will not work. Here is why it won’t work:’ lovely stuff, via Bill De hOra
(tags: via:dehora funny cap cs distributed-systems distcomp networking partitions state checklists)
‘Sparrow: Scalable Scheduling for Sub-Second Parallel Jobs’ [tech report]
(tags: scheduling sparrow load-balancing algorithms distributed-systems distcomp papers)
From derelict to delightful: Art Tunnel Smithfield
I do like the Art Tunnel. Smithfield is a great demo of reclaiming Dublin’s increasing dereliction and I hope the DCC allow this to continue
(tags: smithfield d7 dublin ireland art art-tunnel reclamation derelict economy dcc)
How A ‘Deviant’ Philosopher Built Palantir, A CIA-Funded Data-Mining Juggernaut – Forbes
Palantir — the free-market state-surveillance data-retention nightmare. At the end of this slightly overenthusiastic puff piece we get to:
Katz-Lacabe wasn’t impressed. Palantir’s software, he points out, has no default time limits — all information remains searchable for as long as it’s stored on the customer’s servers. And its auditing function? “I don’t think it means a damn thing,” he says. “Logs aren’t useful unless someone is looking at them.” […] What if Palantir’s audit logs — its central safeguard against abuse — are simply ignored? Karp responds that the logs are intended to be read by a third party. In the case of government agencies, he suggests an oversight body that reviews all surveillance — an institution that is purely theoretical at the moment. “Something like this will exist,” Karp insists. “Societies will build it, precisely because the alternative is letting terrorism happen or losing all our liberties.” Palantir’s critics, unsurprisingly, aren’t reassured by Karp’s hypothetical court. Electronic Privacy Information Center activist Amie Stepanovich calls Palantir “naive” to expect the government to start policing its own use of technology. The Electronic Frontier Foundation’s Lee Tien derides Karp’s argument that privacy safeguards can be added to surveillance systems after the fact. “You should think about what to do with the toxic waste while you’re building the nuclear power plant,” he argues, “not some day in the future.”
(tags: palantir data-retention privacy surveillance state cia forbes andy-greenberg eff epic snooping)
London orders rubbish bins to stop collecting smartphone data
Good call.
AUTHORITIES IN LONDON’S financial district have ordered a company using high-tech rubbish bins to collect smartphone data from passers-by to cease its activities, and referred the firm to the privacy watchdog. The City of London Corporation, which manages the so-called “Square Mile” around St Paul’s Cathedral, said such data collection “needs to stop” until there could be a public debate about it.
(via Daragh O’Brien)(tags: via:dobrien privacy phones wifi mac-address data-protection data-retention renew london bins snooping sniffing)
The Irish State wishes to uninvent computers with new FOI Bill
Mark Coughlan noticed this:
The FOI body shall take reasonable steps to search for and extract the records to which the request relates, having due regard to the steps that would be considered reasonable if the records were held in paper format.
In other words, pretend that computerised database technology, extant since the 1960s, does not exist. Genius (via Simon McGarr)(tags: funny irish ireland foi open-data freedom computerisation punch-cards paper databases)
Hamlet is Banned in the British Library
Pretty hilarious account of the usual, run-of-the-mill overblocking in the British Library from last weekend:
I asked [the information desk] if they saw the problem, perhaps just the symbolism, of Hamlet being banned in the British Library. They shrugged. The IT department said there was nothing to be done, as it was only the British Library’s wifi service that was blocking Hamlet, and the British Library’s wifi service, they seemed sure, had nothing to do with the British Library. They were merely ships that passed in the night. Children crying to each other from either bank of an uncrossable river.
(tags: censorship filters overblocking hamlet shakespeare literature funny sad british-library blocking)
The algorithm for a perfectly balanced photo gallery – Summit Stories from Crispy Mountain
Nice application of a partitioning exhaustive search algorithm using dynamic programming (via Tom)
(tags: algorithms javascript python dynamic-programming partitioning images gallery)
-
An amazing Soviet map of the US economy from 1979. Wonderful piece of cold war memorabilia
(tags: cold-war ussr usa mapping maps soviet economy memorabilia)
Randomly Failed! The State of Randomness in Current Java Implementations
This would appear to be the paper which sparked off the drama around BitCoin thefts from wallets generated on Android devices:
The SecureRandom PRNG is the primary source of randomness for Java and is used e.g., by cryptographic operations. This underlines its importance regarding security. Some of fallback solutions of the investigated implementations [are] revealed to be weak and predictable or capable of being in?uenced. Very alarming are the defects found in Apache Harmony, since it is partly used by Android.
More on the BitCoin drama: https://bitcointalk.org/index.php?topic=271486.40 , http://bitcoin.org/en/alert/2013-08-11-android(tags: android java prng random security bugs apache-harmony apache crypto bitcoin papers)
The Getty Museum offers a huge chunk of their collection for free use
We’ve launched the Open Content Program to share, freely and without restriction, as many of the Getty’s digital resources as possible. The initial focus of the Open Content Program is to make available all images of public domain artworks in the Getty’s collections. Today we’ve taken a first step toward this goal by making roughly 4,600 high-resolution images of the Museum’s collection free to use, modify, and publish for any purpose. Why open content? Why now? The Getty was founded on the conviction that understanding art makes the world a better place, and sharing our digital resources is the natural extension of that belief. This move is also an educational imperative. Artists, students, teachers, writers, and countless others rely on artwork images to learn, tell stories, exchange ideas, and feed their own creativity. In its discussion of open content, the most recent Horizon Report, Museum Edition stated that “it is now the mark—and social responsibility—of world-class institutions to develop and share free cultural and educational resources.” I agree wholeheartedly.
(tags: getty art via:tupp_ed open-content free images pictures paintings museums)
The NSA Is Commandeering the Internet – Bruce Schneier
You, an executive in one of those companies, can fight. You’ll probably lose, but you need to take the stand. And you might win. It’s time we called the government’s actions what it really is: commandeering. Commandeering is a practice we’re used to in wartime, where commercial ships are taken for military use, or production lines are converted to military production. But now it’s happening in peacetime. Vast swaths of the Internet are being commandeered to support this surveillance state. If this is happening to your company, do what you can to isolate the actions. Do you have employees with security clearances who can’t tell you what they’re doing? Cut off all automatic lines of communication with them, and make sure that only specific, required, authorized acts are being taken on behalf of government. Only then can you look your customers and the public in the face and say that you don’t know what is going on — that your company has been commandeered.
(tags: nsa america politics privacy data-protection data-retention law google microsoft security bruce-schneier)
We are the Operations team at Etsy. Ask us anything! : IAmA
great AMA from Etsy ops staff (via Nelson)
(tags: etsy reddit devops ops architecture ama via:nelson)
Building a panopticon: The evolution of the NSA’s XKeyscore
This is an amazing behind-the-scenes look at the architecture of XKeyscore, and how it evolved from an earlier large-scale packet interception system, Narus’ Semantic Traffic Analyzer. XKeyscore is a federated, distributed system, with distributed packet-capture agents running on Linux, built with protocol-specific plugins, which write 3 days of raw packet data, and 30 days of intercept metadata, to local buffer stores. Central queries are then ‘distributed across all of the XKeyscore tap sites, and any results are returned and aggregated’. Dunno about you, but this is pretty much how I would have built something like this, IMO….
(tags: panopticon xkeyscore nsa architecture scalability packet-capture narus sniffing snooping interception lawful-interception li tapping)
Police may block recording with Apple patent
Creeptastic, Apple.
Apple has patented a piece of technology which would allow government and police to block transmission of information, including video and photographs, from any public gathering or venue they deem “sensitive”, and “protected from externalities.” In other words, these powers will have control over what can and cannot be documented on wireless devices during any public event. And while the company says the affected sites are to be mostly cinemas, theaters, concert grounds and similar locations, Apple Inc. also says “covert police or government operations may require complete ‘blackout’ conditions.”
(tags: apple iphone via:devore creepy police photos recording remote-control phones blackout)
Ivan Risti?: Defending against the BREACH attack
One interesting response to this HTTPS compression-based MITM attack:
The award for least-intrusive and entirely painless mitigation proposal goes to Paul Querna who, on the httpd-dev mailing list, proposed to use the HTTP chunked encoding to randomize response length. Chunked encoding is a HTTP feature that is typically used when the size of the response body is not known in advance; only the size of the next chunk is known. Because chunks carry some additional information, they affect the size of the response, but not the content. By forcing more chunks than necessary, for example, you can increase the length of the response. To the attacker, who can see only the size of the response body, but not anything else, the chunks are invisible. (Assuming they’re not sent in individual TCP packets or TLS records, of course.) This mitigation technique is very easy to implement at the web server level, which makes it the least expensive option. There is only a question about its effectiveness. No one has done the maths yet, but most seem to agree that response length randomization slows down the attacker, but does not prevent the attack entirely. But, if the attack can be slowed down significantly, perhaps it will be as good as prevented.
(tags: mitm attacks hacking security compression http https protocols tls ssl tcp chunked-encoding apache)
Totoro Isn’t All Cute. For Some, He’s the God of Death.
“Everyone, do not worry,” read the Studio Ghibli statement. “There’s absolutely no truth or configuration that Totoro is the God of Death or that Mei is dead in My Neighbor Totoro.”
(tags: totoro studio-ghibli death morbid japan film movies urban-legends alternate plot)
Hogan describes bin charge increases as ‘opportunistic’ – Environmental News | The Irish Times
LOL Greyhound.
Greyhound Recycling last month announced increases of 50 cents a month for customers on a flat monthly charge, 50 cents for each black bin collection for customers who pay by the lift and two cents a kilo for customers who pay by weight only. In a letter to customers, it described the levy as “tax imposed by the Government of Ireland on the people of Ireland”. However, following a complaint to the [National Consumer Agency] that the by-weight increase was 76 per cent more than the [government landfill levy] increase, Greyhound reduced the charge to an additional one cent a kilo.
(tags: greyhound ireland dublin rubbish recycling consumer ripoffs tax)
IrelandOffline broadband availability map
Marking the locations of broadband options in your area, along with VDSL cabinets, local exchanges, and wireless ISP coverage, and the landing sites of submarine cables (presumably from submarinecablemap.com data)
(tags: irelandoffline cables network internet ireland coverage wisps vdsl broadband)
Filters ‘not a silver bullet’ that will stop perverts, warns Interpol chief – Independent.ie
Sunday Independent interview with Interpol assistant director Mick Moran:
Moran spoke out after child welfare organisations here called on the Government to follow the UK’s example by placing anti-pornography filters on Irish home broadband connections. The Irish Society for the Prevention of Cruelty to Children argued that pornography was damaging to young children and should be removed from their line of sight. But Moran warned this would only lull parents into a false sense of security. “If we imagine the access people had to porn in the past – that access is now complete and total. They have access to the most horrific material out there. We now need to focus on parental responsibility about how kids are using the internet.”
(tags: mick-moran cam interpol policing ispcc filtering parenting children broadband)
-
Gil Tene raises an extremely good point about load testing, high-percentile response-time measurement, and behaviour when testing a system under load:
I’ve been harping for a while now about a common measurement technique problem I call “Coordinated Omission” for a while, which can often render percentile data useless. […] I believe that this problem occurs extremely frequently in test results, but it’s usually hard to deduce it’s existence purely from the final data reported. But every once in a while, I see test results where the data provided is enough to demonstrate the huge percentile-misreporting effect of Coordinated Omission based purely on the summary report. I ran into just such a case in Attila’s cool posting about log4j2’s truly amazing performance, so I decided to avoid polluting his thread with an elongated discussion of how to compute 99.9%’ile data, and started this topic here. That thread should really be about how cool log4j2 is, and I’m certain that it really is cool, even after you correct the measurements. […] Basically, I think that the 99.99% observation computation is wrong, and demonstrably (using the data in the graph data posted) exhibits the classic “coordinated omission” measurement problem I’ve been preaching about. This test is not alone in exhibiting this, and there is nothing to be ashamed of when you find yourself making this mistake. I only figured it out after doing it myself many many times, and then I noticed that everyone else seems to also be doing it but most of them haven’t yet figured it out. In fact, I run into this issue so often in percentile reporting and load testing that I’m starting to wonder if coordinated omission is there in 99.9% of latency tests ;-)
(tags: measurement testing latency load-testing gil-tene coordinated-omission validity log4j percentiles)
Xerox scanners/photocopiers randomly alter numbers in scanned documents · D. Kriesel
Pretty major Xerox fail: photocopied/scanned docs are found to have replaced the digit ‘6’ with ‘8’, due to a poor choice of compression techniques:
Several mails I got suggest that the xerox machines use JBIG2 for compression. This algorithm creates a dictionary of image patches it finds “similar”. Those patches then get reused instead of the original image data, as long as the error generated by them is not “too high”. Makes sense. This also would explain, why the error occurs when scanning letters or numbers in low resolution (still readable, though). In this case, the letter size is close to the patch size of JBIG2, and whole “similar” letters or even letter blocks get replaced by each other.
(tags: jbig2 compression xerox photocopying scanning documents fonts arial image-compression images)
The 1940s origins of Whataboutery
The exchange is indicative of a rhetorical strategy known as ‘whataboutism’, which occurs when officials implicated in wrongdoing whip out a counter-example of a similar abuse from the accusing country, with the goal of undermining the legitimacy of the criticism itself. (In Latin, this rhetorical defense is called tu quoque, or “you, too.”)
(tags: history language whataboutism whataboutery politics 1940s russia ussr)
-
A highly-available key value store for shared configuration and service discovery. etcd is inspired by zookeeper and doozer, with a focus on: Simple: curl’able user facing API (HTTP+JSON); Secure: optional SSL client cert authentication; Fast: benchmarked 1000s of writes/s per instance; Reliable: Properly distributed using Raft; Etcd is written in go and uses the raft consensus algorithm to manage a highly availably replicated log.
One of the core components of CoreOS — http://coreos.com/ .(tags: configuration distributed raft ha doozer zookeeper go replication consensus-algorithm etcd coreos)
_In Search of an Understandable Consensus Algorithm_, Diego Ongaro and John Ousterhout, Stanford
Raft is a consensus algorithm for managing a replicated log. It produces a result equivalent to Paxos, and it is as efficient as Paxos, but its structure is different from Paxos; this makes Raft more understandable than Paxos and also provides a better foundation for building practical systems. In order to enhance understandability, Raft separates the key elements of consensus, such as leader election and log replication, and it enforces a stronger degree of coherency to reduce the number of states that must be considered. Raft also includes a new mechanism for changing the cluster membership, which uses overlapping majorities to guarantee safety. Results from a user study demonstrate that Raft is easier for students to learn than Paxos.
(tags: distributed algorithms paxos raft consensus-algorithms distcomp leader-election replication clustering)
Extract from 1973 HM Treasury document concerning post-nuclear-attack responses
‘Extract from 1973 HM Treasury document concerning post-nuclear-attack monetary policy’ includes this amazing snippet:
[Contingency] …(d) a total nuclear attack employing high power missiles which would destroy all but a small percentage of the UK population and almost all physical assets or civilised life. […] As for (d), the money policy would of course be absurdly unrealistic for the few surviving administrators and politicians as they struggled to organise food and shelter for the tiny bands of surviving able-bodied and the probably larger number of sick and dying. Most of the other departments contingency planning might also be irrelevant in such a situation. Within a fairly short time the survivors would evacuate the UK and try to find some sort of life in less-effected countries (southern Ireland?).
Hey, at least they were considering these scenarios. (via Charlie Stross)(tags: nuclear attack contingency government monetary policy uk ireland history 1960s via:cstross insane fallout)
WhatClinic.com’s zombie recruitment video. We want your brains…
BRAAAAAAINS
(tags: whatclinic braaaaaains zombies funny video recruitment)
-
A very tasty-looking guac recipe, from h2g market veteran Lily Ramirez-Foran — her family’s traditional one. I like the addition of pomegranate seeds
(tags: guacamole avocados pomegranate recipes lily-ramirez-foran food h2g)
RA Forum: Button Factory – August 14th Simonetti (Goblin) Horror Project
LIVE – for the first time ever in Ireland, Claudio Simonetti (Goblin) & band will perform the classics of horror movie scores by seminal Italian progressive rock band Goblin, Simonetti himself and possibly one or two curve-balls ! Horror rock maestro Claudio Simonetti will fulfill fans’ dreams and nightmares as the band perform the notably eerie soundtracks from Suspriria, Tenebre, Dawn of the Dead, Creepers, Demons and more! This epic show will also feature an intense A/V screening element featuring the electric scenes from some of these revered classics of horror and giallo.
Python Infrastructure Status – SSL Verification Errors on PyPI
There appears to be a problem affecting a number of users where SSL verification errors will be shown saying “pypi.python.org” does not match “addvocate.com”. As Best we can tell this appears to be related to the ISP. It seems to be affecting folks using O2 or O2 related companies. We’ve also reports of it affecting people using Free. Cause appears to be one of the IP addresses returned in the Geo DNS for Europe returning a certificate for addvocate.com. It’s not clear at this time *why* that IP address is returning a certificate for addvocate.com.
Turned out to be a routing loop in the fast.ly London POP (via Mick Twomey)(tags: via:micktwomey o2 censorship filtering internet ssl tls pypi python geodns pki)
“Toxic” behaviour in games is largely from “usually good” people
Only 5% of toxic behavior comes from toxic people; 77% of it comes from people who are usually good. That finding has all sorts of implications for how to stop toxic behavior in an online community. It’s not enough to just ban the jerks; good people have bad days too. Instead you have to teach the whole community what the community standards are. And quickly identify people who are having a bad day, intervene before their toxicity infects too many other people.
Great post by Nelson.(tags: gaming toxic bad-behaviour trolls abuse online games league-of-legends)
-
OpenDNS’s simple DNS-based blocking of dodgy content. Will need to set this up on the home router now that the kids are surfing…
(tags: opendns dns blocking filtering home porn familyshield)
Mail from the (Velvet) Cybercrime Underground
Brian Krebs manages to thwart an attempted framing for possession of Silk Road heroin. bloody hell
(tags: silk-road drugs bitcoin ecommerce brian-krebs crime framed cybercrime russia scary law-enforcement)
Clare dolphin attacks fourth swimmer in a month as Dusty protects her patch
Dusty the Dolphin has gone bad!
Locals say the three-metre long mammal has been responsible for injuring a number of people over the past two years, with several of those being hospitalised with significant injuries. She struck a 40-year-old woman in the abdomen earlier this month. In response, lifeguards now fly the red danger flag any time the dolphin enters the area. The Irish Whale and Dolphin Group has also erected warning posters at Doolin pier. IWDG coordinator Dr Simon Berrow said: “It is our policy to discourage people swimming with whales and dolphins in Ireland. “We’ve drafted a poster recommending people do not swim with Dusty, but if they must, then they should respect her as a wild dolphin and not grab, lunge or chase after her. If she shows aggressive behaviour or is boisterous they should leave the water.”
(tags: dusty dolphins wildlife nature fanore county-clare ireland swimming doolin animals)