Justin's Linklog

Universal Music Group adding audible "watermarks"

including on paid-for, losslessly-compressed digital audio music files:
Why isn't UMG's watermark talked about more? Maybe people think the audio quality problems are due to some kind of lossy compression, as I did, and ignore it completely, or blame the streaming service/distributor. The problem here is that the UMG watermark degrades the audio to about the equivalent of a 96 kbit MP3. My guess is that if consumers were informed about what is going on, they would care. Especially those who pay full retail price for digital downloads advertised as lossless audio.

(tags: lame audio drm media music umg universal watermarks noise consumer mp3)
“Call Me Maybe: Carly Rae Jepsen and the Perils of Network Partitions”

Aphyr's epic RICON talk, exploring distributed-database failure modes through music. and what a lot of fail there is! Bottom line: CRDTs win

(tags: crdts data-structures storage ricon apyhr failures network partitions puns slides)
Cloudera Impala 1.0: It’s Here, It’s Real, It’s Already the Standard for SQL on Hadoop

we are proud to announce the first production drop of Impala, which reflects feedback from across the user community based on multiple types of real-world workloads. Just as a refresher, the main design principle behind Impala is complete integration with the Hadoop platform (jointly utilizing a single pool of storage, metadata model, security framework, and set of system resources). This integration allows Impala users to take advantage of the time-tested cost, flexibility, and scale advantages of Hadoop for interactive SQL queries, and makes SQL a first-class Hadoop citizen alongside MapReduce and other frameworks. The net result is that all your data becomes available for interactive analysis simultaneously with all other types of processing, with no ETL delays needed.
Along with some great benchmark numbers against Hive. nifty stuff

(tags: cloudera impala sql querying etl olap hadoop analytics business-intelligence reports)
Alex Feinberg's response to Damien Katz' anti-Dynamoish/pro-Couchbase blog post

Insightful response, worth bookmarking. (the original post is at http://damienkatz.net/2013/05/dynamo_sure_works_hard.html ).
while you are saving on read traffic (online reads only go to the master), you are now decreasing availability (contrary to your stated goal), and increasing system complexity. You also do hurt performance by requiring all writes and reads to be serialized through a single node: unless you plan to have a leader election whenever the node fails to meet a read SLA (which is going to result a disaster -- I am speaking from personal experience), you will have to accept that you're bottlenecked by a single node. With a Dynamo-style quorum (for either reads or writes), a single straggler will not reduce whole-cluster latency. The core point of Dynamo is low latency, availability and handling of all kinds of partitions: whether clean partitions (long term single node failures), transient failures (garbage collection pauses, slow disks, network blips, etc...), or even more complex dependent failures. The reality, of course, is that availability is neither the sole, nor the principal concern of every system. It's perfect fine to trade off availability for other goals -- you just need to be aware of that trade off.

(tags: cap distributed-databases databases quorum availability scalability damien-katz alex-feinberg partitions network dynamo riak voldemort couchbase)
CAP Confusion: Problems with ‘partition tolerance’

Another good clarification about CAP which resurfaced during last week's discussion:
So what causes partitions? Two things, really. The first is obvious – a network failure, for example due to a faulty switch, can cause the network to partition. The other is less obvious, but fits with the definition [...]: machine failures, either hard or soft. In an asynchronous network, i.e. one where processing a message could take unbounded time, it is impossible to distinguish between machine failures and lost messages. Therefore a single machine failure partitions it from the rest of the network. A correlated failure of several machines partitions them all from the network. Not being able to receive a message is the same as the network not delivering it. In the face of sufficiently many machine failures, it is still impossible to maintain availability and consistency, not because two writes may go to separate partitions, but because the failure of an entire ‘quorum’ of servers may render some recent writes unreadable.
(sorry, catching up on old interesting things posted last week...)

(tags: failure scalability network partitions cap quorum distributed-databases fault-tolerance)
Big-O Algorithm Complexity Cheat Sheet

nicely done, very readable

(tags: algorithms reference cheat-sheet big-o complexity estimation coding)
Did Conroy’s AFP filter wrongly block 1,200 sites?

Looks like many Aussie network operators were legally required to block 1,200 websites (presumably, one target and 1199 false positives), in secret. Quoting http://lists.ausnog.net/pipermail/ausnog/2013-April/017993.html : "You get a notice to block. You block or either get fined, go to jail or lose your carrier licence. It is a blunt instrument and it is a condition of being at 'the big boys table' i.e. you're a carrier or a carriage service provider."

(tags: australia law afp filtering internet blocking censorship secret eff)
Making sense out of BDB-JE fast stats

good info on the system metrics recorded by BDB-JE's EnvironmentStats code, particularly where cache and cleaner activity are concerned. Particularly useful for Voldemort

(tags: voldemort caching bdb bdb-je storage tuning ops metrics reference)
Approximate Heavy Hitters -The SpaceSaving Algorithm

nice, readable intro to SpaceSaving (which I've linked to before) -- a simple stream-processing cardinality top-K estimation algorithm with bounded error.

(tags: algorithms coding space-saving cardinality streams stream-processing estimation)
Darach Ennis on CEP, Stream Processing, Messaging, OOP vs Functional Architecture

good interview -- lots of food for thought!

(tags: darach-ennis stream-processing messaging architecture qcon interviews erlang cep realtime rx comet events)

Archives

Links for 2013-05-14