Justin's Linklog

Fauxdelma Healy Eames

you really know you've made it as an inept Irish politician when Panti Bliss gets dressed up in her most senatorial wig to take the mickey out of you

(tags: funny comedy fidelma-healy-eames politics ireland social-media inept youtube video)
Confusion reigns over three “hijacked” ccTLDs

This kind of silliness is only likely to increase as the number of TLDs increases (and they become more trivial).
What seems to be happening here is that [two companies involved] have had some kind of dispute, and that as a result the registrants and the reputation of three countries’ ccTLDs have been harmed. Very amateurish.

(tags: tlds domains via:fanf amateur-hour dns cctlds registrars adamsnames)
Riakking Complex Data Types

interesting details about Riak's support for secondary indexes. Not quite SQL, but still more powerful than plain old K/V storage (via dehora)

(tags: via:dehora riak indexes storage nosql key-value-stores 2i range-queries)
Metric Collection and Storage with Cassandra | DataStax

DataStax' documentation on how they store TSD data in Cass. Pretty generic

(tags: datastax nosql metrics analytics cassandra tsd time-series storage)
Jeff Dean's list of "Numbers Everyone Should Know"

from a 2007 Google all-hands, the list of typical latency timings from ranging from an L1 cache reference (0.5 nanoseconds) to a CA->NL->CA IP round trip (150 milliseconds).

(tags: performance latencies google jeff-dean timing caches speed network zippy disks via:kellabyte)
Parquet

'a columnar storage format that supports nested data', from Twitter and Cloudera, encoded using Apache Thrift in a Dremel-based record shredding and assembly algorithm. Pretty crazy stuff:
We created Parquet to make the advantages of compressed, efficient columnar data representation available to any project in the Hadoop ecosystem. Parquet is built from the ground up with complex nested data structures in mind, and uses the record shredding and assembly algorithm described in the Dremel paper. We believe this approach is superior to simple flattening of nested name spaces. Parquet is built to support very efficient compression and encoding schemes. Multiple projects have demonstrated the performance impact of applying the right compression and encoding scheme to the data. Parquet allows compression schemes to be specified on a per-column level, and is future-proofed to allow adding more encodings as they are invented and implemented. Parquet is built to be used by anyone. The Hadoop ecosystem is rich with data processing frameworks, and we are not interested in playing favorites. We believe that an efficient, well-implemented columnar storage substrate should be useful to all frameworks without the cost of extensive and difficult to set up dependencies.

(tags: twitter cloudera storage parquet dremel columns record-shredding hadoop marshalling columnar-storage compression data)

Archives

Links for 2013-03-12