Film4 Presents A Season Of Studio Ghibli Classics
hooray! Plenty of dubs, too, which is handy when you have little kids like mine ;)
(tags: studio-ghibli film4 movies anime animation to-watch tv)
Category: Uncategorized
The first pillar of agile sysadmin: We alert on what we draw
'One of [the] purposes of monitoring systems was to provide data to allow us, as engineers, to detect patterns, and predict issues before they become production impacting. In order to do this, we need to be capturing data and storing it somewhere which allows us to analyse it. If we care about it - if the data could provide the kind of engineering insight which helps us to understand our systems and give early warning - we should be capturing it. ' .... 'There are a couple of weaknesses in [Nagios' design]. Assuming we’ve agreed that if we care about a metric enough to want to alert on it then we should be gathering that data for analysis, and graphing it, then we already have the data upon which to base our check. Furthermore, this data is not on the machine we’re monitoring, so our checks don’t in any way add further stress to that machine.' I would add that if we are alerting on a different set of data from what we collect for graphing, then using the graphs to investigate an alarm may run into problems if they don't sync up.
(tags: devops monitoring deployment production sysadmin ops alerting metrics)
JPL Institutional Coding Standard for the Java Programming Language
From JPL's Laboratory for Reliable Software (LaRS). Great reference; there's some really useful recommendations here, and good explanations of familiar ones like "prefer composition over inheritance". Many are supported by FindBugs, too. Here's the full list:
compile with checks turned on; apply static analysis; document public elements; write unit tests; use the standard naming conventions; do not override field or class names; make imports explicit; do not have cyclic package and class dependencies; obey the contract for equals(); define both equals() and hashCode(); define equals when adding fields; define equals with parameter type Object; do not use finalizers; do not implement the Cloneable interface; do not call nonfinal methods in constructors; select composition over inheritance; make fields private; do not use static mutable fields; declare immutable fields final; initialize fields before use; use assertions; use annotations; restrict method overloading; do not assign to parameters; do not return null arrays or collections; do not call System.exit; have one concept per line; use braces in control structures; do not have empty blocks; use breaks in switch statements; end switch statements with default; terminate if-else-if with else; restrict side effects in expressions; use named constants for non-trivial literals; make operator precedence explicit; do not use reference equality; use only short-circuit logic operators; do not use octal values; do not use floating point equality; use one result type in conditional expressions; do not use string concatenation operator in loops; do not drop exceptions; do not abruptly exit a finally block; use generics; use interfaces as types when available; use primitive types; do not remove literals from collections; restrict numeric conversions; program against data races; program against deadlocks; do not rely on the scheduler for synchronization; wait and notify safely; reduce code complexity
(tags: nasa java reference guidelines coding-standards jpl reliability software coding oo concurrency findbugs bugs)
KDE's brush with git repository corruption: post-mortem
a barely-averted disaster... phew.
while we planned for the case of the server losing a disk or entirely biting the dust, or the total loss of the VM’s filesystem, we didn’t plan for the case of filesystem corruption, and the way the corruption affected our mirroring system triggered some very unforeseen and pathological conditions. [...] the corruption was perfectly mirrored... or rather, due to its nature, imperfectly mirrored. And all data on the anongit [mirrors] was lost.
One risk demonstrated: by trusting in mirroring, rather than a schedule of snapshot backups covering a wide time range, they nearly had a major outage. Silent data corruption, and code bugs, happen -- backups protect against this, but RAID, replication, and mirrors do not. Another risk: they didn't have a rate limit on project-deletion, which resulted in the "anongit" mirrors deleting their (safe) data copies in response to the upstream corruption. Rate limiting to sanity-check automated changes is vital. What they should have had in place was described by the fix: 'If a new projects file is generated and is more than 1% different than the previous file, the previous file is kept intact (at 1500 repositories, that means 15 repositories would have to be created or deleted in the span of three minutes, which is extremely unlikely).'(tags: rate-limiting case-studies post-mortems kde git data-corruption risks mirroring replication raid bugs backups snapshots sanity-checks automation ops)
-
Metrics rule the roost -- I guess there's been a long history of telemetry in space applications.
To make software more visible, you need to know what it is doing, he said, which means creating "metrics on everything you can think of".... Those metrics should cover areas like performance, network utilization, CPU load, and so on. The metrics gathered, whether from testing or real-world use, should be stored as it is "incredibly valuable" to be able to go back through them, he said. For his systems, telemetry data is stored with the program metrics, as is the version of all of the code running so that everything can be reproduced if needed. SpaceX has programs to parse the metrics data and raise an alarm when "something goes bad". It is important to automate that, Rose said, because forcing a human to do it "would suck". The same programs run on the data whether it is generated from a developer's test, from a run on the spacecraft, or from a mission. Any failures should be seen as an opportunity to add new metrics. It takes a while to "get into the rhythm" of doing so, but it is "very useful". He likes to "geek out on error reporting", using tools like libSegFault and ftrace. Automation is important, and continuous integration is "very valuable", Rose said. He suggested building for every platform all of the time, even for "things you don't use any more". SpaceX does that and has found interesting problems when building unused code. Unit tests are run from the continuous integration system any time the code changes. "Everyone here has 100% unit test coverage", he joked, but running whatever tests are available, and creating new ones is useful. When he worked on video games, they had a test to just "warp" the character to random locations in a level and had it look in the four directions, which regularly found problems. "Automate process processes", he said. Things like coding standards, static analysis, spaces vs. tabs, or detecting the use of Emacs should be done automatically. SpaceX has a complicated process where changes cannot be made without tickets, code review, signoffs, and so forth, but all of that is checked automatically. If static analysis is part of the workflow, make it such that the code will not build unless it passes that analysis step. When the build fails, it should "fail loudly" with a "monitor that starts flashing red" and email to everyone on the team. When that happens, you should "respond immediately" to fix the problem. In his team, they have a full-size Justin Bieber cutout that gets placed facing the team member who broke the build. They found that "100% of software engineers don't like Justin Bieber", and will work quickly to fix the build problem.
(tags: spacex dev coding metrics deplyment production space justin-bieber)
-
'the story of ketchup is a story of globalization and centuries of economic domination by a world superpower. But the superpower isn't America, and the century isn't ours. Ketchup's origins in the fermented sauces of China and Southeast Asia mean that those little plastic packets under the seat of your car are a direct result of Chinese and Asian domination of a single global world economy for most of the last millenium.'
(tags: ketchup china nam-pla food etymology condiments history trade)
-
now this is a neat trick -- having been stuck having to flip to spares and do other antics while a long-running heap dump took place, this is a winner.
Dumping a JVM’s heap is an extremely useful tool for debugging problems with a J2EE application. Unfortunately, when a JVM explodes, using the standard jmap tool can take an inordinate amount of time to execute for lots of different reasons. This leads to extended downtime when a heap dump is attempted and even then, jmap regularly fails. This blog post is intended to outline an alternate method using [gdb] to achieve a heap dump that only requires mere seconds of additional downtime allowing the slow jmap process to happen once the application is back in service.
(tags: heap-dump gdb heap jvm java via:peakscale gcore core core-dump debugging)
-
'Edition has a ‘design for life’ philosophy - we think that unique designer-made items can be a part of our everyday lives without costing the earth. We stock affordable, contemporary and functional products (mostly handmade), including jewellery, home-ware, accessories, art and toys. Every item has been carefully selected and are all designed here in Ireland.'
BBC Test Card image (1080p HD version)
via colinwh. The de-facto standard HTPC desktop background
(tags: htpc desktops hd 1080p bbc test-card tv scary-clowns)
-
Neil Fraser visits a school in Vietnam, and investigates their computer science curriculum. They are doing an incredible job, it looks like -- very impressive!
(tags: vietnam programming education cs computer-science schools coding children)
TOSEC: Commodore C64 (2012-04-23) : Free Download & Streaming : Internet Archive
A massive, 6.5GB collection of C64 history.
There are an astounding 134,000+ disk, cassette and documentation items in this Commodore 64 collection, including games, demos, cractros, and compilations.
(tags: commodore c64 history computing software demos archive)
By the numbers: How Google Compute Engine stacks up to Amazon EC2
Scalr's thoughts on Google's EC2 competitor.
with Google Compute Engine, AWS has a formidable new competitor in the public cloud space, and we’ll likely be moving some of Scalr’s production workloads from our hybrid aws-rackspace-softlayer setup to it when it leaves beta. There’s a strong technical case for migrating heavy workloads to GCE, and I’ll be grabbing popcorn to eagerly watch as the battle unfolds between the giants.
-
realtime collaboration API. nifty! but can it collaborate on a per-app shared doc, or does it require that the app user auth to Google and access their own docs?
(tags: collaboration api realtime google javascript)
Percona Playback's tcpdump plugin
Capture MySQL traffic via tcpdump, tee it over the network to replay against a second database. Even supports query execution times and pauses between queries to playback the same load level
(tags: tcpdump production load-testing testing staging tee networking netcat percona replay mysql)
Riak CS is now ASL2 open source
'Organizations and users can now access the source code on Github and download the latest packages from the downloads page. Also, today, we announced that Riak CS Enterprise is now available as commercial licensed software, featuring multi-datacenter replication technology and 24×7 Basho customer support.'
(tags: riak riak-cs nosql storage basho open-source github apache asl2)
Hadoop Operations at LinkedIn [slides]
another good Hadoop-at-scale presentation, from LI this time
Sift Science says it can sniff out cyber fraud — before it gets expensive
Great idea for a startup. This stuff is complex, right in the heart of every company's ordering pipeline, and I can see a lot of customers for this
(tags: sift-science anti-fraud fraud b2b b2c ecommerce startups aws)
What would you do: Part 2, the Island of Surpyc
Amazing. 'Cyprus Bailout Choose Your Own Adventure', basically
(tags: cyoa adventure dice games cyprus politics eu bailouts ecb banking troika)
Running the Largest Hadoop DFS Cluster
Facebook's 1PB Hadoop cluster. features improved NameNode availability work and 4 levels of data aging, with reduced replication and Reed-Solomon RAID encoding for colder data ages
(tags: aging data facebook hadoop hdfs reed-solomon error-correction replication erasure-coding)
The America Invents Act: Fighting Patent Trolls With "Prior Art"
Don Marti makes some suggestions regarding the America Invents Act: record your work's timeline; use the new Post-Grant Challenging process; and use the new "prior user" defence, which lets you rely on your own non-public uses.
many of the best practices for tracking new versions of software and other digital assets can also help protect you against patent trolls. It’s a good time to talk to your lawyer about a defensive strategy, and to connect that strategy to your version control and deployment systems to make sure you’re collecting and retaining all of the information that could help you under this new law.
(tags: swpats patent-trolls patenting us prior-art)
Announcing the Voldemort 1.3 Open Source Release
new release from LinkedIn -- better p90/p99 PUT performance, improvements to the BDB-JE storage layer, massively-improved rebalance performance
(tags: voldemort linkedin open-source bdb nosql)
Data Corruption To Go: The Perils Of sql_mode = NULL « Code as Craft
bloody hell. A load of cases where MySQL will happily accommodate all sorts of malformed and invalid input -- thankfully with fixes
(tags: mysql input corrupt invalid validation coding databases sql)
-
a high-performance C server which is used to expose bloom filters and operations over them to networked clients. It uses a simple ASCII protocol which is human readable, and similar to memcached.
(via Tony Finch)(tags: via:fanf memcached bloomd open-source bloom-filters)
Thoughts on configuration file complexity
some interesting thoughts on the old "Turing complete configuration language" question
(tags: configuration turing-complete programming ops testing)
From a monolithic Ruby on Rails app to the JVM
How Soundcloud have ditched the monolithic Rails for nimbler, small-scale distributed polyglot services running on the JVM
(tags: soundcloud rails slides jvm scalability ruby scala clojure coding)
Opinion: The Internet is a surveillance state
Bruce Schneier op-ed on CNN.com.
So, we're done. Welcome to a world where Google knows exactly what sort of porn you all like, and more about your interests than your spouse does. Welcome to a world where your cell phone company knows exactly where you are all the time. Welcome to the end of private conversations, because increasingly your conversations are conducted by e-mail, text, or social networking sites. And welcome to a world where all of this, and everything else that you do or is done on a computer, is saved, correlated, studied, passed around from company to company without your knowledge or consent; and where the government accesses it at will without a warrant. Welcome to an Internet without privacy, and we've ended up here with hardly a fight.
(tags: freedom surveillance legal privacy internet bruce-schneier web google facebook)
Single Producer/Consumer lock free Queue step by step
great dissection of Martin "Disruptor" Thompson's lock-free single-producer/single-consumer queue data structure, with benchmark results showing crazy speedups. This is particularly useful since it's a data structure that can be used to provide good lock-free speedups without adopting the entire Disruptor design pattern.
(tags: disruptor coding java jvm martin-thompson lock-free volatile atomic queue data-structures)
Roko's basilisk - RationalWiki
Wacky transhumanists.
Roko's basilisk is notable for being completely banned from discussion on LessWrong, where any mention of it is deleted. Eliezer Yudkowsky, founder of LessWrong, considers the basilisk would not work, but will not explain why because he does not consider open discussion of the notion of acausal trade with possible superintelligences to be provably safe. Silly over-extrapolations of local memes are posted to LessWrong quite a lot; almost all are just downvoted and ignored. But this one, Yudkowsky reacted to hugely, then doubled-down on his reaction. Thanks to the Streisand effect, discussion of the basilisk and the details of the affair soon spread outside of LessWrong. The entire affair is a worked example of spectacular failure at community management and at controlling purportedly dangerous information. Some people familiar with the LessWrong memeplex have suffered serious psychological distress after contemplating basilisk-like ideas — even when they're fairly sure intellectually that it's a silly problem.[5] The notion is taken sufficiently seriously by some LessWrong posters that they try to work out how to erase evidence of themselves so a future AI can't reconstruct a copy of them to torture.[6]
(tags: transhumanism funny insane stupid singularity ai rokos-basilisk via:maciej lesswrong rationalism superintelligences striesand-effect absurd)
How the America Invents Act Will Change Patenting Forever
Bet you didn't think the US software patents situation could get worse? wrong!
“Now it’s really important to be the first to file, and it’s really important to file before somebody else puts a product out, or puts the invention in their product,” says Barr, adding that it will “create a new urgency on the part of everyone to file faster -- and that’s going to be a problem for the small inventor.”
(tags: first-to-file omnishambles uspto swpats patents software-patents law legal)
Distributed Systems Tracing with Zipkin
Twitter's version of the "canary"/"tracer" request concept
(tags: twitter zipkin tracing tracer-requests canary-requests http debugging production live distributed-systems distcomp stack infrastructure ops)
Transitioning from Google Reader to feedly
xpecting for some time: We have been working on a project called Normandy which is a feedly clone of the Google Reader API – running on Google App Engine. When Google Reader shuts down, feedly will seamlessly transition to the Normandy back end.
Excellent stuff -- I've just tried feedly and it's looking good -- in fact it may be a better UI overall anyway.(tags: feedly google-reader transition rss atom feeds web)
Double vision: seeing both sides of Syria’s war
A skirmish is filmed, using HD video cameras, by both sides. Storyful pinpoint the location. War as panopticon
(tags: storyful war syria future tanks battle video youtube hd panopticon)
Using DiffMerge as your Git visual merge and diff tool
A decent 3-way-diff GUI merge tool which works with git on OSX. "git config" command-lines included in this blog post
(tags: git merge osx mac macosx diff mergetool merging cli diffmerge)
-
A bunch of magic command lines to set useful OS X prefs without pointy-clicky. at least some also seem to work on Mountain Lion
-
'bootstrap an OSX development machine with a one-liner'.
Many teams use chef to manage their production machines, but developers often build their development boxes by hand. SoloWizard makes it painless to create a configurable chef solo script to get your development machine humming: mysql, sublime text, .bash_profile tweaks to OS-X settings - it's all there!
(tags: osx chef mac build-out ops macosx deployment developers desktops laptops mysql rabbitmq activemq nginx)
-
'Our results suggest that the Cablevision decision, [which was widely seen as easing certain ambiguities surrounding intellectual property], led to additional incremental investment in U.S. cloud computing firms that ranged from $728 million to approximately $1.3 billion over the two-and-a-half years after the decision. When paired with the findings of the enhanced effects of VC investment relative to corporate investment, this may be the equivalent of $2 to $5 billion in traditional R&D investment.' via Fred Logue.
(tags: via:fplogue law ip copyright policy cablevision funding vc cloud-computing investment legal buffering)
A History Of Ireland In 100 Objects
Now free!
The Royal Irish Academy, the National Museum of Ireland, and The Irish Times are collaborating with the EU Presidency, the Department of Foreign Affairs and Trade and Adobe to bring you a gift of A History of Ireland in 100 objects ‘from the people of Ireland to the people of the world’ for St Patrick’s Day. It is available as an interactive app for Apple iPhone and iPad, for most Android tablets and on the Kindle Fire, from our website, as well as associated app stores. You can also experience the book on your computer, smartphone or eReader by clicking on the 'eBook' button below. The gift is free to download until the end of March.
(tags: free st-patricks-day museum ireland history objects eu apps iphone ipad android books ebooks)
First 5 Minutes Troubleshooting A Server
quite a good checklist of first steps for troubleshooting. Worth bookmarking for "dstat --top-io --top-bio" alone, which is an absolutely excellent tool and new to me
(tags: dstat server io disks hardware performance linux sysadmin ops troubleshooting checklists root-cause)
-
you really know you've made it as an inept Irish politician when Panti Bliss gets dressed up in her most senatorial wig to take the mickey out of you
(tags: funny comedy fidelma-healy-eames politics ireland social-media inept youtube video)
Confusion reigns over three “hijacked” ccTLDs
This kind of silliness is only likely to increase as the number of TLDs increases (and they become more trivial).
What seems to be happening here is that [two companies involved] have had some kind of dispute, and that as a result the registrants and the reputation of three countries’ ccTLDs have been harmed. Very amateurish.
(tags: tlds domains via:fanf amateur-hour dns cctlds registrars adamsnames)
-
interesting details about Riak's support for secondary indexes. Not quite SQL, but still more powerful than plain old K/V storage (via dehora)
(tags: via:dehora riak indexes storage nosql key-value-stores 2i range-queries)
Metric Collection and Storage with Cassandra | DataStax
DataStax' documentation on how they store TSD data in Cass. Pretty generic
(tags: datastax nosql metrics analytics cassandra tsd time-series storage)
Jeff Dean's list of "Numbers Everyone Should Know"
from a 2007 Google all-hands, the list of typical latency timings from ranging from an L1 cache reference (0.5 nanoseconds) to a CA->NL->CA IP round trip (150 milliseconds).
(tags: performance latencies google jeff-dean timing caches speed network zippy disks via:kellabyte)
-
'a columnar storage format that supports nested data', from Twitter and Cloudera, encoded using Apache Thrift in a Dremel-based record shredding and assembly algorithm. Pretty crazy stuff:
We created Parquet to make the advantages of compressed, efficient columnar data representation available to any project in the Hadoop ecosystem. Parquet is built from the ground up with complex nested data structures in mind, and uses the record shredding and assembly algorithm described in the Dremel paper. We believe this approach is superior to simple flattening of nested name spaces. Parquet is built to support very efficient compression and encoding schemes. Multiple projects have demonstrated the performance impact of applying the right compression and encoding scheme to the data. Parquet allows compression schemes to be specified on a per-column level, and is future-proofed to allow adding more encodings as they are invented and implemented. Parquet is built to be used by anyone. The Hadoop ecosystem is rich with data processing frameworks, and we are not interested in playing favorites. We believe that an efficient, well-implemented columnar storage substrate should be useful to all frameworks without the cost of extensive and difficult to set up dependencies.
(tags: twitter cloudera storage parquet dremel columns record-shredding hadoop marshalling columnar-storage compression data)
Bunnie Huang's "Hacking the Xbox" now available as a free PDF
'No Starch Press and I have decided to release this free ebook version of Hacking the Xbox in honor of Aaron Swartz. As you read this book, I hope that you’ll be reminded of how important freedom is to the hacking community and that you’ll be inclined to support the causes that Aaron believed in. I agreed to release this book for free in part because Aaron’s treatment by MIT is not unfamiliar to me. In this book, you will find the story of when I was an MIT graduate student, extracting security keys from the original Microsoft Xbox. You’ll also read about the crushing disappointment of receiving a letter from MIT legal repudiating any association with my work, effectively leaving me on my own to face Microsoft. The difference was that the faculty of my lab, the AI laboratory, were outraged by this treatment. They openly defied MIT legal and vowed to publish my work as an official “AI Lab Memo,” thereby granting me greater negotiating leverage with Microsoft. Microsoft, mindful of the potential backlash from the court of public opinion over suing a legitimate academic researcher, came to a civil understanding with me over the issue.' This is a classic text on hardware reverse-engineering and the freedom to tinker -- strongly recommended.
(tags: hacking bunnie-huang xbox free hardware drm freedom-to-tinker books reading mit microsoft history)
Daemon Showdown: Upstart vs. Runit vs. Systemd vs. Circus vs. God
strangely, no mention of runit being total shite though
(tags: daemons runit upstart systemd supervisord circus god nannies processes unix crash-only-software linux ops)
-
Clojure-style lazy functional collections (via QCon via Caro)
(tags: via:caro collections java functional lazy-loading lazy-computation lazy clojure)
4 Things Java Programmers Can Learn from Clojure (without learning Clojure)
'1. Use immutable values; 2. Do no work in the constructor; 3. Program to small interfaces; 4. Represent computation, not the world'. Strongly agreed with #1, and the others look interesting too
Tactical Chat: How the U.S. Military Uses IRC to Wage War
Excellent stuff. Lessons to be learned from this: IRC has some key features that mean it can be useful in this case. 1. simple text, everything supports it, no fancy UI clients are necessary; 2. resilient against lossy/transient/low-bandwidth/high-latency networks; 3. standards-compliant and "battle-hardened" (so to speak); 4. open-source/non-proprietary.
Despite the U.S. military’s massive spending each year on advanced communications technology, the use of simple text chat or tactical chat has outpaced other systems to become one of the most popular paths for communicating practical information on the battlefield. Though the use of text chat by the U.S. military first began in the early 1990s, in recent years tactical chat has evolved into a “primary ‘comms’ path, having supplanted voice communications as the primary means of common operational picture (COP) updating in support of situational awareness.” An article from January 2012 in the Air Land Sea Bulletin describes the value of tactical chat as an effective and immediate communications method that is highly effective in distributed, intermittent, low bandwidth environments which is particularly important with “large numbers of distributed warfighters” who must “frequently jump onto and off of a network” and coordinate with other coalition partners. Text chat also provides “persistency in situational understanding between those leaving and those assuming command watch duties” enabling a persistent record of tactical decision making. A 2006 thesis from the Naval Postgraduate School states that internet relay chat (IRC) is one of the most widely used chat protocols for military command and control (C2). Software such as mIRC, a Windows-based chat client, or integrated systems in C2 equipment are used primarily in tactical conditions though efforts are underway to upgrade systems to newer protocols.
(via JK)(tags: via:jk war irc chat mirc us-military tactical-chat distcomp networking)
-
Great neologism from Mick Fealty:
Familiar to anyone who’s followed public debate on Northern Ireland. Some define it as the often multiple blaming and finger pointing that goes on between communities in conflict. Political differences are marked by powerful emotional (often tribal) reactions as opposed to creative conflict over policy and issues. It’s beginning to be known well beyond the bounds of Northern Ireland. [...] Evasion may not be the intention but it is the obvious effect. It occurs when individuals are confronted with a difficult or uncomfortable question. The respondent retrenches his/her position and rejigs the question, being careful to pick open a sore point on the part of questioner’s ‘tribe’. He/she then fires the original query back at the inquirer.
(tags: words etymology whataboutery argument debate northern-ireland mick-fealty slugger-otoole)
-
Give your app its own private Dropbox client and leave the syncing to us.
the real reason Marissa Mayer canned remote Y! employees (apparently)
After spending months frustrated at how empty Yahoo parking lots were, Mayer consulted Yahoo's VPN logs to see if remote employees were checking in enough. Mayer discovered they were not — and her decision was made. we're hearing from people close to Yahoo executives and employees that she made the right decision banning work from home. "The employees at Yahoo are thrilled," says one source close to the company. "There isn't massive uprising. The truth is, they've all been pissed off that people haven't been working."
(tags: yahoo work remote-work teleworking slacking marissa-mayer funny)
Online Schema Change for MySQL
A tool written by Facebook to ease the pain of online MySQL schema-change migrations.
Some ALTER TABLE statements take too long form the perspective of some MySQL users. The fast index create feature for the InnoDB plugin in MySQL 5.1 makes this less of an issue but this can still take minutes to hours for a large table and for some MySQL deployments that is too long. A workaround is to perform the change on a slave first and then promote the slave to be the new master. But this requires a slave located near the master. MySQL 5.0 added support for triggers and some replication systems have been built using triggers to capture row changes. Why not use triggers for this? The openarkkit toolkit did just that with oak-online-alter-table. We have published our version of an online schema change utility (OnlineSchemaChange.php aka OSC).
(tags: facebook mysql sql schema database migrations ops alter-table)
Netflix Queue: Data migration for a high volume web application
There will come a time in the life of most systems serving data, when there is a need to migrate data to [another] data store while maintaining or improving data consistency, latency and efficiency. This document explains the data migration technique we used at Netflix to migrate the user’s queue data between two different distributed NoSQL storage systems [SimpleDB to Cassandra].
(tags: cassandra netflix migrations data schema simpledb storage)
Monitoring Apache Hadoop, Cassandra and Zookeeper using Graphite and JMXTrans
nice enough, but a lot of moving parts. It would be nice to see a simpler ZK+Graphite setup using the 'mntr' verb
(tags: graphite monitoring ops zookeeper cassandra hadoop jmx jmxtrans graphs)
RFC 6585 - Additional HTTP Status Codes
includes "429 Too Many Requests", for rate limits
Curator Framework: Reducing the Complexity of Building Distributed Systems | Marketing Technology
good +1 for using Netflix' Curator ZK client library
-
a high-level API that greatly simplifies using ZooKeeper. It adds many features that build on ZooKeeper and handles the complexity of managing connections to the ZooKeeper cluster and retrying operations. Some of the features are: Automatic connection management: There are potential error cases that require ZooKeeper clients to recreate a connection and/or retry operations. Curator automatically and transparently (mostly) handles these cases. Cleaner API: simplifies the raw ZooKeeper methods, events, etc.; provides a modern, fluent interface Recipe implementations (see Recipes): Leader election, Shared lock, Path cache and watcher, Distributed Queue, Distributed Priority Queue
(tags: zookeeper java netflix distcomp libraries oss open-source distributed)
OscarGodson.js | What I Learned At Yammer
some pretty interesting lessons, it turns out: a 'take what you need' vacation policy means nobody takes vacations (unsurprising); Yammer actively work to avoid employee burnout (good idea); Yammer A/B test every feature; and Yammer mgmt try to let their devs work autonomously.
-
Some really cool-looking UNIX command line utils, packaged in Debian (and therefore in Ubuntu too). A few of these I've reimplemented separately, but it's always good to replace a hack with a more widely available "official" tool. Thanks, Joey Hess!
sponge: accept input, wait til EOF, then rewrite a file; chronic: runs a command quietly unless it fails; combine: combine the lines in two files using boolean operations; ifdata: get network interface info without parsing ifconfig output; ifne: run a program if the standard input is not empty; isutf8: check if a file or standard input is utf-8; lckdo: execute a program with a lock held; mispipe: pipe two commands, returning the exit status of the first; parallel: run multiple jobs at once; pee: tee standard input to pipes; sponge: soak up standard input and write to a file; ts: timestamp standard input; vidir: edit a directory in your text editor; vipe: insert a text editor into a pipe; zrun: automatically uncompress arguments to command
(tags: bash shell cli unix scripting via:peakscale joey-hess debian ubuntu tools command-line commands)
Test-Driven Infrastructure with Chef
Interesting idea.
The book introduces “Infrastructure as Code,” test-driven development, Chef, and cucumber-chef, and then proceeds to a simple example using Chef to provision a shared Linux server. The recipes for the server are developed test-first, demonstrating both the technique and the workflow.
(tags: tdd chef server provisioning build deploy linux coding ops sysadmin)
Peek and poke in the age of Linux
Neat demo of using ptrace to inject into a running process, just like the good old days ;)
Some time ago I ran into a production issue where the init process (upstart) stopped behaving properly. Specifically, instead of spawning new processes, it deadlocked in a transitional state. [...] What’s worse, upstart doesn’t allow forcing a state transition and trying to manually create and send DBus events didn’t help either. That meant the sane options we were left with were: restart the host (not desirable at all in that scenario); start the process manually and hope auto-respawn will not be needed. Of course there are also some insane options. Why not cheat like in the old times and just PEEK and POKE the process in the right places? The solution used at the time involved a very ugly script driving gdb which probably summoned satan in some edge cases. But edge cases were not hit and majority of hosts recovered without issues.
(tags: debugging memory linux upstart peek poke ptrace gdb processes hacks)
The World Wide Web is Moving to AOL! | Brian Bailey
brilliant parody of those "we're so happy to be shutting down!" posts.
Don't worry, all of that hard work won't be wasted. The World Wide Web will remain accessible for 30 days, which will give you plenty of time to update your readers and customers. Each of you will also receive a 30-day free trial for AOL. Look for your CD in the mail soon. Even better, we've created an import tool to make it easy to migrate everything you've put on the web to American Online! The address will change, of course, but now it will be available to every AOL member. You may find that you don't need to bother, though. America Online already has groups and pages about almost every topic you can imagine. Take a look around first and you might save yourself a lot of time. There are only so many different ways to say that Citizen Kane was a good movie! We understand that not all of you will become AOL subscribers and not all web sites will move to the new platform. Just to be safe, be sure to print out all of your favorite pages before the end of the month.
(tags: acquihired acquisitions aol www funny parody humour web)
Irish government attacked using 'MiniDuke' PDF malware
although I haven't seen a word of it in the Irish media yet -- wonder if the government have noticed?
Cyber criminals have targeted government officials in more than 20 countries, including Ireland and Romania, in a complex online assault seen rarely since the turn of the millennium. The attack, dubbed "MiniDuke" by researchers, has infected government computers as recently as this week in an attempt to steal geopolitical intelligence, according to security experts.
(tags: ireland malware attacks pdf security espionage romania miniduke)
The MiniDuke Mystery: PDF 0-day Government Spy Assembler 0x29A Micro Backdoor - Securelist
By analysing the logs from the command servers, we have observed 59 unique victims in 23 countries: Belgium, Brazil, Bulgaria, Czech Republic, Georgia, Germany, Hungary, Ireland, Israel, Japan, Latvia, Lebanon, Lithuania, Montenegro, Portugal, Romania, Russian Federation, Slovenia, Spain, Turkey, Ukraine, United Kingdom and United States.
Romania believes rival nation behind MiniDuke cyber attack | Reuters
"It is a cyber attack ... pursued by an entity that has the characteristics of a state actor," [Romanian secret service] SRI spokesman Sorin Sava told Reuters [...]. "Our estimations show the attack is certainly relevant to Romania's national security taking into account the profile of the compromised entities." [...] In this case, computer experts say an attacker from the former Soviet Union could be more likely. "MiniDuke" in some ways resembles a banking fraud Trojan dubbed "TinBa" believed to have been created by Russian criminal hackers.
(tags: ireland malware attacks pdf security espionage romania miniduke)
Compress data more densely with Zopfli - Google Developers Blog
New compressor from Google, gzip/zip-compatible, slower but slightly smaller results
(tags: compression gzip zip deflate google)
Denominator: A Multi-Vendor Interface for DNS
the latest good stuff from Netflix.
Denominator is a portable Java library for manipulating DNS clouds. Denominator has pluggable back-ends, initially including AWS Route53, Neustar Ultra, DynECT, and a mock for testing. We also ship a command line version so it's easy for anyone to try it out. The reason we built Denominator is that we are working on multi-region failover and traffic sharing patterns to provide higher availability for the streaming service during regional outages caused by our own bugs and AWS issues. To do this we need to directly control the DNS configuration that routes users to each region and each zone. When we looked at the features and vendors in this space we found that we were already using AWS Route53, which has a nice API but is missing some advanced features; Neustar UltraDNS, which has a SOAP based API; and DynECT, which has a REST API that uses a quite different pseudo-transactional model. We couldn’t find a Java based API that grouped together common set of capabilities that we are interested in, so we created one. The idea is that any feature that is supported by more than one vendor API is the highest common denominator, and that functionality can be switched between vendors as needed, or in the event of a DNS vendor outage.
(tags: dns netflix java tools ops route53 aws ultradns dynect)
-
Who knew? you can make a runnable JAR file!
There has long been a hack known in some circles, but not widely known, to make jars really executable, in the chmod +x sense. The hack takes advantage of the fact that jar files are zip files, and zip files allow arbitrary cruft to be prepended to the zip file itself (this is how self-extracting zip files work).
(tags: jars via:netflix shell java executable chmod zip hacks command-line cli)
Two surgeons debate the use of cycle helmets
'I am a neurosurgeon and a cyclist, and I am also married to a dedicated cyclist. I wear a cycling helmet and encourage cyclists to wear one. I don’t find that wearing one impedes me in any way. I am under no illusion that it will save me in the event of a high speed collision with a car or lorry (nothing will), but most cycling accidents aren’t of the high-speed variety.' versus: 'I am a consultant Trauma orthopaedic surgeon working in Edinburgh and have many years of experience treating cyclists after serious road traffic, cycle sport and commuting cycle injuries. I believe there is no justification for helmet laws or promotional campaigns that portray cycling as a particularly ‘dangerous’ activity, or that make unfounded claims about the effectiveness of helmets. By reducing cycle use even slightly, helmet laws or promotion campaigns are likely to cause a significant net disbenefit to public health, regardless of the effectiveness or otherwise of helmets.' Generally a lot of sense on either side.
(tags: helmets cycling bicycles health safety surgeons doctors)
Storm and Hadoop: Convergence of Big-Data and Low-Latency Processing
Yahoo! are going big with Storm for their next-generation internal cloud platform: 'Yahoo! engineering teams are developing technologies to enable Storm applications and Hadoop applications to be hosted on a single cluster. • We have enhanced Storm to support Hadoop style security mechanism (including Kerberos authentication), and thus enable Storm applications authorized to access Hadoop datasets on HDFS and HBase. • Storm is being integrated into Hadoop YARN for resource management. Storm-on-YARN enables Storm applications to utilize the computation resources in our tens of thousands of Hadoop computation nodes. YARN is used to launch Storm application master (Nimbus) on demand, and enables Nimbus to request resources for Storm application slaves (Supervisors).'
(tags: yahoo yarn cloud-computing private-clouds big-data latency storm hadoop elastic-computing hbase)
Trojan paralyses speed cameras in Moscow
what a coincidence! (via Tony Finch)
-
Basically, tweaking a few suboptimal sysctls to optimize for 802.11b/n; requires a Jailbroken IOS device. I'm surprised that Apple defaulted segment size to 512 to be honest, and disabling delayed ACKs sounds like it might be useful (see also http://www.stuartcheshire.org/papers/NagleDelayedAck/).
TCP optimizer modifies a few settings inside iOS, including increasing the TCP receive buffer from 131072 to 292000, disabling TCP delayed ACK’s, allowing a maximum of 16 un-ACK’d packets instead of 8 and set the default packet size to 1460 instead of 512. These changes won’t only speed up your YouTube videos, they’ll also improve your internet connection’s performance overall, including Wi-Fi network connectivity.
(tags: tcp performance tuning ios apple wifi wireless 802.11n sysctl ip)
-
A study published in the Feb. 27 issue of the journal PLoS One links increased consumption of sugar with increased rates of diabetes by examining the data on sugar availability and the rate of diabetes in 175 countries over the past decade. And after accounting for many other factors, the researchers found that increased sugar in a population’s food supply was linked to higher diabetes rates independent of rates of obesity. In other words, according to this study, obesity doesn’t cause diabetes: sugar does. The study demonstrates this with the same level of confidence that linked cigarettes and lung cancer in the 1960s. As Rob Lustig, one of the study’s authors and a pediatric endocrinologist at the University of California, San Francisco, said to me, “You could not enact a real-world study that would be more conclusive than this one.”
(tags: nytimes health food via:fanf sugar eating diabetes papers medicine)
-
Stoneybatter's not-for-profit art space needs contributions
(tags: art stoneybatter dublin d7 ireland fundit fundraising the-joinery)
Are volatile reads really free?
Marc Brooker with some good test data:
It appears as though reads to volatile variables are not free in Java on x86, or at least on the tested setup. It's true that the difference isn't so huge (especially for the read-only case) that it'll make a difference in any but the more performance sensitive case, but that's a different statement from free.
(tags: volatile concurrency jvm performance java marc-brooker)
-
'Watch Netflix USA, Hulu, Pandora, BBC iPlayer, and more in [sic] anywhere you live!' -- seems to use similar techniques to tunlr.net, looks like it works for my Netflix
(tags: netflix dns tv tunnelling drm networking spotify hulu)
Cassandra, Hive, and Hadoop: How We Picked Our Analytics Stack
reasonably good whole-stack performance testing and analysis; HBase, Riak, MongoDB, and Cassandra compared. Riak did pretty badly :(
(tags: riak mongodb cassandra hbase performance analytics hadoop hive big-data storage databases nosql)
Big Data Analytics at Netflix. Interview with Christos Kalantzis and Jason Brown.
Good interview with the Cassandra guys at Netflix, and some top Mongo-bashing in the comments
(tags: cassandra netflix user-stories testimonials nosql storage ec2 mongodb)
-
my favourite art of the moment. Thick, heavy layers of acrylic black and white paint, evoking the stormy Atlantic (brr). Gallery Bode, which showed this in Nuremberg in 2011, wrote the following at http://www.bode-galerie.de/en/exhibitions/schwarz_weiss :
Gallery Bode is pleased to constitute the cooperation with Werner Knaupp with an exhibition of a new workseries. The exhibition showcases artworks out of the series "Westmen Isles". [...] The journeys to Iceland are a background to the development of this new workseries. These paintings are telling of a forbidding nature. The beholder can't take a [safe] position but he is involved into the event which becomes comprehensible in a nearly physical way. These pictures of a overwhelming nature could be traced back to Knaupp's confrontation with the force of nature while his journeys. The experience of this force pushes the limits of human being and evokes primal fear. With the abdication of colours the artworks reach dynamic. This foots on the consistency of colour and on the changing between reality and abstraction. In an art historical view the new black and white paintings detached themselves from traditional landscape painting. Werner Knaupp implements the pure force of nature into pure painting, to visualise the force fields of nature. The beholder experiences with these artworks a nature without human dimension. In Werner Knaupp's Oeuvre the "Westmen Isles" paintings are a new expression of his examination with existential fundamental questions.
(tags: germany art painting werner-knaupp paintings monochrome sea iceland)
Indymedia: It’s time to move on
Our decision to curtail publishing on the Nottingham Indymedia site and call a meeting is an attempt to create a space for new ideas. We are not interested in continuing along the slow but certain path to total irrelevance but want to draw in new people and start off in new directions whilst remaining faithful to the underlying principles of Indymedia.
(tags: indymedia community communication web anonymity publishing left-wing)
How to revert a faulty merge in git
omgwtf, this is pretty horrific.
#AltDevBlogADay » Latency Mitigation Strategies
John Carmack on the low-latency coding techniques used to support head mounted display devices.
Virtual reality (VR) is one of the most demanding human-in-the-loop applications from a latency standpoint. The latency between the physical movement of a user’s head and updated photons from a head mounted display reaching their eyes is one of the most critical factors in providing a high quality experience. Human sensory systems can detect very small relative delays in parts of the visual or, especially, audio fields, but when absolute delays are below approximately 20 milliseconds they are generally imperceptible. Interactive 3D systems today typically have latencies that are several times that figure, but alternate configurations of the same hardware components can allow that target to be reached. A discussion of the sources of latency throughout a system follows, along with techniques for reducing the latency in the processing done on the host system.
(tags: head-mounted-display display ui latency vision coding john-carmack)
Distributed Streams Algorithms for Sliding Windows [PDF]
'Massive data sets often arise as physically distributed, parallel data streams, and it is important to estimate various aggregates and statistics on the union of these streams. This paper presents algorithms for estimating aggregate functions over a “sliding window” of the N most recent data items in one or more streams. [...] Our results are obtained using a novel family of synopsis data structures called waves.'
(tags: waves papers streaming algorithms percentiles histogram distcomp distributed aggregation statistics estimation streams)
good blog post on histogram-estimation stream processing algorithms
After reviewing several dozen papers, a score or so in depth, I identified two data structures that appear to enable us to answer these recency and frequency queries: exponential histograms (from "Maintaining Stream Statistics Over Sliding Windows" by Datar et al.) and waves (from "Distributed Streams Algorithms for Sliding Windows" by Gibbons and Tirthapura). Both of these data structures are used to solve the so-called counting problem, the problem of determining, with a bound on the relative error, the number of 1s in the last N units of time. In other words, the data structures are able to answer the question: how many 1s appeared in the last n units of time within a factor of Error (e.g., 50%). The algorithms are neat, so I'll present them briefly.
(tags: streams streaming stream-processing histograms percentiles estimation waves statistics algorithms)
Timelike 2: everything fails all the time
Fantastic post on large-scale distributed load balancing strategies from @aphyr. Random and least-conns routing comes out on top in his simulation (although he hasn't yet tried Marc Brooker's two-randoms routing strategy)
(tags: via:hn routing distributed least-conns load-balancing round-robin distcomp networking scaling)
Marc Brooker's "two-randoms" load balancing approach
Marc Brooker on this interesting load-balancing algorithm, including simulation results:
Using stale data for load balancing leads to a herd behavior, where requests will herd toward a previously quiet host for much longer than it takes to make that host very busy indeed. The next refresh of the cached load data will put the server high up the load list, and it will become quiet again. Then busy again as the next herd sees that it's quiet. Busy. Quiet. Busy. Quiet. And so on. One possible solution would be to give up on load balancing entirely, and just pick a host at random. Depending on the load factor, that can be a good approach. With many typical loads, though, picking a random host degrades latency and reduces throughput by wasting resources on servers which end up unlucky and quiet. The approach taken by the studies surveyed by Mitzenmacher is to try two hosts, and pick the one with the least load. This can be done directly (by querying the hosts) but also works surprisingly well on cached load data. [...] Best of 2 is good because it combines the best of both worlds: it uses real information about load to pick a host (unlike random), but rejects herd behavior much more strongly than the other two approaches.
Having seen what Marc has worked on, and written, inside Amazon, I'd take this very seriously... cool to see he is blogging externally too.(tags: algorithm load-balancing distcomp distributed two-randoms marc-brooker least-conns)
Can regular expressions parse HTML?
'a summary of the main points: The “regular expressions” used by programmers have very little in common with the original notion of regularity in the context of formal language theory. Regular expressions (at least PCRE) can match all context-free languages. As such they can also match well-formed HTML and pretty much all other programming languages. Regular expressions can match at least some context-sensitive languages. Matching of regular expressions is NP-complete. As such you can solve any other NP problem using regular expressions.'
(tags: compsci regexps regular-expressions programming np-complete chomsky-grammar context-free languages)
How to Create Application Shortcuts in Google Chrome for Mac
a rather hacky script is required. Ugh
(tags: hacks osx google chrome mac application-shortcuts site-specific-browsers)
-
I couldn't remember the name for this design principle, so it's worth a bookmark to remind me in future... 'This refers to computer programs that handle failures by simply restarting, without attempting any sophisticated recovery. Correctly written components of crash-only software can microreboot to a known-good state without the help of a user. Since failure-handling and normal startup use the same methods, this can increase the chance that bugs in failure-handling code will be noticed.'
(tags: crashing crash-only-software design architecture coding software fault-tolerance erlang let-it-fail microreboot recovery autosave)
Europe Is Warmer Than Canada Because of the Gulf Stream, Right? Not So Fast
The common tale—the one bandied around for more than a hundred years—goes something like this: Warm water flowing to the northeast out of the Gulf of Mexico—the Gulf Stream—cuts across the North Atlantic ocean, bringing extra energy to the Isles and driving up temperatures relative to the comparatively-frigid North Americas. The only problem with this simple explanation, say Stephen Riser and Susan Lozier in Scientific American, is that it doesn’t actually account for the difference.
(tags: gulf-stream myths ireland europe science currents ocean temperature climate)
Dear Prudence: My wife and I came from the same sperm donor
yes, really. Bloody hell
(tags: sperm-donor birth dear-prudence omgwtfbbq via:davewiner reproduction)
-
from Twitter -- 'a cache for your big data. Even though memory is thousand times faster than SSD, network connected SSD-backed memory makes sense, if we design the system in a way that network latencies dominate over the SSD latencies by a large factor. To understand why network connected SSD makes sense, it is important to understand the role distributed memory plays in large-scale web architecture. In recent years, terabyte-scale, distributed, in-memory caches have become a fundamental building block of any web architecture. In-memory indexes, hash tables, key-value stores and caches are increasingly incorporated for scaling throughput and reducing latency of persistent storage systems. However, power consumption, operational complexity and single node DRAM cost make horizontally scaling this architecture challenging. The current cost of DRAM per server increases dramatically beyond approximately 150 GB, and power cost scales similarly as DRAM density increases. Fatcache extends a volatile, in-memory cache by incorporating SSD-backed storage.'
(tags: twitter ssd cache caching memcached memcache memory network storage)
Passively Monitoring Network Round-Trip Times - Boundary
'how Boundary uses [TCP timestamps] to calculate round-trip times (RTTs) between any two hosts by passively monitoring TCP traffic flows, i.e., without actively launching ICMP echo requests (pings). The post is primarily an overview of this one aspect of TCP monitoring, it also outlines the mechanism we are using, and demonstrates its correctness.'
(tags: tcp boundary monitoring network ip passive-monitoring rtt timestamping)
drug cartel-controlled mobile comms networks
“The Mexican military has recently broken up several secret telecommunications networks that were built and controlled by drug cartels so they could coordinate drug shipments, monitor their rivals and orchestrate attacks on the security forces. A network that was dismantled just last week provided cartel members with cellphone and radio communications across four northeastern states. The network had coverage along almost 500 miles of the Texas border and extended nearly another 500 miles into Mexico’s interior. Soldiers seized 167 antennas, more than 150 repeaters and thousands of cellphones and radios that operated on the system. Some of the remote antennas and relay stations were powered with solar panels.”
(tags: mexico drugs networks mobile-phones crime)
Heroku finds out that distributed queueing is hard
Stage 3 of the Rap Genius/Heroku blog drama. Summary (as far as I can tell): Heroku gave up on a fully-synchronised load-balancing setup ("intelligent routing"), since it didn't scale, in favour of randomised queue selection; they didn't sufficiently inform their customers, and metrics and docs were not updated to make this change public; the pessimal case became pretty damn pessimal; a customer eventually noticed and complained publicly, creating a public shit-storm. Comments: 1. this is why you monitor real HTTP request latency (scroll down for crazy graphs!). 2. include 90/99 percentiles to catch the "tail" of poorly-performing requests. 3. Load balancers are hard. http://aphyr.com/posts/277-timelike-a-network-simulator has more info on the intricacies of distributed load balancing -- worth a read.
(tags: heroku rap-genius via:hn networking distcomp distributed load-balancing ip queueing percentiles monitoring)
-
10 particularly good -- actually helpful -- tips on using the Graphite metric graphing system
(tags: graphite ops metrics service-metrics graphing ui dataviz)
Literate Jenks Natural Breaks and How The Idea Of Code is Lost
A crazy amount of code archaeology to discover exactly an algorithm -- specifically 'Jenks natural breaks", works, after decades of cargo-cult copying (via Nelson): 'I spent a day reading the original text and decoding as much as possible of the code’s intention, so that I could write a ‘literate’ implementation. My definition of literate is highly descriptive variable names, detailed and narrative comments, and straightforward code with no hijinks. So: yes, this isn’t the first implementation of Jenks in Javascript. And it took me several times longer to do things this way than to just get the code working. But the sad and foreboding state of this algorithm’s existing implementations said that to think critically about this code, its result, and possibilities for improvement, we need at least one version that’s clear about what it’s doing.'
(tags: jenks-natural-breaks algorithms chloropleth javascript reverse-engineering history software copyright via:nelson)
don't order a Raspberry Pi from RS
I've been waiting 24 days for mine so far. Frankly amazing they are so apparently inept, particularly since it seems in breach of EU distance selling regulation if they go beyond 30 days without an update. They've just posted this:
Quick update- we received our delivery of raspberry pi’s last week and as of Friday we had shipped up to order reference 1010239854. We will continue daily to get your orders shipped out as quickly as we possibly can; so that you will all receive your raspberry pi’s shortly. Many thanks everyone for your patience and again apologies for the delay in the dispatch update message on the Pi Store which I know has caused some confusion.
(tags: rs raspberry-pi inept etailers uk e-commerce shopping hardware)
more details on the UK distance selling regulations governing Raspberry Pi RS orders
'my understanding is that according to the Distance Selling Regulations [...], unless you agreed otherwise with RS, then they were obligated to fulfill their side of the contract within thirty days from the day after you ordered, and if they were unable to do so they were also obligated to inform you that they could not and repay you within thirty days;ons (more info here in a nice, easy-to-read format), unless you agreed otherwise with RS, then they were obligated to fulfill their side of the contract within thirty days from the day after you ordered, and if they were unable to do so they were also obligated to inform you that they could not and repay you within thirty days'
Sketch of the Day: HyperLogLog — Cornerstone of a Big Data Infrastructure
includes a nice javascript demo of HLL
(tags: hyperloglog loglog algorithms stream-processing streams estimation demos javascript)
-
'log scale for lists; Decaying lists allow to manage large range of values. A decaying list grows logarithmically with the number of items. It follows that some items are dropped when other are inserted.' (via Tony Finch)
(tags: via:fanf clojure algorithms decay backoff half-life data-structures)
Cycling in Dublin City: the numbers
7.6% of the Dublin commuter population "mainly cycle". some interesting stats here
-
Apache-licensed open source java lib to implement retrying behaviour cleanly.
a general purpose method for retrying arbitrary Java code with specific stop, retry, and exception handling capabilities that are enhanced by Guava's predicate matching. It also includes an exponential backoff WaitStrategy that might be useful for situations where more well-behaved service polling is preferred.
(tags: retries retrying resiliency fault-tolerance java open-source guava)
-
Utilizing an iPhone/Android App known as “Talking Tom Cat”, the tool has been transformed into a new media mouthpiece, addressing very specific particulars of the conflict that are glossed over by international media: alliances between MNLA and Ansar Dine, critiques of hypocrisy of the MUJAO factions, and ousting of corrupt politicians.
(tags: apps wtf politics talking-tom-cat bizarre tuareg africa via:neilmajor)
-
Black hats steal code-signing keys from software whitelisting anti-malware firm. Pretty audacious
(tags: malware security whitelisting av)
How did I do the Starwars Traceroute?
It is accomplished using many vrfs on 2 Cisco 1841s. For those less technical, VRFs are essentially private routing tables similar to a VPN. When a packet destined to 216.81.59.173 (AKA obiwan.scrye.net) hits my main gateway, I forward it onto the first VRF on the "ASIDE" router on 206.214.254.1. That router then has a specific route for 216.81.59.173 to 206.214.254.6, which resides on a different VRF on the "BSIDE" router. It then has a similar set up which points it at 206.214.254.9 which lives in another VPN on "ASIDE" router. All packets are returned using a default route pointing at the global routing table. This was by design so the packets TTL expiration did not have to return fully through the VRF Maze. I am a consultant to Epik Networks who let me use the Reverse DNS for an unused /24, and I used PowerDNS to update all of the entries through mysql. This took about 30 minutes to figure out how to do it, and about 90 minutes to implement.
(tags: vrfs routing networking hacks star-wars traceroute rdns ip)
Real-time Analytics in Scala [slides, PDF]
some good approximation/streaming algorithms and tips on Scala implementation
(tags: streams algorithms approximation coding scala slides)
'E?cient Computation of Frequent and Top-k Elements in Data Streams' [paper, PDF]
The Space-Saving algorithm to compute top-k in a stream. I've been asking a variation of this problem as an interview question for a while now, pretty cool to find such a neat solution. Pity neither myself nor anyone I've interviewed has come up with it ;)
(tags: space-saving approximation streams stream-processing cep papers pdf algorithms)
-
ASL-licensed open source library of stream-processing/approximation algorithms: count-min sketch, space-saving top-k, cardinality estimation, LogLog, HyperLogLog, MurmurHash, lookup3 hash, Bloom filters, q-digest, stochastic top-k
(tags: algorithms coding streams cep stream-processing approximation probabilistic space-saving top-k cardinality estimation bloom-filters q-digest loglog hyperloglog murmurhash lookup3)
'Medians and Beyond: New Aggregation Techniques for Sensor Networks' [paper, PDF]
'We introduce Quantile Digest or q-digest, a novel data structure which provides provable guarantees on approximation error and maximum resource consumption. In more concrete terms, if the values returned by the sensors are integers in the range [1;n], then using q-digest we can answer quantile queries using message size m within an error of O(log(n)/m). We also outline how we can use q-digest to answer other queries such as range queries, most frequent items and histograms. Another notable property of q-digest is that in addition to the theoretical worst case bound error, the structure carries with itself an estimate of error for this particular query.'
(tags: q-digest algorithms streams approximation histograms median percentiles quantiles)
Russia's anti-child-porn internet blocklist allegedly being used for general censorship
Allegedly being used to censor political and anti-corruption journalism, and a Russian wikipedia-like site for hosting an article about suicide
(tags: censorship feature-creep russia politics blocklists)
HyperLogLog++: Google’s Take On Engineering HLL
Google and AggregateKnowledge's improvements to the HyperLogLog cardinality estimation algorithm
(tags: hyperloglog cardinality estimation streaming stream-processing cep)
osx - Remap "Home" and "End" to beginning and end of line
in summary: ~/Library/KeyBindings/DefaultKeyBinding.dict. Thanks, Apple, this is stupid
(tags: mac keyboard bindings it-just-works compatibility ui rebinding)
-
Hadoop, a batch-generated read-only Voldemort cluster, and an intriguing optimal-storage histogram bucketing algorithm:
The optimal histogram is computed using a random-restart hill climbing approximated algorithm. The algorithm has been shown very fast and accurate: we achieved 99% accuracy compared to an exact dynamic algorithm, with a speed increase of one factor. [...] The amount of information to serve in Voldemort for one year of BBVA's credit card transactions on Spain is 270 GB. The whole processing flow would run in 11 hours on a cluster of 24 "m1.large" instances. The whole infrastructure, including the EC2 instances needed to serve the resulting data would cost approximately $3500/month.
(tags: scalability scaling voldemort hadoop batch algorithms histograms statistics bucketing percentiles)
-
'Splout is a scalable, open-source, easy-to-manage SQL big data view. Splout is to Hadoop + SQL what Voldemort or Elephant DB are to Hadoop + Key/Value. Splout serves a read-only, partitioned SQL view which is generated and indexed by Hadoop.' Some FAQs: 'What's the difference between Splout SQL and Dremel-like solutions such as BigQuery, Impala or Apache Drill? Splout SQL is not a "fast analytics" Dremel-like engine. It is more thought to be used for serving datasets under web / mobile high-throughput, many lookups, low-latency applications. Splout SQL is more like a NoSQL database in the sense that it has been thought for answering queries under sub-second latencies. It has been thought for performing queries that impact a very small subset of the data, not queries that analyze the whole dataset at once.'
(tags: splout sql big-data hadoop read-only scaling queries analytics)
Goonwaffe Stories: A Guide For Newbies [PDF]
impressively high-quality newbie's guide from the Goonswarm Federation -- as themittani.com describes it, 'frankly a work of art: a 1950's Pulp Scifi magazine full of internet spaceships and sociopathy.'
(tags: eve-online space goonswarm gaming mmo pdf pulp science-fiction)
Evasi0n Jailbreak's Userland Component
Good writeup of the exploit techniques used in the new iOS jailbreak.
Evasi0n is interesting because it escalates privileges and has full access to the system partition all without any memory corruption. It does this by exploiting the /var/db/timezone vulnerability to gain access to the root user’s launchd socket. It then abuses launchd to load MobileFileIntegrity with an inserted codeless library, which is overriding MISValidateSignature to always return 0.
(tags: jailbreak ios iphone ipad exploits evasi0n via:nelson)
Programming Language Checklist
'You appear to be advocating a new: [ ] functional [ ] imperative [ ] object-oriented [ ] procedural [ ] stack-based [ ] "multi-paradigm" [ ] lazy [ ] eager [ ] statically-typed [ ] dynamically-typed [ ] pure [ ] impure [ ] non-hygienic [ ] visual [ ] beginner-friendly [ ] non-programmer-friendly [ ] completely incomprehensible programming language. Your language will not work. Here is why it will not work.'
(tags: humor programming funny coding languages)
Jetty-9 goes fast with Mechanical Sympathy
This is very cool! Applying Mechanical Sympathy optimization techniques to Jetty, specifically: "False sharing" on the BlockingArrayQueue data structure resolved; a new ArrayTernaryTrie data structure to improve header field storage, making it faster to build. look up, efficient on RAM, cheap to GC, and more cache-friendly than a traditional trie; and a branchless hex-to-byte conversion statement. The results are a 30%-faster microbenchmark on amd64, with 50% less Young Gen garbage collections. Lovely to see low-level infrastructure libs like Jetty getting this kind of optimization.
(tags: jetty java mechanical-sympathy optimization coding tries)
-
craft beer kegs for hire in Dublin, Sligo, Limerick and Galway. Needs more Metalman, of course ;)
A Continuous Packaging Pipeline
presentation describing some nice automation tools for packaging vendor code for deployment
(tags: deployment fosdem presentations slides debian deb fpm apt-get)
-
a new C++ template library from Google which implements an in-memory B-Tree container type, suitable for use as a drop-in replacement for std::map, set, multimap and multiset. Lower memory use, and reportedly faster due to better cache-friendliness
(tags: c++ google data-structures containers b-trees stl map set open-source)
Clairvoyant Squirrel: Large Scale Malicious Domain Classification
Storm-based service to detect malicious DNS domain usage from streaming pcap data in near-real-time. Uses string features in the DNS domain, along with randomness metrics using Markov analysis, combined with a Random Forest classifier, to achieve 98% precision at 10,000 matches/sec
(tags: storm distributed distcomp random-forest classifiers machine-learning anti-spam slides)
"Security Engineering" now online in full
Ross Anderson says: 'I’m delighted to announce that my book Security Engineering – A Guide to Building Dependable Distributed Systems is now available free online in its entirety. You may download any or all of the chapters from the book’s web page.'
(tags: security books reference coding software encryption ross-anderson)
Slide Rule Calculations By Example
Harder than using a calculator, that's for sure
(tags: slide-rule gadgets tech history antiques calculating)
A Continuous Packaging Pipeline
presentation describing some nice automation tools for packaging vendor code for deployment
(tags: deployment fosdem presentations slides debian deb fpm apt-get)
-
a new C++ template library from Google which implements an in-memory B-Tree container type, suitable for use as a drop-in replacement for std::map, set, multimap and multiset. Lower memory use, and reportedly faster due to better cache-friendliness
(tags: c++ google data-structures containers b-trees stl map set open-source)
Clairvoyant Squirrel: Large Scale Malicious Domain Classification
Storm-based service to detect malicious DNS domain usage from streaming pcap data in near-real-time. Uses string features in the DNS domain, along with randomness metrics using Markov analysis, combined with a Random Forest classifier, to achieve 98% precision at 10,000 matches/sec
(tags: storm distributed distcomp random-forest classifiers machine-learning anti-spam slides)
-
'Intel's Intelligent Platform Management Interface (IPMI), which is implemented and added onto by all server vendors, grant system administrators with a means to manage their hardware in an Out of Band (OOB) or Lights Out Management (LOM) fashion. However there are a series of design, utilization, and vendor issues that cause complex, pervasive, and serious security infrastructure problems. The BMC is an embedded computer on the motherboard that implements IPMI; it enjoys an asymmetrical relationship with its host, with the BMC able to gain full control of memory and I/O, while the server is both blind and impotent against the BMC. Compromised servers have full access to the private IPMI network The BMC uses reusable passwords that are infrequently changed, widely shared among servers, and stored in clear text in its storage. The passwords may be disclosed with an attack on the server, over the network network against the BMC, or with a physical attack against the motherboard (including after the server has been decommissioned.) IT's reliance on IPMI to reduce costs, the near-complete lack of research, 3rd party products, or vendor documentation on IPMI and the BMC security, and the permanent nature of the BMC on the motherboard make it currently very difficult to defend, fix or remediate against these issues.' (via Tony Finch)
(tags: via:fanf security ipmi power-management hardware intel passwords bios)
-
Massive Java concurrency fail in recent 1.6 and 1.7 JDK releases -- the java.util.HashMap type now spin-locks on an AtomicLong in its constructor. Here's the response from the author: 'I'll acknowledge right up front that the initialization of hashSeed is a bottleneck but it is not one we expected to be a problem since it only happens once per Hash Map instance. For this code to be a bottleneck you would have to be creating hundreds or thousands of hash maps per second. This is certainly not typical. Is there really a valid reason for your application to be doing this? How long do these hash maps live?' Oh dear. Assumptions of "typical" like this are not how you design a fundamental data structure. fail. For now there is a hacky reflection-based workaround, but this is lame and needs to be fixed as soon as possible. (Via cscotta)
(tags: java hashmap concurrency bugs fail security hashing jdk via:cscotta)
High Scalability - geo-aware traffic load balancing and caching at CNBC.com
Dyn's anycast DNS service, as used by CNBC.com
(tags: anycast dns scalability dyn failover geographical load-balancing)
Using Statsd and Graphite From a Rails App
Reasonable simple, from the looks of it
(tags: rails graphite metrics service-metrics ruby)
The colour of London's commute
Nice visualisation. 'What the map shows is the mix of transport to work of residents living in each part of London*, using ONS data at Middle Super Output Area (MSOA) level. Each MSOA is given an RGB colour determined by the modal share, with red colours representing travel by car, taxi or motorbike, blue travel by public transport and green cycling or walking. The result is a fairly simple pattern, with motor vehicles predominating on London's fringes, public transport in the inner suburbs and cycling and walking in the very centre. Those tendrils of blue reaching out presumably represent major public transport links.'
(tags: data visualisation dataviz london mapping via:ldoody)
Where are the free WiFi spots in Dublin City Centre?
hooray, free wifi! beautiful Invader-style pixel-art mosaics to highlight them, too. nice one Joe
-
some lovely pixel art to advertise the free wifi areas, by Craig Robinson. I see a girl in pyjamas, a Dub hurler, a viking, Molly Malone, Phil Lynott, Oscar Wilde, a Moore St market trader, a busker, and the Spire...
DuckDuckGo Architecture - 1 Million Deep Searches a Day and Growing
thumbs-up for DNSMadeEasy's Global Traffic Director anycast-based geographically-segmented DNS service, in particular
(tags: dns architecture scalability search duckduckgo geoip anycast)
Announcing Ribbon: Tying the Netflix Mid-Tier Services Together
Netflix' load balancing client-side library, open source
(tags: netflix load-balancing coding libraries client-side open-source)
-
'an expressive toolset for constructing scalable, resilient [service] architectures. It works in the cloud, in the data center, and on your laptop, and it makes your system diagram visible and inevitable. Inevitable systems coordinate automatically to interconnect, removing the hassle of manual configuration of connection points (and the associated danger of human error).' Looks like a pretty neat cluster deployment tool; driven from a single configuration file, using Chef, integrating closely with AWS and providing many useful additional features
(tags: chef deployment clusters knife services aws ec2 ops ironfan demo)
Fox DMCA Takedowns Order Google to Remove Fox DMCA Takedowns
Chilling Effects is setup to stop the ‘chilling effects’ of Internet censorship. Google sees this as a good thing and sends takedown requests it receives to be added to the database. Fox sends takedown requests to Google for pages which the company says contain links to material it holds the copyright to. Those pages include those on Chilling Effects which show which links Fox wants taken down. Google delists the Chilling Effects pages from its search engine, thus completing the circle and defeating the very reason Chilling Effects was set up for in the first place.
(tags: chilling-effects copyright internet legal dmca google law)
-
At Railscamp X it became clear there is a gap in the current HTTP specification. There are many ways for a developer to screw up their implementation, but no code to share the nature of the error with the end user. We humbly suggest the following status codes are included in the HTTP spec in the 7XX range.
Includes such useful status codes as "724 - This line should be unreachable". How Newegg crushed the “shopping cart” patent and saved online retail
Very cool account of Newegg's battle against a ludicrous patent-troll shakedown. Great quote from their Chief Legal Officer, Lee Cheng:
Patent trolling is based upon deficiencies in a critical, but underdeveloped, area of the law. The faster we drive these cases to verdict, and through appeal, and also get legislative reform on track, the faster our economy will be competitive in this critical area. We're competing with other economies that are not burdened with this type of litigation. China doesn't have this, South Korea doesn't have this, Europe doesn't have this. [...] It's actually surprising how quickly people forget what Lemelson did. [referring to Jerome Lemelson, an infamous patent troll who used so-called "submarine patents" to make billions in licensing fees.] This activity is very similar. Trolls right now "submarine" as well. They use timing, like he used timing. Then they pop up and say "Hello, surprise! Give us your money or we will shut you down!" Screw them. Seriously, screw them. You can quote me on that.
(tags: patent-trolls east-texas newegg shopping-cart swpat software-patents patents ecommerce soverain)
Implementing strcmp, strlen, and strstr using SSE 4.2 instructions - strchr.com
Using new Intel Core i7 instructions to speed up string manipulation.
Fascinating stuff. SSE ftw(tags: sse optimization simd assembly intel i7 intel-core strstr strings string-matching strchr strlen coding)
All polar bears descended from one Irish grizzly
'THE ARCTIC'S DWINDLING POPULATION of polar bears all descend from a single mamma brown bear which lived 20,000 to 50,000 years ago in present-day Ireland, new research suggests. DNA samples from the great white carnivores - taken from across their entire range in Russia, Canada, Greenland, Norway and Alaska - revealed that every individual's lineage could be traced back to this Irish forebear.' More than the average bear, I guess
(tags: animals biology science dna history ireland bears polar-bears grizzly-bears via:ben)
Basho | Alert Logic Relies on Riak to Support Rapid Growth
'The new [Riak-based] analytics infrastructure performs statistical and correlation processing on all data [...] approximately 5 TB/day. All of this data is processed in real-time as it streams in. [...] Alert Logic’s analytics infrastructure, powered by Riak, achieves performance results of up to 35k operations/second across each node in the cluster – performance that eclipses the existing MySQL deployment by a large margin on single node performance. In real business terms, the initial deployment of the combination of Riak and the analytic infrastructure has allowed Alert Logic to process in real-time 7,500 reports, which previously took 12 hours of dedicated processing every night.' Twitter discussion here: https://twitter.com/fisherpk/status/294984960849367040 , which notes 'heavily cached SAN storage, 12 core blades and 90% get to put ops', and '3 riak nodes, 12-cores, 30k get heavy riak ops/sec. 8 nodes driving ops to that cluster'. Apparently the use of SAN storage on all nodes is historic, but certainly seems to have produced good iops numbers as an (expensive) side-effect...
(tags: iops riak basho ops systems alert-logic storage nosql databases)
Turn a Raspberry Pi Into an AirPlay Receiver for Streaming Music in Your Living Room
hooray, a viable domestic Raspberry Pi use case at last ;)
Antigua Government Set to Launch “Pirate” Website To Punish United States
oh the lulz.
The Government of Antigua is planning to launch a website selling movies, music and software, without paying U.S. copyright holders. The Caribbean island is taking the unprecedented step because the United States refuses to lift a trade “blockade” preventing the island from offering Internet gambling services, despite several WTO decisions in Antigua’s favor. The country now hopes to recoup some of the lost income through a WTO approved “warez” site.
(tags: us-politics antigua piracy filesharing pirate gambling wto ip blockades)
-
An article by Nathan "Storm" Marz describing the system architecture he's been talking about for a while; Hadoop-driven batch view, Storm-driven "speed view", and a merging API
(tags: storm systems architecture lambda-architecture design Hadoop)
Network graph viz of Irish politicians and organisations on Twitter
generated by the Clique Research Cluster at UCD and DERI. 'a visualization of the unified graph representation for the users in the data, produced using Gephi and sigma.js. Users are coloured according to their community (i.e. political affiliation). The size of each node is proportional to its in-degree (i.e. number of incoming links).' sigma.js provides a really user-friendly UI to the graphs, although -- as with most current graph visualisations -- it'd be particularly nice if it was possible to 'tease out' and focus on interesting nodes, and get a pasteable URL of the result, in context. Still, the most usable graph viz I've seen in a while...
(tags: graphs dataviz ucd research ireland twitter networks community sigma.js javascript canvas gephi)
-
Incredible blog of book covers and illustrations, much from the 1970s
(tags: illustration art prints 1970s graphics)
Namazu-e: Earthquake catfish prints
'In November 1855, the Great Ansei Earthquake struck the city of Edo (now Tokyo), claiming 7,000 lives and inflicting widespread damage. Within days, a new type of color woodblock print known as namazu-e (lit. "catfish pictures") became popular among the residents of the shaken city. These prints featured depictions of mythical giant catfish (namazu) who, according to popular legend, caused earthquakes by thrashing about in their underground lairs. In addition to providing humor and social commentary, many prints claimed to offer protection from future earthquakes.'
Implementing Real-Time Trending Topics With a Distributed Rolling Count Algorithm in Storm
Storm demo with a reasonably complex topology. 'how to implement a distributed, real-time trending topics algorithm in Storm. It uses the latest features available in Storm 0.8 (namely tick tuples) and should be a good starting point for anyone trying to implement such an algorithm for their own application. The new code is now available in the official storm-starter repository, so feel free to take a deeper look.'
(tags: storm distcomp distributed tick-tuples demo)
-
'a UNIX init scheme with service supervision' - philosophically similar to daemontools, widely packaged, LSB init.d-script-compliant, BSD-licensed
'The Uni?ed Logging Infrastructure for Data Analytics at Twitter' [PDF]
A picture of how Twitter standardized their internal service event logging formats to allow batch analysis and analytics. They surface service metrics to dashboards from Pig jobs on a daily basis, which frankly doesn't sound too great...
(tags: twitter analytics event-logging events logging metrics)
Ivan Beshoff, Last Survivor Of Mutiny on the Potemkin, founded Beshoffs
wow. there's a factoid! the "Beshoffs" chain of chippers in Dublin were founded by this historic figure, who died in 1987
(tags: factoids beshoffs chips dublin history small-world battleship-potemkin russia)
-
Excellent demo of how use of a block cipher with a known secret key makes an insecure MAC. "In short, CBC-MAC is a Message Authentication Code, not a strong hash function. While MACs can be built out of hash functions (e.g. HMAC), and hash functions can be built out of block ciphers like AES, not all MACs are also hash functions. CBC-MAC in particular is completely unsuitable for use as a hash function, because it only allows two parties with knowledge of a particular secret key to securely transmit messages between each other. Anyone with knowledge of that key can forge the messages in a way that keeps the MAC (“hash value”) the same. All you have to do is run the forged message through CBC-MAC as usual, then use the AES decryption operation on the original hash value to find the last intermediate state. XORing this state with the CBC-MAC for the forged message yields a new block of data which, when appended to the forged message, will cause it to have the original hash value. Because the input is taken backwards, you can either modify the first block of the file, or just run the hash function backwards until you reach the block that you want to modify. You can make a forged file pass the hash check as long as you can modify an arbitrary aligned 16-byte block in it."
Scala 2.8 Collections API -- Performance Characteristics
wow. Every library vending a set of collection types should have a page like this
(tags: collections scala performance reference complexity big-o coding)
So, after just over 3 and a half years, I'm leaving Amazon.
It's been great fun -- I can honestly say, even with my code being used by hundreds of millions of users in SpamAssassin and elsewhere, I hadn't really had to come to grips with the distributed systems problems that an Amazon-scale service involves.
During my time at Amazon, I've had the pleasure of building out a brand-new, groundbreaking innovative internal service, from scratch to its current status where it's deployed in production datacenters worldwide. It's a low-latency service, used to monitor Amazon's internal networks using massive quantities of measurement data and machine learning algorithms. It's really very nifty, and I'm quite proud of what we've achieved. I was lucky to work closely with some very smart people during this, too -- Amazon has some top-notch engineers.
But time to move on! In a week's time, I'll be joining Swrve to work on the server-side architecture of their system. Swrve have a very interesting product, extending the A/B-testing model into gaming, and a great team; and it'll be nice to get back into startup-land once again, for a welcome change. (It's not all roses working for a big company. ;) I'm looking forward to it. Who knows, I may even start blogging here again...
Pity about losing those 12 phone tool icons though!
CES: Worse Products Through Software
'The companies out there that know how to make decent software have been steadily eating their way into and through markets previously dominated by the hardware guys. Apple with music players, TiVo with video recording, even Microsoft with its decade-old Xbox Live service, which continues to embarrass the far weaker offerings from Sony and Nintendo. (And, yes, iOS is embarrassing all three console makers.)' See also Mat Honan's article at http://www.wired.com/gadgetlab/2012/12/internet-tv-sucks/ : 'Smart TVs are just too complicated. They have terrible user interfaces that differ wildly from device to device. It’s not always clear what content is even available — for example, after more than two years on the market, you still can’t watch Hulu Plus on your Google TV. [...] They give us too many options for apps most people will never use, and they do so at the expense of making it simple to find the shows and movies we want to watch, no matter where they are, be it online or on the air. As NPD puts it in the conclusion to its report, “OEMs and retailers need to focus less on new innovation in this space and more on simplification of the user experience and messaging if they want to drive additional, and new, behaviors on the TV.” Which is a more polite way of saying, clean up your horrible interface, Samsung.' (via Craig)
(tags: via:craig design ui tv hardware television sony ces software)
Fast Packed String Matching for Short Patterns [paper, PDF]
'Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like NLP, information retrieval and computational biology. In the last two decades a general trend has appeared trying to exploit the power of the word RAM model to speed-up the performances of classical string matching algorithms. [...] In this paper we use specialized word-size packed string matching instructions, based on the Intel streaming SIMD extensions (SSE) technology, to design very fast string matching algorithms in the case of short patterns.' Reminds me of http://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm , but taking advantage of SIMD extensions, which should make things nice and speedy, at the cost of tying it to specific hardware platforms. (via Tony Finch)
(tags: rabin-karp algorithms strings string-matching papers via:fanf)
Irish EU Council Presidency proposes destruction of right to privacy | EDRI
'For example, based on the current situation in Ireland, the idea is that all companies can do whatever they want with personal data, without fear of sanction. Sanctions, such as fines, “should be optional or at least conditional upon a prior warning or reprimand”. In other words, do what you want, the worst that can happen is that you will receive a warning.' Shame! Daragh O'Brien's comment: 'utter idiocy'. ( at https://twitter.com/daraghobrien/status/292041500873850880 )
(tags: privacy ireland eu fail data-protection data-privacy politics)
Belgium plans artificial island to store wind power
' Belgium is planning to build a doughnut-shaped island in the North Sea that will store wind energy by pumping water out of a hollow in the middle, as it looks for ways to lessen its reliance on nuclear power. One of the biggest problems with electricity is that it is difficult to store and the issue is exaggerated in the case of renewable energy from wind or sun because it is intermittent depending on the weather.' 'The island is still in the planning stages, but will be built out of sand 3 km off the Belgian coast near the town of Wenduine if it gets the final go-ahead. The island, which would also work as an offshore substation to transform the voltage of the electricity generated by wind turbines, could take five or more years to plan and build.'
(tags: power via:daev belgium wind-power hydro sea islands manmade storage)
-
so Reddit uses the Wilson score confidence interval approach, it turns out; more details here (via Toby diPasquale)
(tags: ranking rating algorithms popularity python wilson-score-interval sorting statistics confidence-sort)
The Neurocritic: Fisher-Price Synesthesia
'Synesthesia [jm: sic] is a rare perceptual phenomenon in which the stimulation of one sensory modality, or exposure to one type of stimulus, leads to a sensory (or cognitive) experience in a different, non-stimulated modality. For instance, some synesthetes have colored hearing while others might taste shapes. GRAPHEME-COLOR SYNESTHESIA is the condition in which individual printed letters are perceived in a specific, constant color. This occurs involuntarily and in the absence of colored font. [...] A new study has identified 11 synesthetes whose grapheme-color mappings appear to be based on the Fisher Price plastic letter set made between 1972-1990.' (via Dave Green)
(tags: fisher-price synesthesia synaesthesia colors colours sight neuroscience brain via-dave-green toys)
Extreme Performance with Java - Charlie Hunt [slides, PDF]
presentation slides for Charlie Hunt's 2012 QCon presentation, where he discusses 'what you need to know about a modern JVM in order to be effective at writing a low latency Java application'. The talk video is at http://www.infoq.com/presentations/Extreme-Performance-Java
(tags: low-latency charlie-hunt performance java jvm presentations qcon slides pdf)
-
'Bloomsday Map Of Dublin Based On Ulysses'. Beautiful! 'The Leopold’s Day map is a stunning marriage of typography and cartography plotting all the streets alluded to by Joyce in Ulysses which were in existence on June 16th 1904. It is accompanied by a comprehensive and beautifully typeset directory with over 400 entries noting the landmarks, business and people of Dublin that were referenced in the text. The Leopold’s Day map is an exquisitely detailed, limited edition piece. It has an impressive dimension of 1000mm x 700mm which means it can also fit into a ready made frame. Price: €125.00'
(tags: bloomsday ulysses dublin ireland maps james-joyce art prints)
aaw/hyperloglog-redis - GitHub
'This gem is a pure Ruby implementation of the HyperLogLog algorithm for estimating cardinalities of sets observed via a stream of events. A Redis instance is used for storing the counters.'
(tags: cardinality sets redis algorithms ruby gems hyperloglog)
-
'uses DNS witchcraft to allow you to access US/UK-only audio and video services like Hulu.com, BBC iPlayer, etc. without using a VPN or Web proxy.' According to http://superuser.com/questions/461316/how-does-tunlr-work , it proxies the initial connection setup and geo-auth, then mangles the stream address to stream directly, not via proxy. Sounds pretty useful
(tags: proxy network vpn dns tunnel content video audio iplayer bbc hulu streaming geo-restriction)
OmniTI's Experiences Adopting Chef
A good, in-depth writeup of OmniTI's best practices with respect to build-out of multiple customer deployments, using multi-tenant Chef from a version-controlled repo. Good suggestions, and I am really looking forward to this bit: 'Chef tries to turn your system configuration into code. That means you now inherit all the woes of software engineering: making changes in a coordinated manner and ensuring that changes integrate well are now an even greater concern. In part three of this series, we’ll look at applying software quality assurance and release management practices to Chef cookbooks and roles.'
(tags: chef deployment ops omniti systems vagrant automation)
-
Twitter's Scala style guide. 'While highly effective, Scala is also a large language, and our experiences have taught us to practice great care in its application. What are its pitfalls? Which features do we embrace, which do we eschew? When do we employ “purely functional style”, and when do we avoid it? In other words: what have we found to be an effective use of the language? This guide attempts to distill our experience into short essays, providing a set of best practices. Our use of Scala is mainly for creating high volume services that form distributed systems — and our advice is thus biased — but most of the advice herein should translate naturally to other domains.'
Notes on Distributed Systems for Young Bloods -- Something Similar
'Below is a list of some lessons I’ve learned as a distributed systems engineer that are worth being told to a new engineer. Some are subtle, and some are surprising, but none are controversial. This list is for the new distributed systems engineer to guide their thinking about the field they are taking on. It’s not comprehensive, but it’s a good beginning.' This is a pretty nice list, a little over-stated, but that's the format. I particularly like the following: 'Exploit data-locality'; 'Learn to estimate your capacity'; 'Metrics are the only way to get your job done'; 'Use percentiles, not averages'; 'Extract services'.
-
'a Nagios plugin to poll Graphite'. Necessary, since service metrics are the true source of service health information
(tags: nagios graphite service-metrics ops)
paperplanes. The Virtues of Monitoring, Redux
A rather vague and touchy-feely "state of the union" post on monitoring. Good set of links at the end, though; I like the look of Sensu and Tasseo, but am still unconvinced about the value of Boundary's offering
(tags: monitoring metrics ops)
What happened to KHTML after Apple announced Safari
'There was a huge amount of excitement at the announcement that Safari would be using KHTML. At that time, it was almost a given that the OSS rendering engine was Gecko. KHTML was KDE's little engine that could. But nobody ever expected it to be picked up by other folks. One of the original parts of the KHTML-to-OS X port was KWQ (pronounced, "quack") that abstracted out the KDE API portions that were used in KHTML. Folks were pretty ecstatic at first. It seemed very validating. But that changed quickly. As Zack's post indicates, WebKit became a thing of unmergable code-drops. Even inside of the KDE community there became a split between the KHTML purists and the WebKit faction. They'd previously more or less all been KHTML developers, but post-WebKit there was something of a pragmatists vs. idealists split. Zack fell on the latter side of that (for understandable reasons: there was an existing community project, with its own set of values, and that was hijacked to a large extent by WebKit). A few years later WebKit transformed itself into a more or less valid open source project (see webkit.org), but that didn't close the rift in the KDE community between the two, at that point rather divergent, rendering engines. There's still some remaining melancholy that stems from that initial hope and what could have potentially been, but wasn't.'
(tags: history safari open-source code-drops over-the-wall webkit khtml kde oss apple)
-
whoa. (via Dave O'Riordan)
Dan McKinley :: Whom the Gods Would Destroy, They First Give Real-time Analytics
'It's important to divorce the concepts of operational metrics and product analytics. [..] Funny business with timeframes can coerce most A/B tests into statistical significance.' 'The truth is that there are very few product decisions that can be made in real time.' HN discussion: http://news.ycombinator.com/item?id=5032588
(tags: real-time analytics statistics a-b-testing)
Greyhound agrees to change consumer contracts and make refunds - National Consumer Agency
Take note, switchers: 'The National Consumer Agency (NCA) has received a commitment from Greyhound that it will amend certain terms in its standard consumer contract, which the NCA thinks are unfair to consumers. This will be done by January 18 2013. Among the terms considered unfair by the NCA are that consumers must forfeit their credit balance and pay a €45 administration fee, if they cancel their contract with Greyhound within 12 months. If you were charged money in these circumstances, Greyhound has agreed to refund you. Greyhound will communicate these changes to all of its consumers by 18 January 2013. If you have any questions about the changes or getting a refund, you should contact Greyhound directly.'
Pushover: Simple Mobile Notifications for Android and iOS
'Pushover makes it easy to send real-time notifications to your Android and iOS devices.' extremely simple HTTPS API; 'Pushover has no monthly subscription fees and users will always be able to receive unlimited messages for free. Most applications can send messages for free, subject to monthly limits.' Also supported by ifttt.com
-
'an elegant and simple HTTP library for Python, built for human beings.' 'Requests is an Apache2 Licensed HTTP library, written in Python, for human beings. Python’s standard urllib2 module provides most of the HTTP capabilities you need, but the API is thoroughly broken. It was built for a different time — and a different web. It requires an enormous amount of work (even method overrides) to perform the simplest of tasks. Requests takes all of the work out of Python HTTP/1.1 — making your integration with web services seamless. There’s no need to manually add query strings to your URLs, or to form-encode your POST data. Keep-alive and HTTP connection pooling are 100% automatic, powered by urllib3, which is embedded within Requests.'
Surprisingly Good Evidence That Real Name Policies Fail To Improve Comments
'Enough theorizing, there’s actually good evidence to inform the debate. For 4 years, Koreans enacted increasingly stiff real-name commenting laws, first for political websites in 2003, then for all websites receiving more than 300,000 viewers in 2007, and was finally tightened to 100,000 viewers a year later after online slander was cited in the suicide of a national figure. The policy, however, was ditched shortly after a Korean Communications Commission study found that it only decreased malicious comments by 0.9%. Korean sites were also inundated by hackers, presumably after valuable identities. Further analysis by Carnegie Mellon’s Daegon Cho and Alessandro Acquisti, found that the policy actually increased the frequency of expletives in comments for some user demographics. While the policy reduced swearing and “anti-normative” behavior at the aggregate level by as much as 30%, individual users were not dismayed. “Light users”, who posted 1 or 2 comments, were most affected by the law, but “heavy” ones (11-16+ comments) didn’t seem to mind. Given that the Commission estimates that only 13% of comments are malicious, a mere 30% reduction only seems to clean up the muddied waters of comment systems a depressingly negligent amount. The finding isn’t surprising: social science researchers have long known that participants eventually begin to ignore cameras video taping their behavior. In other words, the presence of some phantom judgmental audience doesn’t seem to make us better versions of ourselves.' (via Ronan Lyons)
(tags: anonymity identity policy comments privacy politics new-media via:ronanlyons)
HAT-trie: A Cache-conscious Trie-based Data Structure for Strings [PDF]
'Tries are the fastest tree-based data structures for managing strings in-memory, but are space-intensive. The burst-trie is almost as fast but reduces space by collapsing trie-chains into buckets. This is not however, a cache-conscious approach and can lead to poor performance on current processors. In this paper, we introduce the HAT-trie, a cache-conscious trie-based data structure that is formed by carefully combining existing components. We evaluate performance using several real-world datasets and against other highperformance data structures. We show strong improvements in both time and space; in most cases approaching that of the cache-conscious hash table. Our HAT-trie is shown to be the most e?cient trie-based data structure for managing variable-length strings in-memory while maintaining sort order.' (via Tony Finch)
(tags: via:fanf data-structures tries cache-aware trees)
The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases [PDF]
'Main memory capacities have grown up to a point where most databases ?t into RAM. For main-memory database systems, index structure performance is a critical bottleneck. Traditional in-memory data structures like balanced binary search trees are not ef?cient on modern hardware, because they do not optimally utilize on-CPU caches. Hash tables, also often used for main-memory indexes, are fast but only support point queries. To overcome these shortcomings, we present ART, an adaptive radix tree (trie) for ef?cient indexing in main memory. Its lookup performance surpasses highly tuned, read-only search trees, while supporting very ef?cient insertions and deletions as well. At the same time, ART is very space ef?cient and solves the problem of excessive worst-case space consumption, which plagues most radix trees, by adaptively choosing compact and ef?cient data structures for internal nodes. Even though ART’s performance is comparable to hash tables, it maintains the data in sorted order, which enables additional operations like range scan and pre?x lookup.' (via Tony Finch)
(tags: via:fanf data-structures trees indexing cache-aware tries)
Ef?cient In-Memory Indexing with Generalized Pre?x Trees [PDF]
'Ef?cient data structures for in-memory indexing gain in importance due to (1) the exponentially increasing amount of data, (2) the growing main-memory capacity, and (3) the gap between main-memory and CPU speed. In consequence, there are high performance demands for in-memory data structures. Such index structures are used—with minor changes—as primary or secondary indices in almost every DBMS. Typically, tree-based or hash-based structures are used, while structures based on prefix-trees (tries) are neglected in this context. For tree-based and hash-based structures, the major disadvantages are inherently caused by the need for reorganization and key comparisons. In contrast, the major disadvantage of trie-based structures in terms of high memory consumption (created and accessed nodes) could be improved. In this paper, we argue for reconsidering pre?x trees as in-memory index structures and we present the generalized trie, which is a pre?x tree with variable prefix length for indexing arbitrary data types of fixed or variable length. The variable prefix length enables the adjustment of the trie height and its memory consumption. Further, we introduce concepts for reducing the number of created and accessed trie levels. This trie is order-preserving and has deterministic trie paths for keys, and hence, it does not require any dynamic reorganization or key comparisons. Finally, the generalized trie yields improvements compared to existing in-memory index structures, especially for skewed data. In conclusion, the generalized trie is applicable as general-purpose in-memory index structure in many different OLTP or hybrid (OLTP and OLAP) data management systems that require balanced read/write performance.' (via Tony Finch)
(tags: via:fanf prefix-trees tries data-structures)
A Non-Blocking HashTable by Dr. Cliff Click : programming
Proggit discovers the NonBlockingHashMap. This comment from Boundary's cscotta is particularly interesting: "The code is intricate and curiously-formatted, but NBHM is quite excellent. The majority of our analytics platform is backed by NBHMs updated rapidly in parallel. Cliff's a great, friendly, approachable guy; if you have any specific questions about the approaches or implementation, he may be happy to answer."
(tags: data-structures algorithms non-blocking concurrency threading multicore cliff-click azul maps java boundary)
English Letter Frequency Counts: Mayzner Revisited or ETAOIN SRHLDCU
Amazing how consistent the n-gram counts are between Peter Norvig's analysis (here) against the 20120701 Google Books corpus, and Mark Mayzner's 20,000-word corpus from the early 1960s
(tags: english statistics n-grams words etaoin-shrdlu peter-norvig mark-mayzner)
Solving monitoring state storage problems using Redis
a nice basic Redis-in-practice post
-
'a HTTP client mock library for Python, 100% inspired on ruby's FakeWeb [ https://github.com/chrisk/fakeweb ].' 'HTTPretty monkey patches Python's socket core module, reimplementing the HTTP protocol by mocking requests and responses.'
(tags: mocking testing http python ruby unit-tests tests monkey-patching)
Why did infinite scroll fail at Etsy?
'A/B testing must be done in a modularized fashion. The “fail” case he gave was when Etsy spent months developing and testing infinite scroll to their search listings, only to find that it had a negative impact on engagement.' [...] 'instead of having the goal of “test infinite scroll,” Etsy realized it needed to test each assumption separately, and this going forward is their game plan.'
(tags: usability testing design etsy ab-testing test modularization via:hn)
-
Annotations-based git-like CLI helper for Java
"Matters Computational - Ideas, Algorithms, Source Code"
A hefty tome (in PDF format) containing lots of interesting algorithms and computational tricks; code is GPLv3 licensed
(tags: algorithms computation via:cliffc pdf books coding)
Dan McKinley :: Effective Web Experimentation as a Homo Narrans
Good demo from Etsy's A/B testing, of how the human brain can retrofit a story onto statistically-insignificant results. To fix: 'avoid building tooling that enables fishing expeditions; limit our post-hoc rationalization by explicitly constraining it before the experiment. Whenever we test a feature on Etsy, we begin the process by identifying metrics that we believe will change if we 1) understand what is happening and 2) get the effect we desire.'
(tags: testing etsy statistics a-b-testing fishing ulysses-contract brain experiments)
Lesser known crimes: do you own that copyright?
A very interesting crime on the Irish statute books:
Section 141 of the Copyright and Related Rights Act 2000 provides: A person who, for financial gain, makes a claim to enjoy a right under this Part [ie. copyright] which is, and which he or she knows or has reason to believe is, false, shall be guilty of an offence and shall be liable on conviction on indictment to a fine not exceeding £100,000, or to imprisonment for a term not exceeding 5 years, or both.
(tags: ireland copyright ip false-claims law)
-
Makers of "Feck It Sure It's Grand" merchandise are flogging stuff for 50 cent (+ shipping) in their "out with the old" sale
Patent trolls want $1,000 for using scanners
We are truly living in the future -- a dystopian future, but one nonetheless. A patent troll manages to obtain "gobbledigook" patents on using a scanner to scan to PDF, then attempts to shake down a bunch of small companies before eventually running into resistance, at which point it "forks" into a bunch of algorithmically-named shell companies, spammer-style, sending the same demands. Those demands in turn contain this beauty of Stockholm-syndrome-inducing prose:
'You should know also that we have had a positive response from the business community to our licensing program. As you can imagine, most businesses, upon being informed that they are infringing someone’s patent rights, are interested in operating lawfully and taking a license promptly. Many companies have responded to this licensing program in such a manner. Their doing so has allowed us to determine that a fair price for a license negotiated in good faith and without the need for court action is a payment of $900 per employee. We trust that your organization will agree to conform your behavior to respect our patent rights by negotiating a license rather than continuing to accept the benefits of our patented technology without a license. Assuming this is the case, we are prepared to make this pricing available to you.'
And here's an interesting bottom line:The best strategy for target companies? It may be to ignore the letters, at least for now. “Ignorance, surprisingly, works,” noted Prof. Chien in an e-mail exchange with Ars. Her study of startups targeted by patent trolls found that when confronted with a patent demand, 22 percent ignored it entirely. Compare that with the 35 percent that decided to fight back and 18 percent that folded. Ignoring the demand was the cheapest option ($3,000 on average) versus fighting in court, which was the most expensive ($870,000 on average). Another tactic that clearly has an effect: speaking out, even when done anonymously. It hardly seems a coincidence that the Project Paperless patents were handed off to a web of generic-sounding LLCs, with demand letters signed only by “The Licensing Team,” shortly after the “Stop Project Paperless” website went up. It suggests those behind such low-level licensing campaigns aren’t proud of their behavior. And rightly so.
(tags: patents via:fanf networks printing printers scanning patent-trolls project-paperless adzpro gosnel faslan)
Keep predicting and you’ll be right eventually?
debunking Ken Ring, the kiwi “long term weather prediction” “scientist” who gets trundled out every year around this time
(tags: ken-ring weather predictions ireland rain)
29c3 HashDOS presentation slides (PDF)
Summary: MurmurHash still vulnerable, likewise Cityhash and Python's hash -- use SipHash
(tags: via:fanf cityhash siphash hash dos security hashdos murmurhash)
Scaling Crashlytics: Building Analytics on Redis 2.6
How one analytics/metrics co is using Redis on the backend
(tags: analytics redis presentation metrics)
Systemd, systemd-nspawn, and namespaces for Linux service compartmentalization
"Using ReadOnlyDirectories= andInaccessibleDirectories= you may setup a file system namespace jail for your service. Initially, it will be identical to your host OS' file system namespace. By listing directories in these directives you may then mark certain directories or mount points of the host OS as read-only or even completely inaccessible to the daemon."
(tags: security systemd jails namespaces linux compartmentalisation)
GNUTLS project is leaving/attempting to leave the GNU project/FSF
seems there's trouble around governance and rights of the project's developers. GNU sed and grep's maintainer, too: http://article.gmane.org/gmane.comp.lang.smalltalk.gnu.general/7873
(tags: gnu software free free-software governance copyright management richard-stallman)
Raspberry Pi XBMC Shootout: Raspbmc vs OpenELEC vs XBian
summary: OpenELEC wins! (via Davie Norrie)
(tags: via:dnorrie openelec xbmc raspberry-pi tv htpc linux)
-
the enStratus “solo installer”; what they use for one-box testing, staging, and customer stack deployment, using chef-solo and Vagrant
(tags: chef virtualization vagrant chef-solo deployment enstratus cluster stack)
-
'thin software layers don’t add much value, especially when you have many such layers piled on each other. Each layer has to be pushed onto your mental stack as you dive into the code. Furthermore, the layers of phyllo dough are permeable, allowing the honey to soak through. But software abstractions are best when they don’t leak. When you pile layer on top of layer in software, the layers are bound to leak.'
(tags: code design terminology food antipatterns)
The innards of Evernote's new business analytics data warehouse
replacing a giant MySQL star-schema reporting server with a Hadoop/Hive/ParAccel cluster
(tags: horizontal-scaling scalability bi analytics reporting evernote via:highscalability hive hadoop paraccel)
HBase Real-time Analytics & Rollbacks via Append-based Updates
Interesting concept for scaling up the write rate on massive key-value counter stores:
'Replace update (Get+Put) operations at write time with simple append-only writes and defer processing of updates to periodic jobs or perform aggregations on the fly if user asks for data earlier than individual additions are processed. The idea is simple and not necessarily novel, but given the specific qualities of HBase, namely fast range scans and high write throughput, this approach works very well.'
(tags: counters analytics hbase append sematext aggregation big-data)
Cliff Click in "A JVM Does What?"
interesting YouTubed presentation from Azul's Cliff Click on some java/JVM innards
(tags: presentation concurrency jvm video java youtube cliff-click)
-
An attempt to catalogue some Emergency-era (ie. WWII) ground markings, used to notify US pilots that they were overflying the neutral Republic of Ireland
(tags: ireland eire history wwii the-emergency war geography mapping)
Bunnie Huang is building a once-off custom laptop design
As one commenter says, "it's like watching a Jedi construct his own light-saber.” Quad-core ARM chips, on-board FPGA (!), and lots of other amazing hacker-friendly features; sounds like a one-of-a-kind device
Authentication is machine learning
This may be the most insightful writing about authentication in years:
(via Tony Finch.)From my brief time at Google, my internship at Yahoo!, and conversations with other companies doing web authentication at scale, I’ve observed that as authentication systems develop they gradually merge with other abuse-fighting systems dealing with various forms of spam (email, account creation, link, etc.) and phishing. Authentication eventually loses its binary nature and becomes a fuzzy classification problem.
This is not a new observation. It’s generally accepted for banking authentication and some researchers like Dinei Florêncio and Cormac Herley have made it for web passwords. Still, much of the security research community thinks of password authentication in a binary way [..]. Spam and phishing provide insightful examples: technical solutions (like Hashcash, DKIM signing, or EV certificates), have generally failed but in practice machine learning has greatly reduced these problems. The theory has largely held up that with enough data we can train reasonably effective classifiers to solve seemingly intractable problems.
(tags: passwords authentication big-data machine-learning google abuse antispam dkim via:fanf)
Hotels to pay royalties on music - The Irish Times - Fri, Dec 14, 2012
'The operators of hotels, guesthouses and bed & breakfasts will have to pay royalties for any copyright music played in guest bedrooms [in Ireland]. [...] Under the agreement, the music charges will be set by Phonographic Performance Ireland Ltd (PPI). [...] When it initiated its case in 2010, the PPI said it was seeking payment of about €1 per bedroom per week or about 14 cent a night.' I don't understand this. Most hotels do not play music in the rooms themselves. Does this apply if there is no music playing in the bedroom? Does it apply if the customer brings their own music? Are Dublin Bus to be next?
-
'The trouble with the Lisp-hacker tradition is that it is overly focused on the problem of programming -- compilers, abstraction, editors, and so forth -- rather than the problems outside the programmer's cubicle. I conjecture that the Lisp-school essayists -- Raymond, Graham, and Yegge -- have not “needed mathematics” because they spend their time worrying about how to make code more abstract. This kind of thinking may lead to compact, powerful code bases, but in the language of economics, there is an opportunity cost.'
The Aggregate Magic Algorithms
Obscure, low-level bit-twiddling tricks -- specifically:
Absolute Value of a Float, Alignment of Pointers, Average of Integers, Bit Reversal, Comparison of Float Values, Comparison to Mask Conversion, Divide Rounding, Dual-Linked List with One Pointer Field, GPU Any, GPU SyncBlocks, Gray Code Conversion, Integer Constant Multiply, Integer Minimum or Maximum, Integer Power, Integer Selection, Is Power of 2, Leading Zero Count, Least Significant 1 Bit, Log2 of an Integer, Next Largest Power of 2, Most Significant 1 Bit, Natural Data Type Precision Conversions, Polynomials, Population Count (Ones Count), Shift-and-Add Optimization, Sign Extension, Swap Values Without a Temporary, SIMD Within A Register (SWAR) Operations, Trailing Zero Count.
Many of these would be insane to use in anything other than the hottest of hot-spots, but good to have on file. (via Toby diPasquale)(tags: hot-spots optimisation bit-twiddling algorithms via:codeslinger snippets)
Shell Scripts Are Like Gremlins
Shell Scripts are like Gremlins. You start out with one adorably cute shell script. You commented it and it does one thing really well. It’s easy to read, everyone can use it. It’s awesome! Then you accidentally spill some water on it, or feed it late one night and omgwtf is happening!?
+1. I have to wean myself off the habit of automating with shell scripts where a clean, well-unit-tested piece of code would work better.(tags: shell-scripts scripting coding automation sysadmin devops chef deployment)
-
"there's a nostalgic cost: your old original 386 DX33 system from early 1991 won't be able to boot modern Linux kernels anymore. Sniff." Now *THAT* is backwards compatibility.
(tags: linux backwards-compatibility 386 history linus-torvalds)
-
'The results are startlingly good. This 3D printed skull [see pic] looks almost real. This is the print quality everyone will be able to access when Mcor’s deal with Staples enables 3D printing from copy centers.'
BBC News - The hum that helps to fight crime
'Dr Harrison said: "If we have we can extract [the hum of the mains AC power's 50Hz wave] and compare it with the database, if it is a continuous recording, it will all match up nicely. "If we've got some breaks in the recording, if it's been stopped and started, the profiles won't match or there will be a section missing. Or if it has come from two different recordings looking as if it is one, we'll have two different profiles within that one recording." In the UK, because one national grid supplies the country with electricity, the fluctuations in frequency are the same the country over. So it does not matter if the recording has been made in Aberdeen or Southampton, the comparison will work.'
Two Sides For Salvation « Code as Craft
Etsy's MySQL master-master pair configuration, and how it allows no-downtime schema changes
(tags: database etsy mysql replication schema availability downtime)
GMail partial outage - Dec 10 2012 incident report [PDF]
TL;DR: a bad load balancer change was deployed globally, causing the impact. 21 minute time to detection. Single-location rollout is now on the cards
-
lovely signed and editioned prints by Dublin's best illustrators at good prices. Turns out this was in connection with a show a few days ago, so the best ones are now sold out -- I love the Chris Judge Liberty Hall print -- but there's still a few good ones left. Brian Gallagher's Georgian doorway is a beauty.
(tags: illustration dublin prints art chris-judge)
-
via Come Here To Me -- 'The whole population of the county at the time was under 60,000. Ringsend, Merrion, Monkstown, Bullock and Dalkey on the Southside and Ballybough, Clontarf, Sutton and Hoath/Howth on the Northside are marked. Taken from the book Dublin: through space and time (2001).'
Massive tracts of land were reclaimed since then, clearly -- the North bay comes all the way in to Ballybough!
Back-up Tut and other decoy spatial antiquities
I like this idea -- a complete facsimile of King Tut's burial chamber. Bldgblog comments:
“On the 90th anniversary of the discovery of King Tut’s tomb, an “authorized facsimile of the burial chamber” has been created, complete “with sarcophagus, sarcophagus lid and the missing fragment from the south wall.” The resulting duplicate, created with the help of high-res cameras and lasers, is “an exact facsimile of the burial chamber,” one that is now “being sent to Cairo by The Ministry of Tourism of Egypt.” [...]
'Interestingly, we read that this was "done under a licence to the University of Basel," which implies the very real possibility that unlicensed duplicate rooms might also someday be produced—that is, pirate interiors ripped or printed from the original data set, like building-scale "physibles," a kind of infringed architecture of object torrents taking shape as inhabitable rooms.' [...]
'In their book Anachronic Renaissance, for instance, Alexander Nagel and Christopher Wood write of what they call a long "chain of effective substitutions" or "effective surrogates for lost originals" that nonetheless reached the value and status of an icon in medieval Europe. "[O]ne might know that [these objects] were fabricated in the present or in the recent past," Nagel and Wood write, "but at the same time value them and use them as if they were very old things." They call this seeing in "substitutional terms".'
(tags: via:new-aesthetic bldgblog archaeology facsimiles copying king-tut egypt history 3d-printing physibles)
-
"This project aims at creating a simple efficient building block for "Big Data" libraries, applications and frameworks; thing that can be used as an in-memory, bounded queue with opaque values (sequence of JDK primitive values): insertions at tail, removal from head, single entry peeks), and that has minimal garbage collection overhead. Insertions and removals are as individual entries, which are sub-sequences of the full buffer. GC overhead minimization is achieved by use of direct ByteBuffers (memory allocated outside of GC-prone heap); and bounded nature by only supporting storage of simple primitive value (byte, `long') sequences where size is explicitly known. Conceptually memory buffers are just simple circular buffers (ring buffers) that hold a sequence of primitive values, bit like arrays, but in a way that allows dynamic automatic resizings of the underlying storage. Library supports efficient reusing and sharing of underlying segments for sets of buffers, although for many use cases a single buffer suffices."
(tags: gc java jvm bytebuffer)
The "MIG-in-the-middle" attack
or, a very effective demonstration of a man-in-the-middle interception and replay attack, from a 1980s Namibia-Angola war, via Ross Anderson
Scoop! The inside story of the news website that saved the BBC
The Register's take on the early days of www.bbc.co.uk. Lots of politics, unsurprisingly.
Fifteen years ago this month the BBC launched its News Online website. Developed internally with a skeleton team, the web service rapidly became the face of the BBC on the internet, and its biggest success story – winning four successive BAFTA awards. Remarkably, it operated at a third of the cost of rival commercial online news operations – unheard of in public-sector IT projects. Devised before there were really any content management systems, the technical architecture became a template for all major news systems, and one that’s still in use today. The team endured some furious internal politicking and sabotage to survive.
Irish mobile phone companies: still spammy
'Pro tip: if you're going to spam, try not to spam the DPC's Director of Investigations.' -- lolz
(tags: funny oh-dear three hutchinson ireland mobile spam dpc law)
-
Wikipedia page.
The Hamming weight of a string is the number of symbols that are different from the zero-symbol of the alphabet used. It is thus equivalent to the Hamming distance from the all-zero string of the same length. For the most typical case, a string of bits, this is the number of 1's in the string. In this binary case, it is also called the population count, popcount or sideways sum. It is the digit sum of the binary representation of a given number.
Contains an efficient algorithm to compute this for a given long value, by 'adding counts in a tree pattern.'(tags: algorithms hamming-distance bits hamming weight binary)
Efficient concurrent long set and map
An ordered set and map data structure and algorithm for long keys and values, supporting concurrent reads by multiple threads and updates by a single thread.
Some good stuff in the linked blog posts about Clojure's PersistentHashMap and PersistentVector data structures, too.(tags: arrays java tries data-structures persistent clojure concurrent set map)
James Hamilton - Failures at Scale & How to Ride Through Them - AWS re:Invent 2012 - Cpn208
mostly an update of his classic USENIX paper, but pretty cool to come across a mention of a network monitoring system we've built on page 21 ;)
(tags: amazon james-hamilton reliabilty slides aws)
_The Pauseless GC Algorithm_ [pdf]
Paper from USENIX VEE '05, by Cliff Click, Gil Tene, and Michael Wolf of Azul Systems, describing some details of the Azul secret sauce (via b6n)
Everything I Ever Learned About JVM Performance Tuning @Twitter
presentation by Attila Szegedi of Twitter from last year. Some good tips here, well-presented
The Rise And Fall Of The Obscure Music Download Blog: A Roundtable
One internet music "sharing" trend largely unnoticed by the powers that sue was the niche explosion of obscure music download blogs, lasting roughly from 2004-2008. Using free filesharing services like Rapidshare and Mediafire, and setting up sites on Blogspot and similar providers, these internet hubs stayed hidden in the open by catering to more discerning kleptomaniac audiophiles. Their specialty: parceling out ripped recordings — many of them copyrighted — from the more collectible and unknown corners of music's oddball, anomalous past. While the RIAA was suing dead people for downloading Michael Jackson songs (and Madonna was using Soulseek to curse at teenagers), obscure music blogs racked up millions of hits, ripping and sharing 80s Japanese noise, 70s German prog, 60s San Francisco hippie freak-outs, 50s John Cage bootlegs, 30s gramophone oddities, Norwegian death metal, cold wave cassettes made by kids in their garages, and the like. It was the mid aughts, and the advent of digitization had inadvertently put the value of the music industry's "Top Ten" commercial product in peril. That same process transformed the value of old, collectible music as well. If one smart record collector was able to share the entire contents—music, artwork and all—of one vinyl LP on his blog, for free, and upload another item from his 1,000+ collection the next day, for weeks and years, and others like him did the same, competing with each other about who could upload the rarest and most sought-after record, and anyone who downloaded it could then share it again and again… Suddenly everyone in the world had the coolest record collection in the world; and soon, nobody in the world had the coolest record collection in the world. Obscure music download blogs weren't shut down like Napster or Megaupload were (though they were indirectly affected by that crackdown); they just, mysteriously, seemed to burn out on their own sometime around 2008. While some are still around, their number represents only a fraction of that mid-00s heyday. Was this because obscure music blogs had overshared the underexposed and blown the whole thing into oblivion? Is the fact that a guy in Japan will no longer pay $500 on eBay for a first pressing of the No New York compilation because he can find it for free on the internet good for the world? Was the commodity-lost but the knowledge-gained an even exchange? To explore what was going on then, I assembled this email roundtable discussion between creators of some of the most popular blogs of the time: Eric Lumbleau of Mutant Sounds, Liam Elms of 8 Days in April, Frank of Systems of Romance and Brian Turner, Music Director of WFMU.
(via Loreana Rushe)(tags: music mp3 blogs obscure via-loreana-rushe history 2000s)
Conor’s 2012 Raspberry Pi Christmas Gift Guide
Ah, memories! Wish my kiddies were old enough for one of these...
I really think this Christmas could be a lovely replay of 1982 for a lot of people, like me, who got their first home computer that year. You could have so much fun on Christmas Day messing with the RPi rather than falling asleep in front of the fire. Just don’t fight over who gets the telly when Doctor Who is on. Whilst the bare-bones nature of the Raspberry Pi is wonderful, it is unusable out of the box unless you are a house with smartphones, digital cameras and existing PCs already that you can raid for components. What you want to avoid is a repeat of me that December in 1982 with my brand-new 16K ZX Spectrum which didn’t work on our Nordmende TV until two weeks later when the RTV Rentals guy came and replaced the TV Tuner. Two weeks typing Beep 1,2 to make sure it wasn’t broken.
(tags: raspberry-pi gifts computers kids hacking education gadgets christmas)
Nintendo's work on Miiverse Penis Drawing Detection
'The unique feature of the Miiverse is being able to send drawings, not just text. But since the advent of the internet, there have always been those who have used it for unsavory purposes.'
See also the "time-to-penis" metric in MMO games: http://www.joystiq.com/2009/03/24/overheard-gdc09-ttp-time-to-penis/
'Motoyama: we never had such a problem with our Hatena services. But, when we brought Hatena Flipnote to the West, we were caught off-guard by the amount of penises drawn by people.
Kurisu: So the team and I had to come up with a way to create a system that auto-detects those types of pictures. [...]
'Motoyama: After a week, we made very good progress on the system. Then we tested the system with Nintendo of America and told them to start drawing. It went horribly.
Kurisu: What we learned is that people enjoy drawing penises. Multiple ones. (laughs) The system was not prepared to handle that.'(tags: nintendo image-detection ttp metrics games gaming mmo miiverse drawing)
The trench talk that is now entrenched in the English language
'From cushy to crummy and blind spot to binge drink, a new study reveals the impact the First World War had on the English language and the words it introduced.' Incredible comments, too...
(tags: english etymology history wwi great-war via:sinead-gleeson words language)
Special encoding of small aggregate data types in Redis
Nice performance trick in Redis on hash storage: 'In theory in order to guarantee that we perform lookups in constant time (also known as O(1) in big O notation) there is the need to use a data structure with a constant time complexity in the average case, like an hash table. But many times hashes contain just a few fields. When hashes are small we can instead just encode them in an O(N) data structure, like a linear array with length-prefixed key value pairs. Since we do this only when N is small, the amortized time for HGET and HSET commands is still O(1): the hash will be converted into a real hash table as soon as the number of elements it contains will grow too much (you can configure the limit in redis.conf). This does not work well just from the point of view of time complexity, but also from the point of view of constant times, since a linear array of key value pairs happens to play very well with the CPU cache (it has a better cache locality than an hash table).'
(tags: memory redis performance big-o hash-tables storage coding cache arrays)
HTTP Error 403: The service you requested is restricted - Vodafone Community
Looks like Vodafone Ireland are failing to scale their censorware; clients on their network reporting "HTTP Error 403: The service you requested is restricted". According to a third-party site, this error is produced by the censorship software they use when it's insufficiently scaled for demand:
"When you try to use HTTP Vodafone route a request to their authentication server to see if your account is allow to connect to the site. By default they block a list of adult/premium web sites (this is service you have switched on or off with your account). The problem is at busy times this validation service is overloaded and so their systems get no response as to whether the site is allowed, so assume the site you asked for is restricted and gives the 403 error. Once this happens you seem to have to make new 3G data connection (reset the phone, move cell or let the connection time out) to get it to try again."
Sample: http://pic.twitter.com/N1lAwBjW(tags: scaling ireland vodafone fail censorware scalability customer-service)
Does it run Minecraft? Well, since you ask…
Going by the number of Minecraft fans among my friends' sons and daughters in the 8-12 age group, this is a great idea:
We sent a bunch of [Raspberry Pi] boards out to Notch and the guys at Mojang in Stockholm a little while back, and they’ve produced a port of Minecraft: Pocket Edition which they’re calling Minecraft: Pi Edition. It’ll carry a revised feature set and support for several programming languages, so you can code direct into Minecraft before you start playing. (Or you can just – you know – play.)
(tags: minecraft gaming programming coding raspberry-pi kids learning education)
-
Martin Thompson with a good description of the x86 memory barrier model and how it interacts with Java's JSR-133 memory model
(tags: architecture hardware programming java concurrency volatile jsr-133)
IBM insider: How I caught my wife while bug-hunting on OS/2 • The Register
Wow, working for IBM in the 80's was truly shitty.
'IBM HR came up with a plan that summed up the department's view of tech staff: a dinner dance. In Southsea. For our non-British readers this is not a glamorous location. As a scumbag contractor I wasn’t invited, but since I was dating one of the seven women on the project, I went anyway and was impressed by the way IBM had tried so very hard to make the inside of a municipal leisure centre look like Hawaii. This is so crap that the integrity checks I’ve installed to watch myself for incipient senility keep flagging it as a false memory. The only way I can force myself to believe the idea that the richest corporation on the planet behaved that way is that the girl who took me is now a reassuringly expensive lawyer who was kind enough to marry me and so we have photographic evidence. (I wish to make it clear that I’m not saying IBM had the worst HR of any firm in the world, merely that my 28 years in technology and banking have never exposed a worse one to me.)'
And indeed, so were MS:'We, on the other hand, were regarded as hopelessly bureaucratic. After Microsoft lost the source code for the actual build of OS/2 we shipped, I reported a bug triggered when you double-clicked on Chkdsk twice: the program would fire up twice and both would try to fix the disk at the same time, causing corruption. I noted that this “may not be consistent with the user's goals as he sees them at this time”. This was labelled a user error, and some guy called Ballmer questioned why I had this “obsession” with perfect code.'
(thanks, Conor!)(tags: via:conor-delaney os2 ibm microsoft work 1980s pc uk steve-ballmer)
How Team Obama’s tech efficiency left Romney IT in dust | Ars Technica
The web-app dev and ops best practices used by the Obama campaign's tech team. Some key tools: Puppet, EC2, Asgard, Cacti, Opsview, StatsD, Graphite, Seyren, Route53, Loggly, etc.
Tumblr Architecture - 15 Billion Page Views A Month And Harder To Scale Than Twitter
Buckets of details on Tumblr's innards. fans of Finagle and Kafka, notably
John Carmack's .plan update from 10/14/98
John Carmack presciently defines the benefits of an event sourcing architecture in 1998, as a key part of Quake 3's design:
"The key point: Journaling of time along with other inputs turns a realtime application into a batch process, with all the attendant benefits for quality control and debugging. These problems, and many more, just go away. With a full input trace, you can accurately restart the session and play back to any point (conditional breakpoint on a frame number), or let a session play back at an arbitrarily degraded speed, but cover exactly the same code paths."
(This was the first time I'd heard of the concept, at least.)(tags: john-carmack design software coding event-sourcing events quake-3)
-
Unlike other tools intended to solve the JVM startup problem (e.g. Nailgun, Cake), Drip does not use a persistent JVM. There are many pitfalls to using a persistent JVM, which we discovered while working on the Cake build tool for Clojure. The main problem is that the state of the persistent JVM gets dirty over time, producing strange errors and requiring liberal use of cake kill whenever any error is encountered, just in case dirty state is the cause. Instead of going down this road, Drip uses a different strategy. It keeps a fresh JVM spun up in reserve with the correct classpath and other JVM options so you can quickly connect and use it when needed, then throw it away. Drip hashes the JVM options and stores information about how to connect to the JVM in a directory with the hash value as its name.
(via HN)(tags: java command-line tools startup speed)
Australian VCE exam question accidentally includes photoshopped Battletech mech
File under New Aesthetic:
Exams for the popular History: Revolution subject were original supposed to include the artwork Storming the Winter palace on 25th October 1917 by Nikolai Kochergin, which depicts events during the October Revolution, which was instrumental in the larger Russian Revolution of 1917. When students opened their exam this morning they found an altered version of the work with what appear to be a large "BattleTech Marauder" robot aiding the rising revolutionaries in the background.
(tags: new-aesthetic funny photoshop russia 1917 battletech mechs vcaa)
Building an Impenetrable ZooKeeper (PDF)
great presentation on operational tips for a reliable ZK cluster (via Bill deHora)
(tags: via:bill-dehora zookeeper ops syadmin)
WebTechStacks by martharotter - Kippt
A good set of infrastructure/devops tech blogs, collected by Martha Rotter
(tags: via:martharotter blogs infrastructure devops ops web links)
What can data scientists learn from DevOps?
Interesting. 'Rather than continuing to pretend analysis is a one-time, ad hoc action, automate it. [...] you need to maintain the automation machinery, but a cost-benefit analysis will show that the effort rapidly pays off — particularly for complex actions such as analysis that are nontrivial to get right.' (via @fintanr)
(tags: via:fintanr data-science data automation devops analytics analysis)
AnandTech - The Intel SSD DC S3700: Intel's 3rd Generation Controller Analyzed
Interesting trend; Intel moved from a btree to an array-based data structure for their logical-block address indirection map, in order to reduce worst-case latencies (via Martin Thompson)
(tags: latency intel via:martin-thompson optimization speed p99 data-structures arrays btrees ssd hardware)
Java tip: How to get CPU, system, and user time for benchmarking
a neat MXBean trick to get per-thread CPU usage in a running JVM (via Tatu Saloranta)
-
'I'd really prefer not to fork the language; I'd much rather collectively help carry the banner of Markdown forward into the future, with the blessing of John Gruber and in collaboration with other popular sites that use Markdown. So... who's with me?'
SipHash: a fast short-input PRF
a family of pseudorandom functions optimized for short inputs. Target applications include network traffic authentication and hash-table lookups protected against hash-flooding denials-of-service attacks. SipHash is simpler than MACs based on universal hashing, and faster on short inputs. Compared to dedicated designs for hash-table lookup, SipHash has well-defined security goals and competitive performance. For example, SipHash processes a 16-byte input with a fresh key in 140 cycles on an AMD FX-8150 processor, which is much faster than state-of-the-art MACs.
(tags: hashing siphash djb security algorithms)
#AltDevBlogADay » Functional Programming in C++
John Carmack makes a case for writing C++ in an FP style, with wide use of const and pure functions. something similar can be achieved in pure Java using Guava's Immutable types, to a certain extent. I love his other posts on this site -- he argues persuasively for static code analysis and keeping multiple alternative subsystem implementations, too
(tags: c++ programming functional-programming fp coding john-carmack const immutability)
Fuchsia MacAree — A-Z of Untranslatable Words
Lovely poster by fantastic Irish illustrator Fuchsia MacAree, who's launching her first exhibition of art and drawings at the Bernard Shaw tonight. See also "Learn To Swear With Captain Haddock": http://fuchsiamacaree.bigcartel.com/product/captain-haddock-print
(tags: want art prints fuchsia-macaree words etymology home)
-
Encyclopedic post from John Allspaw (of Etsy) on the topic, with an "Obligatory [List Of] Pithy Characteristics"
How to make a security geek feel very old: #Factorisation, #DKIM and @DrZacharyHarris
“A 384-bit key I can factor on my laptop in 24 hours. The 512-bit keys I can factor in about 72 hours using Amazon Web Services for $75. And I did do a number of those. Then there are the 768-bit keys. Those are not factorable by a normal person like me with my resources alone. But the government of Iran probably could, or a large group with sufficient computing resources could pull it off.” Remember when we thought 512-bit keys would be enough? how time flies! Of course, John Aycock raised this problem back in 2007, although he assumed it'd take a 100,000-host botnet to crack them (in 153 minutes).
(tags: factorisation moores-law cpu speed dkim domain-keys 512-bit cracking security via:alec-muffet)
Data distribution in the cloud with Node.js
Very interesting presentation from ex-IONAian Darach Ennis of Push Technology on eep.js, embedded event processing in Javascript for node.js stream processing. Handles tumbling, monotonic, periodic and sliding windows at 8-40 million events per second; no multi-dimensional, infinite or predicate event-processing windows. (via Sergio Bossa)
(tags: via:sbtourist events event-processing streaming data ex-iona darach-ennis push-technology cep javascript node.js streams)
Raspberry Pi gets open-source video drivers
'As of right now, all of the VideoCore driver code which runs on the ARM is available under a FOSS license (3-Clause BSD to be precise). If you’re not familiar with the status of open source drivers on ARM SoCs this announcement may not seem like such a big deal, but it does actually mean that the BCM2835 used in the Raspberry Pi is the first ARM-based multimedia SoC with fully-functional, vendor-provided (as opposed to partial, reverse engineered) fully open-source drivers, and that Broadcom is the first vendor to open their mobile GPU drivers up in this way.' This is a great result -- congrats to the Raspberry Pi team for getting this to happen.
(tags: raspberry-pi open-source hardware drivers gpu graphics embedded-linux linux broadcom bsd bcm2835)
experimental CPU-cache-aware hash table implementations in Cloudera's Impala
via Todd Lipcon -- https://twitter.com/tlipcon/status/261113382642532352 'another cool piece of cloudera impala source: cpu-cache-aware hash table implementations by @jackowayed'. 'L1-sized hash table that hopes to use cache well. Each bucket is a chunk list of tuples. Each chunk is a cache line.'
(tags: hashing hash-tables data-structures performance c++ l1 cache cpu)
-
"The cost per element in major data structures offered by Java and Guava (r11)]." A very useful reference!
Ever wondered what's the cost of adding each entry to a HashMap? Or one new element in a TreeSet? Here are the answers: the cost per-entry for each well-known structure in Java and Guava. You can use this to estimate the cost of a structure, like this: if the per-entry cost of a structure is 32 bytes, and your structure contains 1024 elements, the structure's footprint will be around 32 kilobytes. Note that non-tree mutable structures are amortized (adding an element might trigger a resize, and be expensive, otherwise it would be cheap), making the measurement of the "average per element cost" measurement hard, but you can expect that the real answers are close to what is reported below.
(tags: java coding guava reference memory cost performance data-structures)
-
'A continuous version of Conway's Life, using floating point values instead of integers'. 'SmoothLifeL supports many interesting phenomena such as gliders that can travel in any direction, rotating pairs of gliders, 'wickstretchers' and the appearance of elastic tension in the 'cords' that join the blobs.' paper: http://arxiv.org/abs/1111.1567 , and slides: http://www.youtube.com/watch?v=iyTIXRhjXII (via jwz)
(tags: life games emergent-behaviour algorithms graphics via:jwz cool eye-candy conways-life floating-point continuous gliders)