Casalattico - Wikipedia, the free encyclopedia
How wierd. Many of the well-known chippers in Ireland are run by families from the same comune in Italy.
In the late 19th and early 20th century a significant number of young people left Casalattico to work in Ireland, with many founding chip shops there. Most second, third and fourth generation Irish-Italians can trace their lineage back to the municipality, with names such as Magliocco, Fusco, Marconi, Borza, Macari, Rosato and Forte being the most common. Although the Forte family actually originates from the village of Mortale, renamed Mon Forte due to the achievements of the Forte family. It is believed that up to 8,000 Irish-Italians have ancestors from Casalattico. The village is home to an Irish festival every summer to celebrate the many families that moved from there to Ireland.
(via JK)(tags: rome lazio italy ireland chip-shops chippers history emigration casalattico work irish-italians via:jk)
Videos from the Continuous Delivery track at QCon SF 2012
Think we'll be watching some of these in work soon -- Jez Humble's talk (the last one) in particular looks good:
Amazon, Etsy, Google and Facebook are all primarily software development shops which command enormous amounts of resources. They are, to use Christopher Little’s metaphor, unicorns. How can the rest of us adopt continuous delivery? That’s the subject of my talk, which describes four case studies of organizations that adopted continuous delivery, with varying degrees of success. One of my favourites – partly because it’s embedded software, not a website – is the story of HP’s LaserJet Firmware team, who re-architected their software around the principles of continuous delivery. People always want to know the business case for continuous delivery: the FutureSmart team provide one in the book they wrote that discusses how they did it.
(tags: continuous-integration continuous-delivery build release process dev deployment videos qcon towatch hp)
_Dynamic Histograms: Capturing Evolving Data Sets_ [pdf]
Currently, histograms are static structures: they are created from scratch periodically and their creation is based on looking at the entire data distribution as it exists each time. This creates problems, however, as data stored in DBMSs usually varies with time. If new data arrives at a high rate and old data is likewise deleted, a histogram’s accuracy may deteriorate fast as the histogram becomes older, and the optimizer’s effectiveness may be lost. Hence, how often a histogram is reconstructed becomes very critical, but choosing the right period is a hard problem, as the following trade-off exists: If the period is too long, histograms may become outdated. If the period is too short, updates of the histogram may incur a high overhead. In this paper, we propose what we believe is the most elegant solution to the problem, i.e., maintaining dynamic histograms within given limits of memory space. Dynamic histograms are continuously updateable, closely tracking changes to the actual data. We consider two of the best static histograms proposed in the literature [9], namely V-Optimal and Compressed, and modify them. The new histograms are naturally called Dynamic V-Optimal (DVO) and Dynamic Compressed (DC). In addition, we modified V-Optimal’s partition constraint to create the Static Average-Deviation Optimal (SADO) and Dynamic Average-Deviation Optimal (DADO) histograms.
(via d2fn)(tags: via:d2fn histograms streaming big-data data dvo dc sado dado dynamic-histograms papers toread)
Category: Uncategorized
How I decoded the human genome - Salon.com
classic long-read article from John Sundman: 'We are becoming the masters of our own DNA. But does that give us the right to decide that my children should never have been born?' part two at http://www.salon.com/2003/10/22/genome_two/
(tags: human genome genomics eugenics politics life john-sundman disability health dna medicine salon long-reads children)
The “Meme Hustler” hustler: Evgeny Morozov’s Stupid Talk about Tim O’Reilly
great long-read blog post from John Sundman debunking Evgeny Morozov's takedown of Tim O'Reilly
(tags: debunking john-sundman evgeny-morozov tim-oreilly tech technological-solutionism futurism writing silicon-valley utopianism open-source oss)
Strange Passion Presents Chant Chant Chant, Choice & SM Corporation live
'We are delighted to announce, for one night only, 3 legendary Irish Post Punk bands performing live in Dublin after a 30 year hiatus. This follows on from the critically acclaimed release of the Strange Passion Irish Post Punk compilation in 2012. Post punk legends Chant Chant Chant will perform along with electronic music pioneers Choice and SM Corporation. '
(tags: choice music ireland post-punk electronic dublin strange-passion gigs)
'Mythbusting Modern Hardware to gain "Mechanical Sympathy"' [slides]
Martin Thompson's latest talk -- taking a few common concepts about modern hardware performance and debunking/confirming them, mythbusters-style
(tags: mythbusters hardware mechanical-sympathy martin-thompson java performance cpu disks ssd)
High home ownership can seriously damage labor market, new study suggests
Interesting -- a healthy rental market is needed to allow sufficient labour mobility. This matches what I heard and saw from friends and coworkers in the US, anecdotally
Concert Industry Struggles With ‘Bots’ That Siphon Off Tickets - NYTimes.com
Bots now buying more than 60% of tickets, one group requesting up to 200,000 per day; bot writers now charging $14 per 10k captchas (via Shane Naughton)
(tags: ticketmaster scalping tickets via:shane-naughton bots captchas abuse)
Instant artist statement: Arty Bollocks Generator
'My work explores the relationship between the body and vegetarian ethics. With influences as diverse as Munch and Francis Bacon, new synergies are created from both orderly and random narratives. Ever since I was a postgraduate I have been fascinated by the essential unreality of the moment. What starts out as undefined soon becomes corroded into a hegemony of greed, leaving only a sense of failing and the chance of a new order. As temporal replicas become transformed through diligent and undefined practice, the viewer is left with an impression of the darkness of our culture.'
(tags: funny humor art arty bollocks generator hacks via:leroideplywood)
Communication costs in real-world networks
Peter Bailis has generated some good real-world data about network performance and latency, measured using EC2 instances, between ec2 regions, between zones, and between hosts in a single AZ. good data (particularly as I was looking for this data in a public source not too long ago).
I wasn’t aware of any datasets describing network behavior both within and across datacenters, so we launched m1.small Amazon EC2 instances in each of the eight geo-distributed “Regions,” across the three us-east “Availability Zones” (three co-located datacenters in Virginia), and within one datacenter (us-east-b). We measured RTTs between hosts for a week at a granularity of one ping per second.
Some of the high-percentile measurements are undoubtedly impact of host and VM behaviour, but that is still good data for a typical service built in EC2.(tags: networks performance measurements benchmarks ops ec2 networking internet az latency)
Reducing MongoDB traffic by 78% with Redis | Crashlytics Blog
One for @roflscaletips. Crashlytics reduce MongoDB load by hacking in some hand-coded caching into their Rails app, instead of just using a front-line HTTP cache to reduce Rails *and* db load. duh. (via Oisin)
(tags: crashlytics fail roflscale rails caching redis ruby via:oisin)
Display Hidden Files in OS X Open and Save Dialog Boxes
yet another laughable UI kludge in OS X. ridiculous
(tags: usability osx apple ui kludges hidden-files dot-files command-shift-option-elbow magic)
-
"Dear Mr Tilman, this is the only way I can help you. saluti, Giorgio Moroder". I love it -- someone call Tufte
(tags: graphics giorgio-moroder history music ilx basslines donna-summer synths)
Hollywood Studios [attempt to censor] Pirate Bay Documentary
Probably not deliberate, but pretty damn inept.
Over the past weeks several movie studios have been trying to suppress the availability of TPB-AFK [the Pirate Bay documentary] by asking Google to remove links to the documentary from its search engine. The links are carefully hidden in standard DMCA takedown notices for popular movies and TV-shows. The silent attacks come from multiple Hollywood sources including Viacom, Paramount, Fox and Lionsgate and are being sent out by multiple anti-piracy outfits. Fox, with help from six-strikes monitoring company Dtecnet, asked Google to remove a link to TPB-AFK on Mechodownload. Paramount did the same with a link on the Warez.ag forums. Viacom sent at least two takedown requests targeting links to the Pirate Bay documentary on Mrworldpremiere and Rapidmoviez. Finally, Lionsgate jumped in by asking Google to remove a copy of TPB-AFK from a popular Pirate Bay proxy.
(tags: funny inept hollywood lionsgate fox viacom paramount dtecnet tpb-afk piratebay piracy copyright movies google)
Flashback: How Yahoo Killed Flickr and Lost the Internet
This is about the best tech journalism I've ever read on Flickr. nice one Mat Honan
(tags: gizmodo flickr acquisition mergers yahoo corporate-culture mat-honan tech journalism)
Resisting the lure of the Freeman movement | Workers Solidarity Movement
An anarchist critique of the Freeman movement from the WSM:
This has been a very brief overview of the Freeman movement that has tried to capture with broad strokes its nature and possible responses. There is room for much more work, including a more in-depth analysis of the various flaws in the approach to the law. The greatest danger however is allowing a movement to develop within anarchist circles that ignores the principle of mutual aid and implicitly promotes private ownership of resources, that by granting absolute right to individuals gives them the ability to ignore their responsibilities to the wider community and ecology that sustains them. In more traditional terms, the movement is one all about negative freedoms, ignoring positive freedom as a concept.
(tags: anarchism freeman-on-the-land politics ireland law wsm)
The Reactionary ‘Freeman-?on-?the-?land’ and a Political Fracture
Another leftie view on the Freeman movement
(tags: freeman-on-the-land politics ireland left-wing anarchism law)
-
Well, apparently tomorrow, but close enough. Happy birthday to bradfitz' greatest creation and its wonderful slab allocator!
(tags: birthdays code via:alex-popescu open-source history malloc memory caching memcached)
Newegg nukes “corporate troll” Alcatel in third patent appeal win this year
I am loving this. Particularly this:
At trial in East Texas Cheng took the stand to tell Newegg's story. Alcatel-Lucent's corporate representative, at the heart of its massive licensing campaign, couldn't even name the technology or the patents it was suing Newegg over. "Successful defendants have their litigation managed by people who care," said Cheng. "For me, it's easy. I believe in Newegg, I care about Newegg. Alcatel Lucent, meanwhile, they drag out some random VP—who happens to be a decorated Navy veteran, who happens to be handsome and has a beautiful wife and kids—but the guy didn't know what patents were being asserted. What a joke." "Shareholders of public companies that engage in patent trolling should ask themselves if they're really well-served by their management teams," Cheng added. "Are they properly monetizing their R&D? Surely there are better ways to make money than to just rely on litigating patents. If I was a shareholder, I would take a hard look as to whether their management was competent."
(tags: patents ip swpats alcatel bell-labs newegg east-texas litigation lucent)
Call me maybe: Carly Rae Jepsen and the perils of network partitions
Kyle "aphyr" Kingsbury expands on his slides demonstrating the real-world failure scenarios that arise during some kinds of partitions (specifically, the TCP-hang, no clear routing failure, network partition scenario). Great set of blog posts clarifying CAP
(tags: distributed network databases cap nosql redis mongodb postgresql riak crdt aphyr)
-
Welcome to the Galapagos of Chinese “open” source. I call it “gongkai” (??). Gongkai is the transliteration of “open” as applied to “open source”. I feel it deserves a term of its own, as the phenomenon has grown beyond the so-called “shanzhai” (??) and is becoming a self-sustaining innovation ecosystem of its own. Just as the Galapagos Islands is a unique biological ecosystem evolved in the absence of continental species, gongkai is a unique innovation ecosystem evolved with little western influence, thanks to political, language, and cultural isolation. Of course, just as the Galapagos was seeded by hardy species that found their way to the islands, gongkai was also seeded by hardy ideas that came from the west. These ideas fell on the fertile minds of the Pearl River delta, took root, and are evolving. Significantly, gongkai isn’t a totally lawless free-for-all. It’s a network of ideas, spread peer-to-peer, with certain rules to enforce sharing and to prevent leeching. It’s very different from Western IP concepts, but I’m trying to have an open mind about it.
(tags: gongkai bunnie-huang china phone mobile hardware devices open-source)
Stability Patterns and Antipatterns [slides]
Michael "Release It!" Nygard's slides from a recent O'Reilly event, discussing large-scale service reliability design patterns
(tags: michael-nygard design-patterns architecture systems networking reliability soa slides pdf)
Deep In The Game: Not The RTE Guide
Good interview with Alan Maguire, the satirist behind the very funny @NotTheRTEGuide on Twitter:
I’ve always been a huge fan of TV Go Home and Charlie Brooker in general and it seemed like Irish TV and culture was a good target for the kind of barbed surrealism that he does. (I’m not claiming I’m in his league or anything but he’s the main influence). I was really surprised that there hadn’t been a parody RTÉ Guide already. TV listings are 140-ish characters already and the RTÉ Guide has a kind of weird place in Irish culture where everybody knows it but nobody our age really has any idea of what’s in it anymore. We associate it with a small-c conservatism, or I did at least and I play that up occasionally with the account.
(tags: nottherteguide rte rte-guide ireland funny satire interviews)
-
'based on my observations while I was a Site Reliability Engineer at Google.' - by Rob Ewaschuk; very good, and matching the similar recommendations and best practices at Amazon for that matter
(tags: monitoring ops devops alerting alerts pager-duty via:jk)
Monitoring the Status of Your EBS Volumes
Page in the AWS docs which describes their derived metrics and how they are computed -- these are visible in the AWS Management Console, and alarmable, but not viewable in the Cloudwatch UI. grr. (page-joshea!)
(tags: ebs aws monitoring metrics ops documentation cloudwatch)
Interpol filter scope creep: ASIC ordering unilateral website blocks
Bloody hell. This is stupidity of the highest order, and a canonical example of "filter creep" by a government -- secret state censorship of 1200 websites due to a single investment scam site.
The Federal Government has confirmed its financial regulator has started requiring Australian Internet service providers to block websites suspected of providing fraudulent financial opportunities, in a move which appears to also open the door for other government agencies to unilaterally block sites they deem questionable in their own portfolios. The instrument through which the ISPs are blocking the Interpol list of sites is Section 313 of the Telecommunications Act. Under the Act, the Australian Federal Police is allowed to issue notices to telcos asking for reasonable assistance in upholding the law. [...] Tonight Senator Conroy’s office revealed that the incident that resulted in Melbourne Free University and more than a thousand other sites being blocked originated from a different source — financial regulator the Australian Securities and Investment Commission. On 22 March this year, ASIC issued a media release warning consumers about the activities of a cold-calling investment scam using the name ‘Global Capital Wealth’, which ASIC said was operating several fraudulent websites — www.globalcapitalwealth.com and www.globalcapitalaustralia.com. In its release on that date, ASIC stated: “ASIC has already blocked access to these websites.”
(tags: scams australia filtering filter-creep false-positives isps asic fraud secrecy)
Obfuscatory pie-chart from Garda penalty-points corruption report
"Twitter / gavinsblog: For sake of clarity here is helpful pie chart of the 95.4% of fixed charge notices not terminated #missingthepoint" Paging Edward Tufte: classic example of an obfuscatory pie-chart, diagramming the wrong thing misleadingly. By presenting it like this, it appears that the 95.4% of cases where fixed charge notices were issued by the guards are relevant to the discussion of the other classes; in reality, that means that 4.6% of cases, 37,000 cases, were terminated, some for good reasons, others for not, and it's the difference between those two classes that are relevant. In my opinion, 2 separate pie charts would be better; one to show the dismissed-versus-undismissed count (which IMO could have been omitted entirely), and one to show the good-vs-not-so-good termination reason counts (which is the meat of the issue).
(tags: dataviz visualisation data obfuscation gardai police corruption penalty-points)
Berkeley DB Java Edition Architecture [PDF]
background white paper on the BDB-JE innards and design, from 2006. Still pretty accurate and good info
(tags: bdb-je java berkeley-db bdb design databases pdf white-papers trees)
-
This Court has developed a new awareness and understanding of a category of vexatious litigant. As we shall see, while there is often a lack of homogeneity, and some individuals or groups have no name or special identity, they (by their own admission or by descriptions given by others) often fall into the following descriptions: Detaxers; Freemen or Freemen-on-the-Land; Sovereign Men or Sovereign Citizens; Church of the Ecumenical Redemption International (CERI); Moorish Law; and other labels - there is no closed list. In the absence of a better moniker, I have collectively labelled them as Organized Pseudolegal Commercial Argument litigants [“OPCA litigants”], to functionally define them collectively for what they literally are. These persons employ a collection of techniques and arguments promoted and sold by ‘gurus’ (as hereafter defined) to disrupt court operations and to attempt to frustrate the legal rights of governments, corporations, and individuals. Over a decade of reported cases have proven that the individual concepts advanced by OPCA litigants are invalid. What remains is to categorize these schemes and concepts, identify global defects to simplify future response to variations of identified and invalid OPCA themes, and develop court procedures and sanctions for persons who adopt and advance these vexatious litigation strategies. One participant in this matter [...] appears to be a sophisticated and educated person, but is also an OPCA litigant. One of the purposes of these Reasons is, through this litigant, to uncover, expose, collate, and publish the tactics employed by the OPCA community, as a part of a process to eradicate the growing abuse that these litigants direct towards the justice and legal system we otherwise enjoy in Alberta and across Canada. I will respond on a point-by-point basis to the broad spectrum of OPCA schemes, concepts, and arguments advanced in this action by [him].
Via Ronan Lupton(tags: via:ronanlupton law canada legal freeman opca court tax judgements)
-
This classic came up in discussions yesterday...
In the Linux Kernel community Rusty Russell came up with a API rating scheme to help us determine if our API is sensible, or not. It's a rating from -10 to 10, where 10 is perfect is -10 is hell. Unfortunately there are too many examples at the wrong end of the scale.
(tags: rusty-russell quality coding kernel linux apis design code-reviews code)
-
hooray! Command-line gmailish goodness returns. And with a signed gem, to boot
Martin Thompson, Luke "Snabb Switch" Gorrie etc. review the C10M presentation from Schmoocon
on the mechanical-sympathy mailing list. Some really interesting discussion on handling insane quantities of TCP connections using low volumes of hardware:
This talk has some good points and I think the subject is really interesting. I would take the suggested approach with serious caution. For starters the Linux kernel is nowhere near as bad as it made out. Last year I worked with a client and we scaled a single server to 1 million concurrent connections with async programming in Java and some sensible kernel tuning. I've heard they have since taken this to over 5 million concurrent connections. BTW Open Onload is an open source implementation. Writing a network stack is a serious undertaking. In a previous life I wrote a network probe and had to reassemble TCP streams and kept getting tripped up by edge cases. It is a great exercise in data structures and lock-free programming. If you need very high-end performance I'd talk to the Solarflare or Mellanox guys before writing my own. There are some errors and omissions in this talk. For example, his range of ephemeral ports is not quite right, and atomic operations are only 15 cycles on Sandy Bridge when hitting local cache. A big issue for me is when he defined C10M he did not mention the TIME_WAIT issue with closing connections. Creating and destroying 1 million connections per second is a major issue. A protocol like HTTP is very broken in that the server closes the socket and therefore has to retain the TCB until the specified timeout occurs to ensure no older packet is delivered to a new socket connection.
(tags: mechanical-sympathy hardware scaling c10m tcp http scalability snabb-switch martin-thompson)
-
This program creates an EBS snapshot for an Amazon EC2 EBS volume. To help ensure consistent data in the snapshot, it tries to flush and freeze the filesystem(s) first as well as flushing and locking the database, if applicable. Filesystems can be frozen during the snapshot. Prior to Linux kernel 2.6.29, XFS must be used for freezing support. While frozen, a filesystem will be consistent on disk and all writes will block. There are a number of timeouts to reduce the risk of interfering with the normal database operation while improving the chances of getting a consistent snapshot. If you have multiple EBS volumes in a RAID configuration, you can specify all of the volume ids on the command line and it will create snapshots for each while the filesystem and database are locked. Note that it is your responsibility to keep track of the resulting snapshot ids and to figure out how to put these back together when you need to restore the RAID setup.
Handy!(tags: ubuntu ec2 aws linux ebs snapshots ops tools alestic)
Measuring & Optimizing I/O Performance
Another good writeup on iostat and EBS, from Ilya Grigorik
(tags: io optimization sysadmin performance iostat ebs aws ops)
AWS forum post on interpreting iostat output for EBS
Great post from AndrewC@EBS on interpreting iostat output on EBS volumes -- from 2009, but still looks reasonable enough
Operations is Dead, but Please Don’t Replace it with DevOps
This is so damn spot on.
Functional silos (and a standalone DevOps team is a great example of one) decouple actions from responsibility. Functional silos allow people to ignore, or at least feel disconnected from, the consequences of their actions. DevOps is a cultural change that encourages, rewards and exposes people taking responsibility for what they do, and what is expected from them. As Werner Vogels from Amazon Web Services says, “you build it, you run it”. So a “DevOps team” is a risky and ultimately doomed strategy. Sure there are some technical roles, specifically related to the enablement of DevOps as an approach and these roles and tools need to be filled and built. Self service platforms, collaboration and communication systems, tool chains for testing, deployment and operations are all necessary. Sure someone needs to deliver on that stuff. But those are specific technical deliverables and not DevOps. DevOps is about people, communication and collaboration. Organizations ignore that at their peril.
(tags: devops teams work ops silos collaboration organisations)
Universal Music Group adding audible "watermarks"
including on paid-for, losslessly-compressed digital audio music files:
Why isn't UMG's watermark talked about more? Maybe people think the audio quality problems are due to some kind of lossy compression, as I did, and ignore it completely, or blame the streaming service/distributor. The problem here is that the UMG watermark degrades the audio to about the equivalent of a 96 kbit MP3. My guess is that if consumers were informed about what is going on, they would care. Especially those who pay full retail price for digital downloads advertised as lossless audio.
(tags: lame audio drm media music umg universal watermarks noise consumer mp3)
“Call Me Maybe: Carly Rae Jepsen and the Perils of Network Partitions”
Aphyr's epic RICON talk, exploring distributed-database failure modes through music. and what a lot of fail there is! Bottom line: CRDTs win
(tags: crdts data-structures storage ricon apyhr failures network partitions puns slides)
Cloudera Impala 1.0: It’s Here, It’s Real, It’s Already the Standard for SQL on Hadoop
we are proud to announce the first production drop of Impala, which reflects feedback from across the user community based on multiple types of real-world workloads. Just as a refresher, the main design principle behind Impala is complete integration with the Hadoop platform (jointly utilizing a single pool of storage, metadata model, security framework, and set of system resources). This integration allows Impala users to take advantage of the time-tested cost, flexibility, and scale advantages of Hadoop for interactive SQL queries, and makes SQL a first-class Hadoop citizen alongside MapReduce and other frameworks. The net result is that all your data becomes available for interactive analysis simultaneously with all other types of processing, with no ETL delays needed.
Along with some great benchmark numbers against Hive. nifty stuff(tags: cloudera impala sql querying etl olap hadoop analytics business-intelligence reports)
Alex Feinberg's response to Damien Katz' anti-Dynamoish/pro-Couchbase blog post
Insightful response, worth bookmarking. (the original post is at http://damienkatz.net/2013/05/dynamo_sure_works_hard.html ).
while you are saving on read traffic (online reads only go to the master), you are now decreasing availability (contrary to your stated goal), and increasing system complexity. You also do hurt performance by requiring all writes and reads to be serialized through a single node: unless you plan to have a leader election whenever the node fails to meet a read SLA (which is going to result a disaster -- I am speaking from personal experience), you will have to accept that you're bottlenecked by a single node. With a Dynamo-style quorum (for either reads or writes), a single straggler will not reduce whole-cluster latency. The core point of Dynamo is low latency, availability and handling of all kinds of partitions: whether clean partitions (long term single node failures), transient failures (garbage collection pauses, slow disks, network blips, etc...), or even more complex dependent failures. The reality, of course, is that availability is neither the sole, nor the principal concern of every system. It's perfect fine to trade off availability for other goals -- you just need to be aware of that trade off.
(tags: cap distributed-databases databases quorum availability scalability damien-katz alex-feinberg partitions network dynamo riak voldemort couchbase)
CAP Confusion: Problems with ‘partition tolerance’
Another good clarification about CAP which resurfaced during last week's discussion:
So what causes partitions? Two things, really. The first is obvious – a network failure, for example due to a faulty switch, can cause the network to partition. The other is less obvious, but fits with the definition [...]: machine failures, either hard or soft. In an asynchronous network, i.e. one where processing a message could take unbounded time, it is impossible to distinguish between machine failures and lost messages. Therefore a single machine failure partitions it from the rest of the network. A correlated failure of several machines partitions them all from the network. Not being able to receive a message is the same as the network not delivering it. In the face of sufficiently many machine failures, it is still impossible to maintain availability and consistency, not because two writes may go to separate partitions, but because the failure of an entire ‘quorum’ of servers may render some recent writes unreadable.
(sorry, catching up on old interesting things posted last week...)(tags: failure scalability network partitions cap quorum distributed-databases fault-tolerance)
Big-O Algorithm Complexity Cheat Sheet
nicely done, very readable
(tags: algorithms reference cheat-sheet big-o complexity estimation coding)
Did Conroy’s AFP filter wrongly block 1,200 sites?
Looks like many Aussie network operators were legally required to block 1,200 websites (presumably, one target and 1199 false positives), in secret. Quoting http://lists.ausnog.net/pipermail/ausnog/2013-April/017993.html : "You get a notice to block. You block or either get fined, go to jail or lose your carrier licence. It is a blunt instrument and it is a condition of being at 'the big boys table' i.e. you're a carrier or a carriage service provider."
(tags: australia law afp filtering internet blocking censorship secret eff)
Making sense out of BDB-JE fast stats
good info on the system metrics recorded by BDB-JE's EnvironmentStats code, particularly where cache and cleaner activity are concerned. Particularly useful for Voldemort
(tags: voldemort caching bdb bdb-je storage tuning ops metrics reference)
Approximate Heavy Hitters -The SpaceSaving Algorithm
nice, readable intro to SpaceSaving (which I've linked to before) -- a simple stream-processing cardinality top-K estimation algorithm with bounded error.
(tags: algorithms coding space-saving cardinality streams stream-processing estimation)
Darach Ennis on CEP, Stream Processing, Messaging, OOP vs Functional Architecture
good interview -- lots of food for thought!
(tags: darach-ennis stream-processing messaging architecture qcon interviews erlang cep realtime rx comet events)
One Year Later, the Results of Tor Books UK Going DRM-Free
As it is, we’ve seen no discernible increase in piracy on any of our titles, despite them being DRM-free for nearly a year.
Understanding Elastic Block Store Availability and Performance [slides]
fantastic in-depth presentation on EBS usage; lots of good advice here if you're using EBS volumes with/without PIOPS
(tags: piops ebs performance aws ec2 ops storage amazon presentations)
-
Github get good results using Judy arrays to replace a Ruby hash. However: the whole blog post is a bit dodgy to me. It feels like there are much better ways to fix the problem: 1. the big one: don't do GC-heavy activity in the front-end web servers. Split that language-classification code into a separate service. Write its results to a cache and don't re-query needlessly. 2. why isn't this benchmarked against a C/C++ hash? it's only 36000 entries, loaded once at startup. lookups against that should be blisteringly fast even with the basic data structures, and that would also be outside the Ruby heap so avoid the GC overhead. Feels like the use of a Judy array was a "because I want to" decision. 3. personally, I'd have preferred they spend time fixing their uptime problems.... See also https://news.ycombinator.com/item?id=5639013 for more kvetching.
(tags: ruby github gc judy-arrays linguist hashes data-structures)
-
Mozilla's experience with Kanban. We've had good results in Amazon, too. good intro links in this post -- might start talking about it in Swrve...
(tags: kanban scheduling team agile mozilla)
Secret Bitcoin mining code added to game sparks outrage
Thunberg's admission that [the E-Sports Entertainment Association client software] ran Bitcoin-mining software without explicit user consent is startling. Aside from potentially opening the company up to huge legal liability, the move is likely to engender distrust among some of the company's most loyal fans. The nonchalance of some of Thunberg's comments may only add insult to the betrayal many users are likely to feel. "But for the record, I told jag he shouldn't be lazy and run the miner in a separate process," he wrote in a post, referring to one of his software engineers with the screen name Jaguar, who didn't take steps to conceal the Bitcoin miner. "Rookie move." In the later post he wrote: "100% of the funds are going into the s14 prize pot, so at the very least your melted gpus contributed to a good cause."
Gap's application of Knockout.js and the MVVM model
Interesting, first time I'd heard of it; the Model-View-View Model pattern.
(tags: mvvm architecture javascript web ui knockout-js martin-fowler json)
-
very nice single-purpose site -- figure out who represents any given Irish postal address
Lectures in Advanced Data Structures (6.851)
Good lecture notes on the current state of the art in data structure research.
Data structures play a central role in modern computer science. You interact with data structures even more often than with algorithms (think Google, your mail server, and even your network routers). In addition, data structures are essential building blocks in obtaining efficient algorithms. This course covers major results and current directions of research in data structures: TIME TRAVEL We can remember the past efficiently (a technique called persistence), but in general it's difficult to change the past and see the outcomes on the present (retroactivity). So alas, Back To The Future isn't really possible. GEOMETRY When data has more than one dimension (e.g. maps, database tables). DYNAMIC OPTIMALITY Is there one binary search tree that's as good as all others? We still don't know, but we're close. MEMORY HIERARCHY Real computers have multiple levels of caches. We can optimize the number of cache misses, often without even knowing the size of the cache. HASHING Hashing is the most used data structure in computer science. And it's still an active area of research. INTEGERS Logarithmic time is too easy. By careful analysis of the information you're dealing with, you can often reduce the operation times substantially, sometimes even to constant. We will also cover lower bounds that illustrate when this is not possible. DYNAMIC GRAPHS A network link went down, or you just added or deleted a friend in a social network. We can still maintain essential information about the connectivity as it changes. STRINGS Searching for phrases in giant text (think Google or DNA). SUCCINCT Most “linear size” data structures you know are much larger than they need to be, often by an order of magnitude. Some data structures require almost no space beyond the raw data but are still fast (think heaps, but much cooler).
(via Tim Freeman)(tags: data-structures lectures mit video data algorithms coding csail strings integers hashing sorting bst memory)
Older Is Wiser: Study Shows Software Developers’ Skills Improve Over Time
At least in terms of StackOverflow rep:
For the first part of the study, the researchers compared the age of users with their reputation scores. They found that an individual’s reputation increases with age, at least into a user’s 40s. There wasn’t enough data to draw meaningful conclusions for older programmers. The researchers then looked at the number of different subjects that users asked and answered questions about, which reflects the breadth of their programming interests. The researchers found that there is a sharp decline in the number of subjects users weighed in on between the ages of 15 and 30 – but that the range of subjects increased steadily through the programmers’ 30s and into their early 50s. Finally, the researchers evaluated the knowledge of older programmers (ages 37 and older) compared to younger programmers (younger than 37) in regard to relatively recent technologies – meaning technologies that have been around for less than 10 years. For two smartphone operating systems, iOS and Windows Phone 7, the veteran programmers had a significant edge in knowledge over their younger counterparts. For every other technology, from Django to Silverlight, there was no statistically significant difference between older and younger programmers. “The data doesn’t support the bias against older programmers – if anything, just the opposite,” Murphy-Hill says.
Damn right ;)(tags: coding age studies software work stack-overflow ncsu knowledge skills life)
-
Test Double is a generic term for any case where you replace a production object for testing purposes. There are various kinds of double that Gerard lists: Dummy objects are passed around but never actually used. Usually they are just used to fill parameter lists. Fake objects actually have working implementations, but usually take some shortcut which makes them not suitable for production (an InMemoryTestDatabase is a good example). Stubs provide canned answers to calls made during the test, usually not responding at all to anything outside what's programmed in for the test. Spies are stubs that also record some information based on how they were called. One form of this might be an email service that records how many messages it was sent. Mocks are pre-programmed with expectations which form a specification of the calls they are expected to receive. They can throw an exception if they receive a call they don't expect and are checked during verification to ensure they got all the calls they were expecting.
(tags: test-doubles naming patterns tdd testing mocking tests martin-fowler)
Limerick-Tralee walking/cycling route blocked by farmers
Oh for god's sake. I know a few people who've made a trip to Mayo explicitly because the Greenway was there to visit. This is shocking, backwards stuff:
The success of [Mayo's] Great Western Greenway [trail] has overtaken that of others, such as the Great Southern Trail group, which has been working hard to install a walking and cycling route on sections of the former Limerick-Tralee railway line. On February 2nd, to mark the 50th anniversary of its closure, about 150 members and supporters of the Great Southern Trail set out from the old railway station at Abbeyfeale, Co Limerick, along the most recently developed section to cross the Kerry county boundary. The trailers were greeted by a barricade on the border, manned by more than 30 farmers, including the Listowel Fine Gael town councillor Denis Stack. A stand-off continued for three hours, with the Garda mediating in vain. The farmers were trying to lay claim to the land occupied by the disused railway line, even though Minister for Transport Leo Varadkar had made it clear that CIÉ “is the owner of the property [and] will object to any application by others to register these lands”.
(via Rossa McMahon)(tags: via:rossamcmahon cycling walking hiking trails ireland kerry limerick listowel denis-stack cie)
-
like sed for JSON data – you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text. [it] is written in portable C, and it has zero runtime dependencies. You can download a single binary, scp it to a far away machine, and expect it to work.
Nice tool. Needs to get into the Debian/Ubuntu apt repos pronto ;)(tags: jq tools cli via:peakscale json coding data sed unix)
"Clickwrap" licensing established as legal in Irish court
"The evidence does establish that there is a practice in the airline and online travel agency sectors of contractually binding web users by click wrapping or browse wrapping, which practice is generally and regularly followed by the operators in those sectors. In reality, it is difficult to see how online trade could be carried on in the absence of those devices. As regards the third question which arises from the MSG decision, in this case it is whether the defendant was aware or is presumed to have been aware of the practice. The evidence before the Court, in my view, clearly demonstrates that the defendant was aware of the practice, it being a practice which is generally and regularly followed when making bookings with online travel agents and with airlines and which, in the words of the Court in the MSG case, may be regarded as being a consolidated practice. Accordingly, in my view, by application of Article 23(1)(c), the defendant is bound by the jurisdiction clause in the Terms of Use on the plaintiff’s website by its use, either through the medium of an automaton or a manual operator or a third party data provider, of the website.”
(via Rossa McMahon)
Functional Reactive Programming in the Netflix API with RxJava
Hmm, this seems nifty as a compositional building block for Java code to enable concurrency without thread-safety and sync problems.
Functional reactive programming offers efficient execution and composition by providing a collection of operators capable of filtering, selecting, transforming, combining and composing Observable's. The Observable data type can be thought of as a "push" equivalent to Iterable which is "pull". With an Iterable, the consumer pulls values from the producer and the thread blocks until those values arrive. By contrast with the Observable type, the producer pushes values to the consumer whenever values are available. This approach is more flexible, because values can arrive synchronously or asynchronously.
(tags: concurrency java jvm threads thread-safety coding rx frp fp functional-programming reactive functional async observable)
You probably shouldn’t use a spreadsheet for important work
Daniel Lemire comments on the recent cases of bugs in spreadsheets causing major impact:
There are several critical problems with a tool like Excel that need to be widely known: * Spreadsheets do not support testing. For anything that matters, you should validate and test your code automatically and systematically; * Spreadsheets make code reviews impractical. To visually inspect the code, you need to click and each and every cell. In practice, this means that you cannot reasonably ask someone to read over your formulas to make sure that there is no mistake; * Spreadsheets encourage redundancies. Spreadsheets encourage copy-and-paste. Though copying and pasting is sometimes the right tool, it also creates redundancies. These redundancies make it very difficult to update a spreadsheet: are you absolutely sure that you have changed the formula throughout?
Agreed on all three, particularly on the impossibility of testing. IMO, everyone who may be in a job where automation via spreadsheet is likely, needs training in SDE fundamentals: unit testing, the important of open source and open data for reproducibility, version control, and code review. We are all computer scientists now.(tags: spreadsheets excel coding errors bugs testability unit-testing testing quality sde sde-fundamentals dry)
Log4j2 Asynchronous Loggers for Low-Latency Logging - Apache Log4j 2
implemented using the LMAX Disruptor library -- very impressive performance figures. I presume in real-world usage, these latencies are dwarfed by hardware costs, though
(tags: disruptor coding java log4j logging async performance)
-
Google Drive and GMail have a built-in scripting engine. I had no idea
(tags: gmail evernote archival scripting coding hacks google-drive)
-
How the Irish media are partly to blame for the catastrophic property bubble, from a paper entitled _The Role Of The Media In Propping Up Ireland’s Housing Bubble_, by Dr Julien Mercille, in the _Social Europe Journal_:
“The overall argument is that the Irish media are part and parcel of the political and corporate establishment, and as such the news they convey tend to reflect those sectors’ interests and views. In particular, the Celtic Tiger years involved the financialisation of the economy and a large property bubble, all of it wrapped in an implicit neoliberal ideology. The media, embedded within this particular political economy and itself a constitutive element of it, thus mostly presented stories sustaining it. In particular, news organisations acquired direct stakes in an inflated real estate market by purchasing property websites and receiving vital advertising revenue from the real estate sector. Moreover, a number of their board members were current or former high officials in the finance industry and government, including banks deeply involved in the bubble’s expansion."
(tags: economics irish-times ireland newspapers media elite insiders bubble property-bubble property celtic-tiger papers news bias)
-
Ugh. low-end ISPs MITM'ing DNS queries:
Some ISP's are now using a technology called 'Transparent DNS proxy'. Using this technology, they will intercept all DNS lookup requests (TCP/UDP port 53) and transparently proxy the results. This effectively forces you to use their DNS service for all DNS lookups. If you have changed your DNS settings to an open DNS service such as Google, Comodo or OpenDNS expecting that your DNS traffic is no longer being sent to your ISP's DNS server, you may be surprised to find out that they are using transparent DNS proxying.
(via Nelson) BitTorrent’s Secure Dropbox Alternative Goes Public
As kragen says, 'a decentralized way to sync a folder of large files, using BitTorrent instead of an untrustworthy central server'. Windows, OSX, and Linux supported
(tags: bittorrent dropbox cloud storage filesharing sharing sync synchronization)
DataSift Architecture: Realtime Datamining at 120,000 Tweets Per Second
250 million tweets per day, 30-node HBase cluster, 400TB of storage, Kafka and 0mq. This is from 2011, hence this dated line: 'for a distributed application they thought AWS was too limited, especially in the network. AWS doesn’t do well when nodes are connected together and they need to talk to each other. Not low enough latency network. Their customers care about latency.' (Nowadays, it would be damn hard to build a lower-latency network than that attached to a cc2.8xlarge instance.)
(tags: datasift architecture scalability data twitter firehose hbase kafka zeromq)
Breaking the 1000 ms Time to Glass Mobile Barrier [slides]
Great presentation from Google on HTML5 CSS+JS render speed, 3G/4G network latency, etc. (via John G)
(tags: google slides 3g 4g lte networking telcos telecom css js html5 web via:jg)
Lucene 4 - Revisiting Problems For Speed [slides]
a Presentation from Simon Willnauer on optimization work performed on Lucene in 2011. The most interesting stuff here is the work done to replace an O(n^2) FuzzyQuery fuzzy-match algorithm with a FSM trie is extremely cool -- benchmarked at 214 times faster!
(tags: benchmarks slides lucene search fuzzy-matching text-matching strings algorithms coding fsm tries)
Microsoft Code Digger extension
Miguel de Icaza says it's witchcraft -- I'm inclined to agree:
Code Digger analyzes possible execution paths through your .NET code. The result is a table where each row shows a unique behavior of your code. The table helps you understand the behavior of the code, and it may also uncover hidden bugs. Through the new context menu item "Generate Inputs / Outputs Table" in the Visual Studio editor, you can invoke Code Digger to analyze your code. Code Digger computes and displays input-output pairs. Code Digger systematically hunts for bugs, exceptions, and assertion failures.
(tags: testing constraint-solving solver witchcraft magic dot-net coding tests code-digger microsoft)
Swansea measles outbreak: was an MMR scare in the local press to blame?
Sixteen years ago, journalists had a much easier job assembling "balanced" stories about MMR in south Wales. When I wrote about the measles outbreak last week, I suggested that it was related to Andrew Wakefield's discredited 1998 Lancet research, but the Swansea contagion seems more likely to be the result of a separate scare a year earlier in the South Wales Evening Post. Before 1997, uptake of MMR in the distribution area of the Post was 91%, and 87.2% in the rest of Wales. After the Post's campaign, uptake in the distribution area fell to 77.4% (it was 86.8% in the rest of Wales). That's almost a 14% drop where the Post had influence, compared with less than 3% elsewhere. In the dry wording of the BMJ, "the [South West Evening Post] campaign is the most likely explanation". In other words, what we can see in Swansea is the local effect of local reporting‚ in all probability, just a taster of what happens when the news irresponsibly creates unfounded terror. [...] The 1997 coverage focused on a group of families who blamed MMR for various ailments in their children, including learning difficulties, digestive problems and autism‚ none of which have been found to have any connection with the vaccine. The Post's coverage was at the time deemed a success, and in 1998 it won a prize for investigative reporting in the BT Wales Press Awards. That year, the SWEP ran at least 39 stories related to the alleged dangers of MMR. And yes, it's true that the paper never directly endorsed non-vaccination. What it did do was publicise the idea of "vaccine damage" as a risk, one that parents would then likely weigh up against the risk of contracting measles, mumps or rubella. And this went beyond the reporting of parental anxieties‚ it was part of the Post's editorial line. One article is entitled "Young bodies cannot take it". The all-important "journalistic balance" was constantly available, thanks to campaigning parents and their solicitor Richard Barr. (It was Barr who engaged Wakefield for a lawsuit, leading to the "fishing expedition" research that became the Lancet paper.) They were happy to provide a quote on the dangers of the "triple jab", which health authorities were then obliged to rebut politely. The Post also seemed to downplay the risk of measles, reporting on 6 July 1998 that "not a single child has been hit by the illness‚ despite a 13% drop in take-up levels". It's not parents who should feel embarrassed by the Swansea measles outbreak: some may have acted from overt dread at the prospect of harming their child, and some simply from omission, but all were encouraged by a press that focused on non-existent risks and downplayed the genuine horror of the diseases MMR prevents. The shame belongs to journalists: those of the South West Evening Post who allowed themselves to be recruited in the service of a speculative lawsuit, and any who let a specious devotion to "balance" overrule a duty to tell the truth.
(tags: south-wales wales mmr health vaccination scares journalism ethics disease measles south-wales-evening-post)
-
mostly a DynamoDB puff-piece from last week's Amazon Cloud Connect, but contains some good real-world figures for a 20-billion-GUID deduping table use-case at end. ($4,150 per month, to cut to the chase)
(tags: dynamodb aws figures costs architecture ec2 dedupe cloud-connect slides)
Excel, untestability, and the reliability of quants
Wow, this is a great software-quality story -- I knew Excel was the most widely used programming environment out there, but this is a factor I'd overlooked:
In his remarks on the final panel, Frank Partnoy mentioned something I missed when it came out a few weeks ago: the role of Microsoft Excel in the “London Whale” trading debacle. [..] To summarize: JPMorgan’s Chief Investment Office needed a new value-at-risk (VaR) model for the synthetic credit portfolio (the one that blew up) and assigned a quantitative whiz [...] to create it. The new model “operated through a series of Excel spreadsheets, which had to be completed manually, by a process of copying and pasting data from one spreadsheet to another.” The internal Model Review Group identified this problem as well as a few others, but approved the model, while saying that it should be automated and another significant flaw should be fixed. After the London Whale trade blew up, the Model Review Group discovered that the model had not been automated and found several other errors. Most spectacularly, “After subtracting the old rate from the new rate, the spreadsheet divided by their sum instead of their average, as the modeler had intended. This error likely had the effect of muting volatility by a factor of two and of lowering the VaR ...” I write periodically about the perils of bad software in the business world in general and the financial industry in particular, by which I usually mean back-end enterprise software that is poorly designed, insufficiently tested, and dangerously error-prone. But this is something different. [...] While Excel the program is reasonably robust, the spreadsheets that people create with Excel are incredibly fragile. There is no way to trace where your data come from, there’s no audit trail (so you can overtype numbers and not know it), and there’s no easy way to test spreadsheets, for starters. The biggest problem is that anyone can create Excel spreadsheets -- badly. Because it’s so easy to use, the creation of even important spreadsheets is not restricted to people who understand programming and do it in a methodical, well-documented way. This is why the JPMorgan VaR model is the rule, not the exception: manual data entry, manual copy-and-paste, and formula errors. This is another important reason why you should pause whenever you hear that banks’ quantitative experts are smarter than Einstein, or that sophisticated risk management technology can protect banks from blowing up. At the end of the day, it’s all software. While all software breaks occasionally, Excel spreadsheets break all the time. But they don’t tell you when they break: they just give you the wrong number.
(tags: excel reliability software coding ides jpmorgan value-at-risk finance london-whale quants spreadsheets unit-tests testability testing)
Riak, CAP, and eventual consistency
Good (albeit draft) write-up of the implications of CAP, allow_mult, and last_write_wins conflict-resolution policies in Riak:
As Brewer's CAP theorem established, distributed systems have to make hard choices. Network partition is inevitable. Hardware failure is inevitable. When a partition occurs, a well-behaved system must choose its behavior from a spectrum of options ranging from "stop accepting any writes until the outage is resolved" (thus maintaining absolute consistency) to "allow any writes and worry about consistency later" (to maximize availability). Riak leans toward the availability end of the spectrum, but allows the operator and even the developer to tune read and write requests to better meet the business needs for any given set of data.
(tags: riak cap eventual-consistency distcomp distributed-systems partition last-write-wins voldemort allow_mult)
How You Can Help Save Upcoming.org, Posterous, and More
Yahoo! sucks. shutting down in days? ArchiveTeam Warrior to the rescue; install the VM!
(tags: archival yahoo shutdowns upcoming waxy archives virtualbox)
The Excel Depression - NYTimes.com
Krugman on the Reinhart-Rogoff Excel-bug fiasco.
What the Reinhart-Rogoff affair shows is the extent to which austerity has been sold on false pretenses. For three years, the turn to austerity has been presented not as a choice but as a necessity. Economic research, austerity advocates insisted, showed that terrible things happen once debt exceeds 90 percent of G.D.P. But “economic research” showed no such thing; a couple of economists made that assertion, while many others disagreed. Policy makers abandoned the unemployed and turned to austerity because they wanted to, not because they had to. So will toppling Reinhart-Rogoff from its pedestal change anything? I’d like to think so. But I predict that the usual suspects will just find another dubious piece of economic analysis to canonize, and the depression will go on and on.
(tags: paul-krugman economics excel coding bugs software austerity debt)
Vaccination 'herd immunity' demonstration
'Stochastic monte-carlo epidemic SIR model to reveal herd immunity'. Fantastic demo of this important medical concept (via Colin Whittaker)
(tags: via:colinwh stochastic herd-immunity random sir epidemics health immunity vaccination measles medicine monte-carlo-simulations simulations)
Fred's ImageMagick Scripts: SIMILAR
compute an image-similarity metric, to discover mostly-identical-but-slightly-tweaked images:
SIMILAR computes the normalized cross correlation similarity metric between two equal dimensioned images. The normalized cross correlation metric measures how similar two images are, not how different they are. The range of ncc metric values is between 0 (dissimilar) and 1 (similar). If mode=g, then the two images will be converted to grayscale. If mode=rgb, then the two images first will be converted to colorspace=rgb. Next, the ncc similarity metric will be computed for each channel. Finally, they will be combined into an rms value.
(via Dan O'Neill)(tags: image photos pictures similar imagemagick via:dano metrics similarity)
-
a first-person game prototype in which players navigate a 3D space while picking up orbs that reduce the speed of light in increments. Custom-built, open-source relativistic graphics code allows the speed of light in the game to approach the player’s own maximum walking speed. Visual effects of special relativity gradually become apparent to the player, increasing the challenge of gameplay. These effects, rendered in realtime to vertex accuracy, include the Doppler effect (red- and blue-shifting of visible light, and the shifting of infrared and ultraviolet light into the visible spectrum); the searchlight effect (increased brightness in the direction of travel); time dilation (differences in the perceived passage of time from the player and the outside world); Lorentz transformation (warping of space at near-light speeds); and the runtime effect (the ability to see objects as they were in the past, due to the travel time of light). Players can choose to share their mastery and experience of the game through Twitter. A Slower Speed of Light combines accessible gameplay and a fantasy setting with theoretical and computational physics research to deliver an engaging and pedagogically rich experience.
Eventual Consistency Today: Limitations, Extensions, and Beyond - ACM Queue
Good overview of the current state of eventually-consistent data store research, covering CALM and CRDTs, from Peter Bailis and Ali Ghodsi
(tags: eventual-consistency data storage horizontal-scaling research distcomp distributed-systems via:martin-thompson crdts calm acid cap)
Latency's Worst Nightmare: Performance Tuning Tips and Tricks [slides]
the basics of running a service stack (web, app servers, data stores) on AWS. some good benchmark figures in the final slides
(tags: benchmarks aws ec2 ebs piops services scaling scalability presentations)
Rob "b3ta" Manuel in Dublin next week
The Bottom Half Of The Internet -- "Racism; typos; filth; spam; ignorance; rage – that's all the bottom half of the internet is good for, right? Rob Manuel wants you to question the internet dictum, most beloved of high-profile columnists, that you should ignore all of the comments all of the time. The 'war on comments', he reckons, might just be an echo of a fourth estate that's having trouble adjusting to the idea of an unwashed public disagreeing with their sacred opinions. Sous les pavés, la plage." On Tuesday, le cool Dublin & Pilcrow present SPIEL. Rob Manuel is the flashy animator behind B3ta and he's joined by Ed Melvin, who wants to educate you on 'The Unreal Engines' of virtual currencies and economies.
(tags: rob-manuel b3ta dublin comments internet meetings talks lecool)
Reality, Reactivity, Relevance and Repeatability in Java Application Profiling
this product from JInspired appears to support runtime profiling of java apps with < 5% performance impact
(tags: profiling performance java coding measurement)
You Lookin' At Me? Reflections on Google Glass
ex-Nokia product design guru Jan Chipchase on Google Glass
(tags: google privacy technology google-glass pervasive-computing life future)
Not the ‘best in the world’ - The Medical Independent
Debunking this prolife talking point:
'Our maternity services are amongst the best in the world’. This phrase has been much hackneyed since the heartbreaking death of Savita Halappanavar was revealed in mid October. James Reilly and other senior politicians are particularly guilty of citing this inaccurate position. So what is the state of Irish maternity services and how do our figures compare with other comparable countries? Let’s start with the statistics.
The bottom line:Eight deaths per 100,000 is not bad, but it ranks our maternity services far from the best in world and below countries such as Slovakia and Poland.
(tags: pro-choice ireland savita medicine health maternity morbidity statistics)
How Kaggle Is Changing How We Work - Thomas Goetz - The Atlantic
Founded in 2010, Kaggle is an online platform for data-mining and predictive-modeling competitions. A company arranges with Kaggle to post a dump of data with a proposed problem, and the site's community of computer scientists and mathematicians -- known these days as data scientists -- take on the task, posting proposed solutions. [...] On one level, of course, Kaggle is just another spin on crowdsourcing, tapping the global brain to solve a big problem. That stuff has been around for a decade or more, at least back to Wikipedia (or farther back, Linux, etc). And companies like TaskRabbit and oDesk have thrown jobs to the crowd for several years. But I think Kaggle, and other online labor markets, represent more than that, and I'll offer two arguments. First, Kaggle doesn't incorporate work from all levels of proficiency, professionals to amateurs. Participants are experts, and they aren't working for benevolent reasons alone: they want to win, and they want to get better to improve their chances of winning next time. Second, Kaggle doesn't just create the incidental work product, it creates a new marketplace for work, a deeper disruption in a professional field. Unlike traditional temp labor, these aren't bottom of the totem pole jobs. Kagglers are on top. And that disruption is what will kill Joy's Law. Because here's the thing: the Kaggle ranking has become an essential metric in the world of data science. Employers like American Express and the New York Times have begun listing a Kaggle rank as an essential qualification in their help wanted ads for data scientists. It's not just a merit badge for the coders; it's a more significant, more valuable, indicator of capability than our traditional benchmarks for proficiency or expertise. In other words, your Ivy League diploma and IBM resume don't matter so much as my Kaggle score. It's flipping the resume, where your work is measurable and metricized and your value in the marketplace is more valuable than the place you work.
(tags: academia datamining economics data kaggle data-science ranking work competition crowdsourcing contracting)
-
a good reference, with lots of sample output. Not clear if it takes 1.6/1.7 differences into account, though
Austerity policies founded on Excel typo
You've probably heard that countries with a high debt:GDP ratio suffer from slow economic growth. The specific number 90 percent has been invoked frequently. That's all thanks to a study conducted by Carmen Reinhardt and Kenneth Rogoff for their book This Time It's Different. But the results have been difficult for other researchers to replicate. Now three scholars at the University of Massachusetts have done so in "Does High Public Debt Consistently Stifle Economic Growth? A Critique of Reinhart and Rogoff" and they find that the Reinhart/Rogoff result is based on opportunistic exclusion of Commonwealth data in the late-1940s, a debatable premise about how to weight the data, and most of all a sloppy Excel coding error. Read Mike Konczal for the whole rundown, but I'll just focus on the spreadsheet part. At one point they set cell L51 equal to AVERAGE(L30:L44) when the correct procuedure was AVERAGE(L30:L49). By typing wrong, they accidentally left Denmark, Canada, Belgium, Austria, and Australia out of the average. When you run the math correctly "the average real GDP growth rate for countries carrying a public debt-to-GDP ratio of over 90 percent is actually 2.2 percent, not -0.1 percent."
(tags: austerity politics excel coding errors bugs spreadsheets economics economy)
Is Your MySQL Buffer Pool Warm? Make It Sweat!
How GroupOn are warming up a failover warm MySQL spare, using Percona stuff and a "tee" of the live in-flight queries. (via Dave Doran)
(tags: via:dave-doran mysql databases warm-spares spares failover groupon percona replication)
So now you know who gets some of those excessive Ticketmaster fees….
Interesting evidence; it appears Irish music promoters are getting "rebates" from the massive TicketMaster "booking fee", on each ticket sold. This sounds like a cartel to me, and we need to regulate this. Where is the National Consumer Agency and Competition Authority?
The matter is something which should be of concern to every gig-going music fan, regardless of whether they go to Stradbally or not. For years, many have asked about TicketMaster's quasi-monopoly position in the marketplace and why this is so. We’ve always been told that promoters preferred to deal with one company rather than several and that TM’s systems and nationwide reach yadda yadda yadda was the bees’ knees etc. Other companies have tried to compete but no-one has been able to beat TM at this game. But why would promoters go elsewhere when they’re getting a slice of the TM fees back as rebates? Those past off-the-record attempts by and briefings from promoters blaming TM for those fees can now be seen as hypocritical. They’re sticking with TM because they’re receiving a take of the fees paid by punters who have no other choice in service provider if they want to get their hands on tickets. You wonder what the acts make of this cash-grab – perhaps some whip-smart agent is already making a claim for a percentage of the rebates because there would be no rebates in the first place without the act. Surely this is an issue for the Competition Authority and National Consumers Association too, given the manner in which the rebates are made and TM’s deals with the promoters? While promoters under TM deals are free to sell a certain proportion of their tickets with another provider, it’s usually only a very small percentage of the total and unlikely to trouble TM’s bottom line. Also, given that the rebates are volume-driven, it’s better for the promoters to keep the largest possible chunk of their business with TM. It seems that we have a new suspect in the blame game about why ticket prices are so high.
(tags: regulation ireland cartels competition ticketing tickets ticketmaster music gigs consumer)
Blog shines spotlight on Dublin city’s illegal dumping problem
Hooray, Eoin's activism gets some coverage!
THE SCALE OF Dublin’s dumping problem is laid bare in a blog that has seen contributors send in photos of chairs, fridges and heaps of rubbish strewn on city streets. Eoin Parker, one of organisers behind DublinLitterBlog.com, spoke to TheJournal.ie about the problem, saying that the blog was set up following the privatisation of waste management by Dublin City Council in 2012.
(tags: dumping dublin litter rubbish blogs dcc d1 activism community)
-
To our knowledge, Ked is the first scripting language to emerge from The People's Republic of Cork. Below is an account of what we know so far about the mysterious Corkonian language. Any suggested updates or contributions are encouraged.
Genius. Just how bad are RTE’s finances?
A sobering examination by NAMAwinelake into the quagmire of Ireland's publicly-funded national broadcaster:
It seems that RTE has become a disaster zone, with libels and incompetence overseen by incapable management, and this is reflected in that organisation’s financial results. RTE still employs nearly 2,000 people and supports jobs and industry across independent producers and suppliers; it is a major business. But the time has come to call a halt to delusional management that is sinking the organization deeper into a quagmire which will ultimately need to be bailed out by the State. And Noel Curran is fobbing us off with flying a kite about a reduction in 65-year old Pat Kenny’s salary from €630,000 to €570,000?!
(tags: rte namawinelake public funding finances money mismanagement ireland incompetence tv news)
High Scalability - Scaling Pinterest - From 0 to 10s of Billions of Page Views a Month in Two Years
wow, Pinterest have a pretty hardcore architecture. Sharding to the max. This is scary stuff for me:
a [Cassandra-style] Cluster Management Algorithm is a SPOF. If there’s a bug it impacts every node. This took them down 4 times.
yeah, so, eek ;)(tags: clustering sharding architecture aws scalability scaling pinterest via:matt-sergeant redis mysql memcached)
Expert in Savita inquiry confirms Irish women get lower standard of care with chorioamnionitis
Dr. Jen Gunter again:
Dr. Knowles’ testimony confirms for me that the law played a role, because her statements indicate the standard of care for treatment of chorioamnionitis is less aggressive in Ireland. This can only be because of the law as there is no medical evidence to support delaying delivery when chorioamnionitis is diagnosed. Standard of care is not to wait until a woman is sick enough to need a termination, the idea is to treat her, you know, before she gets sick enough. An elevated white count and ruptured membranes at 17 weeks is typically enough to make the diagnosis, so Dr. Knowles needs to testify as to what in Savita’s medical record made it safe to not recommend a delivery. By the way, I also disagree with Dr. Knowles about her interpretation of Savita’s medical record, the chart doesn’t have “subtle indicators” of infection, it screams chorioamnionitis long before Wednesday morning. In North America the standard of care with chorioamnionitis is to recommend delivery as soon as the diagnosis is made, not wait until women enter the antechamber of death in the hopes that we can somehow snatch them back from the brink. If Irish law, or the interpretation thereof, had nothing to do with Savita’s death no expert would be mentioning sick enough at all.
(tags: jen-gunter ob-gyn medicine savita law ireland abortion tragedy galway hospital)
Boundary Product Update: Trends Dashboard Now Available
Boundary implement week-on-week trend display. Pity they use silly "giant number" dashboard boxes showing comparisons of the current datapoint with the previous week's datapoint; there's no indication of smoothing being applied, and "giant number" dashboards are basically useless anyway compared to a time-series graph, for unsmoothed time-series data. Also, no prediction bands. :(
(tags: boundary time-series tsd prediction metrics smoothing dataviz dashboards)
ESB Networks | Power Check | Service Interruptions Map
real-time service outage information on a map, from Ireland's power network
Project Voldemort at Gilt Groupe: When Failure Isn't an Option [slides]
Geir Magnusson explains how Gilt Groupe is using Project Voldemort to scale out their e-commerce transactional system. The initial SQL solution had to be replaced because it could not handle the transactional spikes the site is experiencing daily due to its particular way of selling their inventory: each day at noon. Magnusson explains why they chose Voldemort and talks about the architecture.
via Filippo(tags: via:filippo database architecture nosql data voldemort gilt-groupe ops storage presentations)
The full timeline of Savita Halappanavar's mistreatment
a comment on Dr. Jen Gunter's blog puts it all together
(tags: timeline savita abortion malpractice ireland medicine fail)
-
No holds barred:
Speaking today, spokesman Charles Stanley-Smith said; "This idea is insane. This area has suffered from dumping due to a lack of enforcement - yet the council now propose to effectively withdraw services altogether. As numerous studies such as 'the broken window hypothesis' indicate, where a small problem is left un-tackled it is likely to become far worse rather than better. In other words, rather than increase enforcement to solve the problem, Dublin City Council is going to remove enforcement. How will this deal with the problem? Imagine if that logic were applied to crime; would the removal of police services in an area help resolve criminal behaviour - or increase it? The answer is obvious."
(tags: an-taisce environment cleaning dublin ireland dcc rubbish trash society d1)
-
Written by Google, this library is a flexible, efficient, and powerful Java client library for accessing any resource on the web via HTTP. It features a pluggable HTTP transport abstraction that allows any low-level library to be used, such as java.net.HttpURLConnection, Apache HTTP Client, or URL Fetch on Google App Engine. It also features efficient JSON and XML data models for parsing and serialization of HTTP response and request content. The JSON and XML libraries are also fully pluggable, including support for Jackson and Android's GSON libraries for JSON.
Not quite as simple an API as Python's requests, sadly, but still an improvement on the verbose Apache HttpComponent API. Good support for unit testing via a built-in mock-response class. Still in beta(tags: google beta software http libraries json xml transports protocols)
Former IMF chief of mission to Ireland says not burning the bondholders was "a mistake"
Former IMF chief of mission to Ireland, Ashoka Mody, above left with Ajai Chopra in 2010. Melancholy of eye and large of loafer, Ashoka was involved in negotiating Ireland’s EU/IMF bailout. [...] This morning Ashok gave an interview to Gavin Jennings on Morning Ireland, in which he admitted Ireland’s bailout was riddled with mistakes, namely the non-burning of the senior bondholders and the program of austerity. Jennings: “So, if imposing austerity on Ireland was wrong, or a mistake; if not allowing any burning of bondholders, whether official, sovereign or private was a mistake; you were centrally involved in that program. I know Ajai Chopra was very much the public face of the IMF mission to Ireland. But you were centrally involved in constructing this bailout. How much responsibility do you take for those errors.” Mody: “Yes, so, obviously, I have to take the responsibility in…but I’m in very good company in taking responsibility in this. There were many parties involved. And my role really was to bring such matters to the attention of people who finally made these decisions.”
Great.(tags: bondholders imf ireland economy default ajai-chopra ashoka-mody)
Savita Halappanavar’s inquest: the three questions that must be answered | Dr. Jen Gunter
A professional OB/GYN analyses the horrors coming to light in the Savita inquest. Here's one particular gem:
Fetal survival with ruptured membranes at 17 weeks is 0%, this is from prospective study. [...but] “real and substantial risk” to the woman’s life is what is required by the Irish constitution to terminate a pregnancy, *whether or not the foetus is viable*.
So the foetus had 0% chance of survival -- but still termination was not considered an option. Bloody hell.(tags: religion ireland savita horrors malpractice galway guh hospitals hse health inquest abortion pro-choice pregnancy)
Minister Rabbitte welcomes EU agreement on re-use of Public Sector Information
Lots of talk about "charging regimes", "income-generating public sector bodies" etc., but not a single mention of open data or free access. Terrible stuff. :( (via conoro)
(tags: via:conoro open-access government public-sector ireland eu open-data public free)
Compression in Kafka: GZIP or Snappy ?
With Ack: in this mode, as far as compression is concerned, the data gets compressed at the producer, decompressed and compressed on the broker before it sends the ack to the producer. The producer throughput with Snappy compression was roughly 22.3MB/s as compared to 8.9MB/s of the GZIP producer. Producer throughput is 150% higher with Snappy as compared to GZIP. No ack, similar to Kafka 0.7 behavior: In this mode, the data gets compressed at the producer and it doesn’t wait for the ack from the broker. The producer throughput with Snappy compression was roughly 60.8MB/s as compared to 18.5MB/s of the GZIP producer. Producer throughput is 228% higher with Snappy as compared to GZIP. The higher compression savings in this test are due to the fact that the producer does not wait for the leader to re-compress and append the data; it simply compresses messages and fires away. Since Snappy has very high compression speed and low CPU usage, a single producer is able to compress the same amount of messages much faster as compared to GZIP.
The Bw-Tree: A B-tree for New Hardware - Microsoft Research
The emergence of new hardware and platforms has led to reconsideration of how data management systems are designed. However, certain basic functions such as key indexed access to records remain essential. While we exploit the common architectural layering of prior systems, we make radically new design decisions about each layer. Our new form of B tree, called the Bw-tree achieves its very high performance via a latch-free approach that effectively exploits the processor caches of modern multi-core chips. Our storage manager uses a unique form of log structuring that blurs the distinction between a page and a record store and works well with flash storage. This paper describes the architecture and algorithms for the Bw-tree, focusing on the main memory aspects. The paper includes results of our experiments that demonstrate that this fresh approach produces outstanding performance.
(tags: bw-trees database paper toread research algorithms microsoft sql sql-server b-trees data-structures storage cache-friendly mechanical-sympathy)
Boundary Techtalk - Large-scale OLAP with Kobayashi
Boundary on their TSD-on-Riak store.
Dietrich Featherston, Engineer at Boundary, walks through the process of designing Kobayashi, the time-series analytics database behind our network metrics. He goes through the false-starts and lessons learned in effectively using Riak as the storage layer for a large-scale OLAP database. The system is ultimately capable of answering complex, ad-hoc queries at interactive latencies.
(tags: video boundary tsd riak eventual-consistency storage kobayashi olap time-series)
-
A few days old, but already an instant Streisand-Effect classic:
Sometimes people borrow [Colin Purrington's free guide about making scientific posters] without giving him credit. This happens fairly regularly, and when he finds out about it, he sends an e-mail asking them to take it down. Usually they do. But when he sent an e-mail to the Consortium for Plant Biotechnology Research, asking that a roughly 1,200-word, near-verbatim, uncredited chunk from his guide be removed from the consortium’s materials, the response was unexpected. Rather than apologise, a lawyer sent him a cease-and-desist letter accusing him of plagiarizing the consortium’s materials and demanding that he take down his guide or face a lawsuit seeking damages up to $150,000.
(tags: streisand-effect lawsuits law infringement copyright cpbr bullying science posters)
Kafka 0.8 Producer Performance
Great benchmarking from Piotr Kozikowski at the LiveRamp team, into performance of the upcoming Kafka 0.8 release
(tags: performance kafka apache benchmarks ops queueing)
Running a Multi-Broker Apache Kafka 0.8 Cluster on a Single Node
an excellent writeup on Kafka 0.8's use and operation, including details of the new replication features
(tags: kafka replication queueing distributed ops)
-
'A €10 silver coin being offered for sale to the public in honour of James Joyce by the Central Bank tomorrow contains a misquote from the author. The line used on the coin from Chapter 3 of Ulysses includes a superfluous conjunction – a rogue ‘that’.' [..] The coin reads:
“Ineluctable modality of the visible: at least that if no more, thought through my eyes. Signatures of all things *that* I am here to read.”
(Incorrect 'that' emphasised)(tags: for:robotwisdom james-joyce typos funny fail central-bank ireland coins minting errors ulysses)
Netflix ISP Speed Index for Ireland
Via Mulley. Magnet doing well, with UPC coming second; UPC have dropped a fair bit in the past month. Would love to see it broken down by region...
(tags: upc ireland isps speed bandwidth netflix broadband magnet eircom)
Why I'm Walking Away From CouchDB
In practice there are two gotchas that are so painful I am looking for a replacement with a different featureset than couchdb provides. The location tracking project icecondor.com uses couchdb to store 20,000 new records per day. It has more write traffic than read traffic and runs on modest hardware. Those two gotchas are: 1. View Index updates. While I have a vague understanding of why view index updates are slow and bulky and important, in practice it is unworkable. Every write sets up a trap for the first reader to come along after the write. The more writes there are, the bigger the trap for the first reader which has to wait on the couchdb process that refreshes the view index on an as-needed basis. I believe this trade-off was made to keep writes fast. No need to update the view index until all writes are actually complete, right? Write traffic is heavier than read traffic and the time needed for that index refresh causes the webapp to crash because its not setup to handle timeouts from a database query. The workaround is as hackish as one can imagine - cron jobs to hit every map/reduce query to keep indexes fresh. 2. Append only database file Append only is in theory a great way to ensure on-disk reliability. A system crash during an append should only affect that append. Its a crash during an update to existing parts of the file that risks the integrity of more than whats being updated. With so many layers of caching and optimizations in the kernel and the filesystem and now in the workings of SSD drives, I'm not sure append-only gives extra protection anymore. What it does do is a create a huge operational headache. The on-disk file can never grow beyond half the available storage space. Record deletion uses new disk space and if the half-full mark approaches, vacuuming must be done. The entire database is rewritten to the filesystem, leaving out no longer needed records. If the data file should happen to grow beyond half the partition, the system has esentially crashed because there is no way to compact the file and soon the partition will be full. This is a likely scenario when there is a lot of record deletion activity. The system in question does a lot of writes of temporary data that is followed up by deletes a few days later. There is also a lot of permanent storage that hardly gets used. Rewriting every byte of the records that are long-lived due to compaction is an enormous amount of wasted I/O - doubly so given SSD drives have a short write-cycle lifespan.
(tags: nosql couchdb consistency checkpointing databases data-stores indexing)
CouchDB: not drinking the kool-aid
Jonathan Ellis on some CouchDB negatives:
Here are some reasons you should think twice and do careful testing before using CouchDB in a non-toy project: Writes are serialized. Not serialized as in the isolation level, serialized as in there can only be one write active at a time. Want to spread writes across multiple disks? Sorry. CouchDB uses a MVCC model, which means that updates and deletes need to be compacted for the space to be made available to new writes. Just like PostgreSQL, only without the man-years of effort to make vacuum hurt less. CouchDB is simple. Gloriously simple. Why is that a negative? It's competing with systems (in the popular imagination, if not in its author's mind) that have been maturing for years. The reason PostgreSQL et al have those features is because people want them. And if you don't, you should at least ask a DBA with a few years of non-MySQL experience what you'll be missing. The majority of CouchDB fans don't appear to really understand what a good relational database gives them, just as a lot of PHP programmers don't get what the big deal is with namespaces. A special case of simplicity deserves mention: nontrivial queries must be created as a view with mapreduce. MapReduce is a great approach to trivially parallelizing certain classes of problem. The problem is, it's tedious and error-prone to write raw MapReduce code. This is why Google and Yahoo have both created high-level languages on top of it (Sawzall and Pig, respectively). Poor SQL; even with DSLs being the new hotness, people forget that SQL is one of the original domain-specific languages. It's a little verbose, and you might be bored with it, but it's much better than writing low-level mapreduce code.
(tags: cassandra couch nosql storage distributed databases consistency)
What is the CouchDB replication protocol? Is it like Git? - Stack Overflow
Good write up of CouchDB replication
(tags: protocols couchdb sync replication git mvcc databases merging timelines)
TouchDB's reverse-engineered write-up of the Couch replication protocol
There really isn’t a separate “protocol” per se for replication. Instead, replication uses CouchDB’s REST API and data model. It’s therefore a bit difficult to talk about replication independently of the rest of CouchDB. In this document I’ll focus on the algorithm used, and link to documentation of the APIs it invokes. The “protocol” is simply the set of those APIs operating over HTTP.
(tags: couchdb protocols touchdb nosql replication sync mvcc revisions rest)
-
A good writeup of how to detect cases of copyright infringement for photography, art and other visual media.
Von Glitschka, Modern Dog and myriad others make clear that the support of the creative community is absolutely vital in raising awareness of copyright infringements. Sites like www.youthoughtwewouldntnotice.com name and shame clear breaches of copyright, while the Modern Dog case shows that there is no better IP tracing system than the eyes and ears of the design community itself. “It’s the industry at large that has kept me aware of infringements,” states Von. “Without that I would miss most of them because I don’t go looking – they find me via the eyes of others.”
(tags: photography art visual-media copyright infringement piracy ripping)
FastBit: An Efficient Compressed Bitmap Index Technology
an [LGPL] open-source data processing library following the spirit of NoSQL movement. It offers a set of searching functions supported by compressed bitmap indexes. It treats user data in the column-oriented manner similar to well-known database management systems such as Sybase IQ, MonetDB, and Vertica. It is designed to accelerate user's data selection tasks without imposing undue requirements. In particular, the user data is NOT required to be under the control of FastBit software, which allows the user to continue to use their existing data analysis tools. The key technology underlying the FastBit software is a set of compressed bitmap indexes. In database systems, an index is a data structure to accelerate data accesses and reduce the query response time. Most of the commonly used indexes are variants of the B-tree, such as B+-tree and B*-tree. FastBit implements a set of alternative indexes called compressed bitmap indexes. Compared with B-tree variants, these indexes provide very efficient searching and retrieval operations, but are somewhat slower to update after a modification of an individual record. A key innovation in FastBit is the Word-Aligned Hybrid compression (WAH) for the bitmaps.[...] Another innovation in FastBit is the multi-level bitmap encoding methods.
(tags: fastbit nosql algorithms indexing search compressed-bitmaps indexes wah bitmaps compression)
-
The bit array data structure is implemented in Java as the BitSet class. Unfortunately, this fails to scale without compression. JavaEWAH is a word-aligned compressed variant of the Java bitset class. It uses a 64-bit run-length encoding (RLE) compression scheme. We trade-off some compression for better processing speed. We also have a 32-bit version which compresses better, but is not as fast. In general, the goal of word-aligned compression is not to achieve the best compression, but rather to improve query processing time. Hence, we try to save CPU cycles, maybe at the expense of storage. However, the EWAH scheme we implemented is always more efficient storage-wise than an uncompressed bitmap (as implemented in the BitSet class). Unlike some alternatives, javaewah does not rely on a patented scheme.
(tags: javaewah wah rle compression bitmaps bitmap-indexes bitset algorithms data-structures)
Measure Anything, Measure Everything « Code as Craft
the classic Etsy pro-metrics "measure everything" post. Some good basic rules and mindset
Testing Your Automation [slides]
Test-driven infrastructure, using Chef -- slides from Big Ruby 2013. Tools used: foodcritic (lol), Chefspec, minitest-chef-handler, fauxhai, cucumber chef. This is really good to see -- TDD applied to ops. Video at: http://confreaks.com/videos/2309-bigruby2013-testing-your-automation-ttd-for-chef-cookbooks
(tags: devops ops chef automation testing tdd infrastructure provisioning deployment)
Meet the nice-guy lawyers who want $1,000 per worker for using scanners | Ars Technica
Great investigative journalism, interviewing the legal team behind the current big patent-troll shakedown; that on scanning documents with a button press, using a scanner attached to a network. They express whole-hearted belief in the legality of their actions, unsurprisingly -- they're exactly what you think they'd be like (via Nelson)
(tags: via:nelson ethics business legal patents swpats patent-trolls texas shakedown)
[#HADOOP-9448] Reimplement things - ASF JIRA
Pretty good April Fools from this year -- a patch to delete the entirety of Hadoop's codebase:
To avoid any bias to the existing code and make the same mistakes we should just delete trunk completely. Attached it is a script that deletes everything.
(tags: hadoop april-fools asf patches open-source oss)
Lucas Nussbaum’s Blog » Blog Archive » RVM: seriously?
+1. RVM is atrocious code -- some of the worst bash script I've seen. And it's not just installing as a command, it requires that it be sourced and hooks into your login shell. If you then use "set -e", it crashes; "set -u", it crashes; reset $HOME, crash. It's dire.
-
Next April 11th, at the IIEA in North Gt Georges St:
Rick Falkvinge, founder of the Swedish Pirate Party, will examine the case for reform of copyright and patent law in the EU. Legalised file sharing, free sampling and shortened copyright protection times are the main elements of a proposal co-authored by Mr. Falkvinge which was submitted to the European Parliament in 2012. He will question whether, in the context of ever-increasing online activity, existing legal frameworks pose a threat to users’ civil liberties.
(tags: rick-falkvinge pirate-party ireland iiea dublin copyright patents filesharing)
High Performance MongoDB Clusters with Amazon EBS Provisioned IOPS
yeah yeah, Mongo. bookmarking for the good data on EBS+PIOPS
(tags: ebs piops aws performance tips ops ec2 mongodb presentations)
-
These notes are intended to help users and system administrators maximize TCP/IP performance on their computer systems. They summarize all of the end-system (computer system) network tuning issues including a tutorial on TCP tuning, easy configuration checks for non-experts, and a repository of operating system specific instructions for getting the best possible network performance on these platforms.
Some tips for maximizing HPC network performance for the intra-DC case; recommended by the LinkedIn Kafka operations page.(tags: tuning network tcp sysadmin performance ops kafka ec2)
Increasing EBS Performance - Amazon Elastic Compute Cloud
good docs from EC2
(tags: ec2 ebs performance piops docs)
-
an open source virtualized Ethernet networking stack. I am developing Snabb Switch in response to several exciting trends: x86 has risen to be a powerful networking platform. Virtualization and SDN are pulling more networking into servers. Optimized user-space software is out-performing kernel-space software. Snabb Switch's simple and fast software-only data plane makes developing networking software easier than ever before.
Written in LuaJIT but aiming to be very fast. cool stuff, worth watching(tags: sdn software networking emulation snabb-switch luajit lua virtualization)
Abusing hash kernels for wildly unprincipled machine learning
what, is this the first time our spam filtering approach of hashing a giant feature space is hitting mainstream machine learning? that can't be right!
(tags: ai machine-learning python data hashing features feature-selection anti-spam spamassassin)
-
Joel On Software weighs in (via Tony Finch):
The fastest growing industry in the US right now, even during this time of slow economic growth, is probably the patent troll protection racket industry.
(tags: joel-on-software patents swpats shakedown extortion us-politics patent-trolls via:fanf)
-
Cap’n Proto is an insanely fast data interchange format and capability-based RPC system. Think JSON, except binary. Or think Protocol Buffers, except faster. In fact, in benchmarks, Cap’n Proto is INFINITY TIMES faster than Protocol Buffers.
Basically, marshalling like writing an aligned C struct to the wire, QNX messaging protocol-style. Wasteful on space, but responds to this by suggesting compression (which is a fair point tbh). C++-only for now. I'm not seeing the same kind of support for optional data that protobufs has though. Overall I'm worried there's some useful features being omitted here...(tags: serialization formats protobufs capn-proto protocols coding c++ rpc qnx messaging compression compatibility interoperability i14y)
CRDTs - Commutative Replicated Data Types [pdf]
Shared read-only data is easy to scale by using well-understood replication techniques. However, sharing mutable data at a large scale is a dicult problem, because of the CAP impossibility result [5]. Two approaches dominate in practice. One ensures scalability by giving up consistency guarantees, for instance using the Last-Writer-Wins (LWW) approach [7]. The alternative guarantees consistency by serialising all updates, which does not scale beyond a small cluster [12]. Optimistic replication allows replicas to diverge, eventually resolving conflicts either by LWW-like methods or by serialisation [11]. In some (limited) cases, a radical simplication is possible. If concurrent updates to some datum commute, and all of its replicas execute all updates in causal order, then the replicas converge.1 We call this a Commutative Replicated Data Type (CRDT). The CRDT approach ensures that there are no conflicts, hence, no need for consensus-based concurrency control. CRDTs are not a universal solution, but, perhaps surprisingly, we were able to design highly useful CRDTs. This new research direction is promising as it ensures consistency in the large scale at a low cost, at least for some applications.
(tags: consistency algorithms concurrency crdts distcomp data)
-
'The CRDT toolbox provides a collection of basic Conflict-free replicated data types as well as a common interface for defining your own CRDTs'. - in Eric Moritz' github. Also includes some more links to CRDT background reading.
(tags: crdt github eric-moritz python algorithms)
Eventually-Consistent Data Structures [slides]
implementing CRDTs in Riak and Voldemort
(tags: crdt algorithms distcomp riak voldemort distributed)
-
What do you get if you take one accountant with "a fondness for spreadsheets, finance and business" and mix with "a life-long passion for video games"? Well it's obvious isn't it? A turn-based RPG made and played entirely in Microsoft Excel.
(via Paul Moloney)(tags: via:oceanclub arena.xlsm excel spreadsheets games gaming rpg)
serverspec - unit tests for servers
With serverspec, you can write RSpec tests for checking your servers are provisioned correctly. Serverspec tests your servers' actual state through SSH access, so you don't need to install any agent softwares on your servers and can use any provisioning tools, Puppet, Chef, CFEngine and so on.
(via Dave Doran)(tags: via:dave-doran puppet testing chef cfengine unit-testing ops provisioning serverspec rspec ruby)
joshua's blog: overclocking the lecture
Joshua's old tip on watching videos at 2x speed using Perian
(tags: quicktime video hacks mac speed lectures presentations learning)
-
This seems pretty significant. Is the tide turning in the Texas Eastern District against patent trolls, at last? And does it establish sufficient precedent?
A federal judge has thrown out a patent claim against Rackspace, ruling that mathematical algorithms can’t be patented. The ruling in the Eastern Disrict stemmed from a 2012 complaint filed by Uniloc USA asserting that processing of floating point numbers by the Linux operating system was a patent violation. Chief Judge Leonard Davis based the ruling on U.S. Supreme Court case law that prohibits the patenting of mathematical algorithms. According to Rackspace, this is the first reported instance in which the Eastern District of Texas has granted an early motion to dismiss finding a patent invalid because it claimed unpatentable subject matter. Red Hat, which supplies Linux to Rackspace, provided Rackspace’s defense. Red Hat has a policy of standing behind customers through its Open Source Assurance program.
See https://news.ycombinator.com/item?id=5455869 for more discussion.(tags: east-texas patents swpats maths patenting law judges rackspace linux red-hat uniloc-usa floating-point)
Introducing Chronos: A Replacement for Cron
A distributed, fault-tolerant "cron" is something which comes up frequently -- it makes for a great fault-tolerance building block. This one sounds like it's too closely tied into Mesos, though (IMO).
Chronos is our replacement for cron. It is a distributed and fault-tolerant scheduler which runs on top of Mesos. It's a framework and supports custom mesos executors as well as the default command executor. Thus by default, Chronos executes SH (on most systems BASH) scripts. Chronos can be used to interact with systems such as Hadoop (incl. EMR), even if the mesos slaves on which execution happens do not have Hadoop installed. Included wrapper scripts allow transfering files and executing them on a remote machine in the background and using asynchroneous callbacks to notify Chronos of job completion or failures.
(tags: cron scheduling mesos stacks design airbnb chronos fault-tolerance distcomp distributed-computing scripts jobs)
One of CloudFlare's upstream providers on the "death of the internet" scare-mongering
Having a bad day on the Internet is nothing new. These are the types of events we deal with on a regular basis, and most large network operators are very good at responding quickly to deal with situations like this. In our case, we worked with Cloudflare to quickly identify the attack profile, rolled out global filters on our network to limit the attack traffic without adversely impacting legitimate users, and worked with our other partner networks (like NTT) to do the same. If the attacks had stopped here, nobody in the "mainstream media" would have noticed, and it would have been just another fun day for a few geeks on the Internet. The next part is where things got interesting, and is the part that nobody outside of extremely technical circles has actually bothered to try and understand yet. After attacking Cloudflare and their upstream Internet providers directly stopped having the desired effect, the attackers turned to any other interconnection point they could find, and stumbled upon Internet Exchange Points like LINX (in London), AMS-IX (in Amsterdam), and DEC-IX (in Frankfurt), three of the largest IXPs in the world. An IXP is an "interconnection fabric", or essentially just a large switched LAN, which acts as a common meeting point for different networks to connect and exchange traffic with each other. One downside to the way this architecture works is that there is a single big IP block used at each of these IXPs, where every network who interconnects is given 1 IP address, and this IP block CAN be globally routable. When the attackers stumbled upon this, probably by accident, it resulted in a lot of bogus traffic being injected into the IXP fabrics in an unusual way, until the IXP operators were able to work with everyone to make certain the IXP IP blocks weren't being globally re-advertised. Note that the vast majority of global Internet traffic does NOT travel over IXPs, but rather goes via direct private interconnections between specific networks. The IXP traffic represents more of the "long tail" of Internet traffic exchange, a larger number of smaller networks, which collectively still adds up to be a pretty big chunk of traffic. So, what you actually saw in this attack was a larger number of smaller networks being affected by something which was an completely unrelated and unintended side-effect of the actual attacks, and thus *poof* you have the recipe for a lot of people talking about it. :) Hopefully that clears up a bit of the situation.
(tags: bandwidth internet gizmodo traffic cloudflare ddos hacking)
21 graphs that show America’s health-care prices are ludicrous
Excellent data, this. I'd heard a few of these prices, but these graphs really hit home. $26k for a caesarean section at the 95th percentile!? talk about out of control price gouging.
(tags: healthcare costs economics us-politics world comparison graphs charts data via:hn america)
Design for developers [presentation]
A nice set of practical web/UI/tpyography design guidelines, naming specific sources (via Rob C)
-
'13 Security Gotchas You Should Know About'
Film4 Presents A Season Of Studio Ghibli Classics
hooray! Plenty of dubs, too, which is handy when you have little kids like mine ;)
(tags: studio-ghibli film4 movies anime animation to-watch tv)
The first pillar of agile sysadmin: We alert on what we draw
'One of [the] purposes of monitoring systems was to provide data to allow us, as engineers, to detect patterns, and predict issues before they become production impacting. In order to do this, we need to be capturing data and storing it somewhere which allows us to analyse it. If we care about it - if the data could provide the kind of engineering insight which helps us to understand our systems and give early warning - we should be capturing it. ' .... 'There are a couple of weaknesses in [Nagios' design]. Assuming we’ve agreed that if we care about a metric enough to want to alert on it then we should be gathering that data for analysis, and graphing it, then we already have the data upon which to base our check. Furthermore, this data is not on the machine we’re monitoring, so our checks don’t in any way add further stress to that machine.' I would add that if we are alerting on a different set of data from what we collect for graphing, then using the graphs to investigate an alarm may run into problems if they don't sync up.
(tags: devops monitoring deployment production sysadmin ops alerting metrics)
JPL Institutional Coding Standard for the Java Programming Language
From JPL's Laboratory for Reliable Software (LaRS). Great reference; there's some really useful recommendations here, and good explanations of familiar ones like "prefer composition over inheritance". Many are supported by FindBugs, too. Here's the full list:
compile with checks turned on; apply static analysis; document public elements; write unit tests; use the standard naming conventions; do not override field or class names; make imports explicit; do not have cyclic package and class dependencies; obey the contract for equals(); define both equals() and hashCode(); define equals when adding fields; define equals with parameter type Object; do not use finalizers; do not implement the Cloneable interface; do not call nonfinal methods in constructors; select composition over inheritance; make fields private; do not use static mutable fields; declare immutable fields final; initialize fields before use; use assertions; use annotations; restrict method overloading; do not assign to parameters; do not return null arrays or collections; do not call System.exit; have one concept per line; use braces in control structures; do not have empty blocks; use breaks in switch statements; end switch statements with default; terminate if-else-if with else; restrict side effects in expressions; use named constants for non-trivial literals; make operator precedence explicit; do not use reference equality; use only short-circuit logic operators; do not use octal values; do not use floating point equality; use one result type in conditional expressions; do not use string concatenation operator in loops; do not drop exceptions; do not abruptly exit a finally block; use generics; use interfaces as types when available; use primitive types; do not remove literals from collections; restrict numeric conversions; program against data races; program against deadlocks; do not rely on the scheduler for synchronization; wait and notify safely; reduce code complexity
(tags: nasa java reference guidelines coding-standards jpl reliability software coding oo concurrency findbugs bugs)
KDE's brush with git repository corruption: post-mortem
a barely-averted disaster... phew.
while we planned for the case of the server losing a disk or entirely biting the dust, or the total loss of the VM’s filesystem, we didn’t plan for the case of filesystem corruption, and the way the corruption affected our mirroring system triggered some very unforeseen and pathological conditions. [...] the corruption was perfectly mirrored... or rather, due to its nature, imperfectly mirrored. And all data on the anongit [mirrors] was lost.
One risk demonstrated: by trusting in mirroring, rather than a schedule of snapshot backups covering a wide time range, they nearly had a major outage. Silent data corruption, and code bugs, happen -- backups protect against this, but RAID, replication, and mirrors do not. Another risk: they didn't have a rate limit on project-deletion, which resulted in the "anongit" mirrors deleting their (safe) data copies in response to the upstream corruption. Rate limiting to sanity-check automated changes is vital. What they should have had in place was described by the fix: 'If a new projects file is generated and is more than 1% different than the previous file, the previous file is kept intact (at 1500 repositories, that means 15 repositories would have to be created or deleted in the span of three minutes, which is extremely unlikely).'(tags: rate-limiting case-studies post-mortems kde git data-corruption risks mirroring replication raid bugs backups snapshots sanity-checks automation ops)
-
Metrics rule the roost -- I guess there's been a long history of telemetry in space applications.
To make software more visible, you need to know what it is doing, he said, which means creating "metrics on everything you can think of".... Those metrics should cover areas like performance, network utilization, CPU load, and so on. The metrics gathered, whether from testing or real-world use, should be stored as it is "incredibly valuable" to be able to go back through them, he said. For his systems, telemetry data is stored with the program metrics, as is the version of all of the code running so that everything can be reproduced if needed. SpaceX has programs to parse the metrics data and raise an alarm when "something goes bad". It is important to automate that, Rose said, because forcing a human to do it "would suck". The same programs run on the data whether it is generated from a developer's test, from a run on the spacecraft, or from a mission. Any failures should be seen as an opportunity to add new metrics. It takes a while to "get into the rhythm" of doing so, but it is "very useful". He likes to "geek out on error reporting", using tools like libSegFault and ftrace. Automation is important, and continuous integration is "very valuable", Rose said. He suggested building for every platform all of the time, even for "things you don't use any more". SpaceX does that and has found interesting problems when building unused code. Unit tests are run from the continuous integration system any time the code changes. "Everyone here has 100% unit test coverage", he joked, but running whatever tests are available, and creating new ones is useful. When he worked on video games, they had a test to just "warp" the character to random locations in a level and had it look in the four directions, which regularly found problems. "Automate process processes", he said. Things like coding standards, static analysis, spaces vs. tabs, or detecting the use of Emacs should be done automatically. SpaceX has a complicated process where changes cannot be made without tickets, code review, signoffs, and so forth, but all of that is checked automatically. If static analysis is part of the workflow, make it such that the code will not build unless it passes that analysis step. When the build fails, it should "fail loudly" with a "monitor that starts flashing red" and email to everyone on the team. When that happens, you should "respond immediately" to fix the problem. In his team, they have a full-size Justin Bieber cutout that gets placed facing the team member who broke the build. They found that "100% of software engineers don't like Justin Bieber", and will work quickly to fix the build problem.
(tags: spacex dev coding metrics deplyment production space justin-bieber)
-
'the story of ketchup is a story of globalization and centuries of economic domination by a world superpower. But the superpower isn't America, and the century isn't ours. Ketchup's origins in the fermented sauces of China and Southeast Asia mean that those little plastic packets under the seat of your car are a direct result of Chinese and Asian domination of a single global world economy for most of the last millenium.'
(tags: ketchup china nam-pla food etymology condiments history trade)
-
now this is a neat trick -- having been stuck having to flip to spares and do other antics while a long-running heap dump took place, this is a winner.
Dumping a JVM’s heap is an extremely useful tool for debugging problems with a J2EE application. Unfortunately, when a JVM explodes, using the standard jmap tool can take an inordinate amount of time to execute for lots of different reasons. This leads to extended downtime when a heap dump is attempted and even then, jmap regularly fails. This blog post is intended to outline an alternate method using [gdb] to achieve a heap dump that only requires mere seconds of additional downtime allowing the slow jmap process to happen once the application is back in service.
(tags: heap-dump gdb heap jvm java via:peakscale gcore core core-dump debugging)
-
'Edition has a ‘design for life’ philosophy - we think that unique designer-made items can be a part of our everyday lives without costing the earth. We stock affordable, contemporary and functional products (mostly handmade), including jewellery, home-ware, accessories, art and toys. Every item has been carefully selected and are all designed here in Ireland.'
BBC Test Card image (1080p HD version)
via colinwh. The de-facto standard HTPC desktop background
(tags: htpc desktops hd 1080p bbc test-card tv scary-clowns)
-
Neil Fraser visits a school in Vietnam, and investigates their computer science curriculum. They are doing an incredible job, it looks like -- very impressive!
(tags: vietnam programming education cs computer-science schools coding children)
TOSEC: Commodore C64 (2012-04-23) : Free Download & Streaming : Internet Archive
A massive, 6.5GB collection of C64 history.
There are an astounding 134,000+ disk, cassette and documentation items in this Commodore 64 collection, including games, demos, cractros, and compilations.
(tags: commodore c64 history computing software demos archive)
By the numbers: How Google Compute Engine stacks up to Amazon EC2
Scalr's thoughts on Google's EC2 competitor.
with Google Compute Engine, AWS has a formidable new competitor in the public cloud space, and we’ll likely be moving some of Scalr’s production workloads from our hybrid aws-rackspace-softlayer setup to it when it leaves beta. There’s a strong technical case for migrating heavy workloads to GCE, and I’ll be grabbing popcorn to eagerly watch as the battle unfolds between the giants.
-
realtime collaboration API. nifty! but can it collaborate on a per-app shared doc, or does it require that the app user auth to Google and access their own docs?
(tags: collaboration api realtime google javascript)
Percona Playback's tcpdump plugin
Capture MySQL traffic via tcpdump, tee it over the network to replay against a second database. Even supports query execution times and pauses between queries to playback the same load level
(tags: tcpdump production load-testing testing staging tee networking netcat percona replay mysql)
Riak CS is now ASL2 open source
'Organizations and users can now access the source code on Github and download the latest packages from the downloads page. Also, today, we announced that Riak CS Enterprise is now available as commercial licensed software, featuring multi-datacenter replication technology and 24×7 Basho customer support.'
(tags: riak riak-cs nosql storage basho open-source github apache asl2)
Hadoop Operations at LinkedIn [slides]
another good Hadoop-at-scale presentation, from LI this time
Sift Science says it can sniff out cyber fraud — before it gets expensive
Great idea for a startup. This stuff is complex, right in the heart of every company's ordering pipeline, and I can see a lot of customers for this
(tags: sift-science anti-fraud fraud b2b b2c ecommerce startups aws)
What would you do: Part 2, the Island of Surpyc
Amazing. 'Cyprus Bailout Choose Your Own Adventure', basically
(tags: cyoa adventure dice games cyprus politics eu bailouts ecb banking troika)
Running the Largest Hadoop DFS Cluster
Facebook's 1PB Hadoop cluster. features improved NameNode availability work and 4 levels of data aging, with reduced replication and Reed-Solomon RAID encoding for colder data ages
(tags: aging data facebook hadoop hdfs reed-solomon error-correction replication erasure-coding)
The America Invents Act: Fighting Patent Trolls With "Prior Art"
Don Marti makes some suggestions regarding the America Invents Act: record your work's timeline; use the new Post-Grant Challenging process; and use the new "prior user" defence, which lets you rely on your own non-public uses.
many of the best practices for tracking new versions of software and other digital assets can also help protect you against patent trolls. It’s a good time to talk to your lawyer about a defensive strategy, and to connect that strategy to your version control and deployment systems to make sure you’re collecting and retaining all of the information that could help you under this new law.
(tags: swpats patent-trolls patenting us prior-art)
Announcing the Voldemort 1.3 Open Source Release
new release from LinkedIn -- better p90/p99 PUT performance, improvements to the BDB-JE storage layer, massively-improved rebalance performance
(tags: voldemort linkedin open-source bdb nosql)
Data Corruption To Go: The Perils Of sql_mode = NULL « Code as Craft
bloody hell. A load of cases where MySQL will happily accommodate all sorts of malformed and invalid input -- thankfully with fixes
(tags: mysql input corrupt invalid validation coding databases sql)
-
a high-performance C server which is used to expose bloom filters and operations over them to networked clients. It uses a simple ASCII protocol which is human readable, and similar to memcached.
(via Tony Finch)(tags: via:fanf memcached bloomd open-source bloom-filters)
Thoughts on configuration file complexity
some interesting thoughts on the old "Turing complete configuration language" question
(tags: configuration turing-complete programming ops testing)
From a monolithic Ruby on Rails app to the JVM
How Soundcloud have ditched the monolithic Rails for nimbler, small-scale distributed polyglot services running on the JVM
(tags: soundcloud rails slides jvm scalability ruby scala clojure coding)
Opinion: The Internet is a surveillance state
Bruce Schneier op-ed on CNN.com.
So, we're done. Welcome to a world where Google knows exactly what sort of porn you all like, and more about your interests than your spouse does. Welcome to a world where your cell phone company knows exactly where you are all the time. Welcome to the end of private conversations, because increasingly your conversations are conducted by e-mail, text, or social networking sites. And welcome to a world where all of this, and everything else that you do or is done on a computer, is saved, correlated, studied, passed around from company to company without your knowledge or consent; and where the government accesses it at will without a warrant. Welcome to an Internet without privacy, and we've ended up here with hardly a fight.
(tags: freedom surveillance legal privacy internet bruce-schneier web google facebook)
Single Producer/Consumer lock free Queue step by step
great dissection of Martin "Disruptor" Thompson's lock-free single-producer/single-consumer queue data structure, with benchmark results showing crazy speedups. This is particularly useful since it's a data structure that can be used to provide good lock-free speedups without adopting the entire Disruptor design pattern.
(tags: disruptor coding java jvm martin-thompson lock-free volatile atomic queue data-structures)
Roko's basilisk - RationalWiki
Wacky transhumanists.
Roko's basilisk is notable for being completely banned from discussion on LessWrong, where any mention of it is deleted. Eliezer Yudkowsky, founder of LessWrong, considers the basilisk would not work, but will not explain why because he does not consider open discussion of the notion of acausal trade with possible superintelligences to be provably safe. Silly over-extrapolations of local memes are posted to LessWrong quite a lot; almost all are just downvoted and ignored. But this one, Yudkowsky reacted to hugely, then doubled-down on his reaction. Thanks to the Streisand effect, discussion of the basilisk and the details of the affair soon spread outside of LessWrong. The entire affair is a worked example of spectacular failure at community management and at controlling purportedly dangerous information. Some people familiar with the LessWrong memeplex have suffered serious psychological distress after contemplating basilisk-like ideas — even when they're fairly sure intellectually that it's a silly problem.[5] The notion is taken sufficiently seriously by some LessWrong posters that they try to work out how to erase evidence of themselves so a future AI can't reconstruct a copy of them to torture.[6]
(tags: transhumanism funny insane stupid singularity ai rokos-basilisk via:maciej lesswrong rationalism superintelligences striesand-effect absurd)
How the America Invents Act Will Change Patenting Forever
Bet you didn't think the US software patents situation could get worse? wrong!
“Now it’s really important to be the first to file, and it’s really important to file before somebody else puts a product out, or puts the invention in their product,” says Barr, adding that it will “create a new urgency on the part of everyone to file faster -- and that’s going to be a problem for the small inventor.”
(tags: first-to-file omnishambles uspto swpats patents software-patents law legal)
Distributed Systems Tracing with Zipkin
Twitter's version of the "canary"/"tracer" request concept
(tags: twitter zipkin tracing tracer-requests canary-requests http debugging production live distributed-systems distcomp stack infrastructure ops)
Transitioning from Google Reader to feedly
xpecting for some time: We have been working on a project called Normandy which is a feedly clone of the Google Reader API – running on Google App Engine. When Google Reader shuts down, feedly will seamlessly transition to the Normandy back end.
Excellent stuff -- I've just tried feedly and it's looking good -- in fact it may be a better UI overall anyway.(tags: feedly google-reader transition rss atom feeds web)
Double vision: seeing both sides of Syria’s war
A skirmish is filmed, using HD video cameras, by both sides. Storyful pinpoint the location. War as panopticon
(tags: storyful war syria future tanks battle video youtube hd panopticon)
Using DiffMerge as your Git visual merge and diff tool
A decent 3-way-diff GUI merge tool which works with git on OSX. "git config" command-lines included in this blog post
(tags: git merge osx mac macosx diff mergetool merging cli diffmerge)
-
A bunch of magic command lines to set useful OS X prefs without pointy-clicky. at least some also seem to work on Mountain Lion
-
'bootstrap an OSX development machine with a one-liner'.
Many teams use chef to manage their production machines, but developers often build their development boxes by hand. SoloWizard makes it painless to create a configurable chef solo script to get your development machine humming: mysql, sublime text, .bash_profile tweaks to OS-X settings - it's all there!
(tags: osx chef mac build-out ops macosx deployment developers desktops laptops mysql rabbitmq activemq nginx)
-
'Our results suggest that the Cablevision decision, [which was widely seen as easing certain ambiguities surrounding intellectual property], led to additional incremental investment in U.S. cloud computing firms that ranged from $728 million to approximately $1.3 billion over the two-and-a-half years after the decision. When paired with the findings of the enhanced effects of VC investment relative to corporate investment, this may be the equivalent of $2 to $5 billion in traditional R&D investment.' via Fred Logue.
(tags: via:fplogue law ip copyright policy cablevision funding vc cloud-computing investment legal buffering)
A History Of Ireland In 100 Objects
Now free!
The Royal Irish Academy, the National Museum of Ireland, and The Irish Times are collaborating with the EU Presidency, the Department of Foreign Affairs and Trade and Adobe to bring you a gift of A History of Ireland in 100 objects ‘from the people of Ireland to the people of the world’ for St Patrick’s Day. It is available as an interactive app for Apple iPhone and iPad, for most Android tablets and on the Kindle Fire, from our website, as well as associated app stores. You can also experience the book on your computer, smartphone or eReader by clicking on the 'eBook' button below. The gift is free to download until the end of March.
(tags: free st-patricks-day museum ireland history objects eu apps iphone ipad android books ebooks)
First 5 Minutes Troubleshooting A Server
quite a good checklist of first steps for troubleshooting. Worth bookmarking for "dstat --top-io --top-bio" alone, which is an absolutely excellent tool and new to me
(tags: dstat server io disks hardware performance linux sysadmin ops troubleshooting checklists root-cause)
-
you really know you've made it as an inept Irish politician when Panti Bliss gets dressed up in her most senatorial wig to take the mickey out of you
(tags: funny comedy fidelma-healy-eames politics ireland social-media inept youtube video)
Confusion reigns over three “hijacked” ccTLDs
This kind of silliness is only likely to increase as the number of TLDs increases (and they become more trivial).
What seems to be happening here is that [two companies involved] have had some kind of dispute, and that as a result the registrants and the reputation of three countries’ ccTLDs have been harmed. Very amateurish.
(tags: tlds domains via:fanf amateur-hour dns cctlds registrars adamsnames)
-
interesting details about Riak's support for secondary indexes. Not quite SQL, but still more powerful than plain old K/V storage (via dehora)
(tags: via:dehora riak indexes storage nosql key-value-stores 2i range-queries)
Metric Collection and Storage with Cassandra | DataStax
DataStax' documentation on how they store TSD data in Cass. Pretty generic
(tags: datastax nosql metrics analytics cassandra tsd time-series storage)
Jeff Dean's list of "Numbers Everyone Should Know"
from a 2007 Google all-hands, the list of typical latency timings from ranging from an L1 cache reference (0.5 nanoseconds) to a CA->NL->CA IP round trip (150 milliseconds).
(tags: performance latencies google jeff-dean timing caches speed network zippy disks via:kellabyte)
-
'a columnar storage format that supports nested data', from Twitter and Cloudera, encoded using Apache Thrift in a Dremel-based record shredding and assembly algorithm. Pretty crazy stuff:
We created Parquet to make the advantages of compressed, efficient columnar data representation available to any project in the Hadoop ecosystem. Parquet is built from the ground up with complex nested data structures in mind, and uses the record shredding and assembly algorithm described in the Dremel paper. We believe this approach is superior to simple flattening of nested name spaces. Parquet is built to support very efficient compression and encoding schemes. Multiple projects have demonstrated the performance impact of applying the right compression and encoding scheme to the data. Parquet allows compression schemes to be specified on a per-column level, and is future-proofed to allow adding more encodings as they are invented and implemented. Parquet is built to be used by anyone. The Hadoop ecosystem is rich with data processing frameworks, and we are not interested in playing favorites. We believe that an efficient, well-implemented columnar storage substrate should be useful to all frameworks without the cost of extensive and difficult to set up dependencies.
(tags: twitter cloudera storage parquet dremel columns record-shredding hadoop marshalling columnar-storage compression data)
Bunnie Huang's "Hacking the Xbox" now available as a free PDF
'No Starch Press and I have decided to release this free ebook version of Hacking the Xbox in honor of Aaron Swartz. As you read this book, I hope that you’ll be reminded of how important freedom is to the hacking community and that you’ll be inclined to support the causes that Aaron believed in. I agreed to release this book for free in part because Aaron’s treatment by MIT is not unfamiliar to me. In this book, you will find the story of when I was an MIT graduate student, extracting security keys from the original Microsoft Xbox. You’ll also read about the crushing disappointment of receiving a letter from MIT legal repudiating any association with my work, effectively leaving me on my own to face Microsoft. The difference was that the faculty of my lab, the AI laboratory, were outraged by this treatment. They openly defied MIT legal and vowed to publish my work as an official “AI Lab Memo,” thereby granting me greater negotiating leverage with Microsoft. Microsoft, mindful of the potential backlash from the court of public opinion over suing a legitimate academic researcher, came to a civil understanding with me over the issue.' This is a classic text on hardware reverse-engineering and the freedom to tinker -- strongly recommended.
(tags: hacking bunnie-huang xbox free hardware drm freedom-to-tinker books reading mit microsoft history)
Daemon Showdown: Upstart vs. Runit vs. Systemd vs. Circus vs. God
strangely, no mention of runit being total shite though
(tags: daemons runit upstart systemd supervisord circus god nannies processes unix crash-only-software linux ops)
-
Clojure-style lazy functional collections (via QCon via Caro)
(tags: via:caro collections java functional lazy-loading lazy-computation lazy clojure)
4 Things Java Programmers Can Learn from Clojure (without learning Clojure)
'1. Use immutable values; 2. Do no work in the constructor; 3. Program to small interfaces; 4. Represent computation, not the world'. Strongly agreed with #1, and the others look interesting too
Tactical Chat: How the U.S. Military Uses IRC to Wage War
Excellent stuff. Lessons to be learned from this: IRC has some key features that mean it can be useful in this case. 1. simple text, everything supports it, no fancy UI clients are necessary; 2. resilient against lossy/transient/low-bandwidth/high-latency networks; 3. standards-compliant and "battle-hardened" (so to speak); 4. open-source/non-proprietary.
Despite the U.S. military’s massive spending each year on advanced communications technology, the use of simple text chat or tactical chat has outpaced other systems to become one of the most popular paths for communicating practical information on the battlefield. Though the use of text chat by the U.S. military first began in the early 1990s, in recent years tactical chat has evolved into a “primary ‘comms’ path, having supplanted voice communications as the primary means of common operational picture (COP) updating in support of situational awareness.” An article from January 2012 in the Air Land Sea Bulletin describes the value of tactical chat as an effective and immediate communications method that is highly effective in distributed, intermittent, low bandwidth environments which is particularly important with “large numbers of distributed warfighters” who must “frequently jump onto and off of a network” and coordinate with other coalition partners. Text chat also provides “persistency in situational understanding between those leaving and those assuming command watch duties” enabling a persistent record of tactical decision making. A 2006 thesis from the Naval Postgraduate School states that internet relay chat (IRC) is one of the most widely used chat protocols for military command and control (C2). Software such as mIRC, a Windows-based chat client, or integrated systems in C2 equipment are used primarily in tactical conditions though efforts are underway to upgrade systems to newer protocols.
(via JK)(tags: via:jk war irc chat mirc us-military tactical-chat distcomp networking)
-
Great neologism from Mick Fealty:
Familiar to anyone who’s followed public debate on Northern Ireland. Some define it as the often multiple blaming and finger pointing that goes on between communities in conflict. Political differences are marked by powerful emotional (often tribal) reactions as opposed to creative conflict over policy and issues. It’s beginning to be known well beyond the bounds of Northern Ireland. [...] Evasion may not be the intention but it is the obvious effect. It occurs when individuals are confronted with a difficult or uncomfortable question. The respondent retrenches his/her position and rejigs the question, being careful to pick open a sore point on the part of questioner’s ‘tribe’. He/she then fires the original query back at the inquirer.
(tags: words etymology whataboutery argument debate northern-ireland mick-fealty slugger-otoole)
-
Give your app its own private Dropbox client and leave the syncing to us.
the real reason Marissa Mayer canned remote Y! employees (apparently)
After spending months frustrated at how empty Yahoo parking lots were, Mayer consulted Yahoo's VPN logs to see if remote employees were checking in enough. Mayer discovered they were not — and her decision was made. we're hearing from people close to Yahoo executives and employees that she made the right decision banning work from home. "The employees at Yahoo are thrilled," says one source close to the company. "There isn't massive uprising. The truth is, they've all been pissed off that people haven't been working."
(tags: yahoo work remote-work teleworking slacking marissa-mayer funny)
Online Schema Change for MySQL
A tool written by Facebook to ease the pain of online MySQL schema-change migrations.
Some ALTER TABLE statements take too long form the perspective of some MySQL users. The fast index create feature for the InnoDB plugin in MySQL 5.1 makes this less of an issue but this can still take minutes to hours for a large table and for some MySQL deployments that is too long. A workaround is to perform the change on a slave first and then promote the slave to be the new master. But this requires a slave located near the master. MySQL 5.0 added support for triggers and some replication systems have been built using triggers to capture row changes. Why not use triggers for this? The openarkkit toolkit did just that with oak-online-alter-table. We have published our version of an online schema change utility (OnlineSchemaChange.php aka OSC).
(tags: facebook mysql sql schema database migrations ops alter-table)
Netflix Queue: Data migration for a high volume web application
There will come a time in the life of most systems serving data, when there is a need to migrate data to [another] data store while maintaining or improving data consistency, latency and efficiency. This document explains the data migration technique we used at Netflix to migrate the user’s queue data between two different distributed NoSQL storage systems [SimpleDB to Cassandra].
(tags: cassandra netflix migrations data schema simpledb storage)
Monitoring Apache Hadoop, Cassandra and Zookeeper using Graphite and JMXTrans
nice enough, but a lot of moving parts. It would be nice to see a simpler ZK+Graphite setup using the 'mntr' verb
(tags: graphite monitoring ops zookeeper cassandra hadoop jmx jmxtrans graphs)
RFC 6585 - Additional HTTP Status Codes
includes "429 Too Many Requests", for rate limits
Curator Framework: Reducing the Complexity of Building Distributed Systems | Marketing Technology
good +1 for using Netflix' Curator ZK client library
-
a high-level API that greatly simplifies using ZooKeeper. It adds many features that build on ZooKeeper and handles the complexity of managing connections to the ZooKeeper cluster and retrying operations. Some of the features are: Automatic connection management: There are potential error cases that require ZooKeeper clients to recreate a connection and/or retry operations. Curator automatically and transparently (mostly) handles these cases. Cleaner API: simplifies the raw ZooKeeper methods, events, etc.; provides a modern, fluent interface Recipe implementations (see Recipes): Leader election, Shared lock, Path cache and watcher, Distributed Queue, Distributed Priority Queue
(tags: zookeeper java netflix distcomp libraries oss open-source distributed)
OscarGodson.js | What I Learned At Yammer
some pretty interesting lessons, it turns out: a 'take what you need' vacation policy means nobody takes vacations (unsurprising); Yammer actively work to avoid employee burnout (good idea); Yammer A/B test every feature; and Yammer mgmt try to let their devs work autonomously.
-
Some really cool-looking UNIX command line utils, packaged in Debian (and therefore in Ubuntu too). A few of these I've reimplemented separately, but it's always good to replace a hack with a more widely available "official" tool. Thanks, Joey Hess!
sponge: accept input, wait til EOF, then rewrite a file; chronic: runs a command quietly unless it fails; combine: combine the lines in two files using boolean operations; ifdata: get network interface info without parsing ifconfig output; ifne: run a program if the standard input is not empty; isutf8: check if a file or standard input is utf-8; lckdo: execute a program with a lock held; mispipe: pipe two commands, returning the exit status of the first; parallel: run multiple jobs at once; pee: tee standard input to pipes; sponge: soak up standard input and write to a file; ts: timestamp standard input; vidir: edit a directory in your text editor; vipe: insert a text editor into a pipe; zrun: automatically uncompress arguments to command
(tags: bash shell cli unix scripting via:peakscale joey-hess debian ubuntu tools command-line commands)
Test-Driven Infrastructure with Chef
Interesting idea.
The book introduces “Infrastructure as Code,” test-driven development, Chef, and cucumber-chef, and then proceeds to a simple example using Chef to provision a shared Linux server. The recipes for the server are developed test-first, demonstrating both the technique and the workflow.
(tags: tdd chef server provisioning build deploy linux coding ops sysadmin)
Peek and poke in the age of Linux
Neat demo of using ptrace to inject into a running process, just like the good old days ;)
Some time ago I ran into a production issue where the init process (upstart) stopped behaving properly. Specifically, instead of spawning new processes, it deadlocked in a transitional state. [...] What’s worse, upstart doesn’t allow forcing a state transition and trying to manually create and send DBus events didn’t help either. That meant the sane options we were left with were: restart the host (not desirable at all in that scenario); start the process manually and hope auto-respawn will not be needed. Of course there are also some insane options. Why not cheat like in the old times and just PEEK and POKE the process in the right places? The solution used at the time involved a very ugly script driving gdb which probably summoned satan in some edge cases. But edge cases were not hit and majority of hosts recovered without issues.
(tags: debugging memory linux upstart peek poke ptrace gdb processes hacks)
The World Wide Web is Moving to AOL! | Brian Bailey
brilliant parody of those "we're so happy to be shutting down!" posts.
Don't worry, all of that hard work won't be wasted. The World Wide Web will remain accessible for 30 days, which will give you plenty of time to update your readers and customers. Each of you will also receive a 30-day free trial for AOL. Look for your CD in the mail soon. Even better, we've created an import tool to make it easy to migrate everything you've put on the web to American Online! The address will change, of course, but now it will be available to every AOL member. You may find that you don't need to bother, though. America Online already has groups and pages about almost every topic you can imagine. Take a look around first and you might save yourself a lot of time. There are only so many different ways to say that Citizen Kane was a good movie! We understand that not all of you will become AOL subscribers and not all web sites will move to the new platform. Just to be safe, be sure to print out all of your favorite pages before the end of the month.
(tags: acquihired acquisitions aol www funny parody humour web)
Irish government attacked using 'MiniDuke' PDF malware
although I haven't seen a word of it in the Irish media yet -- wonder if the government have noticed?
Cyber criminals have targeted government officials in more than 20 countries, including Ireland and Romania, in a complex online assault seen rarely since the turn of the millennium. The attack, dubbed "MiniDuke" by researchers, has infected government computers as recently as this week in an attempt to steal geopolitical intelligence, according to security experts.
(tags: ireland malware attacks pdf security espionage romania miniduke)
The MiniDuke Mystery: PDF 0-day Government Spy Assembler 0x29A Micro Backdoor - Securelist
By analysing the logs from the command servers, we have observed 59 unique victims in 23 countries: Belgium, Brazil, Bulgaria, Czech Republic, Georgia, Germany, Hungary, Ireland, Israel, Japan, Latvia, Lebanon, Lithuania, Montenegro, Portugal, Romania, Russian Federation, Slovenia, Spain, Turkey, Ukraine, United Kingdom and United States.
Romania believes rival nation behind MiniDuke cyber attack | Reuters
"It is a cyber attack ... pursued by an entity that has the characteristics of a state actor," [Romanian secret service] SRI spokesman Sorin Sava told Reuters [...]. "Our estimations show the attack is certainly relevant to Romania's national security taking into account the profile of the compromised entities." [...] In this case, computer experts say an attacker from the former Soviet Union could be more likely. "MiniDuke" in some ways resembles a banking fraud Trojan dubbed "TinBa" believed to have been created by Russian criminal hackers.
(tags: ireland malware attacks pdf security espionage romania miniduke)
Compress data more densely with Zopfli - Google Developers Blog
New compressor from Google, gzip/zip-compatible, slower but slightly smaller results
(tags: compression gzip zip deflate google)
Denominator: A Multi-Vendor Interface for DNS
the latest good stuff from Netflix.
Denominator is a portable Java library for manipulating DNS clouds. Denominator has pluggable back-ends, initially including AWS Route53, Neustar Ultra, DynECT, and a mock for testing. We also ship a command line version so it's easy for anyone to try it out. The reason we built Denominator is that we are working on multi-region failover and traffic sharing patterns to provide higher availability for the streaming service during regional outages caused by our own bugs and AWS issues. To do this we need to directly control the DNS configuration that routes users to each region and each zone. When we looked at the features and vendors in this space we found that we were already using AWS Route53, which has a nice API but is missing some advanced features; Neustar UltraDNS, which has a SOAP based API; and DynECT, which has a REST API that uses a quite different pseudo-transactional model. We couldn’t find a Java based API that grouped together common set of capabilities that we are interested in, so we created one. The idea is that any feature that is supported by more than one vendor API is the highest common denominator, and that functionality can be switched between vendors as needed, or in the event of a DNS vendor outage.
(tags: dns netflix java tools ops route53 aws ultradns dynect)
-
Who knew? you can make a runnable JAR file!
There has long been a hack known in some circles, but not widely known, to make jars really executable, in the chmod +x sense. The hack takes advantage of the fact that jar files are zip files, and zip files allow arbitrary cruft to be prepended to the zip file itself (this is how self-extracting zip files work).
(tags: jars via:netflix shell java executable chmod zip hacks command-line cli)
Two surgeons debate the use of cycle helmets
'I am a neurosurgeon and a cyclist, and I am also married to a dedicated cyclist. I wear a cycling helmet and encourage cyclists to wear one. I don’t find that wearing one impedes me in any way. I am under no illusion that it will save me in the event of a high speed collision with a car or lorry (nothing will), but most cycling accidents aren’t of the high-speed variety.' versus: 'I am a consultant Trauma orthopaedic surgeon working in Edinburgh and have many years of experience treating cyclists after serious road traffic, cycle sport and commuting cycle injuries. I believe there is no justification for helmet laws or promotional campaigns that portray cycling as a particularly ‘dangerous’ activity, or that make unfounded claims about the effectiveness of helmets. By reducing cycle use even slightly, helmet laws or promotion campaigns are likely to cause a significant net disbenefit to public health, regardless of the effectiveness or otherwise of helmets.' Generally a lot of sense on either side.
(tags: helmets cycling bicycles health safety surgeons doctors)
Storm and Hadoop: Convergence of Big-Data and Low-Latency Processing
Yahoo! are going big with Storm for their next-generation internal cloud platform: 'Yahoo! engineering teams are developing technologies to enable Storm applications and Hadoop applications to be hosted on a single cluster. • We have enhanced Storm to support Hadoop style security mechanism (including Kerberos authentication), and thus enable Storm applications authorized to access Hadoop datasets on HDFS and HBase. • Storm is being integrated into Hadoop YARN for resource management. Storm-on-YARN enables Storm applications to utilize the computation resources in our tens of thousands of Hadoop computation nodes. YARN is used to launch Storm application master (Nimbus) on demand, and enables Nimbus to request resources for Storm application slaves (Supervisors).'
(tags: yahoo yarn cloud-computing private-clouds big-data latency storm hadoop elastic-computing hbase)
Trojan paralyses speed cameras in Moscow
what a coincidence! (via Tony Finch)
-
Basically, tweaking a few suboptimal sysctls to optimize for 802.11b/n; requires a Jailbroken IOS device. I'm surprised that Apple defaulted segment size to 512 to be honest, and disabling delayed ACKs sounds like it might be useful (see also http://www.stuartcheshire.org/papers/NagleDelayedAck/).
TCP optimizer modifies a few settings inside iOS, including increasing the TCP receive buffer from 131072 to 292000, disabling TCP delayed ACK’s, allowing a maximum of 16 un-ACK’d packets instead of 8 and set the default packet size to 1460 instead of 512. These changes won’t only speed up your YouTube videos, they’ll also improve your internet connection’s performance overall, including Wi-Fi network connectivity.
(tags: tcp performance tuning ios apple wifi wireless 802.11n sysctl ip)
-
A study published in the Feb. 27 issue of the journal PLoS One links increased consumption of sugar with increased rates of diabetes by examining the data on sugar availability and the rate of diabetes in 175 countries over the past decade. And after accounting for many other factors, the researchers found that increased sugar in a population’s food supply was linked to higher diabetes rates independent of rates of obesity. In other words, according to this study, obesity doesn’t cause diabetes: sugar does. The study demonstrates this with the same level of confidence that linked cigarettes and lung cancer in the 1960s. As Rob Lustig, one of the study’s authors and a pediatric endocrinologist at the University of California, San Francisco, said to me, “You could not enact a real-world study that would be more conclusive than this one.”
(tags: nytimes health food via:fanf sugar eating diabetes papers medicine)
-
Stoneybatter's not-for-profit art space needs contributions
(tags: art stoneybatter dublin d7 ireland fundit fundraising the-joinery)
Are volatile reads really free?
Marc Brooker with some good test data:
It appears as though reads to volatile variables are not free in Java on x86, or at least on the tested setup. It's true that the difference isn't so huge (especially for the read-only case) that it'll make a difference in any but the more performance sensitive case, but that's a different statement from free.
(tags: volatile concurrency jvm performance java marc-brooker)
-
'Watch Netflix USA, Hulu, Pandora, BBC iPlayer, and more in [sic] anywhere you live!' -- seems to use similar techniques to tunlr.net, looks like it works for my Netflix
(tags: netflix dns tv tunnelling drm networking spotify hulu)
Cassandra, Hive, and Hadoop: How We Picked Our Analytics Stack
reasonably good whole-stack performance testing and analysis; HBase, Riak, MongoDB, and Cassandra compared. Riak did pretty badly :(
(tags: riak mongodb cassandra hbase performance analytics hadoop hive big-data storage databases nosql)
Big Data Analytics at Netflix. Interview with Christos Kalantzis and Jason Brown.
Good interview with the Cassandra guys at Netflix, and some top Mongo-bashing in the comments
(tags: cassandra netflix user-stories testimonials nosql storage ec2 mongodb)
-
my favourite art of the moment. Thick, heavy layers of acrylic black and white paint, evoking the stormy Atlantic (brr). Gallery Bode, which showed this in Nuremberg in 2011, wrote the following at http://www.bode-galerie.de/en/exhibitions/schwarz_weiss :
Gallery Bode is pleased to constitute the cooperation with Werner Knaupp with an exhibition of a new workseries. The exhibition showcases artworks out of the series "Westmen Isles". [...] The journeys to Iceland are a background to the development of this new workseries. These paintings are telling of a forbidding nature. The beholder can't take a [safe] position but he is involved into the event which becomes comprehensible in a nearly physical way. These pictures of a overwhelming nature could be traced back to Knaupp's confrontation with the force of nature while his journeys. The experience of this force pushes the limits of human being and evokes primal fear. With the abdication of colours the artworks reach dynamic. This foots on the consistency of colour and on the changing between reality and abstraction. In an art historical view the new black and white paintings detached themselves from traditional landscape painting. Werner Knaupp implements the pure force of nature into pure painting, to visualise the force fields of nature. The beholder experiences with these artworks a nature without human dimension. In Werner Knaupp's Oeuvre the "Westmen Isles" paintings are a new expression of his examination with existential fundamental questions.
(tags: germany art painting werner-knaupp paintings monochrome sea iceland)
Indymedia: It’s time to move on
Our decision to curtail publishing on the Nottingham Indymedia site and call a meeting is an attempt to create a space for new ideas. We are not interested in continuing along the slow but certain path to total irrelevance but want to draw in new people and start off in new directions whilst remaining faithful to the underlying principles of Indymedia.
(tags: indymedia community communication web anonymity publishing left-wing)
How to revert a faulty merge in git
omgwtf, this is pretty horrific.
#AltDevBlogADay » Latency Mitigation Strategies
John Carmack on the low-latency coding techniques used to support head mounted display devices.
Virtual reality (VR) is one of the most demanding human-in-the-loop applications from a latency standpoint. The latency between the physical movement of a user’s head and updated photons from a head mounted display reaching their eyes is one of the most critical factors in providing a high quality experience. Human sensory systems can detect very small relative delays in parts of the visual or, especially, audio fields, but when absolute delays are below approximately 20 milliseconds they are generally imperceptible. Interactive 3D systems today typically have latencies that are several times that figure, but alternate configurations of the same hardware components can allow that target to be reached. A discussion of the sources of latency throughout a system follows, along with techniques for reducing the latency in the processing done on the host system.
(tags: head-mounted-display display ui latency vision coding john-carmack)
Distributed Streams Algorithms for Sliding Windows [PDF]
'Massive data sets often arise as physically distributed, parallel data streams, and it is important to estimate various aggregates and statistics on the union of these streams. This paper presents algorithms for estimating aggregate functions over a “sliding window” of the N most recent data items in one or more streams. [...] Our results are obtained using a novel family of synopsis data structures called waves.'
(tags: waves papers streaming algorithms percentiles histogram distcomp distributed aggregation statistics estimation streams)
good blog post on histogram-estimation stream processing algorithms
After reviewing several dozen papers, a score or so in depth, I identified two data structures that appear to enable us to answer these recency and frequency queries: exponential histograms (from "Maintaining Stream Statistics Over Sliding Windows" by Datar et al.) and waves (from "Distributed Streams Algorithms for Sliding Windows" by Gibbons and Tirthapura). Both of these data structures are used to solve the so-called counting problem, the problem of determining, with a bound on the relative error, the number of 1s in the last N units of time. In other words, the data structures are able to answer the question: how many 1s appeared in the last n units of time within a factor of Error (e.g., 50%). The algorithms are neat, so I'll present them briefly.
(tags: streams streaming stream-processing histograms percentiles estimation waves statistics algorithms)
Timelike 2: everything fails all the time
Fantastic post on large-scale distributed load balancing strategies from @aphyr. Random and least-conns routing comes out on top in his simulation (although he hasn't yet tried Marc Brooker's two-randoms routing strategy)
(tags: via:hn routing distributed least-conns load-balancing round-robin distcomp networking scaling)
Marc Brooker's "two-randoms" load balancing approach
Marc Brooker on this interesting load-balancing algorithm, including simulation results:
Using stale data for load balancing leads to a herd behavior, where requests will herd toward a previously quiet host for much longer than it takes to make that host very busy indeed. The next refresh of the cached load data will put the server high up the load list, and it will become quiet again. Then busy again as the next herd sees that it's quiet. Busy. Quiet. Busy. Quiet. And so on. One possible solution would be to give up on load balancing entirely, and just pick a host at random. Depending on the load factor, that can be a good approach. With many typical loads, though, picking a random host degrades latency and reduces throughput by wasting resources on servers which end up unlucky and quiet. The approach taken by the studies surveyed by Mitzenmacher is to try two hosts, and pick the one with the least load. This can be done directly (by querying the hosts) but also works surprisingly well on cached load data. [...] Best of 2 is good because it combines the best of both worlds: it uses real information about load to pick a host (unlike random), but rejects herd behavior much more strongly than the other two approaches.
Having seen what Marc has worked on, and written, inside Amazon, I'd take this very seriously... cool to see he is blogging externally too.(tags: algorithm load-balancing distcomp distributed two-randoms marc-brooker least-conns)
Can regular expressions parse HTML?
'a summary of the main points: The “regular expressions” used by programmers have very little in common with the original notion of regularity in the context of formal language theory. Regular expressions (at least PCRE) can match all context-free languages. As such they can also match well-formed HTML and pretty much all other programming languages. Regular expressions can match at least some context-sensitive languages. Matching of regular expressions is NP-complete. As such you can solve any other NP problem using regular expressions.'
(tags: compsci regexps regular-expressions programming np-complete chomsky-grammar context-free languages)
How to Create Application Shortcuts in Google Chrome for Mac
a rather hacky script is required. Ugh
(tags: hacks osx google chrome mac application-shortcuts site-specific-browsers)
-
I couldn't remember the name for this design principle, so it's worth a bookmark to remind me in future... 'This refers to computer programs that handle failures by simply restarting, without attempting any sophisticated recovery. Correctly written components of crash-only software can microreboot to a known-good state without the help of a user. Since failure-handling and normal startup use the same methods, this can increase the chance that bugs in failure-handling code will be noticed.'
(tags: crashing crash-only-software design architecture coding software fault-tolerance erlang let-it-fail microreboot recovery autosave)
Europe Is Warmer Than Canada Because of the Gulf Stream, Right? Not So Fast
The common tale—the one bandied around for more than a hundred years—goes something like this: Warm water flowing to the northeast out of the Gulf of Mexico—the Gulf Stream—cuts across the North Atlantic ocean, bringing extra energy to the Isles and driving up temperatures relative to the comparatively-frigid North Americas. The only problem with this simple explanation, say Stephen Riser and Susan Lozier in Scientific American, is that it doesn’t actually account for the difference.
(tags: gulf-stream myths ireland europe science currents ocean temperature climate)
Dear Prudence: My wife and I came from the same sperm donor
yes, really. Bloody hell
(tags: sperm-donor birth dear-prudence omgwtfbbq via:davewiner reproduction)
-
from Twitter -- 'a cache for your big data. Even though memory is thousand times faster than SSD, network connected SSD-backed memory makes sense, if we design the system in a way that network latencies dominate over the SSD latencies by a large factor. To understand why network connected SSD makes sense, it is important to understand the role distributed memory plays in large-scale web architecture. In recent years, terabyte-scale, distributed, in-memory caches have become a fundamental building block of any web architecture. In-memory indexes, hash tables, key-value stores and caches are increasingly incorporated for scaling throughput and reducing latency of persistent storage systems. However, power consumption, operational complexity and single node DRAM cost make horizontally scaling this architecture challenging. The current cost of DRAM per server increases dramatically beyond approximately 150 GB, and power cost scales similarly as DRAM density increases. Fatcache extends a volatile, in-memory cache by incorporating SSD-backed storage.'
(tags: twitter ssd cache caching memcached memcache memory network storage)
Passively Monitoring Network Round-Trip Times - Boundary
'how Boundary uses [TCP timestamps] to calculate round-trip times (RTTs) between any two hosts by passively monitoring TCP traffic flows, i.e., without actively launching ICMP echo requests (pings). The post is primarily an overview of this one aspect of TCP monitoring, it also outlines the mechanism we are using, and demonstrates its correctness.'
(tags: tcp boundary monitoring network ip passive-monitoring rtt timestamping)
drug cartel-controlled mobile comms networks
“The Mexican military has recently broken up several secret telecommunications networks that were built and controlled by drug cartels so they could coordinate drug shipments, monitor their rivals and orchestrate attacks on the security forces. A network that was dismantled just last week provided cartel members with cellphone and radio communications across four northeastern states. The network had coverage along almost 500 miles of the Texas border and extended nearly another 500 miles into Mexico’s interior. Soldiers seized 167 antennas, more than 150 repeaters and thousands of cellphones and radios that operated on the system. Some of the remote antennas and relay stations were powered with solar panels.”
(tags: mexico drugs networks mobile-phones crime)
Heroku finds out that distributed queueing is hard
Stage 3 of the Rap Genius/Heroku blog drama. Summary (as far as I can tell): Heroku gave up on a fully-synchronised load-balancing setup ("intelligent routing"), since it didn't scale, in favour of randomised queue selection; they didn't sufficiently inform their customers, and metrics and docs were not updated to make this change public; the pessimal case became pretty damn pessimal; a customer eventually noticed and complained publicly, creating a public shit-storm. Comments: 1. this is why you monitor real HTTP request latency (scroll down for crazy graphs!). 2. include 90/99 percentiles to catch the "tail" of poorly-performing requests. 3. Load balancers are hard. http://aphyr.com/posts/277-timelike-a-network-simulator has more info on the intricacies of distributed load balancing -- worth a read.
(tags: heroku rap-genius via:hn networking distcomp distributed load-balancing ip queueing percentiles monitoring)
-
10 particularly good -- actually helpful -- tips on using the Graphite metric graphing system
(tags: graphite ops metrics service-metrics graphing ui dataviz)
Literate Jenks Natural Breaks and How The Idea Of Code is Lost
A crazy amount of code archaeology to discover exactly an algorithm -- specifically 'Jenks natural breaks", works, after decades of cargo-cult copying (via Nelson): 'I spent a day reading the original text and decoding as much as possible of the code’s intention, so that I could write a ‘literate’ implementation. My definition of literate is highly descriptive variable names, detailed and narrative comments, and straightforward code with no hijinks. So: yes, this isn’t the first implementation of Jenks in Javascript. And it took me several times longer to do things this way than to just get the code working. But the sad and foreboding state of this algorithm’s existing implementations said that to think critically about this code, its result, and possibilities for improvement, we need at least one version that’s clear about what it’s doing.'
(tags: jenks-natural-breaks algorithms chloropleth javascript reverse-engineering history software copyright via:nelson)
don't order a Raspberry Pi from RS
I've been waiting 24 days for mine so far. Frankly amazing they are so apparently inept, particularly since it seems in breach of EU distance selling regulation if they go beyond 30 days without an update. They've just posted this:
Quick update- we received our delivery of raspberry pi’s last week and as of Friday we had shipped up to order reference 1010239854. We will continue daily to get your orders shipped out as quickly as we possibly can; so that you will all receive your raspberry pi’s shortly. Many thanks everyone for your patience and again apologies for the delay in the dispatch update message on the Pi Store which I know has caused some confusion.
(tags: rs raspberry-pi inept etailers uk e-commerce shopping hardware)
more details on the UK distance selling regulations governing Raspberry Pi RS orders
'my understanding is that according to the Distance Selling Regulations [...], unless you agreed otherwise with RS, then they were obligated to fulfill their side of the contract within thirty days from the day after you ordered, and if they were unable to do so they were also obligated to inform you that they could not and repay you within thirty days;ons (more info here in a nice, easy-to-read format), unless you agreed otherwise with RS, then they were obligated to fulfill their side of the contract within thirty days from the day after you ordered, and if they were unable to do so they were also obligated to inform you that they could not and repay you within thirty days'
Sketch of the Day: HyperLogLog — Cornerstone of a Big Data Infrastructure
includes a nice javascript demo of HLL
(tags: hyperloglog loglog algorithms stream-processing streams estimation demos javascript)
-
'log scale for lists; Decaying lists allow to manage large range of values. A decaying list grows logarithmically with the number of items. It follows that some items are dropped when other are inserted.' (via Tony Finch)
(tags: via:fanf clojure algorithms decay backoff half-life data-structures)
Cycling in Dublin City: the numbers
7.6% of the Dublin commuter population "mainly cycle". some interesting stats here
-
Apache-licensed open source java lib to implement retrying behaviour cleanly.
a general purpose method for retrying arbitrary Java code with specific stop, retry, and exception handling capabilities that are enhanced by Guava's predicate matching. It also includes an exponential backoff WaitStrategy that might be useful for situations where more well-behaved service polling is preferred.
(tags: retries retrying resiliency fault-tolerance java open-source guava)
-
Utilizing an iPhone/Android App known as “Talking Tom Cat”, the tool has been transformed into a new media mouthpiece, addressing very specific particulars of the conflict that are glossed over by international media: alliances between MNLA and Ansar Dine, critiques of hypocrisy of the MUJAO factions, and ousting of corrupt politicians.
(tags: apps wtf politics talking-tom-cat bizarre tuareg africa via:neilmajor)
-
Black hats steal code-signing keys from software whitelisting anti-malware firm. Pretty audacious
(tags: malware security whitelisting av)
How did I do the Starwars Traceroute?
It is accomplished using many vrfs on 2 Cisco 1841s. For those less technical, VRFs are essentially private routing tables similar to a VPN. When a packet destined to 216.81.59.173 (AKA obiwan.scrye.net) hits my main gateway, I forward it onto the first VRF on the "ASIDE" router on 206.214.254.1. That router then has a specific route for 216.81.59.173 to 206.214.254.6, which resides on a different VRF on the "BSIDE" router. It then has a similar set up which points it at 206.214.254.9 which lives in another VPN on "ASIDE" router. All packets are returned using a default route pointing at the global routing table. This was by design so the packets TTL expiration did not have to return fully through the VRF Maze. I am a consultant to Epik Networks who let me use the Reverse DNS for an unused /24, and I used PowerDNS to update all of the entries through mysql. This took about 30 minutes to figure out how to do it, and about 90 minutes to implement.
(tags: vrfs routing networking hacks star-wars traceroute rdns ip)
Real-time Analytics in Scala [slides, PDF]
some good approximation/streaming algorithms and tips on Scala implementation
(tags: streams algorithms approximation coding scala slides)
'E?cient Computation of Frequent and Top-k Elements in Data Streams' [paper, PDF]
The Space-Saving algorithm to compute top-k in a stream. I've been asking a variation of this problem as an interview question for a while now, pretty cool to find such a neat solution. Pity neither myself nor anyone I've interviewed has come up with it ;)
(tags: space-saving approximation streams stream-processing cep papers pdf algorithms)
-
ASL-licensed open source library of stream-processing/approximation algorithms: count-min sketch, space-saving top-k, cardinality estimation, LogLog, HyperLogLog, MurmurHash, lookup3 hash, Bloom filters, q-digest, stochastic top-k
(tags: algorithms coding streams cep stream-processing approximation probabilistic space-saving top-k cardinality estimation bloom-filters q-digest loglog hyperloglog murmurhash lookup3)
'Medians and Beyond: New Aggregation Techniques for Sensor Networks' [paper, PDF]
'We introduce Quantile Digest or q-digest, a novel data structure which provides provable guarantees on approximation error and maximum resource consumption. In more concrete terms, if the values returned by the sensors are integers in the range [1;n], then using q-digest we can answer quantile queries using message size m within an error of O(log(n)/m). We also outline how we can use q-digest to answer other queries such as range queries, most frequent items and histograms. Another notable property of q-digest is that in addition to the theoretical worst case bound error, the structure carries with itself an estimate of error for this particular query.'
(tags: q-digest algorithms streams approximation histograms median percentiles quantiles)
Russia's anti-child-porn internet blocklist allegedly being used for general censorship
Allegedly being used to censor political and anti-corruption journalism, and a Russian wikipedia-like site for hosting an article about suicide
(tags: censorship feature-creep russia politics blocklists)
HyperLogLog++: Google’s Take On Engineering HLL
Google and AggregateKnowledge's improvements to the HyperLogLog cardinality estimation algorithm
(tags: hyperloglog cardinality estimation streaming stream-processing cep)
osx - Remap "Home" and "End" to beginning and end of line
in summary: ~/Library/KeyBindings/DefaultKeyBinding.dict. Thanks, Apple, this is stupid
(tags: mac keyboard bindings it-just-works compatibility ui rebinding)
-
Hadoop, a batch-generated read-only Voldemort cluster, and an intriguing optimal-storage histogram bucketing algorithm:
The optimal histogram is computed using a random-restart hill climbing approximated algorithm. The algorithm has been shown very fast and accurate: we achieved 99% accuracy compared to an exact dynamic algorithm, with a speed increase of one factor. [...] The amount of information to serve in Voldemort for one year of BBVA's credit card transactions on Spain is 270 GB. The whole processing flow would run in 11 hours on a cluster of 24 "m1.large" instances. The whole infrastructure, including the EC2 instances needed to serve the resulting data would cost approximately $3500/month.
(tags: scalability scaling voldemort hadoop batch algorithms histograms statistics bucketing percentiles)
-
'Splout is a scalable, open-source, easy-to-manage SQL big data view. Splout is to Hadoop + SQL what Voldemort or Elephant DB are to Hadoop + Key/Value. Splout serves a read-only, partitioned SQL view which is generated and indexed by Hadoop.' Some FAQs: 'What's the difference between Splout SQL and Dremel-like solutions such as BigQuery, Impala or Apache Drill? Splout SQL is not a "fast analytics" Dremel-like engine. It is more thought to be used for serving datasets under web / mobile high-throughput, many lookups, low-latency applications. Splout SQL is more like a NoSQL database in the sense that it has been thought for answering queries under sub-second latencies. It has been thought for performing queries that impact a very small subset of the data, not queries that analyze the whole dataset at once.'
(tags: splout sql big-data hadoop read-only scaling queries analytics)
Goonwaffe Stories: A Guide For Newbies [PDF]
impressively high-quality newbie's guide from the Goonswarm Federation -- as themittani.com describes it, 'frankly a work of art: a 1950's Pulp Scifi magazine full of internet spaceships and sociopathy.'
(tags: eve-online space goonswarm gaming mmo pdf pulp science-fiction)
Evasi0n Jailbreak's Userland Component
Good writeup of the exploit techniques used in the new iOS jailbreak.
Evasi0n is interesting because it escalates privileges and has full access to the system partition all without any memory corruption. It does this by exploiting the /var/db/timezone vulnerability to gain access to the root user’s launchd socket. It then abuses launchd to load MobileFileIntegrity with an inserted codeless library, which is overriding MISValidateSignature to always return 0.
(tags: jailbreak ios iphone ipad exploits evasi0n via:nelson)
Programming Language Checklist
'You appear to be advocating a new: [ ] functional [ ] imperative [ ] object-oriented [ ] procedural [ ] stack-based [ ] "multi-paradigm" [ ] lazy [ ] eager [ ] statically-typed [ ] dynamically-typed [ ] pure [ ] impure [ ] non-hygienic [ ] visual [ ] beginner-friendly [ ] non-programmer-friendly [ ] completely incomprehensible programming language. Your language will not work. Here is why it will not work.'
(tags: humor programming funny coding languages)
Jetty-9 goes fast with Mechanical Sympathy
This is very cool! Applying Mechanical Sympathy optimization techniques to Jetty, specifically: "False sharing" on the BlockingArrayQueue data structure resolved; a new ArrayTernaryTrie data structure to improve header field storage, making it faster to build. look up, efficient on RAM, cheap to GC, and more cache-friendly than a traditional trie; and a branchless hex-to-byte conversion statement. The results are a 30%-faster microbenchmark on amd64, with 50% less Young Gen garbage collections. Lovely to see low-level infrastructure libs like Jetty getting this kind of optimization.
(tags: jetty java mechanical-sympathy optimization coding tries)
-
craft beer kegs for hire in Dublin, Sligo, Limerick and Galway. Needs more Metalman, of course ;)
A Continuous Packaging Pipeline
presentation describing some nice automation tools for packaging vendor code for deployment
(tags: deployment fosdem presentations slides debian deb fpm apt-get)
-
a new C++ template library from Google which implements an in-memory B-Tree container type, suitable for use as a drop-in replacement for std::map, set, multimap and multiset. Lower memory use, and reportedly faster due to better cache-friendliness
(tags: c++ google data-structures containers b-trees stl map set open-source)
Clairvoyant Squirrel: Large Scale Malicious Domain Classification
Storm-based service to detect malicious DNS domain usage from streaming pcap data in near-real-time. Uses string features in the DNS domain, along with randomness metrics using Markov analysis, combined with a Random Forest classifier, to achieve 98% precision at 10,000 matches/sec
(tags: storm distributed distcomp random-forest classifiers machine-learning anti-spam slides)
"Security Engineering" now online in full
Ross Anderson says: 'I’m delighted to announce that my book Security Engineering – A Guide to Building Dependable Distributed Systems is now available free online in its entirety. You may download any or all of the chapters from the book’s web page.'
(tags: security books reference coding software encryption ross-anderson)
Slide Rule Calculations By Example
Harder than using a calculator, that's for sure
(tags: slide-rule gadgets tech history antiques calculating)
A Continuous Packaging Pipeline
presentation describing some nice automation tools for packaging vendor code for deployment
(tags: deployment fosdem presentations slides debian deb fpm apt-get)
-
a new C++ template library from Google which implements an in-memory B-Tree container type, suitable for use as a drop-in replacement for std::map, set, multimap and multiset. Lower memory use, and reportedly faster due to better cache-friendliness
(tags: c++ google data-structures containers b-trees stl map set open-source)
Clairvoyant Squirrel: Large Scale Malicious Domain Classification
Storm-based service to detect malicious DNS domain usage from streaming pcap data in near-real-time. Uses string features in the DNS domain, along with randomness metrics using Markov analysis, combined with a Random Forest classifier, to achieve 98% precision at 10,000 matches/sec
(tags: storm distributed distcomp random-forest classifiers machine-learning anti-spam slides)
-
'Intel's Intelligent Platform Management Interface (IPMI), which is implemented and added onto by all server vendors, grant system administrators with a means to manage their hardware in an Out of Band (OOB) or Lights Out Management (LOM) fashion. However there are a series of design, utilization, and vendor issues that cause complex, pervasive, and serious security infrastructure problems. The BMC is an embedded computer on the motherboard that implements IPMI; it enjoys an asymmetrical relationship with its host, with the BMC able to gain full control of memory and I/O, while the server is both blind and impotent against the BMC. Compromised servers have full access to the private IPMI network The BMC uses reusable passwords that are infrequently changed, widely shared among servers, and stored in clear text in its storage. The passwords may be disclosed with an attack on the server, over the network network against the BMC, or with a physical attack against the motherboard (including after the server has been decommissioned.) IT's reliance on IPMI to reduce costs, the near-complete lack of research, 3rd party products, or vendor documentation on IPMI and the BMC security, and the permanent nature of the BMC on the motherboard make it currently very difficult to defend, fix or remediate against these issues.' (via Tony Finch)
(tags: via:fanf security ipmi power-management hardware intel passwords bios)
-
Massive Java concurrency fail in recent 1.6 and 1.7 JDK releases -- the java.util.HashMap type now spin-locks on an AtomicLong in its constructor. Here's the response from the author: 'I'll acknowledge right up front that the initialization of hashSeed is a bottleneck but it is not one we expected to be a problem since it only happens once per Hash Map instance. For this code to be a bottleneck you would have to be creating hundreds or thousands of hash maps per second. This is certainly not typical. Is there really a valid reason for your application to be doing this? How long do these hash maps live?' Oh dear. Assumptions of "typical" like this are not how you design a fundamental data structure. fail. For now there is a hacky reflection-based workaround, but this is lame and needs to be fixed as soon as possible. (Via cscotta)
(tags: java hashmap concurrency bugs fail security hashing jdk via:cscotta)
High Scalability - geo-aware traffic load balancing and caching at CNBC.com
Dyn's anycast DNS service, as used by CNBC.com
(tags: anycast dns scalability dyn failover geographical load-balancing)
Using Statsd and Graphite From a Rails App
Reasonable simple, from the looks of it
(tags: rails graphite metrics service-metrics ruby)
The colour of London's commute
Nice visualisation. 'What the map shows is the mix of transport to work of residents living in each part of London*, using ONS data at Middle Super Output Area (MSOA) level. Each MSOA is given an RGB colour determined by the modal share, with red colours representing travel by car, taxi or motorbike, blue travel by public transport and green cycling or walking. The result is a fairly simple pattern, with motor vehicles predominating on London's fringes, public transport in the inner suburbs and cycling and walking in the very centre. Those tendrils of blue reaching out presumably represent major public transport links.'
(tags: data visualisation dataviz london mapping via:ldoody)
Where are the free WiFi spots in Dublin City Centre?
hooray, free wifi! beautiful Invader-style pixel-art mosaics to highlight them, too. nice one Joe
-
some lovely pixel art to advertise the free wifi areas, by Craig Robinson. I see a girl in pyjamas, a Dub hurler, a viking, Molly Malone, Phil Lynott, Oscar Wilde, a Moore St market trader, a busker, and the Spire...
DuckDuckGo Architecture - 1 Million Deep Searches a Day and Growing
thumbs-up for DNSMadeEasy's Global Traffic Director anycast-based geographically-segmented DNS service, in particular
(tags: dns architecture scalability search duckduckgo geoip anycast)
Announcing Ribbon: Tying the Netflix Mid-Tier Services Together
Netflix' load balancing client-side library, open source
(tags: netflix load-balancing coding libraries client-side open-source)
-
'an expressive toolset for constructing scalable, resilient [service] architectures. It works in the cloud, in the data center, and on your laptop, and it makes your system diagram visible and inevitable. Inevitable systems coordinate automatically to interconnect, removing the hassle of manual configuration of connection points (and the associated danger of human error).' Looks like a pretty neat cluster deployment tool; driven from a single configuration file, using Chef, integrating closely with AWS and providing many useful additional features
(tags: chef deployment clusters knife services aws ec2 ops ironfan demo)
Fox DMCA Takedowns Order Google to Remove Fox DMCA Takedowns
Chilling Effects is setup to stop the ‘chilling effects’ of Internet censorship. Google sees this as a good thing and sends takedown requests it receives to be added to the database. Fox sends takedown requests to Google for pages which the company says contain links to material it holds the copyright to. Those pages include those on Chilling Effects which show which links Fox wants taken down. Google delists the Chilling Effects pages from its search engine, thus completing the circle and defeating the very reason Chilling Effects was set up for in the first place.
(tags: chilling-effects copyright internet legal dmca google law)
-
At Railscamp X it became clear there is a gap in the current HTTP specification. There are many ways for a developer to screw up their implementation, but no code to share the nature of the error with the end user. We humbly suggest the following status codes are included in the HTTP spec in the 7XX range.
Includes such useful status codes as "724 - This line should be unreachable". How Newegg crushed the “shopping cart” patent and saved online retail
Very cool account of Newegg's battle against a ludicrous patent-troll shakedown. Great quote from their Chief Legal Officer, Lee Cheng:
Patent trolling is based upon deficiencies in a critical, but underdeveloped, area of the law. The faster we drive these cases to verdict, and through appeal, and also get legislative reform on track, the faster our economy will be competitive in this critical area. We're competing with other economies that are not burdened with this type of litigation. China doesn't have this, South Korea doesn't have this, Europe doesn't have this. [...] It's actually surprising how quickly people forget what Lemelson did. [referring to Jerome Lemelson, an infamous patent troll who used so-called "submarine patents" to make billions in licensing fees.] This activity is very similar. Trolls right now "submarine" as well. They use timing, like he used timing. Then they pop up and say "Hello, surprise! Give us your money or we will shut you down!" Screw them. Seriously, screw them. You can quote me on that.
(tags: patent-trolls east-texas newegg shopping-cart swpat software-patents patents ecommerce soverain)
Implementing strcmp, strlen, and strstr using SSE 4.2 instructions - strchr.com
Using new Intel Core i7 instructions to speed up string manipulation.
Fascinating stuff. SSE ftw(tags: sse optimization simd assembly intel i7 intel-core strstr strings string-matching strchr strlen coding)
All polar bears descended from one Irish grizzly
'THE ARCTIC'S DWINDLING POPULATION of polar bears all descend from a single mamma brown bear which lived 20,000 to 50,000 years ago in present-day Ireland, new research suggests. DNA samples from the great white carnivores - taken from across their entire range in Russia, Canada, Greenland, Norway and Alaska - revealed that every individual's lineage could be traced back to this Irish forebear.' More than the average bear, I guess
(tags: animals biology science dna history ireland bears polar-bears grizzly-bears via:ben)
Basho | Alert Logic Relies on Riak to Support Rapid Growth
'The new [Riak-based] analytics infrastructure performs statistical and correlation processing on all data [...] approximately 5 TB/day. All of this data is processed in real-time as it streams in. [...] Alert Logic’s analytics infrastructure, powered by Riak, achieves performance results of up to 35k operations/second across each node in the cluster – performance that eclipses the existing MySQL deployment by a large margin on single node performance. In real business terms, the initial deployment of the combination of Riak and the analytic infrastructure has allowed Alert Logic to process in real-time 7,500 reports, which previously took 12 hours of dedicated processing every night.' Twitter discussion here: https://twitter.com/fisherpk/status/294984960849367040 , which notes 'heavily cached SAN storage, 12 core blades and 90% get to put ops', and '3 riak nodes, 12-cores, 30k get heavy riak ops/sec. 8 nodes driving ops to that cluster'. Apparently the use of SAN storage on all nodes is historic, but certainly seems to have produced good iops numbers as an (expensive) side-effect...
(tags: iops riak basho ops systems alert-logic storage nosql databases)
Turn a Raspberry Pi Into an AirPlay Receiver for Streaming Music in Your Living Room
hooray, a viable domestic Raspberry Pi use case at last ;)
Antigua Government Set to Launch “Pirate” Website To Punish United States
oh the lulz.
The Government of Antigua is planning to launch a website selling movies, music and software, without paying U.S. copyright holders. The Caribbean island is taking the unprecedented step because the United States refuses to lift a trade “blockade” preventing the island from offering Internet gambling services, despite several WTO decisions in Antigua’s favor. The country now hopes to recoup some of the lost income through a WTO approved “warez” site.
(tags: us-politics antigua piracy filesharing pirate gambling wto ip blockades)
-
An article by Nathan "Storm" Marz describing the system architecture he's been talking about for a while; Hadoop-driven batch view, Storm-driven "speed view", and a merging API
(tags: storm systems architecture lambda-architecture design Hadoop)
Network graph viz of Irish politicians and organisations on Twitter
generated by the Clique Research Cluster at UCD and DERI. 'a visualization of the unified graph representation for the users in the data, produced using Gephi and sigma.js. Users are coloured according to their community (i.e. political affiliation). The size of each node is proportional to its in-degree (i.e. number of incoming links).' sigma.js provides a really user-friendly UI to the graphs, although -- as with most current graph visualisations -- it'd be particularly nice if it was possible to 'tease out' and focus on interesting nodes, and get a pasteable URL of the result, in context. Still, the most usable graph viz I've seen in a while...
(tags: graphs dataviz ucd research ireland twitter networks community sigma.js javascript canvas gephi)
-
Incredible blog of book covers and illustrations, much from the 1970s
(tags: illustration art prints 1970s graphics)
Namazu-e: Earthquake catfish prints
'In November 1855, the Great Ansei Earthquake struck the city of Edo (now Tokyo), claiming 7,000 lives and inflicting widespread damage. Within days, a new type of color woodblock print known as namazu-e (lit. "catfish pictures") became popular among the residents of the shaken city. These prints featured depictions of mythical giant catfish (namazu) who, according to popular legend, caused earthquakes by thrashing about in their underground lairs. In addition to providing humor and social commentary, many prints claimed to offer protection from future earthquakes.'
Implementing Real-Time Trending Topics With a Distributed Rolling Count Algorithm in Storm
Storm demo with a reasonably complex topology. 'how to implement a distributed, real-time trending topics algorithm in Storm. It uses the latest features available in Storm 0.8 (namely tick tuples) and should be a good starting point for anyone trying to implement such an algorithm for their own application. The new code is now available in the official storm-starter repository, so feel free to take a deeper look.'
(tags: storm distcomp distributed tick-tuples demo)
-
'a UNIX init scheme with service supervision' - philosophically similar to daemontools, widely packaged, LSB init.d-script-compliant, BSD-licensed
'The Uni?ed Logging Infrastructure for Data Analytics at Twitter' [PDF]
A picture of how Twitter standardized their internal service event logging formats to allow batch analysis and analytics. They surface service metrics to dashboards from Pig jobs on a daily basis, which frankly doesn't sound too great...
(tags: twitter analytics event-logging events logging metrics)
Ivan Beshoff, Last Survivor Of Mutiny on the Potemkin, founded Beshoffs
wow. there's a factoid! the "Beshoffs" chain of chippers in Dublin were founded by this historic figure, who died in 1987
(tags: factoids beshoffs chips dublin history small-world battleship-potemkin russia)
-
Excellent demo of how use of a block cipher with a known secret key makes an insecure MAC. "In short, CBC-MAC is a Message Authentication Code, not a strong hash function. While MACs can be built out of hash functions (e.g. HMAC), and hash functions can be built out of block ciphers like AES, not all MACs are also hash functions. CBC-MAC in particular is completely unsuitable for use as a hash function, because it only allows two parties with knowledge of a particular secret key to securely transmit messages between each other. Anyone with knowledge of that key can forge the messages in a way that keeps the MAC (“hash value”) the same. All you have to do is run the forged message through CBC-MAC as usual, then use the AES decryption operation on the original hash value to find the last intermediate state. XORing this state with the CBC-MAC for the forged message yields a new block of data which, when appended to the forged message, will cause it to have the original hash value. Because the input is taken backwards, you can either modify the first block of the file, or just run the hash function backwards until you reach the block that you want to modify. You can make a forged file pass the hash check as long as you can modify an arbitrary aligned 16-byte block in it."
Scala 2.8 Collections API -- Performance Characteristics
wow. Every library vending a set of collection types should have a page like this
(tags: collections scala performance reference complexity big-o coding)
So, after just over 3 and a half years, I'm leaving Amazon.
It's been great fun -- I can honestly say, even with my code being used by hundreds of millions of users in SpamAssassin and elsewhere, I hadn't really had to come to grips with the distributed systems problems that an Amazon-scale service involves.
During my time at Amazon, I've had the pleasure of building out a brand-new, groundbreaking innovative internal service, from scratch to its current status where it's deployed in production datacenters worldwide. It's a low-latency service, used to monitor Amazon's internal networks using massive quantities of measurement data and machine learning algorithms. It's really very nifty, and I'm quite proud of what we've achieved. I was lucky to work closely with some very smart people during this, too -- Amazon has some top-notch engineers.
But time to move on! In a week's time, I'll be joining Swrve to work on the server-side architecture of their system. Swrve have a very interesting product, extending the A/B-testing model into gaming, and a great team; and it'll be nice to get back into startup-land once again, for a welcome change. (It's not all roses working for a big company. ;) I'm looking forward to it. Who knows, I may even start blogging here again...
Pity about losing those 12 phone tool icons though!
CES: Worse Products Through Software
'The companies out there that know how to make decent software have been steadily eating their way into and through markets previously dominated by the hardware guys. Apple with music players, TiVo with video recording, even Microsoft with its decade-old Xbox Live service, which continues to embarrass the far weaker offerings from Sony and Nintendo. (And, yes, iOS is embarrassing all three console makers.)' See also Mat Honan's article at http://www.wired.com/gadgetlab/2012/12/internet-tv-sucks/ : 'Smart TVs are just too complicated. They have terrible user interfaces that differ wildly from device to device. It’s not always clear what content is even available — for example, after more than two years on the market, you still can’t watch Hulu Plus on your Google TV. [...] They give us too many options for apps most people will never use, and they do so at the expense of making it simple to find the shows and movies we want to watch, no matter where they are, be it online or on the air. As NPD puts it in the conclusion to its report, “OEMs and retailers need to focus less on new innovation in this space and more on simplification of the user experience and messaging if they want to drive additional, and new, behaviors on the TV.” Which is a more polite way of saying, clean up your horrible interface, Samsung.' (via Craig)
(tags: via:craig design ui tv hardware television sony ces software)
Fast Packed String Matching for Short Patterns [paper, PDF]
'Searching for all occurrences of a pattern in a text is a fundamental problem in computer science with applications in many other fields, like NLP, information retrieval and computational biology. In the last two decades a general trend has appeared trying to exploit the power of the word RAM model to speed-up the performances of classical string matching algorithms. [...] In this paper we use specialized word-size packed string matching instructions, based on the Intel streaming SIMD extensions (SSE) technology, to design very fast string matching algorithms in the case of short patterns.' Reminds me of http://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm , but taking advantage of SIMD extensions, which should make things nice and speedy, at the cost of tying it to specific hardware platforms. (via Tony Finch)
(tags: rabin-karp algorithms strings string-matching papers via:fanf)
Irish EU Council Presidency proposes destruction of right to privacy | EDRI
'For example, based on the current situation in Ireland, the idea is that all companies can do whatever they want with personal data, without fear of sanction. Sanctions, such as fines, “should be optional or at least conditional upon a prior warning or reprimand”. In other words, do what you want, the worst that can happen is that you will receive a warning.' Shame! Daragh O'Brien's comment: 'utter idiocy'. ( at https://twitter.com/daraghobrien/status/292041500873850880 )
(tags: privacy ireland eu fail data-protection data-privacy politics)
Belgium plans artificial island to store wind power
' Belgium is planning to build a doughnut-shaped island in the North Sea that will store wind energy by pumping water out of a hollow in the middle, as it looks for ways to lessen its reliance on nuclear power. One of the biggest problems with electricity is that it is difficult to store and the issue is exaggerated in the case of renewable energy from wind or sun because it is intermittent depending on the weather.' 'The island is still in the planning stages, but will be built out of sand 3 km off the Belgian coast near the town of Wenduine if it gets the final go-ahead. The island, which would also work as an offshore substation to transform the voltage of the electricity generated by wind turbines, could take five or more years to plan and build.'
(tags: power via:daev belgium wind-power hydro sea islands manmade storage)
-
so Reddit uses the Wilson score confidence interval approach, it turns out; more details here (via Toby diPasquale)
(tags: ranking rating algorithms popularity python wilson-score-interval sorting statistics confidence-sort)
The Neurocritic: Fisher-Price Synesthesia
'Synesthesia [jm: sic] is a rare perceptual phenomenon in which the stimulation of one sensory modality, or exposure to one type of stimulus, leads to a sensory (or cognitive) experience in a different, non-stimulated modality. For instance, some synesthetes have colored hearing while others might taste shapes. GRAPHEME-COLOR SYNESTHESIA is the condition in which individual printed letters are perceived in a specific, constant color. This occurs involuntarily and in the absence of colored font. [...] A new study has identified 11 synesthetes whose grapheme-color mappings appear to be based on the Fisher Price plastic letter set made between 1972-1990.' (via Dave Green)
(tags: fisher-price synesthesia synaesthesia colors colours sight neuroscience brain via-dave-green toys)
Extreme Performance with Java - Charlie Hunt [slides, PDF]
presentation slides for Charlie Hunt's 2012 QCon presentation, where he discusses 'what you need to know about a modern JVM in order to be effective at writing a low latency Java application'. The talk video is at http://www.infoq.com/presentations/Extreme-Performance-Java
(tags: low-latency charlie-hunt performance java jvm presentations qcon slides pdf)
-
'Bloomsday Map Of Dublin Based On Ulysses'. Beautiful! 'The Leopold’s Day map is a stunning marriage of typography and cartography plotting all the streets alluded to by Joyce in Ulysses which were in existence on June 16th 1904. It is accompanied by a comprehensive and beautifully typeset directory with over 400 entries noting the landmarks, business and people of Dublin that were referenced in the text. The Leopold’s Day map is an exquisitely detailed, limited edition piece. It has an impressive dimension of 1000mm x 700mm which means it can also fit into a ready made frame. Price: €125.00'
(tags: bloomsday ulysses dublin ireland maps james-joyce art prints)
aaw/hyperloglog-redis - GitHub
'This gem is a pure Ruby implementation of the HyperLogLog algorithm for estimating cardinalities of sets observed via a stream of events. A Redis instance is used for storing the counters.'
(tags: cardinality sets redis algorithms ruby gems hyperloglog)
-
'uses DNS witchcraft to allow you to access US/UK-only audio and video services like Hulu.com, BBC iPlayer, etc. without using a VPN or Web proxy.' According to http://superuser.com/questions/461316/how-does-tunlr-work , it proxies the initial connection setup and geo-auth, then mangles the stream address to stream directly, not via proxy. Sounds pretty useful
(tags: proxy network vpn dns tunnel content video audio iplayer bbc hulu streaming geo-restriction)
OmniTI's Experiences Adopting Chef
A good, in-depth writeup of OmniTI's best practices with respect to build-out of multiple customer deployments, using multi-tenant Chef from a version-controlled repo. Good suggestions, and I am really looking forward to this bit: 'Chef tries to turn your system configuration into code. That means you now inherit all the woes of software engineering: making changes in a coordinated manner and ensuring that changes integrate well are now an even greater concern. In part three of this series, we’ll look at applying software quality assurance and release management practices to Chef cookbooks and roles.'
(tags: chef deployment ops omniti systems vagrant automation)
-
Twitter's Scala style guide. 'While highly effective, Scala is also a large language, and our experiences have taught us to practice great care in its application. What are its pitfalls? Which features do we embrace, which do we eschew? When do we employ “purely functional style”, and when do we avoid it? In other words: what have we found to be an effective use of the language? This guide attempts to distill our experience into short essays, providing a set of best practices. Our use of Scala is mainly for creating high volume services that form distributed systems — and our advice is thus biased — but most of the advice herein should translate naturally to other domains.'
Notes on Distributed Systems for Young Bloods -- Something Similar
'Below is a list of some lessons I’ve learned as a distributed systems engineer that are worth being told to a new engineer. Some are subtle, and some are surprising, but none are controversial. This list is for the new distributed systems engineer to guide their thinking about the field they are taking on. It’s not comprehensive, but it’s a good beginning.' This is a pretty nice list, a little over-stated, but that's the format. I particularly like the following: 'Exploit data-locality'; 'Learn to estimate your capacity'; 'Metrics are the only way to get your job done'; 'Use percentiles, not averages'; 'Extract services'.
-
'a Nagios plugin to poll Graphite'. Necessary, since service metrics are the true source of service health information
(tags: nagios graphite service-metrics ops)
paperplanes. The Virtues of Monitoring, Redux
A rather vague and touchy-feely "state of the union" post on monitoring. Good set of links at the end, though; I like the look of Sensu and Tasseo, but am still unconvinced about the value of Boundary's offering
(tags: monitoring metrics ops)
What happened to KHTML after Apple announced Safari
'There was a huge amount of excitement at the announcement that Safari would be using KHTML. At that time, it was almost a given that the OSS rendering engine was Gecko. KHTML was KDE's little engine that could. But nobody ever expected it to be picked up by other folks. One of the original parts of the KHTML-to-OS X port was KWQ (pronounced, "quack") that abstracted out the KDE API portions that were used in KHTML. Folks were pretty ecstatic at first. It seemed very validating. But that changed quickly. As Zack's post indicates, WebKit became a thing of unmergable code-drops. Even inside of the KDE community there became a split between the KHTML purists and the WebKit faction. They'd previously more or less all been KHTML developers, but post-WebKit there was something of a pragmatists vs. idealists split. Zack fell on the latter side of that (for understandable reasons: there was an existing community project, with its own set of values, and that was hijacked to a large extent by WebKit). A few years later WebKit transformed itself into a more or less valid open source project (see webkit.org), but that didn't close the rift in the KDE community between the two, at that point rather divergent, rendering engines. There's still some remaining melancholy that stems from that initial hope and what could have potentially been, but wasn't.'
(tags: history safari open-source code-drops over-the-wall webkit khtml kde oss apple)
-
whoa. (via Dave O'Riordan)
Dan McKinley :: Whom the Gods Would Destroy, They First Give Real-time Analytics
'It's important to divorce the concepts of operational metrics and product analytics. [..] Funny business with timeframes can coerce most A/B tests into statistical significance.' 'The truth is that there are very few product decisions that can be made in real time.' HN discussion: http://news.ycombinator.com/item?id=5032588
(tags: real-time analytics statistics a-b-testing)
Greyhound agrees to change consumer contracts and make refunds - National Consumer Agency
Take note, switchers: 'The National Consumer Agency (NCA) has received a commitment from Greyhound that it will amend certain terms in its standard consumer contract, which the NCA thinks are unfair to consumers. This will be done by January 18 2013. Among the terms considered unfair by the NCA are that consumers must forfeit their credit balance and pay a €45 administration fee, if they cancel their contract with Greyhound within 12 months. If you were charged money in these circumstances, Greyhound has agreed to refund you. Greyhound will communicate these changes to all of its consumers by 18 January 2013. If you have any questions about the changes or getting a refund, you should contact Greyhound directly.'
Pushover: Simple Mobile Notifications for Android and iOS
'Pushover makes it easy to send real-time notifications to your Android and iOS devices.' extremely simple HTTPS API; 'Pushover has no monthly subscription fees and users will always be able to receive unlimited messages for free. Most applications can send messages for free, subject to monthly limits.' Also supported by ifttt.com