Skip to content

Justin's Linklog Posts

Links for 2013-01-07

Links for 2013-01-04

  • Dan McKinley :: Effective Web Experimentation as a Homo Narrans

    Good demo from Etsy’s A/B testing, of how the human brain can retrofit a story onto statistically-insignificant results. To fix: ‘avoid building tooling that enables fishing expeditions; limit our post-hoc rationalization by explicitly constraining it before the experiment. Whenever we test a feature on Etsy, we begin the process by identifying metrics that we believe will change if we 1) understand what is happening and 2) get the effect we desire.’

    (tags: testing etsy statistics a-b-testing fishing ulysses-contract brain experiments)

  • Lesser known crimes: do you own that copyright?

    A very interesting crime on the Irish statute books:

    Section 141 of the Copyright and Related Rights Act 2000 provides: A person who, for financial gain, makes a claim to enjoy a right under this Part [ie. copyright] which is, and which he or she knows or has reason to believe is, false, shall be guilty of an offence and shall be liable on conviction on indictment to a fine not exceeding £100,000, or to imprisonment for a term not exceeding 5 years, or both.

    (tags: ireland copyright ip false-claims law)

Links for 2013-01-02

  • Patent trolls want $1,000 for using scanners

    We are truly living in the future — a dystopian future, but one nonetheless. A patent troll manages to obtain “gobbledigook” patents on using a scanner to scan to PDF, then attempts to shake down a bunch of small companies before eventually running into resistance, at which point it “forks” into a bunch of algorithmically-named shell companies, spammer-style, sending the same demands. Those demands in turn contain this beauty of Stockholm-syndrome-inducing prose:

    ‘You should know also that we have had a positive response from the business community to our licensing program. As you can imagine, most businesses, upon being informed that they are infringing someone’s patent rights, are interested in operating lawfully and taking a license promptly. Many companies have responded to this licensing program in such a manner. Their doing so has allowed us to determine that a fair price for a license negotiated in good faith and without the need for court action is a payment of $900 per employee. We trust that your organization will agree to conform your behavior to respect our patent rights by negotiating a license rather than continuing to accept the benefits of our patented technology without a license. Assuming this is the case, we are prepared to make this pricing available to you.’
    And here’s an interesting bottom line:
    The best strategy for target companies? It may be to ignore the letters, at least for now. “Ignorance, surprisingly, works,” noted Prof. Chien in an e-mail exchange with Ars. Her study of startups targeted by patent trolls found that when confronted with a patent demand, 22 percent ignored it entirely. Compare that with the 35 percent that decided to fight back and 18 percent that folded. Ignoring the demand was the cheapest option ($3,000 on average) versus fighting in court, which was the most expensive ($870,000 on average). Another tactic that clearly has an effect: speaking out, even when done anonymously. It hardly seems a coincidence that the Project Paperless patents were handed off to a web of generic-sounding LLCs, with demand letters signed only by “The Licensing Team,” shortly after the “Stop Project Paperless” website went up. It suggests those behind such low-level licensing campaigns aren’t proud of their behavior. And rightly so.

    (tags: patents via:fanf networks printing printers scanning patent-trolls project-paperless adzpro gosnel faslan)

  • Keep predicting and you’ll be right eventually?

    debunking Ken Ring, the kiwi “long term weather prediction” “scientist” who gets trundled out every year around this time

    (tags: ken-ring weather predictions ireland rain)

Links for 2013-01-01

Links for 2012-12-18

  • Baklava code

    ‘thin software layers don’t add much value, especially when you have many such layers piled on each other. Each layer has to be pushed onto your mental stack as you dive into the code. Furthermore, the layers of phyllo dough are permeable, allowing the honey to soak through. But software abstractions are best when they don’t leak. When you pile layer on top of layer in software, the layers are bound to leak.’

    (tags: code design terminology food antipatterns)

Links for 2012-12-17

Links for 2012-12-16

Links for 2012-12-14

  • Authentication is machine learning

    This may be the most insightful writing about authentication in years:

    From my brief time at Google, my internship at Yahoo!, and conversations with other companies doing web authentication at scale, I’ve observed that as authentication systems develop they gradually merge with other abuse-fighting systems dealing with various forms of spam (email, account creation, link, etc.) and phishing. Authentication eventually loses its binary nature and becomes a fuzzy classification problem.

    This is not a new observation. It’s generally accepted for banking authentication and some researchers like Dinei Florêncio and Cormac Herley have made it for web passwords. Still, much of the security research community thinks of password authentication in a binary way [..]. Spam and phishing provide insightful examples: technical solutions (like Hashcash, DKIM signing, or EV certificates), have generally failed but in practice machine learning has greatly reduced these problems. The theory has largely held up that with enough data we can train reasonably effective classifiers to solve seemingly intractable problems.

    (via Tony Finch.)

    (tags: passwords authentication big-data machine-learning google abuse antispam dkim via:fanf)

  • Hotels to pay royalties on music – The Irish Times – Fri, Dec 14, 2012

    ‘The operators of hotels, guesthouses and bed & breakfasts will have to pay royalties for any copyright music played in guest bedrooms [in Ireland]. […] Under the agreement, the music charges will be set by Phonographic Performance Ireland Ltd (PPI). […] When it initiated its case in 2010, the PPI said it was seeking payment of about €1 per bedroom per week or about 14 cent a night.’ I don’t understand this. Most hotels do not play music in the rooms themselves. Does this apply if there is no music playing in the bedroom? Does it apply if the customer brings their own music? Are Dublin Bus to be next?

    (tags: hotels ppi ireland music money royalties)

  • The Mathematical Hacker

    ‘The trouble with the Lisp-hacker tradition is that it is overly focused on the problem of programming — compilers, abstraction, editors, and so forth — rather than the problems outside the programmer’s cubicle. I conjecture that the Lisp-school essayists — Raymond, Graham, and Yegge — have not “needed mathematics” because they spend their time worrying about how to make code more abstract. This kind of thinking may lead to compact, powerful code bases, but in the language of economics, there is an opportunity cost.’

    (tags: mathematics coding maths essay hackers lisp fortran)

  • The Aggregate Magic Algorithms

    Obscure, low-level bit-twiddling tricks — specifically:

    Absolute Value of a Float, Alignment of Pointers, Average of Integers, Bit Reversal, Comparison of Float Values, Comparison to Mask Conversion, Divide Rounding, Dual-Linked List with One Pointer Field, GPU Any, GPU SyncBlocks, Gray Code Conversion, Integer Constant Multiply, Integer Minimum or Maximum, Integer Power, Integer Selection, Is Power of 2, Leading Zero Count, Least Significant 1 Bit, Log2 of an Integer, Next Largest Power of 2, Most Significant 1 Bit, Natural Data Type Precision Conversions, Polynomials, Population Count (Ones Count), Shift-and-Add Optimization, Sign Extension, Swap Values Without a Temporary, SIMD Within A Register (SWAR) Operations, Trailing Zero Count.
    Many of these would be insane to use in anything other than the hottest of hot-spots, but good to have on file. (via Toby diPasquale)

    (tags: hot-spots optimisation bit-twiddling algorithms via:codeslinger snippets)

  • Shell Scripts Are Like Gremlins

    Shell Scripts are like Gremlins. You start out with one adorably cute shell script. You commented it and it does one thing really well. It’s easy to read, everyone can use it. It’s awesome! Then you accidentally spill some water on it, or feed it late one night and omgwtf is happening!?
    +1. I have to wean myself off the habit of automating with shell scripts where a clean, well-unit-tested piece of code would work better.

    (tags: shell-scripts scripting coding automation sysadmin devops chef deployment)

Links for 2012-12-13

Links for 2012-12-12

Links for 2012-12-11

  • Damn Fine Print

    lovely signed and editioned prints by Dublin’s best illustrators at good prices. Turns out this was in connection with a show a few days ago, so the best ones are now sold out — I love the Chris Judge Liberty Hall print — but there’s still a few good ones left. Brian Gallagher’s Georgian doorway is a beauty.

    (tags: illustration dublin prints art chris-judge)

Links for 2012-12-10

  • A map of Dublin from 1686

    via Come Here To Me — ‘The whole population of the county at the time was under 60,000. Ringsend, Merrion, Monkstown, Bullock and Dalkey on the Southside and Ballybough, Clontarf, Sutton and Hoath/Howth on the Northside are marked. Taken from the book Dublin: through space and time (2001).’

    Massive tracts of land were reclaimed since then, clearly — the North bay comes all the way in to Ballybough!

    (tags: via:chtm maps dublin ireland history)

  • Back-up Tut and other decoy spatial antiquities

    I like this idea — a complete facsimile of King Tut’s burial chamber. Bldgblog comments:

    “On the 90th anniversary of the discovery of King Tut’s tomb, an “authorized facsimile of the burial chamber” has been created, complete “with sarcophagus, sarcophagus lid and the missing fragment from the south wall.” The resulting duplicate, created with the help of high-res cameras and lasers, is “an exact facsimile of the burial chamber,” one that is now “being sent to Cairo by The Ministry of Tourism of Egypt.” […]

    ‘Interestingly, we read that this was “done under a licence to the University of Basel,” which implies the very real possibility that unlicensed duplicate rooms might also someday be produced—that is, pirate interiors ripped or printed from the original data set, like building-scale “physibles,” a kind of infringed architecture of object torrents taking shape as inhabitable rooms.’ […]

    ‘In their book Anachronic Renaissance, for instance, Alexander Nagel and Christopher Wood write of what they call a long “chain of effective substitutions” or “effective surrogates for lost originals” that nonetheless reached the value and status of an icon in medieval Europe. “[O]ne might know that [these objects] were fabricated in the present or in the recent past,” Nagel and Wood write, “but at the same time value them and use them as if they were very old things.” They call this seeing in “substitutional terms”.’

    (tags: via:new-aesthetic bldgblog archaeology facsimiles copying king-tut egypt history 3d-printing physibles)

Links for 2012-12-06

  • low-gc-membuffers

    “This project aims at creating a simple efficient building block for “Big Data” libraries, applications and frameworks; thing that can be used as an in-memory, bounded queue with opaque values (sequence of JDK primitive values): insertions at tail, removal from head, single entry peeks), and that has minimal garbage collection overhead. Insertions and removals are as individual entries, which are sub-sequences of the full buffer. GC overhead minimization is achieved by use of direct ByteBuffers (memory allocated outside of GC-prone heap); and bounded nature by only supporting storage of simple primitive value (byte, `long’) sequences where size is explicitly known. Conceptually memory buffers are just simple circular buffers (ring buffers) that hold a sequence of primitive values, bit like arrays, but in a way that allows dynamic automatic resizings of the underlying storage. Library supports efficient reusing and sharing of underlying segments for sets of buffers, although for many use cases a single buffer suffices.”

    (tags: gc java jvm bytebuffer)

Links for 2012-12-03

  • Scoop! The inside story of the news website that saved the BBC

    The Register’s take on the early days of www.bbc.co.uk. Lots of politics, unsurprisingly.

    Fifteen years ago this month the BBC launched its News Online website. Developed internally with a skeleton team, the web service rapidly became the face of the BBC on the internet, and its biggest success story – winning four successive BAFTA awards. Remarkably, it operated at a third of the cost of rival commercial online news operations – unheard of in public-sector IT projects. Devised before there were really any content management systems, the technical architecture became a template for all major news systems, and one that’s still in use today. The team endured some furious internal politicking and sabotage to survive.

    (tags: bbc news history web uk the-register)

  • Irish mobile phone companies: still spammy

    ‘Pro tip: if you’re going to spam, try not to spam the DPC’s Director of Investigations.’ — lolz

    (tags: funny oh-dear three hutchinson ireland mobile spam dpc law)

  • Hamming weight

    Wikipedia page.

    The Hamming weight of a string is the number of symbols that are different from the zero-symbol of the alphabet used. It is thus equivalent to the Hamming distance from the all-zero string of the same length. For the most typical case, a string of bits, this is the number of 1’s in the string. In this binary case, it is also called the population count, popcount or sideways sum. It is the digit sum of the binary representation of a given number.
    Contains an efficient algorithm to compute this for a given long value, by ‘adding counts in a tree pattern.’

    (tags: algorithms hamming-distance bits hamming weight binary)

  • Efficient concurrent long set and map

    An ordered set and map data structure and algorithm for long keys and values, supporting concurrent reads by multiple threads and updates by a single thread.
    Some good stuff in the linked blog posts about Clojure’s PersistentHashMap and PersistentVector data structures, too.

    (tags: arrays java tries data-structures persistent clojure concurrent set map)

Links for 2012-11-28

  • The Rise And Fall Of The Obscure Music Download Blog: A Roundtable

    One internet music “sharing” trend largely unnoticed by the powers that sue was the niche explosion of obscure music download blogs, lasting roughly from 2004-2008. Using free filesharing services like Rapidshare and Mediafire, and setting up sites on Blogspot and similar providers, these internet hubs stayed hidden in the open by catering to more discerning kleptomaniac audiophiles. Their specialty: parceling out ripped recordings — many of them copyrighted — from the more collectible and unknown corners of music’s oddball, anomalous past. While the RIAA was suing dead people for downloading Michael Jackson songs (and Madonna was using Soulseek to curse at teenagers), obscure music blogs racked up millions of hits, ripping and sharing 80s Japanese noise, 70s German prog, 60s San Francisco hippie freak-outs, 50s John Cage bootlegs, 30s gramophone oddities, Norwegian death metal, cold wave cassettes made by kids in their garages, and the like. It was the mid aughts, and the advent of digitization had inadvertently put the value of the music industry’s “Top Ten” commercial product in peril. That same process transformed the value of old, collectible music as well. If one smart record collector was able to share the entire contents—music, artwork and all—of one vinyl LP on his blog, for free, and upload another item from his 1,000+ collection the next day, for weeks and years, and others like him did the same, competing with each other about who could upload the rarest and most sought-after record, and anyone who downloaded it could then share it again and again… Suddenly everyone in the world had the coolest record collection in the world; and soon, nobody in the world had the coolest record collection in the world. Obscure music download blogs weren’t shut down like Napster or Megaupload were (though they were indirectly affected by that crackdown); they just, mysteriously, seemed to burn out on their own sometime around 2008. While some are still around, their number represents only a fraction of that mid-00s heyday. Was this because obscure music blogs had overshared the underexposed and blown the whole thing into oblivion? Is the fact that a guy in Japan will no longer pay $500 on eBay for a first pressing of the No New York compilation because he can find it for free on the internet good for the world? Was the commodity-lost but the knowledge-gained an even exchange? To explore what was going on then, I assembled this email roundtable discussion between creators of some of the most popular blogs of the time: Eric Lumbleau of Mutant Sounds, Liam Elms of 8 Days in April, Frank of Systems of Romance and Brian Turner, Music Director of WFMU.
    (via Loreana Rushe)

    (tags: music mp3 blogs obscure via-loreana-rushe history 2000s)

Links for 2012-11-26

  • Conor’s 2012 Raspberry Pi Christmas Gift Guide

    Ah, memories! Wish my kiddies were old enough for one of these…

    I really think this Christmas could be a lovely replay of 1982 for a lot of people, like me, who got their first home computer that year. You could have so much fun on Christmas Day messing with the RPi rather than falling asleep in front of the fire. Just don’t fight over who gets the telly when Doctor Who is on. Whilst the bare-bones nature of the Raspberry Pi is wonderful, it is unusable out of the box unless you are a house with smartphones, digital cameras and existing PCs already that you can raid for components. What you want to avoid is a repeat of me that December in 1982 with my brand-new 16K ZX Spectrum which didn’t work on our Nordmende TV until two weeks later when the RTV Rentals guy came and replaced the TV Tuner. Two weeks typing Beep 1,2 to make sure it wasn’t broken.

    (tags: raspberry-pi gifts computers kids hacking education gadgets christmas)

  • Nintendo’s work on Miiverse Penis Drawing Detection

    ‘The unique feature of the Miiverse is being able to send drawings, not just text. But since the advent of the internet, there have always been those who have used it for unsavory purposes.’
    ‘Motoyama: we never had such a problem with our Hatena services. But, when we brought Hatena Flipnote to the West, we were caught off-guard by the amount of penises drawn by people.
    Kurisu: So the team and I had to come up with a way to create a system that auto-detects those types of pictures. […]
    ‘Motoyama: After a week, we made very good progress on the system. Then we tested the system with Nintendo of America and told them to start drawing. It went horribly.
    Kurisu: What we learned is that people enjoy drawing penises. Multiple ones. (laughs) The system was not prepared to handle that.’
    See also the “time-to-penis” metric in MMO games: http://www.joystiq.com/2009/03/24/overheard-gdc09-ttp-time-to-penis/

    (tags: nintendo image-detection ttp metrics games gaming mmo miiverse drawing)

  • The trench talk that is now entrenched in the English language

    ‘From cushy to crummy and blind spot to binge drink, a new study reveals the impact the First World War had on the English language and the words it introduced.’ Incredible comments, too…

    (tags: english etymology history wwi great-war via:sinead-gleeson words language)

  • Special encoding of small aggregate data types in Redis

    Nice performance trick in Redis on hash storage: ‘In theory in order to guarantee that we perform lookups in constant time (also known as O(1) in big O notation) there is the need to use a data structure with a constant time complexity in the average case, like an hash table. But many times hashes contain just a few fields. When hashes are small we can instead just encode them in an O(N) data structure, like a linear array with length-prefixed key value pairs. Since we do this only when N is small, the amortized time for HGET and HSET commands is still O(1): the hash will be converted into a real hash table as soon as the number of elements it contains will grow too much (you can configure the limit in redis.conf). This does not work well just from the point of view of time complexity, but also from the point of view of constant times, since a linear array of key value pairs happens to play very well with the CPU cache (it has a better cache locality than an hash table).’

    (tags: memory redis performance big-o hash-tables storage coding cache arrays)

  • HTTP Error 403: The service you requested is restricted – Vodafone Community

    Looks like Vodafone Ireland are failing to scale their censorware; clients on their network reporting “HTTP Error 403: The service you requested is restricted”. According to a third-party site, this error is produced by the censorship software they use when it’s insufficiently scaled for demand:

    “When you try to use HTTP Vodafone route a request to their authentication server to see if your account is allow to connect to the site. By default they block a list of adult/premium web sites (this is service you have switched on or off with your account). The problem is at busy times this validation service is overloaded and so their systems get no response as to whether the site is allowed, so assume the site you asked for is restricted and gives the 403 error. Once this happens you seem to have to make new 3G data connection (reset the phone, move cell or let the connection time out) to get it to try again.”
    Sample: http://pic.twitter.com/N1lAwBjW

    (tags: scaling ireland vodafone fail censorware scalability customer-service)

Links for 2012-11-24

Links for 2012-11-23

  • IBM insider: How I caught my wife while bug-hunting on OS/2 • The Register

    Wow, working for IBM in the 80’s was truly shitty.

    ‘IBM HR came up with a plan that summed up the department’s view of tech staff: a dinner dance. In Southsea. For our non-British readers this is not a glamorous location. As a scumbag contractor I wasn’t invited, but since I was dating one of the seven women on the project, I went anyway and was impressed by the way IBM had tried so very hard to make the inside of a municipal leisure centre look like Hawaii. This is so crap that the integrity checks I’ve installed to watch myself for incipient senility keep flagging it as a false memory. The only way I can force myself to believe the idea that the richest corporation on the planet behaved that way is that the girl who took me is now a reassuringly expensive lawyer who was kind enough to marry me and so we have photographic evidence. (I wish to make it clear that I’m not saying IBM had the worst HR of any firm in the world, merely that my 28 years in technology and banking have never exposed a worse one to me.)’
    And indeed, so were MS:
    ‘We, on the other hand, were regarded as hopelessly bureaucratic. After Microsoft lost the source code for the actual build of OS/2 we shipped, I reported a bug triggered when you double-clicked on Chkdsk twice: the program would fire up twice and both would try to fix the disk at the same time, causing corruption. I noted that this “may not be consistent with the user’s goals as he sees them at this time”. This was labelled a user error, and some guy called Ballmer questioned why I had this “obsession” with perfect code.’
    (thanks, Conor!)

    (tags: via:conor-delaney os2 ibm microsoft work 1980s pc uk steve-ballmer)

Links for 2012-11-21

Links for 2012-11-19

  • drip

    Unlike other tools intended to solve the JVM startup problem (e.g. Nailgun, Cake), Drip does not use a persistent JVM. There are many pitfalls to using a persistent JVM, which we discovered while working on the Cake build tool for Clojure. The main problem is that the state of the persistent JVM gets dirty over time, producing strange errors and requiring liberal use of cake kill whenever any error is encountered, just in case dirty state is the cause. Instead of going down this road, Drip uses a different strategy. It keeps a fresh JVM spun up in reserve with the correct classpath and other JVM options so you can quickly connect and use it when needed, then throw it away. Drip hashes the JVM options and stores information about how to connect to the JVM in a directory with the hash value as its name.
    (via HN)

    (tags: java command-line tools startup speed)

Links for 2012-11-14

Links for 2012-11-08

Links for 2012-10-31

  • The Future of Markdown

    ‘I’d really prefer not to fork the language; I’d much rather collectively help carry the banner of Markdown forward into the future, with the blessing of John Gruber and in collaboration with other popular sites that use Markdown. So… who’s with me?’

    (tags: markdown markup html web standards)

Links for 2012-10-28

  • SipHash: a fast short-input PRF

    a family of pseudorandom functions optimized for short inputs. Target applications include network traffic authentication and hash-table lookups protected against hash-flooding denials-of-service attacks. SipHash is simpler than MACs based on universal hashing, and faster on short inputs. Compared to dedicated designs for hash-table lookup, SipHash has well-defined security goals and competitive performance. For example, SipHash processes a 16-byte input with a fresh key in 140 cycles on an AMD FX-8150 processor, which is much faster than state-of-the-art MACs.

    (tags: hashing siphash djb security algorithms)

Links for 2012-10-27

Links for 2012-10-26

Flood of posts

Sorry for the flood of recent posts — turns out my cron job to gateway from Pinboard had stopped running due to cron fail. (I should really set up some monitoring someday ;)

Links for 2012-10-25

Links for 2012-10-24

Links for 2012-10-12

  • ElementCostInDataStructures

    “The cost per element in major data structures offered by Java and Guava (r11)].” A very useful reference!

    Ever wondered what’s the cost of adding each entry to a HashMap? Or one new element in a TreeSet? Here are the answers: the cost per-entry for each well-known structure in Java and Guava. You can use this to estimate the cost of a structure, like this: if the per-entry cost of a structure is 32 bytes, and your structure contains 1024 elements, the structure’s footprint will be around 32 kilobytes. Note that non-tree mutable structures are amortized (adding an element might trigger a resize, and be expensive, otherwise it would be cheap), making the measurement of the “average per element cost” measurement hard, but you can expect that the real answers are close to what is reported below.

    (tags: java coding guava reference memory cost performance data-structures)