Skip to content

Justin's Linklog Posts

Links for 2012-09-05

  • Estonia introduces coding classes to 8-year-olds

    ‘ProgreTiiger education will start with students in the first grade, which starts around the age of 7 or 8 for Estonians. The compsci education will continue through a student’s final years of public school, around age 16. Teachers are being trained on the new skills, and private sector IT companies are also getting involved, which makes sense, given that these entities will likely end up being the long-term beneficiaries of a technologically literate populace. The ProgreTiiger program is launching at a few pilot schools and will soon be rolling out to all general education schools in Estonia.’

    (tags: estonia education coding programming kids children students learning school)

  • Avoiding Hash Lookups in a Ruby Implementation

    ‘If I were to sum up the past 6 years I’ve spent optimizing JRuby it would be with the following phrase: Get Rid Of Hash Lookups.’ This has been a particular theme of some recent optimization hacks I’ve been working on. Hashes may be O(1) to read, on average, but that doesn’t necessarily mean they’re the right tool for performance… (via Declan McGrath)

    (tags: via:declanmcgrath hash optimization ruby performance jruby hashing data-structures big-o optimisation)

Links for 2012-09-01

  • Striped (Guava: Google Core Libraries for Java 13.0.1 API)

    Nice piece of Guava concurrency infrastructure in the latest release:

    A striped Lock/Semaphore/ReadWriteLock. This offers the underlying lock striping similar to that of ConcurrentHashMap in a reusable form, and extends it for semaphores and read-write locks. Conceptually, lock striping is the technique of dividing a lock into many stripes, increasing the granularity of a single lock and allowing independent operations to lock different stripes and proceed concurrently, instead of creating contention for a single lock.
    The guarantee provided by this class is that equal keys lead to the same lock (or semaphore), i.e. if (key1.equals(key2)) then striped.get(key1) == striped.get(key2) (assuming Object.hashCode() is correctly implemented for the keys). Note that if key1 is not equal to key2, it is not guaranteed that striped.get(key1) != striped.get(key2); the elements might nevertheless be mapped to the same lock. The lower the number of stripes, the higher the probability of this happening.
    Prior to this class, one might be tempted to use Map, where K represents the task. This maximizes concurrency by having each unique key mapped to a unique lock, but also maximizes memory footprint. On the other extreme, one could use a single lock for all tasks, which minimizes memory footprint but also minimizes concurrency. Instead of choosing either of these extremes, Striped allows the user to trade between required concurrency and memory footprint. For example, if a set of tasks are CPU-bound, one could easily create a very compact Striped of availableProcessors() * 4 stripes, instead of possibly thousands of locks which could be created in a Map structure.

    (tags: locking concurrency java guava semaphores coding via:twitter)

  • HotSpot JVM garbage collection options cheat sheet (v2)

    ‘In this article I have collected a list of options related to GC tuning in JVM. This is not a comprehensive list, I have only collected options which I use in practice (or at least understand why I may want to use them). Compared to previous version a few useful diagnostic options was added. Additionally section for G1 specific options was introduced.’

    (tags: hotspot jvm coding gc java performance)

  • Martin “Disruptor” Thompson’s Single Writer Principle

    Contains these millisecond estimates for highly-contended inter-thread signalling when incrementing a 64-bit counter in java:

    One Thread300
    One Thread with Memory Barrier4,700
    One Thread with CAS5,700
    Two Threads with CAS18,000
    One Thread with Lock10,000
    Two Threads with Lock118,000
    Undoubtedly not realistic for a lot of cases, but it’s still useful for order-of-magnitude estimates of locking cost. Bottom line: don’t lock if you can avoid it, even with ‘volatile’ or AtomicFoo types.

    (tags: java jvm performance coding concurrency threading cas locking)

  • Locks & Condition Variables – Latency Impact

    Firstly, this is 3 orders of magnitude greater latency than what I illustrated in the previous article using just memory barriers to signal between threads. This cost comes about because the kernel needs to get involved to arbitrate between the threads for the lock, and then manage the scheduling for the threads to awaken when the condition is signalled. The one-way latency to signal a change is pretty much the same as what is considered current state of the art for network hops between nodes via a switch. It is possible to get ~1µs latency with InfiniBand and less than 5µs with 10GigE and user-space IP stacks. Secondly, the impact is clear when letting the OS choose what CPUs the threads get scheduled on rather than pinning them manually. I’ve observed this same issue across many use cases whereby Linux, in default configuration for its scheduler, will greatly impact the performance of a low-latency system by scheduling threads on different cores resulting in cache pollution. Windows by default seems to make a better job of this.

    (tags: locking concurrency java jvm signalling locks linux threading)

  • Evolution of SoundCloud’s Architecture

    nice write-up. nginx, Rails, RabbitMQ, MySQL, Cassandra, Elastic Search, HAProxy

    (tags: soundcloud webdev architecture scaling scalability)

Links for 2012-08-31

  • What Happens to Stolen Bicycles?

    ‘Bike thievery is essentially a risk-free crime. If you were a criminal, that might just strike your fancy. If Goldman Sachs didn’t have more profitable market inefficencies to exploit, they might be out there arbitraging stolen bikes.’ Good summary, and I suspect a lot applies in Dublin too — flea markets and vanloads of stolen bikes being sent to other cities for reselling.

    (tags: via:hn economics crime bikes theft goldman-sachs)

Links for 2012-08-19

  • 1024cores

    Some good algorithms and notes by Dmitry Vyukov on ‘lockfree, waitfree, obstruction-free synchronization algorithms and data structures, scalability-oriented architecture, multicore/multiprocessor design patterns, high-performance computing, threading technologies and libraries (OpenMP, TBB, PPL), message-passing systems and related topics.’ The catalog of lock-free queue implementations is particularly extensive (via Sergio Bossa)

    (tags: algorithms concurrency articles dmitry-vyukov go c++ coding via:sergio-bossa)

Links for 2012-08-12

  • Sting op exposes Andrews over FF Twitter rants – National News – Independent.ie

    Incredible sting op uncovers the real identity of an anonymous Twitter account posting Fianna Fail gossip:

    He discovered that each tweet had originated from the Twitter web interface, meaning it had been posted from a web browser on a computer, rather than sent from a mobile phone or other portable device. Based on the times that tweets were posted by @brianformerff, he deduced that the Tweets were being posted while the user was on a work break, using a company computer or an internet cafe. The next stage in the hunt was uncovering the IP address of the computer where the tweets originated. “I created my own web redirection service which would allow me to take links to articles of interest, for example in the Irish Times, and then transform them into short links that would pass through a redirection server I controlled. In this way, if someone read the tweets and clicked on the link, I would be able to establish the IP address of the computer that was being used at the time.” The author created a new twitter account, @john_cant _type, based on the persona of a politics student based in Kildare. He started sending several messages and tweets to “brian” and other users to establish himself as a genuine twitter user. Eventually @brianformerff responded to a post from @john_cant_type to a link to an article at Silicon Republic. The bait was taken and the IP address was tracked to an internet cafe, Amazon cyber/net Rathmines which offers web access “at the very reasonable rate of €1/hour”. What happened next descended almost into the realms of farce. The author waited for tweets from @brianformerff and then rushed to the internet cafe to try and catch Chris Andrews. Eventually the plan worked and the author used photography and video surveillance, even taking covert photographs of tweets as they were being posted in the internet cafe by Chris Andrews and analysing if the word count and structure matched the tweets appearing in cyberspace under the tag @brianformerff.

    (tags: chris-andrews twitter surveillance privacy anonymity politics ireland fianna-fail)

  • Rootbeer

    The Rootbeer GPU Compiler makes it easy to use Graphics Processing Units from within Java. Rootbeer is more advanced that CUDA or OpenCL Java Language Bindings. With bindings the developer must serialize complex graphs of objects into arrays of primitive types. With Rootbeer this is done automatically. Also with language bindings, the developer must write the GPU kernel in CUDA or OpenCL. With Rootbeer a static analysis of the Java Bytecode is done (using Soot) and CUDA code is automatically generated. […] All of the familar Java code you have been writing can be executed on the GPU.

    (tags: gpu java coding cuda compiler)

Links for 2012-08-09

  • “In Which The Irish Invent Twitter in 1984”

    A fascinating story of 1980s tech history — ‘The initial Text Tell PX-1000 was developed by Text Lite Ltd. in Ireland in the early 1980s, probably in 1983. It allowed people to create simple text messages and send them by phone anywhere in the world. It had a built-in memory that could hold up to 7400 characters. The firmware inside the PX-1000 was written by West-Tec Ltd. in Ireland, who were probably also the hardware manufacturers. [… A later version was] the Philips version of the PX-1000Cr, as it features advanced cryptographic capabilities. It was intended for small companies and journalists, and was also used by the Dutch Government. […] it played an important role in the fight for Nelson Mandela’s release from prison.’

    (tags: nelson-mandela ireland history crypto texting text-lite 1980s philips)

Links for 2012-08-06

  • French illegal downloads agency Hadopi may be abolished

    According to recent statistics, Hadopi has sent 1 million warning emails, 99,000 “strike two” letters and identified 314 people for referral to the courts for possible disconnection. No one has actually been disconnected. According to Aurelie Filipetti, culture minister in the new French Government, Hadopi has been nothing but a waste of money. “€12 million per year and 60 officials; that’s an expensive way to send 1 million emails,” Filipetti said. “Hadopi has not fulfilled its mission of developing legal downloads. I prefer to reduce the funding of things that have not been proven to be useful.”
    0 disconnections. Not one.

    (tags: hadopi privacy law three-strikes france money)

  • NASA’s Mars Rover Crashed Into a DMCA Takedown

    An hour or so after Curiosity’s 1.31 a.m. EST landing in Gale Crater, I noticed that the space agency’s main YouTube channel had posted a 13-minute excerpt of the stream. Its title was in an uncharacteristic but completely justified all caps: “NASA LANDS CAR-SIZE ROVER BESIDE MARTIAN MOUNTAIN.” When I returned to the page ten minutes later, […] the video was gone, replaced with an alien message: “This video contains content from Scripps Local News, who has blocked it on copyright grounds. Sorry about that.” That is to say, a NASA-made public domain video posted on NASA’s official YouTube channel, documenting the landing of a $2.5 billion Mars rover mission paid for with public taxpayer money, was blocked by YouTube because of a copyright claim by a private news service.

    (tags: dmca google fail nasa copyright false-positives scripps youtube video mars)

Links for 2012-08-03

  • High-frequency trading: The fast and the furious | The Economist

    “The NYMEX panel found that Infinium had finished writing the algorithm only the day before it introduced it to the market, and had tested it for only a couple of hours in a simulated trading environment to see how it would perform. The firm’s normal testing processes take six to eight weeks. When the algorithm started its frenetic buying spree, the measures designed to shut it down automatically did not work. One was supposed to turn the system off if a maximum order size was breached, but because the machine was placing lots of small orders rather than a single big one the shutdown was not triggered. The other measure was meant to prevent Infinium from selling or buying more than a certain number of contracts, but because of an error in the way the rogue algorithm had been written, this, too, failed to spot a problem.”

    (tags: hft automation trading markets stocks nymex bugs software)

Links for 2012-07-30

  • Lessons in website security anti-patterns by Tesco : Troy Hunt, an Aussie software architect working on a .Net security product called ASafaWeb, does a great job extensively deconstructing Tesco’s appalling website security on their shopping site. In the process, he gets this wonderful tweet from their customer-care account: “@troyhunt Let me assure you that all customer passwords are stored securely & in line with industry standards across online retailers.” As he says, this is a clear demonstration that Tesco is in the first stage of the four stages of competence — “unconscious incompetence”: “The individual does not understand or know how to do something and does not necessarily recognise the deficit.” ( http://en.wikipedia.org/wiki/Four_stages_of_competence )
    (tags: tesco security passwords web http https ssl funny dot-net shopping uk customer-care)

  • Accident: Ryanair B738 and American B763 at Barcelona on Apr 14th 2011 : An accident report concerning a Ryanair flight.

    An American Airlines Boeing 767-300, registration N366AA performing flight AA-67 from Barcelona,SP (Spain) to New York JFK, NY (USA), had taxied to the holding point runway 25L and was holding short of the runway. A Ryanair Boeing 737-800, registration EI-EKB performing flight FR-8136 from Barcelona,SP (Spain) to Ibiza,SP (Spain) with 169 passengers and 6 crew, was taxiing along Barcelona’s taxiway K for departure from runway 25L and was maneouvering to pass behind the Boeing 767-300. A number of passengers on board of the Boeing 737-800 observed the right hand wing of the aircraft contact the tailplane of the Boeing 767-300 and rose out of their seats attracting the attention of a flight attendant. A passenger told the flight attendant, that their aircraft had hit the aircraft besides them. The flight attendant contacted the purser, who instructed her to contact the flight deck, she contacted the flight deck and informed the captain that passengers had seen their aircraft had hit another aircraft. The captain responded however everything was fine and she continued with the takeoff about 2 minutes after the Boeing 767. Immediately after departure the passengers insisted the flight was not safe and they had collided with another aircraft, one of the passengers identified himself as an engineer. The flight attendant told the engineer that the captain had been informed and had told everything was fine. No further information was forwarded to the flight deck. After landing in Ibiza, while disembarking, the passengers again spoke up claiming the flight had been unsafe. During the turnaround the flight attendant informed the purser that one of the passengers observing the collision was an engineer. Neither approached the flight crew however. Following the return flight FR-8137 the purser talked to the captain and informed her that one of the passengers observing the collision was an engineer. In the following it was identified that the right hand winglet of the Boeing 737-800 had received damage, the Boeing 767-300 was found with damage to the left hand stabilizer following landing in New York.
    According to the story, it appears the AA flight crew were not informed of the potential damage to their plane before or during their transatlantic flight to JFK. (via Juan Flynn)
    (tags: via:juanflynn flight travel safety ryanair collisions)
  • CIAIAC report : The official report on that Ryanair/AA collision in Barcelona in July 2011, on pages 211-255.
    (tags: collisions safety travel air ryanair)

  • Practical machine learning tricks from the KDD 2011 best industry paper : Wow, this is a fantastic paper. It’s a Google paper on detecting scam/spam ads using machine learning — but not just that, it’s how to build out such a classifier to production scale, and make it operationally resilient, and, indeed, operable. I’ve come across a few of these ideas before, and I’m happy to say I might have reinvented a few (particularly around the feature space), but all of them together make extremely good sense. If I wind up working on large-scale classification again, this is the first paper I’ll go back to. Great info! (via Toby diPasquale.)
    (tags: classification via:codeslinger training machine-learning google ops kdd best-practices anti-spam classifiers ensemble map-reduce)

Links for 2012-07-29

  • The world’s first 3D-printed gun : I wasn’t expecting to see this for a few years. The future is ahead of schedule!

    A .22-caliber pistol, formed from a 3D-printed AR-15 (M16) lower receiver, and a normal, commercial upper. In other words, the main body of the gun is plastic, while the chamber — where the bullets are actually struck — is solid metal. […] While this pistol obviously wasn’t created from scratch using a 3D printer, the interesting thing is that the lower receiver — in a legal sense at least — is what actually constitutes a firearm. Without a lower receiver, the gun would not work; thus, the receiver is the actual legally-controlled part. In short, this means that people without gun licenses — or people who have had their licenses revoked — could print their own lower receiver and build a complete, off-the-books gun. What a chilling thought.

    (tags: via:peakscale guns scary future grim-meathook-future 3d-printing thingiverse weapons)

Links for 2012-07-28

Links for 2012-07-27

  • This park’s life – The Irish Times – Thu, Jul 26, 2012 : Great article about Dublin’s Phoenix Park, Europe’s largest enclosed urban park (more than twice the size of New York’s Central Park, in fact). Now that I have two little kids, I’ve been spending a good portion of my weekends there — it’s a wonderful thing to have on our doorstep. Also:

    The park even breeds celebrities. “The lion that roars at the start of the MGM movies. He’s a Dub. He was born in Dublin Zoo.”

    (tags: phoenix-park dublin history parks deer lion kids)

Links for 2012-07-26

  • Universal properties of mythological networks – Abstract – EPL (Europhysics Letters) – IOPscience : Abstract:

    As in statistical physics, the concept of universality plays an important, albeit qualitative, role in the field of comparative mythology. Here we apply statistical mechanical tools to analyse the networks underlying three iconic mythological narratives with a view to identifying common and distinguishing quantitative features. Of the three narratives, an Anglo-Saxon and a Greek text are mostly believed by antiquarians to be partly historically based while the third, an Irish epic [jm: “An Táin Bó Cúailnge”, The Tain, to be specific], is often considered to be fictional. Here we use network analysis in an attempt to discriminate real from imaginary social networks and place mythological narratives on the spectrum between them. This suggests that the perceived artificiality of the Irish narrative can be traced back to anomalous features associated with six characters. Speculating that these are amalgams of several entities or proxies, renders the plausibility of the Irish text comparable to the others from a network-theoretic point of view.
    Here’s what the Irish Times said:
    The society in the 1st century story of the Táin Bó Cúailnge looked artificial at first analysis of the networks between 404 characters in the story. However, the researchers found the society reflected real rather than fictional networks when the weakest links to six of the characters are removed. These six characters included Medb, Queen of Connacht; Conchobor, King of Ulster and Cúchulainn. They were “similar to superheroes of the Marvel universe” and are “too superhuman” or too well-connected to be real, researchers said. The researchers suggest that each of these superhuman characters may be an amalgam of many which became fused and exaggerated as the story was passed down orally through generations.

    (tags: networks society the-tain epics history mythology ireland statistics network-analysis papers)
  • Irish campsite recommendations : the conclusion of a Twitter/Facebook recommendations-gathering exercise; winners seem to be Lough Key Forest Park, Renvyle Beach, Fintra, Eagle Point, and Hidden Valley
    (tags: camping ireland tips recommendations caravan holidays vacation)

Links for 2012-07-23

  • CloudBurst : ‘Highly Sensitive Short Read Mapping with MapReduce’. current state of the art in DNA sequence read-mapping algorithms.

    CloudBurst uses well-known seed-and-extend algorithms to map reads to a reference genome. It can map reads with any number of differences or mismatches. [..] Given an exact seed, CloudBurst attempts to extend the alignment into an end-to-end alignment with at most k mismatches or differences by either counting mismatches of the two sequences, or with a dynamic programming algorithm to allow for gaps. CloudBurst uses [Hadoop] to catalog and extend the seeds. In the map phase, the map function emits all length-s k-mers from the reference sequences, and all non-overlapping length-s kmers from the reads. In the shuffle phase, read and reference kmers are brought together. In the reduce phase, the seeds are extended into end-to-end alignments. The power of MapReduce and CloudBurst is the map and reduce functions run in parallel over dozens or hundreds of processors.
    JM_SOUGHT — the next generation ;)
    (tags: bioinformatics mapreduce hadoop read-alignment dna sequencing sought antispam algorithms)
  • Expensive lessons in Python performance tuning : some good advice for large-scale Python performance: prun and guppy for profiling, namedtuples for memory efficiency, and picloud for trivial EC2-based scale-out. (via Nelson)
    (tags: picloud prun guppy namedtuples python optimization performance tuning profiling)

  • On Patents : Notch comes up with a perfect analogy for software patents.

    I am mostly fine with the concept of “selling stuff you made”, so I’m also against copyright infringement. I don’t think it’s quite as bad as theft, and I’m not sure it’s good for society that some professions can get paid over and over long after they did the work (say, in the case of a game developer), whereas others need to perform the job over and over to get paid (say, in the case of a hairdresser or a lawyer). But yeah, “selling stuff you made” is good. But there is no way in hell you can convince me that it’s beneficial for society to not share ideas. Ideas are free. They improve on old things, make them better, and this results in all of society being better. Sharing ideas is how we improve. A common argument for patents is that inventors won’t invent unless they can protect their ideas. The problem with this argument is that patents apply even if the infringer came up with the idea independently. If the idea is that easy to think of, why do we need to reward the person who happened to be first?
    Of course, in reality it’s even worse, since you don’t actually have to be first to invent — just first to file without sufficient people noticing, and people are actively dissuaded from noticing (since it makes their lives riskier if they know about the existence of patents)…
    (tags: business legal ip copyright patents notch minecraft patent-trolls)
  • Marsh’s Library : Dublin museum of antiquarian books, open to the public — well worth a visit, apparently (I will definitely be making my way there soon I suspect), to check out their new “Marvels of Science” exhibit. Not only that though, but they have a beautiful website with some great photos — exemplary
    (tags: museum dublin ireland libraries books science)

  • ‘Poisoning Attacks against Support Vector Machines’, Battista Biggio, Blaine Nelson, Pavel Laskov : The perils of auto-training SVMs on unvetted input.

    We investigate a family of poisoning attacks against Support Vector Machines (SVM). Such attacks inject specially crafted training data that increases the SVM’s test error. Central to the motivation for these attacks is the fact that most learning algorithms assume that their training data comes from a natural or well-behaved distribution. However, this assumption does not generally hold in security-sensitive settings. As we demonstrate, an intelligent adversary can, to some extent, predict the change of the SVM’s decision function due to malicious input and use this ability to construct malicious data. The proposed attack uses a gradient ascent strategy in which the gradient is computed based on properties of the SVM’s optimal solution. This method can be kernelized and enables the attack to be constructed in the input space even for non-linear kernels. We experimentally demonstrate that our gradient ascent procedure reliably identifies good local maxima of the non-convex validation error surface, which significantly increases the classifier’s test error.
    Via Alexandre Dulaunoy
    (tags: papers svm machine-learning poisoning auto-learning security via:adulau)

Links for 2012-07-21

Links for 2012-07-16

  • Science funding doesn’t add up – The Irish Times : ‘[Science Foundation Ireland] said it was continuing to support basic research, but there are a number of leading scientists here who were refused funding despite having qualified for it in the past. Dr Mike Peardon of the School of Mathematics was recently been turned down, having been “administratively withdrawn”. This means the application for funding was rejected at the first post during initial consideration and before it had a chance to be assessed by external experts. Several others in his department suffered a similar fate. “The school of mathematics at Trinity is ranked the 15th best maths department in the world and now we are not fundable by Science Foundation Ireland,” he said. “The cases I heard of have all been in pure maths,” said Prof Lorraine Hanlon in UCD’s school of physics. “All reported that the people in pure maths were returned unreviewed.” She believes other areas may also come under pressure. “Pure maths is the thin end of the wedge. The Government says mathematics is fundamental, but on the other side says we dont really care enough to support it. That is a schizophrenic approach,” she said.’
    (tags: mathematics ireland science research academia funding tcd ucd sfi)

  • Microsoft’s ill-chosen magic constants : ‘Paolo Bonzini noticed something a little awkward in the Linux kernel support code for Microsoft’s HyperV virtualisation environment – specifically, that the magic constant passed through to the hypervisor was “0xB16B00B5”, or, in English, “BIG BOOBS”. It turns out that this isn’t an exception – when the code was originally submitted it also contained “0x0B00B135”.’ me, I prefer my magic constants less offensive and more Subgenius-oriented: “0xB0BD0BB5”
    (tags: constants via:kevin-lyda oh-dear microsoft fail magic-numbers boobs linux kernel)

Links for 2012-07-14

  • Scaling lessons learned at Dropbox : website-scaling tips and suggestions, “particularly for a resource-constrained, fast-growing environment that can’t always afford to do things “the right way” (i.e., any real-world engineering project”. I really like the “run with fake load” trick; add additional queries/load which you can quickly turn off if the service starts browning out, giving you a few days breathing room to find a real fix before customers start being affected. Neat
    (tags: dropbox scalability webdev load scaling-up)

Links for 2012-07-11

  • Don’t waste your time in crappy startup jobs : 7 reasons why working for a startup sucks. Been there, done that — I wish I’d read this years ago. It should be permalinked at the top of Hacker News. “In 1995, a lot of talented young people went into large corporations because they saw no other option in the private sector– when, in fact, there were credible alternatives, startups being a great option. In 2012, a lot of young talent is going into startups for the same reason: a belief that it’s the only legitimate work opportunity for top talent, and that their careers are likely to stagnate if they work in more established businesses. They’re wrong, I think, and this mistaken belief allows them to be taken advantage of. The typical equity offer for a software engineer is dismally short of what he’s giving up in terms of reduced salary, and the career path offered by startups is not always what it’s made out to be. For all this, I don’t intend to argue that people shouldn’t join startups. If the offer’s good, and the job looks interesting, it’s worth trying out. I just don’t think that the current, unconditional “startups are awesome!” mentality serves us well. It’s not good for any of us, because there’s no tyrant worse than a peer selling himself short, and right now there are a lot of great people selling themselves very short for a shot at the “startup experience” — whatever that is.”
    (tags: startups work job life career tech vc companies pay stock share-options)

Links for 2012-07-10

Links for 2012-07-03

Links for 2012-06-29

  • Facts still sacred despite Ireland’s spectrum of conflicting views on abortion – The Irish Times – Fri, Jun 29, 2012 : Very good data-driven analysis. “Pro-life” groups claim abortion is a serious mental health risk for women. Youth Defence claims women who opt for an abortion rather than carrying to term or giving the baby up for adoption suffer mental maladies such as depression, suicide and other problems. But this is at heart a scientific claim, and can thus be tested. […] Psychologist Dr Brenda Majors studied this in depth and found no evidence that [“post-abortion syndrome”] exists. As long as a woman was not depressive before an abortion, “elective abortion of an unintended pregnancy does not pose a risk to mental health”. The same results were found in several other studies […] Essentially these studies found there was no difference in mental health between those who opted for abortion and those who carried to term. Curiously, there was a markedly increased risk to mental health for women who gave a child up for adoption. A corollary of the research was that while women did not suffer long-term mental health effects due to abortion, short-term guilt and sadness was far more likely if the women had a background where abortion was viewed negatively or their decisions were decried — the kind of attitude fostered by “pro-life” activists.”
    (tags: pro-choice pro-life abortion data facts via:irish-times research science pregnancy depression pas)

Links for 2012-06-28

  • “Machine Learning That Matters” [paper, PDF] : Great paper. This point particularly resonates: “It is easy to sit in your o?ce and run a Weka algorithm on a data set you downloaded from the web. It is very hard to identify a problem for which machine learning may o?er a solution, determine what data should be collected, select or extract relevant features, choose an appropriate learning method, select an evaluation method, interpret the results, involve domain experts, publicize the results to the relevant scienti?c community, persuade users to adopt the technique, and (only then) to truly have made a di?erence (see Figure 1). An ML researcher might well feel fatigued or daunted just contemplating this list of activities. However, each one is a necessary component of any research program that seeks to have a real impact on the world outside of machine learning.”
    (tags: machine-learning ml software data real-world algorithms)

  • Massive identity-theft breach in South Korea results in calls for national ID system to be abandoned : In South Korea, web users are required to provide their national ID number for “virtually every type of Internet activity, not only for encrypted communications like e-commerce, online banking and e-government services but also casual tasks like e-mail and blogging”, apparently in an attempt to “curb cyber-bullying”. The result is obvious — those ID numbers being collected in giant databases at companies like “SK Communications, which runs top social networking service Cyworld and search site Nate”, and those giant databases being tasty targets for black-hats. Now: “In Korea’s biggest-ever case of data theft the recent hacking attack at SK Communications, which runs top social networking service Cyworld and search site Nate, breached 35 million accounts, a mind-boggling total for a country that has about 50 million people and an economically-active population of 25 million. The compromised information includes names, passwords, phone numbers, e-mail addresses, and most alarmingly, resident registration numbers, the country’s equivalent to social security numbers.” This is an identity-fraudster’s dream: “In the hands of criminals, resident registration numbers could become master keys that open every door, allowing them to construct an entire identity based on the quality and breadth of data involved.”
    (tags: south-korea identity fraud identity-theft web bullying authentication hacking)

Links for 2012-06-27

  • WeatherSpark : Beatiful dataviz of weather data from met.no, NOAA.gov, World Weather Online and Weather Central. The main graph includes: mean and percentiles of historical temperature data for time of year, the temperature and precipitation forecast over the chosen period, wind direction and speed, with hourly data. Very nicely done! (via Una Mullally)
    (tags: via:unamullally dataviz temperature forecasts weather graphing percentiles wind rain)

  • the recruiter honeypot : wow, I thought it was hard hiring in Dublin. Sounds like Silicon Valley is insane. “Unfortunately, it’s not all about the numbers. Though external recruiters perform well for start-ups, there’s another side to this story. It pains me to write this but I think it’s important to share. Meebo employed lots of external recruiters when we were getting off the ground. We had standard 18-month no-poach restrictions with all of our contractors that specified that those recruiters were not allowed to contact Meebo employees within 18 months of our contract expiring. Most of those contracts expired in 2008-2009. However, every recruiter and firm we’d worked with who was still in the recruiting business tried to poach [the ‘honeypot’ employee] Pete London.” (Another lesson: don’t build a product in javascript, since it’s impossible to hire engineers ;)
    (tags: honeypots hiring silicon-valley recruiting coding experts meebo)

Links for 2012-06-26

  • CEO Of Internet Provider Sonic.net: We Delete User Logs After Two Weeks. Your Internet Provider Should, Too. – Forbes : “what we saw was a shift towards customers being made part of a business model that involved–I don’t know if extortion is the right word–but embarassment for gain. An individual would download a movie, using bittorrent, and infringe copyright. And that might be our customer, like Bob Smith who owns a Sonic.net account, or it might be their spouse, or it might be their child. Or it might be one of his three roommates in a loft in San Francisco, who Bob is not responsible for, and who rent out their loft on AirBnB and have couch surfers and buddies from college and so on and open Wifi. When lawyers asked us for these users’ information, some of our customers I spoke with said “Oh yeah, crap, they caught me,” and were willing to admit they engaged in piracy and pay a settlement. But in other cases, it turned out the roommate did it, or no one would admit to doing it. But they would pay the settlement anyway. Because no one wants to be named in the public record in a case from So-And-So Productions vs. 1,600 names including Bob Smith for downloading a film called “Don’t Tell My Wife I B—F—— The Babysitter.” AG: Is that a real title? DJ: Yes. I’ve read about cases where a lawyer was doing this for the movie “The Expendables,” and 5% of people settled. So then he switched to representing someone with an embarassing porn title, and like 30% of people paid. It seemed like half the time, the customer wasn’t the one right one, but they rolled over because it would be very embarassing. And I think that’s an abuse of process. I was unwilling to become part of that business model. In many cases the lawyers never pursued the case, and it was all bluster. But under that threat, you pay.”
    (tags: interview isps freedom copyright internet shakedown lawyers sonic.net data-retention via:oisin)

  • an ex-RBSG engineer on the NatWest/RBS/UlsterBank IT fiasco : ‘Turning over your systems support staff in a wave of redundancies is not the best way to manage the transfer of knowledge. Not everyone who worked the batch at [Royal Bank of Scotland Group] even knew what it is they knew; how, then, could they explain it to people who didn’t know there was knowledge to acquire? Outsourcing the work from Edinburgh to Aberdeen and sacking the staff would have exposed them to the same risks. […] I Y2K tested one of the batch feeder systems at RBS from 1997 – 1998, and managed acceptance testing in payments processing systems from 1999 – 2001. I was one of the people who watched over the first batch of the millennium instead of going to a party. I was part of the project that moved the National Westminster batch onto the RBS software without a single failure. I haven’t worked for the bank for five years, and I am surprised at how personally affronted I am that they let that batch fail. But I shouldn’t be. Protectiveness of the batch was the defining characteristic of our community. We were proud of how well that complex structure of disparate components hummed along. It was a thing of beauty, of art and craft, and they dropped it all over the floor.’
    (tags: systems ops support maintainance legacy ca-7 banking rbs natwest ulster-bank fail outsourcing)

  • Some Facts & Insights Into The Whole Discussion Of ‘Ethics’ And Music Business Models | Techdirt : David “Camper Van Beethoven” Lowery’s blogpost about music sales, ethics, piracy etc. looks like it was pretty much riddled with errors regarding the viability of the music business, then and now. Empirical figures from Jeff Price from Tunecore, and others, to debunk it: “‘Well here’s some truth about the old industry that David somehow misses. Previously, artists were not rolling in money. Most were not allowed into the system by the gatekeepers. Of those that were allowed on the major labels, over 98% of them failed. Yes, 98%?. Of the 2% that succeeded, less than a half percent of those ever got paid a band royalty from the sale of recorded music. How in the world is an artist making at least something, no matter how small, worse than 99% of the world’s unsigned artists making nothing and of the 1% signed, less than a half a percent of them ever making a single band royalty ever?'” […] “Another example of Lowery being wrong that Price responds to is the claim that recorded music revenue to artists has been going down. Price has data: ‘This is empirically false. Revenue to labels has collapsed. Revenue to artists has gone up with more artists making more money now than at any time in history, off of the sale of pre-recorded music. Taken a step further, a $17.98 list price CD earned a band $1.40 as a band royalty that they only got if they were recouped (over 99% of bands never recouped). If an artist sells just two songs for $0.99 on iTunes via TuneCore, they gross $1.40. If they sell an album for $9.99 on iTunes via TuneCore, they gross $7.00. This is an INCREASE of over 700% in revenue to artists for recorded music sales.'”
    (tags: music mp3 music-business piracy techdirt david-lowery tunecore)

Links for 2012-06-25

  • Eight Real Tales of Learning Computer Science as a High School Girl : ‘All students at Stuyvesant High School are required to take a year of computer science. As it turns out, the advanced computer science classes skew mostly male anyway. But for a year, boys and girls get exposed to computer programming together. We asked Mike Zamansky, the head of the computer science program, to share some stories from his female students. They did us one better. Eight students sent in first-hand accounts of what it’s like to learn computer programming as a teenage girl.’ Some interesting comments here. This topic is weighing on my mind now that I have two girls…
    (tags: schools learning education computer-science technology nyc girls teenage)

  • RBS collapse details revealed – The Register : as noted in the gossip last week. ‘The main batch scheduling software used by RBS is CA-7, said one source, a former RBS employee who left the company recently.’ ‘RBS do use CA-7 and do update all accounts overnight on a mainframe via thousands of batch jobs scheduled by CA-7 … Backing out of a failed update to CA-7 really ought to have been a trivial matter for experienced operations and systems programming staff, especially if they knew that an update had been made. That this was not the case tends to imply that the criticisms of the policy to “offshore” also hold some water.’
    (tags: outsourcing failure software rbs natwest ulster-bank ulster-blank offshoring downsizing ca-7 upgrades)

Links for 2012-06-23

  • Natwest, RBS: When will bank glitch be fixed? Probably not today • The Register Forums : Some amazing insider-info posts on the Reg forum for the gigantic RBS/NatWest/Ulster Bank multi-day outage. Fingers pointing at their outsourcing/downsizing practices — in a word, they’ve sacked the experienced staff, replaced them with noobs thousands of miles away, and not paid down any technical debt on the legacy code they’re maintaining. Classic legacy IT fail. “I worked for RBS during and after the merger with Natwest, I left their Global Financial Markets Department in 2004 after a 5 year stint. They had already moved some IT functions to India at that point and have continued to do so year on year since. The numbers some people are quoting 1600/800 are possibly the more recent figures, the total is way way beyond this. The comments on documentation are comical, as if a document is the thing you turn to at a time of crisis. The fact is, when you work closely with systems and the business users, you understand not only the quirks of the systems, but the risks and consequences of failure. You work with those users on the work around solutions that will get the banking day complete. They haven’t just outsourced the IT staff, but the very experienced and valuable back office / operations staff that would work with IT staff to solve the serious issues. I beleive these guys are mostly posted out in Singapore, who probably have never met the IT staff in India. The unseen cost of outsourcing is a compounding loss of shared experience and commitment, which becomes accutely apparent when the sh!t hits the … cash machines The chaps I trained out in India were nice enough, but they simply lacked the knowledge and experience of Finacial Markets trading, trade and settlement processing, Swift messaging blah blah and the risks involved. I’ll be drinking with a bunch of ex RBS/Natwesties soon enough, where we’ll all be saying….. “WE TOLD YOU SO!!!!!!!” Another poster says: “I understand that your description of the RBS Mainframe based batch update process is fairly accurate. The source of the problem was a software update to Batch scheduling suite CA7. The upgrade when so well that now there is no schedule to run all of those thousands of batch jobs to receive and make BACS payments, update balance, schedule printouts, etc. I am sure the problem with the CA7 upgrade and the unfortunate misplacing of the Batch schedule has absolutely nothing to do the with the last UK based technicians leaving recently. The guys in India of course are perfectly able to cope and fix their mistake. I’m sure they understand how the thousands of jobs in the schedule need to ordered to make sure there is data corruption or loss. After all the problem happened on Tuesday and it’s only Friday. I wonder how many ex-RBS staff have received very lucrative short term contracts in the last few days……”
    (tags: natwest it rbs the-register outsourcing fail organisations ulster-bank ulster-blank)

Links for 2012-06-22

Links for 2012-06-21

Links for 2012-06-20

  • The Hydra Bay : “How to set up a Pirate Bay proxy”. Step-by-step instructions for MacOS and Linux on how to run a fully-functional reverse proxy for The Pirate Bay — in other words, provide a duplicate URL for users to circumvent ISP blocks of TPB. http://about.piratereverse.info/proxy/list.html contains about a hundred others. See also http://unblockedpiratebay.com/ for a standalone PHP script which does the same (albeit a little less efficiently). A good demonstration of how futile filtering techniques like IP or domain name blocks are, when applied to a popular website like TPB.
    (tags: piratebay filtering censorship copyright php proxies reverse-proxies ip-blocking dns-blocking)

  • how to restore from iCloud backup : the trick: don’t try and do it through iTunes, it won’t give you the option, apparently. I have a carrier unlock, and apparently need to wipe the phone for it to take place; this scares the crap out of me
    (tags: backup iphone restore sysadmin phones icloud apple howto)

Links for 2012-06-19

Links for 2012-06-15

  • PGP founder, Navy SEALs uncloak encrypted comms biz • The Register : ‘The company, called Silent Circle, will launch later this year, when $20 a month will buy you encrypted email, text messages, phone calls, and videoconferencing in a package that looks to be strong enough to have the NSA seriously worried. Zimmermann says that surveillance by the state and others has increased vastly over the last few years, and privacy improvement are again needed. “At the very least I want people, as part of their right in a free society to be able to communicate securely,” he said in a promotional video. “I should be able to whisper in your ear, even if your ear is a thousand miles away.” […] While software can handle most of the work, there still needs to be a small backend of servers to handle traffic. The company surveyed the state of privacy laws around the world and found that the top three choices were Switzerland, Iceland, and Canada, so they went for the one within driving distance.’
    (tags: pgp phil-zimmermann privacy crypto silent-circle apps vc security)

Links for 2012-06-13

  • The Silencing of Maya : software patent shakedown threatens to remove a 4-year-old’s only means of verbal expression: ‘Maya can speak to us, clearly, for the first time in her life. We are hanging on her every word. We’ve learned that she loves talking about the days of the week, is weirdly interested in the weather, and likes to pretend that her toy princesses are driving the bus to school (sometimes) and to work (other times). This app has not only allowed her to communicate her needs, but her thoughts as well. It’s given us the gift of getting to know our child on a totally different level. I’ve been so busy embracing this new reality and celebrating, that I kind of forgot that there was an ongoing lawsuit, until last Monday. When Speak for Yourself was removed from the iTunes store.’
    (tags: speak-for-yourself children law swpats patenting stories ipad apps)

  • _Building High-level Features Using Large Scale Unsupervised Learning_ [paper, PDF] : “We consider the problem of building highlevel, class-specific feature detectors from only unlabeled data. For example, is it possible to learn a face detector using only unlabeled images using unlabeled images? To answer this, we train a 9-layered locally connected sparse autoencoder with pooling and local contrast normalization on a large dataset of images (the model has 1 billion connections, the dataset has 10 million 200×200 pixel images downloaded from the Internet). We train this network using model parallelism and asynchronous SGD on a cluster with 1,000 machines (16,000 cores) for three days. Contrary to what appears to be a widely-held intuition, our experimental results reveal that it is possible to train a face detector without having to label images as containing a face or not. Control experiments show that this feature detector is robust not only to translation but also to scaling and out-of-plane rotation. We also ?nd that the same network is sensitive to other high-level concepts such as cat faces and human bodies. Starting with these learned features, we trained our network to obtain 15.8% accuracy in recognizing 20,000 object categories from ImageNet, a leap of 70% relative improvement over the previous state-of-the-art.”
    (tags: algorithms machine-learning neural-networks sgd labelling training unlabelled-learning google research papers pdf)

Links for 2012-06-12

Links for 2012-06-11