Skip to content

Category: Uncategorized

Links for 2014-07-18

Links for 2014-07-16

Links for 2014-07-15

Links for 2014-07-14

Links for 2014-07-11

  • Netflix/ribbon

    a client side IPC library that is battle-tested in cloud. It provides the following features: Load balancing; Fault tolerance; Multiple protocol (HTTP, TCP, UDP) support in an asynchronous and reactive model; Caching and batching.
    I like the integration of Eureka and Hystrix in particular, although I would really like to read more about Eureka's approach to availability during network partitions and CAP. https://groups.google.com/d/msg/eureka_netflix/LXKWoD14RFY/-5nElGl1OQ0J has some interesting discussion on the topic. It actually sounds like the Eureka approach is more correct than using ZK: 'Eureka is available. ZooKeeper, while tolerant against single node failures, doesn't react well to long partitioning events. For us, it's vastly more important that we maintain an available registry than a necessary consistent registry. If us-east-1d sees 23 nodes, and us-east-1c sees 22 nodes for a little bit, that's OK with us.' See also http://ispyker.blogspot.ie/2013/12/zookeeper-as-cloud-native-service.html which corroborates this:
    I went into one of the instances and quickly did an iptables DROP on all packets coming from the other two instances. This would simulate an availability zone continuing to function, but that zone losing network connectivity to the other availability zones. What I saw was that the two other instances noticed that the first server “going away”, but they continued to function as they still saw a majority (66%). More interestingly the first instance noticed the other two servers “going away” dropping the ensemble availability to 33%. This caused the first server to stop serving requests to clients (not only writes, but also reads). [...] To me this seems like a concern, as network partitions should be considered an event that should be survived. In this case (with this specific configuration of zookeeper) no new clients in that availability zone would be able to register themselves with consumers within the same availability zone. Adding more zookeeper instances to the ensemble wouldn’t help considering a balanced deployment as in this case the availability would always be majority (66%) and non-majority (33%).

    (tags: netflix ribbon availability libraries java hystrix eureka aws ec2 load-balancing networking http tcp architecture clients ipc)

  • The Myth of Schema-less [NoSQL]

    We don't seem to gain much in terms of database flexibility. Is our application more flexible? I don't think so. Even without our schema explicitly defined in our database, it's there... somewhere. You simply have to search through hundreds of thousands of lines to find all the little bits of it. It has the potential to be in several places, making it harder to properly identify. The reality of these codebases is that they are error prone and rarely lack the necessary documentation. This problem is magnified when there are multiple codebases talking to the same database. This is not an uncommon practice for reporting or analytical purposes. Finally, all this "flexibility" rears its head in the same way that PHP and Javascript's "neat" weak typing stabs you right in the face. There are some somethings you can be cavalier about, and some things you should be strict about. Your data model is one you absolutely need to be strict on. If a field should store an int, it should store nothing else. Not a string, not a picture of a horse, but an integer. It's nice to know that I have my database doing type checking for me and I can expect a field to be the same type across all records. All this leads us to an undeniable fact: There is always a schema. Wearing "I don't do schema" as a badge of honor is a complete joke and encourages a terrible development practice.

    (tags: nosql databases storage schema strong-typing)

  • Latest EBS tuning tips

    from yesterday's AWS Summit in NYC:

    Cheat sheet of EBS-optimized instances. http://t.co/vmTlhUtpWk Optimize your queue depth to achieve lower latency & highest IOPS. http://t.co/EO48oa0D6X When configuring your RAID, use a stripe size of 128KB or 256KB. http://t.co/N0ldtFJ4t6 Use larger block size to speed up the pre-warming process. http://t.co/8UoIeWE2px

    (tags: ebs aws amazon iops raid ops tuning)

Links for 2014-07-10

Links for 2014-07-09

  • Google's Influential Papers for 2013

    Googlers across the company actively engage with the scientific community by publishing technical papers, contributing open-source packages, working on standards, introducing new APIs and tools, giving talks and presentations, participating in ongoing technical debates, and much more. Our publications offer technical and algorithmic advances, feature aspects we learn as we develop novel products and services, and shed light on some of the technical challenges we face at Google. Below are some of the especially influential papers co-authored by Googlers in 2013.

    (tags: google papers toread reading 2013 scalability machine-learning algorithms)

Links for 2014-07-08

  • #BPjMleak

    'Leak of the secret German Internet Censorship URL blacklist BPjM-Modul'. Turns out there's a blocklist of adult-only or prohibited domains issued by a German government department, The Federal Department for Media Harmful to Young Persons (German: "Bundesprüfstelle für jugendgefährdende Medien" or BPjM), issued in the form of a list of hashes of those domains. These were extracted from an AVM router, then the hashes were brute forced using several other plaintext URL blocklists and domain lists. Needless to say, there's an assortment of silly false positives, such as the listing of the website for the 1997 3D Realms game "Shadow Warrior": http://en.wikipedia.org/wiki/Shadow_Warrior

    (tags: hashes reversing reverse-engineering germany german bpjm filtering blocklists blacklists avm domains censorship fps)

  • Brave Men Take Paternity Leave - Gretchen Gavett - Harvard Business Review

    The use of paternity leave has a "snowball effect":

    In the end, Dahl says, “coworkers and brothers who were linked to a father who had his child immediately after the [Norwegian paid paternity leave] reform — versus immediately before the reform — were 3.5% and 4.7% more likely, respectively, to take parental leave.” But when a coworker actually takes parental leave, “the next coworker to have a child at his workplace is 11% more likely to take paternity leave.” Slightly more pronounced, the next brother to have a child is 15% more likely to take time off. And while any male coworker taking leave can reduce stigma, the effect of a manager doing so is more profound. Specifically, “the estimated peer effect is over two and a half times larger if the peer father is predicted to be a manager in the firm as opposed to a regular coworker.”

    (tags: paternity-leave parenting leave work norway research)

  • "The Tail at Scale"

    by Jeffrey Dean and Luiz Andre Barroso, Google. A selection of Google's architectural mechanisms used to defeat 99th-percentile latency spikes: hedged requests, tied requests, micro-partitioning, selective replication, latency-induced probation, canary requests.

    (tags: google architecture distcomp soa http partitioning replication latency 99th-percentile canary-requests hedged-requests)

Links for 2014-07-07

Links for 2014-07-06

  • Layered Glass Table Concept Creates a Cross-Section of the Ocean

    beautiful stuff -- and a snip at only UKP 5,800 ex VAT. it'd make a good DIY project though ;)

    (tags: art tables glass layering 3d cross-sections water ocean sea mapping cartography layers this-is-colossal design furniture)

  • Two traps in iostat: %util and svctm

    Marc Brooker:

    As a measure of general IO busyness %util is fairly handy, but as an indication of how much the system is doing compared to what it can do, it's terrible. Iostat's svctm has even fewer redeeming strengths. It's just extremely misleading for most modern storage systems and workloads. Both of these fields are likely to mislead more than inform on modern SSD-based storage systems, and their use should be treated with extreme care.

    (tags: ioutil iostat svctm ops ssd disks hardware metrics stats linux)

  • New AWS Web Services region: eu-central-1 (soon)

    Iiiinteresting. Sounds like new anti-NSA-snooping privacy laws will be driving a lot of new mini-regions in AWS. Hope Amazon have their new-region-standup process a little more streamlined by now than when I was there ;)

    (tags: aws germany privacy ec2 eu-central-1 nsa snooping)

  • How A Spam Newsletter Caused a Bank Run in Bulgaria

    According to the Bulgarian National Security Agency (see here, for a reporting in English), an investment company that “built a network of associated companies for marketing services” that was used to diffuse panic by means of an alert, uncomfortably titled “Information Bulletin of on the Risk of Deposits in Bulgarian Banks”. The “bulletin” claimed – Bloomberg reports – KTB was undergoing a liquidity shortage. The message apparently also said that the government deposit guarantee fund was under-capitalised to meet possible repayments, that banks could go bankrupt and that the peg of the currency with the euro could be broken. Allegedly, the alert was diffused by text, email and even Facebook messages, thus ensuring a very widespread outreach. In a country that in 1997 underwent a very serious banking crisis featuring all these characteristics – whose memory is still fresh – this was enough to spur panic.

    (tags: spam banking bulgaria banks euro panic facebook social-media)

  • New Russian Law To Forbid Storing Russians' Data Outside the Country - Slashdot

    On Friday Russia's parliament passed a law "which bans online businesses from storing personal data of Russian citizens on servers located abroad[.] ... According to ITAR-TASS, the changes to existing legislation will come into effect in September 2016, and apply to email services, social networks and search engines, including the likes of Facebook and Google. Domain names or net addresses not complying with regulations will be put on a blacklist maintained by Roskomnadzor (the Federal Supervision Agency for Information Technologies and Communications), the organisation which already has the powers to take down websites suspected of copyright infringement without a court order. In the case of non-compliance, Roskomnadzor will be able to impose 'sanctions,' and even instruct local Internet Service Providers (ISPs) to cut off access to the offending resource."

    (tags: russia privacy nsa censorship protectionism internet web)

Links for 2014-07-04

  • Irish parliament pressing ahead with increased access to retained telecoms data

    While much of the new bill is concerned with the dissolution of the Competition Authority and the National Consumer Agency and the formation of a new merged Competition and Consumer Protection Commission (CCPC) the new bill also proposed to extend the powers of the new CCPC to help it investigate serious anticompetitive behaviour. Strikingly the new bill proposes to give members of the CCPC the power to access data retained under the Communications (Retention of Data) Act 2011. As readers will recall this act implements Directive 2006/24/EC which obliges telecommunications companies to archive traffic and location data for a period of up to two years to facilitate the investigation of serious crime. Ireland chose to implement the maximum two year retention period and provided access to An Garda Siochana, The Defence Forces and the Revenue Commissioners. The current reform of Irish competition law now proposes to extend data access powers to the members of the CCPC for the purposes of investigating cartel offences.

    (tags: data-retention privacy surveillance competition ccpc ireland law dri)

  • NSA: Linux Journal is an "extremist forum" and its readers get flagged for extra surveillance

    DasErste.de has published the relevant XKEYSCORE source code, and if you look closely at the rule definitions, you will see linuxjournal.com/content/linux* listed alongside Tails and Tor. According to an article on DasErste.de, the NSA considers Linux Journal an "extremist forum". This means that merely looking for any Linux content on Linux Journal, not just content about anonymizing software or encryption, is considered suspicious and means your Internet traffic may be stored indefinitely.
    This is, sadly, entirely predictable -- that's what happens when you optimize the system for over-sampling, with poor oversight.

    (tags: false-positives linuxjournal linux terrorism tor tails nsa surveillance snooping xkeyscore selectors oversight)

  • stout

    a C++ library adding some modern language features like Option, Try, Stopwatch, and other Guava-ish things (via @cscotta)

    (tags: c++ library stout option try guava coding)

Links for 2014-07-03

Links for 2014-07-01

Links for 2014-06-30

  • Facebook Doesn't Understand The Fuss About Its Emotion Manipulation Study

    This is quite unethical, and I'm amazed it was published at all. Kashmir Hill at Forbes nails it:

    While many users may already expect and be willing to have their behavior studied — and while that may be warranted with “research” being one of the 9,045 words in the data use policy — they don’t expect that Facebook will actively manipulate their environment in order to see how they react. That’s a new level of experimentation, turning Facebook from a fishbowl into a petri dish, and it’s why people are flipping out about this.
    Shocking stuff. We need a new social publishing platform, built on ethical, open systems.

    (tags: ethics facebook privacy academia depression feelings emotion social-publishing social experimentation papers)

  • Building a Smarter Application Stack - DevOps Ireland

    This sounds like a very interesting Dublin meetup -- Engine Yard on thursday night:

    This month, we'll have Tomas Doran from Yelp talking about Docker, service discovery, and deployments. 'There are many advantages to a container based, microservices architecture - however, as always, there is no silver bullet. Any serious deployment will involve multiple host machines, and will have a pressing need to migrate containers between hosts at some point. In such a dynamic world hard coding IP addresses, or even host names is not a viable solution. This talk will take a journey through how Yelp has solved the discovery problems using Airbnb’s SmartStack to dynamically discover service dependencies, and how this is helping unify our architecture, from traditional metal to EC2 ‘immutable’ SOA images, to Docker containers.'

    (tags: meetups talks dublin deployment smartstack ec2 docker yelp service-discovery)

  • Smart Integration Testing with Dropwizard, Flyway and Retrofit

    Retrofit in particular looks neat. Mind you having worked with in-memory SQL databases before for integration testing, I'd never do that again -- too many interop glitches compared to "real world" MySQL/Postgres

    (tags: testing integration-testing retrofit flyway dropwizard logentries)

  • Twitter's TSAR

    TSAR = "Time Series AggregatoR". Twitter's new event processor-style architecture for internal metrics. It's notable that now Twitter and Google are both apparently moving towards this idea of a model of code which is designed to run equally in realtime streaming and batch modes (Summingbird, Millwheel, Flume).

    (tags: analytics architecture twitter tsar aggregation event-processing metrics streaming hadoop batch)

  • 'Robust De-anonymization of Large Sparse Datasets' [pdf]

    paper by Arvind Narayanan and Vitaly Shmatikov, 2008. 'We present a new class of statistical de- anonymization attacks against high-dimensional micro-data, such as individual preferences, recommendations, transaction records and so on. Our techniques are robust to perturbation in the data and tolerate some mistakes in the adversary's background knowledge. We apply our de-anonymization methodology to the Netflix Prize dataset, which contains anonymous movie ratings of 500,000 subscribers of Netflix, the world's largest online movie rental service. We demonstrate that an adversary who knows only a little bit about an individual subscriber can easily identify this subscriber's record in the dataset. Using the Internet Movie Database as the source of background knowledge, we successfully identified the Netflix records of known users, uncovering their apparent political preferences and other potentially sensitive information.'

    (tags: anonymisation anonymization sanitisation databases data-dumps privacy security papers)

  • HSE data releases may be de-anonymisable

    Although the data has been kept anonymous, the increasing sophistication of computer-driven data-mining techniques has led to fears patients could be identified. A HSE spokesman confirmed yesterday that the office responded to requests for data from a variety of sources, including researchers, the universities, GPs, the media, health insurers and pharmaceutical companies. An average of about two requests a week was received. [...] The information provided by the HPO has significant patient identifiers removed, such as name and date of birth. According to the HSE spokesman, individual patient information is not provided and, where information is sought for a small group of patients, this is not provided where the number involved is under five. “In such circumstances, it is highly unlikely that anyone could be identified. Nevertheless, we will have another look at data releases from the office,” he said.
    I'd say this could be readily reversible, from the sounds of it.

    (tags: anonymisation sanitisation data-dumps hse health privacy via:tjmcintyre)

  • Beautiful algorithm visualisations from Mike Bostock

    This is a few days old, but unmissable. I swear, the 'Wilson's algorithm transformed into a tidy tree layout' viz brought tears to my eyes ;)

    (tags: dataviz algorithms visualization visualisation mazes trees sorting animation mike-bostock)

  • ByteArrayOutputStream is really, really slow sometimes in JDK6

    This leads us to the bug. The size of the array is determined by Math.max(buf.length << 1, newcount). Ordinarily, buf.length << 1 returns double buf.length, which would always be much larger than newcount for a 2 byte write. Why was it not? The problem is that for all integers larger than Integer.MAX_INTEGER / 2, shifting left by one place causes overflow, setting the sign bit. The result is a negative integer, which is always less than newcount. So for all byte arrays larger than 1073741824 bytes (i.e. one GB), any write will cause the array to resize, and only to exactly the size required.
    Ouch.

    (tags: bugs java jdk6 bytearrayoutputstream impala performance overflow)

  • Cory Doctorow on Thomas Piketty's 'Capital in the 21st Century'

    quite a leftie analysis

    (tags: history capitalism economics piketty capital finance taxation growth money cory-doctorow thomas-piketty)

  • ThreadSanitizer

    Google's purify/valgrind-like concurrency checking tool: 'As a bonus, ThreadSanitizer finds some other types of bugs: thread leaks, deadlocks, incorrect uses of mutexes, malloc calls in signal handlers, and more. It also natively understands atomic operations and thus can find bugs in lock-free algorithms. [...] The tool is supported by both Clang and GCC compilers (only on Linux/Intel64). Using it is very simple: you just need to add a -fsanitize=thread flag during compilation and linking. For Go programs, you simply need to add a -race flag to the go tool (supported on Linux, Mac and Windows).'

    (tags: concurrency bugs valgrind threadsanitizer threading deadlocks mutexes locking synchronization coding testing)

Links for 2014-06-27

  • Sandymount Repair Cafe

    'A repair café brings together people with things that need fixin' with people who have the skills to fix them in a social cafe style environment. It is an effort to move away from the throwaway culture that prevailed at the end of the twentieth century and move towards a more sustainable and enlightened approach to our relationship with consumer goods. Repair cafes are self organising events at a community level run by local volunteers with the support of local community groups, local agencies and other interested organisations. They are not-for-profit but not anti-profit and an important part of their goal is to promote local repair businesses and initiatives. www.repaircafe.ie is the online hub of a network of repair cafés across Ireland.' Sounds interesting: https://twitter.com/DubCityCouncil/status/481777655445204992 says they'll be doing it tomorrow from 2-5pm in Sandymount in Dublin.

    (tags: dublin sandymount repair fixing diy frugality repaircafe hardware)

  • Chef Vault

    A way to securely store secrets (auth details, API keys, etc.) in Chef

    (tags: chef storage knife authorisation api-keys security encryption)

  • Amazon EC2 Service Limits Report Now Available

    'designed to make it easier for you to view and manage your limits for Amazon EC2 by providing the latest information on service limits and links to quickly request limit increases. EC2 Service Limits Report displays all your service limit information in one place to help you avoid encountering limits on future EC2, EBS, Auto Scaling, and VPC usage.'

    (tags: aws ec2 vpc ebs autoscaling limits ops)

  • Delivery Notifications for Simple Email Service

    Today we are enhancing SES with the addition of delivery notifications. You can now elect to receive an Amazon SNS notification each time SES successfully delivers a message to a recipient's email server. These notifications give you increased visibility into the mail delivery process. With today's release, you can now track deliveries, bounces, and complaints, all via notification to the SNS topic or topics of your choice.

    (tags: delivery email smtp ses aws sns notifications ops)

  • How Emoji Get Lost In Translation

    I recently texted a friend to say how I was excited to meet her new boyfriend, and, because "excited" doesn't look so exciting on an iPhone screen, I editorialized with what seemed then like an innocent "[dancer]". (Translation: Can't wait for the fun night out!) On an Android phone, I realized later, that panache would have been a put-down: The dancers become "[playboy bunny]." (Translation: You’re a Playboy bunny who gets around!)

    (tags: emoji icons graphics text speech phones)

Links for 2014-06-26

Links for 2014-06-25

Links for 2014-06-24

Links for 2014-06-23

  • Startup equity gotcha

    'Two months ago, an early Uber employee thought that he had found a buyer for his vested stock, at $200 per share. But when his agent tried to seal the deal, Uber refused to sign off on the transfer. Instead, it offered to buy back the shares for around $135 a piece, which is within the same price range that Google Ventures and TPG Capital had paid to invest in Uber the previous July. Take it or hold it.' As rbranson on Twitter put it: 'reminder that startup equity is basically worthless unless you're a founder or investor, OR the company goes public.'

    (tags: startups uber stock stock-options shares share-option equity via:rbranson work)

Links for 2014-06-20

Links for 2014-06-19

Links for 2014-06-18

Links for 2014-06-17

  • FlatBuffers: Main Page

    A new serialization format from Google's Android gaming team, supporting C++ and Java, open source under the ASL v2. Reasons to use it:

    Access to serialized data without parsing/unpacking - What sets FlatBuffers apart is that it represents hierarchical data in a flat binary buffer in such a way that it can still be accessed directly without parsing/unpacking, while also still supporting data structure evolution (forwards/backwards compatibility). Memory efficiency and speed - The only memory needed to access your data is that of the buffer. It requires 0 additional allocations. FlatBuffers is also very suitable for use with mmap (or streaming), requiring only part of the buffer to be in memory. Access is close to the speed of raw struct access with only one extra indirection (a kind of vtable) to allow for format evolution and optional fields. It is aimed at projects where spending time and space (many memory allocations) to be able to access or construct serialized data is undesirable, such as in games or any other performance sensitive applications. See the benchmarks for details. Flexible - Optional fields means not only do you get great forwards and backwards compatibility (increasingly important for long-lived games: don't have to update all data with each new version!). It also means you have a lot of choice in what data you write and what data you don't, and how you design data structures. Tiny code footprint - Small amounts of generated code, and just a single small header as the minimum dependency, which is very easy to integrate. Again, see the benchmark section for details. Strongly typed - Errors happen at compile time rather than manually having to write repetitive and error prone run-time checks. Useful code can be generated for you. Convenient to use - Generated C++ code allows for terse access & construction code. Then there's optional functionality for parsing schemas and JSON-like text representations at runtime efficiently if needed (faster and more memory efficient than other JSON parsers).
    Looks nice, but it misses the language coverage of protobuf. Definitely more practical than capnproto.

    (tags: c++ google java serialization json formats protobuf capnproto storage flatbuffers)

  • AWS SDK for Java Client Configuration

    turns out the AWS SDK has lots of tuning knobs: region selection, socket buffer sizes, and debug logging (including wire logging).

    (tags: aws sdk java logging ec2 s3 dynamodb sockets tuning)

  • Behind the loom band

    The simple woven multicoloured bracelet has made Cheong Choon Ng, a Malaysian immigrant to the US, a dollar millionaire. He invented the "Rainbow Loom" after watching his daughters making bracelets with rubber bands.
    So, really, it's his daughters that invented it. ;) My kids are massive fans. This is a 100% legit, Rubik's-Cube-style craze. (via Conor O'Neill)

    (tags: via:conoro loom-bands rubber-bands toys crazes)

  • lookout/ngx_borderpatrol

    BorderPatrol is an nginx module to perform authentication and session management at the border of your network. BorderPatrol makes the assumption that you have some set of services that require authentication and a service that hands out tokens to clients to access that service. You may not want those tokens to be sent across the internet, even over SSL, for a variety of reasons. To this end, BorderPatrol maintains a lookup table of session-id to auth token in memcached.

    (tags: borderpatrol nginx modules authentication session-management web-services http web authorization)

  • Use of Formal Methods at Amazon Web Services

    Chris Newcombe, Marc Brooker, et al. writing about their experience using formal specification and model-checking languages (TLA+) in production in AWS:

    The success with DynamoDB gave us enough evidence to present TLA+ to the broader engineering community at Amazon. This raised a challenge; how to convey the purpose and benefits of formal methods to an audience of software engineers? Engineers think in terms of debugging rather than ‘verification’, so we called the presentation “Debugging Designs”. Continuing that metaphor, we have found that software engineers more readily grasp the concept and practical value of TLA+ if we dub it 'Exhaustively-testable pseudo-code'. We initially avoid the words ‘formal’, ‘verification’, and ‘proof’, due to the widespread view that formal methods are impractical. We also initially avoid mentioning what the acronym ‘TLA’ stands for, as doing so would give an incorrect impression of complexity.
    More slides at http://tla2012.loria.fr/contributed/newcombe-slides.pdf ; proggit discussion at http://www.reddit.com/r/programming/comments/277fbh/use_of_formal_methods_at_amazon_web_services/

    (tags: formal-methods model-checking tla tla+ programming distsys distcomp ebs s3 dynamodb aws ec2 marc-brooker chris-newcombe)

  • Call me maybe: RabbitMQ

    We used Knossos and Jepsen to prove the obvious: RabbitMQ is not a lock service. That investigation led to a discovery hinted at by the documentation: in the presence of partitions, RabbitMQ clustering will not only deliver duplicate messages, but will also drop huge volumes of acknowledged messages on the floor. This is not a new result, but it may be surprising if you haven’t read the docs closely–especially if you interpreted the phrase “chooses Consistency and Partition Tolerance” to mean, well, either of those things.

    (tags: rabbitmq network partitions failure cap-theorem consistency ops reliability distcomp jepsen)

  • Jump Consistent Hash: A Fast, Minimal Memory, Consistent Hash Algorithm

    'a fast, minimal memory, consistent hash algorithm that can be expressed in about 5 lines of code. In comparison to the algorithm of Karger et al., jump consistent hash requires no storage, is faster, and does a better job of evenly dividing the key space among the buckets and of evenly dividing the workload when the number of buckets changes. Its main limitation is that the buckets must be numbered sequentially, which makes it more suitable for data storage applications than for distributed web caching.' Implemented in Guava. This is also noteworthy: 'Google has not applied for patent protection for this algorithm, and, as of this writing, has no plans to. Rather, it wishes to contribute this algorithm to the community.'

    (tags: hashing consistent-hashing google guava memory algorithms sharding)

  • Bike Wheel Spoke ABS Safety Reflective Tube Reflector

    Available in blue, orange, and grey for $2.84 from the insanely-cheap China-based DealExtreme.com. Also available: rim-based reflective stickers

    (tags: bikes cycling reflective safety dealextreme tat)

Links for 2014-06-16

Links for 2014-05-29

  • Tracedump

    a single application IP packet sniffer that captures all TCP and UDP packets of a single Linux process. It consists of the following elements: * ptrace monitor - tracks bind(), connect() and sendto() syscalls and extracts local port numbers that the traced application uses; * pcap sniffer - using information from the previous module, it captures IP packets on an AF_PACKET socket (with an appropriate BPF filter attached); * garbage collector - periodically reads /proc/net/{tcp,udp} files in order to detect the sockets that the application no longer uses. As the output, tracedump generates a PCAP file with SLL-encapsulated IP packets - readable by eg. Wireshark. This file can be later used for detailed analysis of the networking operations made by the application. For instance, it might be useful for IP traffic classification systems.

    (tags: debugging networking linux strace ptrace tracedump tracing tcp udp sniffer ip tcpdump)

  • You Are Not a Digital Native: Privacy in the Age of the Internet

    an open letter from Cory Doctorow to teen readers re privacy. 'The problem with being a “digital native” is that it transforms all of your screw-ups into revealed deep truths about how humans are supposed to use the Internet. So if you make mistakes with your Internet privacy, not only do the companies who set the stage for those mistakes (and profited from them) get off Scot-free, but everyone else who raises privacy concerns is dismissed out of hand. After all, if the “digital natives” supposedly don’t care about their privacy, then anyone who does is a laughable, dinosauric idiot, who isn’t Down With the Kids.'

    (tags: children privacy kids teens digital-natives surveillance cory-doctorow danah-boyd)

  • Shutterbits replacing hardware load balancers with local BGP daemons and anycast

    Interesting approach. Potentially risky, though -- heavy use of anycast on a large-scale datacenter network could increase the scale of the OSPF graph, which scales exponentially. This can have major side effects on OSPF reconvergence time, which creates an interesting class of network outage in the event of OSPF flapping. Having said that, an active/passive failover LB pair will already announce a single anycast virtual IP anyway, so, assuming there are a similar number of anycast IPs in the end, it may not have any negative side effects. There's also the inherent limitation noted in the second-to-last paragraph; 'It comes down to what your hardware router can handle for ECMP. I know a Juniper MX240 can handle 16 next-hops, and have heard rumors that a software update will bump this to 64, but again this is something to keep in mind'. Taking a leaf from the LB design, and using BGP to load-balance across a smaller set of haproxy instances, would seem like a good approach to scale up.

    (tags: scalability networking performance load-balancing bgp exabgp ospf anycast routing datacenters scaling vips juniper haproxy shutterstock)

  • Tron: Legacy Encom Boardroom Visualization

    this is great. lovely, silly, HTML5 dataviz, with lots of spinning globes and wobbling sines on a black background

    (tags: demo github wikipedia dataviz visualisation mapping globes rob-scanlan graphics html5 animation tron-legacy tron movies)

  • CockroachDB

    a distributed key/value datastore which supports ACID transactional semantics and versioned values as first-class features. The primary design goal is global consistency and survivability, hence the name. Cockroach aims to tolerate disk, machine, rack, and even datacenter failures with minimal latency disruption and no manual intervention. Cockroach nodes are symmetric; a design goal is one binary with minimal configuration and no required auxiliary services. Cockroach implements a single, monolithic sorted map from key to value where both keys and values are byte strings (not unicode). Cockroach scales linearly (theoretically up to 4 exabytes (4E) of logical data). The map is composed of one or more ranges and each range is backed by data stored in RocksDB (a variant of LevelDB), and is replicated to a total of three or more cockroach servers. Ranges are defined by start and end keys. Ranges are merged and split to maintain total byte size within a globally configurable min/max size interval. Range sizes default to target 64M in order to facilitate quick splits and merges and to distribute load at hotspots within a key range. Range replicas are intended to be located in disparate datacenters for survivability (e.g. { US-East, US-West, Japan }, { Ireland, US-East, US-West}, { Ireland, US-East, US-West, Japan, Australia }). Single mutations to ranges are mediated via an instance of a distributed consensus algorithm to ensure consistency. We’ve chosen to use the Raft consensus algorithm. All consensus state is stored in RocksDB. A single logical mutation may affect multiple key/value pairs. Logical mutations have ACID transactional semantics. If all keys affected by a logical mutation fall within the same range, atomicity and consistency are guaranteed by Raft; this is the fast commit path. Otherwise, a non-locking distributed commit protocol is employed between affected ranges. Cockroach provides snapshot isolation (SI) and serializable snapshot isolation (SSI) semantics, allowing externally consistent, lock-free reads and writes--both from an historical snapshot timestamp and from the current wall clock time. SI provides lock-free reads and writes but still allows write skew. SSI eliminates write skew, but introduces a performance hit in the case of a contentious system. SSI is the default isolation; clients must consciously decide to trade correctness for performance. Cockroach implements a limited form of linearalizability, providing ordering for any observer or chain of observers.
    This looks nifty. One to watch.

    (tags: cockroachdb databases storage georeplication raft consensus acid go key-value-stores rocksdb)

  • Tuning LevelDB

    good docs from Riak

    (tags: leveldb tuning performance ops riak)

  • Proof of burn - Bitcoin

    method for bootstrapping one cryptocurrency off of another. The idea is that miners should show proof that they burned some coins - that is, sent them to a verifiably unspendable address. This is expensive from their individual point of view, just like proof of work; but it consumes no resources other than the burned underlying asset. To date, all proof of burn cryptocurrencies work by burning proof-of-work-mined cryptocurrencies, so the ultimate source of scarcity remains the proof-of-work-mined "fuel".

    (tags: bitcoin proof money mining cryptocurrency)

  • The programming error that cost Mt Gox 2609 bitcoins

    Digging into broken Bitcoin scripts in the blockchain. Fascinating:

    While analyzing coinbase transactions, I came across another interesting bug that lost bitcoins. Some transactions have the meaningless and unredeemable script: OP_IFDUP OP_IF OP_2SWAP OP_VERIFY OP_2OVER OP_DEPTH That script turns out to be the ASCII text script. Instead of putting the redemption script into the transaction, the P2Pool miners accidentally put in the literal word "script". The associated bitcoins are lost forever due to this error.
    (via Nelson)

    (tags: programming script coding bitcoin mtgox via:nelson scripting dsls)

  • Moquette MQTT

    a Java implementation of an MQTT 3.1 broker. Its code base is small. At its core, Moquette is an events processor; this lets the code base be simple, avoiding thread sharing issues. The Moquette broker is lightweight and easy to understand so it could be embedded in other projects.

    (tags: mqtt moquette netty messaging queueing push-notifications iot internet push eclipse)

  • "Taking the hotdog"

    aka. lock acquisition. ex-Amazon-Dublin lingo, observed in the wild ;)

    (tags: language hotdog archie-mcphee amazon dublin intercom coding locks synchronization)

Links for 2014-05-27

Links for 2014-05-26

Links for 2014-05-23

  • BPF - the forgotten bytecode

    'In essence Tcpdump asks the kernel to execute a BPF program within the kernel context. This might sound risky, but actually isn't. Before executing the BPF bytecode kernel ensures that it's safe: * All the jumps are only forward, which guarantees that there aren't any loops in the BPF program. Therefore it must terminate. * All instructions, especially memory reads are valid and within range. * The single BPF program has less than 4096 instructions. All this guarantees that the BPF programs executed within kernel context will run fast and will never infinitely loop. That means the BPF programs are not Turing complete, but in practice they are expressive enough for the job and deal with packet filtering very well.' Good example of a carefully-designed DSL allowing safe "programs" to be written and executed in a privileged context without security risk, or risk of running out of control.

    (tags: coding dsl security via:oisin linux tcpdump bpf bsd kernel turing-complete configuration languages)

  • Handmade Kitchen Goods from Makers & Brothers - Cool Hunting

    lovely kitchen-gear design from local-boys-made-good Makers & Brothers

    (tags: makers-and-brothers design crafts kitchen nyc terrazo chopping-boards)

Links for 2014-05-22

  • 'Monitoring and detecting causes of failures of network paths', US patent 8,661,295 (B1)

    The first software patent in my name -- couldn't avoid it forever :(

    Systems and methods are provided for monitoring and detecting causes of failures of network paths. The system collects performance information from a plurality of nodes and links in a network, aggregates the collected performance information across paths in the network, processes the aggregated performance information for detecting failures on the paths, analyzes each of the detected failures to determine at least one root cause, and initiates a remedial workflow for the at least one root cause determined. In some aspects, processing the aggregated information may include performing a statistical regression analysis or otherwise solving a set of equations for the performance indications on each of a plurality of paths. In another aspect, the system may also include an interface which makes available for display one or more of the network topology, the collected and aggregated performance information, and indications of the detected failures in the topology.
    The patent describes an early version of Pimms, the network failure detection and remediation system we built for Amazon.

    (tags: amazon pimms swpats patents networking ospf autoremediation outage-detection)

Links for 2014-05-16

Links for 2014-05-14

Links for 2014-05-13

Links for 2014-05-12

Links for 2014-05-09

Links for 2014-05-08

Links for 2014-05-07

Links for 2014-05-06

  • Minimum Viable Block Chain

    Ilya Grigorik describes the design of the Bitcoin/altcoin block chain algorithm. Illuminating writeup

    (tags: algorithms bitcoin security crypto blockchain ilya-grigorik)

  • Docker Plugin for Jenkins

    The aim of the docker plugin is to be able to use a docker host to dynamically provision a slave, run a single build, then tear-down that slave. Optionally, the container can be committed, so that (for example) manual QA could be performed by the container being imported into a local docker provider, and run from there.
    The holy grail of Jenkins/Docker integration. How cool is that...

    (tags: jenkins docker ops testing ec2 hosting scaling elastic-scaling system-testing)

  • Simple Binary Encoding

    an OSI layer 6 presentation for encoding/decoding messages in binary format to support low-latency applications. [...] SBE follows a number of design principles to achieve this goal. By adhering to these design principles sometimes means features available in other codecs will not being offered. For example, many codecs allow strings to be encoded at any field position in a message; SBE only allows variable length fields, such as strings, as fields grouped at the end of a message. The SBE reference implementation consists of a compiler that takes a message schema as input and then generates language specific stubs. The stubs are used to directly encode and decode messages from buffers. The SBE tool can also generate a binary representation of the schema that can be used for the on-the-fly decoding of messages in a dynamic environment, such as for a log viewer or network sniffer. The design principles drive the implementation of a codec that ensures messages are streamed through memory without backtracking, copying, or unnecessary allocation. Memory access patterns should not be underestimated in the design of a high-performance application. Low-latency systems in any language especially need to consider all allocation to avoid the resulting issues in reclamation. This applies for both managed runtime and native languages. SBE is totally allocation free in all three language implementations. The end result of applying these design principles is a codec that has ~25X greater throughput than Google Protocol Buffers (GPB) with very low and predictable latency. This has been observed in micro-benchmarks and real-world application use. A typical market data message can be encoded, or decoded, in ~25ns compared to ~1000ns for the same message with GPB on the same hardware. XML and FIX tag value messages are orders of magnitude slower again. The sweet spot for SBE is as a codec for structured data that is mostly fixed size fields which are numbers, bitsets, enums, and arrays. While it does work for strings and blobs, many my find some of the restrictions a usability issue. These users would be better off with another codec more suited to string encoding.

    (tags: sbe encoding protobuf protocol-buffers json messages messaging binary formats low-latency martin-thompson xml)

  • Observations of an Internet Middleman

    That leaves the remaining six [consumer ISPs peering with Level3] with congestion on almost all of the interconnect ports between us. Congestion that is permanent, has been in place for well over a year and where our peer refuses to augment capacity. They are deliberately harming the service they deliver to their paying customers. They are not allowing us to fulfil the requests their customers make for content. Five of those congested peers are in the United States and one is in Europe. There are none in any other part of the world. All six are large Broadband consumer networks with a dominant or exclusive market share in their local market. In countries or markets where consumers have multiple Broadband choices (like the UK) there are no congested peers.
    Amazing that L3 are happy to publish this -- that's where big monopoly ISPs have led their industry.

    (tags: net-neutrality networking internet level3 congestion isps us-politics)

  • interview with Google VP of SRE Ben Treynor

    interviewed by Niall Murphy, no less ;). Some good info on what Google deems important from an ops/SRE perspective

    (tags: sre ops devops google monitoring interviews ben-treynor)

Links for 2014-05-02

  • Faster BAM Sorting with SAMtools and RocksDB

    Now this is really really clever. Heap-merging a heavyweight genomics format, using RocksDB to speed it up.

    There’s a problem with the single-pass merge described above when the number of intermediate files, N/R, is large. Merging the sorted intermediate files in limited memory requires constantly reading little bits from all those files, incurring a lot of disk seeks on rotating drives. In fact, at some point, samtools sort performance becomes effectively bound to disk seeking. [...] In this scenario, samtools rocksort can sort the same data in much less time, using no more memory, by invoking RocksDB’s background compaction capabilities. With a few extra lines of code we configure RocksDB so that, while we’re still in the process of loading the BAM data, it runs additional background threads to merge batches of existing sorted temporary files into fewer, larger, sorted files. Just like the final merge, each background compaction requires only a modest amount of working memory.
    (via the RocksDB facebook group)

    (tags: rocksdb algorithms sorting leveldb bam samtools merging heaps compaction)

  • Coding For Life (Battery Life, That Is)

    great presentation on Android mobile battery life, and what to avoid

    (tags: presentations via:sergio android mobile battery battery-life 3g wifi gprs hardware)

  • Oisin's mobile app release checklist

    'This form is to document the testing that has been done on each app version before submitting to the App Store. For each item, indicate Yes if the testing has been done, Not Applicable if the testing does not apply (eg testing audio for an app that doesn’t play any), or No if the testing has not been done for another reason.'

    (tags: apps checklists release coding ios android mobile ohurley)

  • "A New Data Structure For Cumulative Frequency Tables"

    paper by Peter M Fenwick, 1993. 'A new method (the ‘binary indexed tree’) is presented for maintaining the cumulative frequencies which are needed to support dynamic arithmetic data compression. It is based on a decomposition of the cumulative frequencies into portions which parallel the binary representation of the index of the table element (or symbol). The operations to traverse the data structure are based on the binary coding of the index. In comparison with previous methods, the binary indexed tree is faster, using more compact data and simpler code. The access time for all operations is either constant or proportional to the logarithm of the table size. In conjunction with the compact data structure, this makes the new method particularly suitable for large symbol alphabets.' via Jakob Buchgraber, who's implementing it right now in Netty ;)

    (tags: netty frequency-tables data-structures algorithms coding binary-tree indexing compression symbol-alphabets)

Links for 2014-05-01

Links for 2014-04-30

Links for 2014-04-29

  • 'Pickles & Spores: Improving Support for Distributed Programming in Scala

    'Spores are "small units of possibly mobile functional behavior". They're a closure-like abstraction meant for use in distributed or concurrent environments. Spores provide a guarantee that the environment is effectively immutable, and safe to ship over the wire. Spores aim to give library authors some confidence in exposing functions (or, rather, spores) in public APIs for safe consumption in a distributed or concurrent environment. The first part of the talk covers a simpler variant of spores as they are proposed for inclusion in Scala 2.11. The second part of the talk briefly introduces a current research project ongoing at EPFL which leverages Scala's type system to provide type constraints that give authors finer-grained control over spore capturing semantics. What's more, these type constraints can be composed during spore composition, so library authors are effectively able to propagate expert knowledge via these composable constraints. The last part of the talk briefly covers Scala/Pickling, a fast new, open serialization framework.'

    (tags: pickling scala presentations spores closures fp immutability coding distributed distcomp serialization formats network)

  • BBC News - Microsoft 'must release' data held on Dublin server

    Messy. I can't see this lasting beyond an appeal.

    Law enforcement efforts would be seriously impeded and the burden on the government would be substantial if they had to co-ordinate with foreign governments to obtain this sort of information from internet service providers such as Microsoft and Google, Judge Francis said. In a blog post, Microsoft's deputy general counsel, David Howard, said: "A US prosecutor cannot obtain a US warrant to search someone's home located in another country, just as another country's prosecutor cannot obtain a court order in her home country to conduct a search in the United States. "We think the same rules should apply in the online world, but the government disagrees."

    (tags: microsoft regions law us-law privacy google cloud international-law surveillance)

  • Russia passes bill requiring bloggers to register with government

    A bill passed by the Russian parliament on Tuesday says that any blogger read by at least 3,000 people a day has to register with the government telecom watchdog and follow the same rules as those imposed by Russian law on mass media. These include privacy safeguards, the obligation to check all facts, silent days before elections and loose but threatening injunctions against "abetting terrorism" and "extremism."
    Russian blogging platforms have responded by changing view-counter tickers to display "2500+" as a max.

    (tags: russia blogs blogging terrorism extremism internet regulation chilling-effects censorship)

Links for 2014-04-28

Links for 2014-04-25

Links for 2014-04-24

  • Sirius by Comcast

    At Comcast, our applications need convenient, low-latency access to important reference datasets. For example, our XfinityTV websites and apps need to use entertainment-related data to serve almost every API or web request to our datacenters: information like what year Casablanca was released, or how many episodes were in Season 7 of Seinfeld, or when the next episode of the Voice will be airing (and on which channel!). We traditionally managed this information with a combination of relational databases and RESTful web services but yearned for something simpler than the ORM, HTTP client, and cache management code our developers dealt with on a daily basis. As main memory sizes on commodity servers continued to grow, however, we asked ourselves: How can we keep this reference data entirely in RAM, while ensuring it gets updated as needed and is easily accessible to application developers? The Sirius distributed system library is our answer to that question, and we're happy to announce that we've made it available as an open source project. Sirius is written in Scala and uses the Akka actor system under the covers, but is easily usable by any JVM-based language.
    Also includes a Paxos implementation with "fast follower" read-only slave replication. ASL2-licensed open source. The only thing I can spot to be worried about is speed of startup; they note that apps need to replay a log at startup to rebuild state, which can be slow if unoptimized in my experience. Update: in a twitter conversation at https://twitter.com/jon_moore/status/459363751893139456 , Jon Moore indicated they haven't had problems with this even with 'datasets consuming 10-20GB of heap', and have 'benchmarked a 5-node Sirius ingest cluster up to 1k updates/sec write throughput.' That's pretty solid!

    (tags: open-source comcast paxos replication read-only datastores storage memory memcached redis sirius scala akka jvm libraries)

  • AWS Elastic Beanstalk for Docker

    This is pretty amazing. nice work, Beanstalk team. not sure how well it integrates with the rest of AWS though

    (tags: aws amazon docker ec2 beanstalk ops containers linux)

  • TDD is dead. Long live testing

    Oh god. I agree with DHH. shoot me now.

    Test-first units leads to an overly complex web of intermediary objects and indirection in order to avoid doing anything that's "slow". Like hitting the database. Or file IO. Or going through the browser to test the whole system. It's given birth to some truly horrendous monstrosities of architecture. A dense jungle of service objects, command patterns, and worse. I rarely unit test in the traditional sense of the word, where all dependencies are mocked out, and thousands of tests can close in seconds. It just hasn't been a useful way of dealing with the testing of Rails applications. I test active record models directly, letting them hit the database, and through the use of fixtures. Then layered on top is currently a set of controller tests, but I'd much rather replace those with even higher level system tests through Capybara or similar. I think that's the direction we're heading. Less emphasis on unit tests, because we're no longer doing test-first as a design practice, and more emphasis on, yes, slow, system tests.

    (tags: tdd rails testing unit-tests system-tests integration-testing ruby dhh mocks)

  • All at sea: global shipping fleet exposed to hacking threat | Reuters

    Hackers recently shut down a floating oil rig by tilting it, while another rig was so riddled with computer malware that it took 19 days to make it seaworthy again; Somali pirates help choose their targets by viewing navigational data online, prompting ships to either turn off their navigational devices, or fake the data so it looks like they're somewhere else; and hackers infiltrated computers connected to the Belgian port of Antwerp, located specific containers, made off with their smuggled drugs and deleted the records.
    (via Mikko Hypponen)

    (tags: via:mikko security hacking oilrigs shipping ships maritime antwerp piracy malware)

  • Search Results - (Author:Thomas H Mason)

    Photographs taken by my great-grandfather, Thomas H. Mason, in the National Library of Ireland's newly-digitized online collection

    (tags: family thomas-h-mason history ireland photography archive nli)

  • Syria's lethal Facebook checkpoints

    An anonymous tip from a highly reliable source: "There are checkpoints in Syria where your Facebook is checked for affiliation with the rebellious groups or individuals aligned with the rebellion. People are then disappeared or killed if they are found to be connected. Drivers are literally forced to load their Facebook/Twitter accounts and then they are riffled through. It's happening daily, and has been for a year at least."

    (tags: boing-boing war facebook social-media twitter internet checkpoints syria)

Links for 2014-04-22

Links for 2014-04-18

  • Consul

    Nice-looking new tool from Hashicorp; service discovery and configuration service, built on Raft for leader election, Serf for gossip-based messaging, and Go. Some features: * Gossip is performed over both TCP and UDP; * gossip messages are encrypted symmetrically and therefore secure from eavesdropping, tampering, spoofing and packet corruption (like the incident which brought down S3 for days: http://status.aws.amazon.com/s3-20080720.html ); * exposes both a HTTP interface and (even better) DNS; * includes explicit support for long-distance WAN operation as well as on LANs. It all looks very practical and usable. MPL-licensed. The only potential risk I can see is that expecting to receive config updates from a blocking poll of the HTTP interface needs some good "best practice" docs, to ensure that people don't mishandle the scenario where there is a network partition between your calling code and the Consul server/agent. Without any heartbeating protocol behind the scenes, HTTP is vulnerable to "hung connections" which would result in a config change being silently missed by the client until the connection eventually is timed out, either by the calling code or the client-side kernel. This could potentially take minutes to occur, which in some usage scenarios could be a big, unforeseen problem.

    (tags: configuration service-discovery distcomp raft consensus-algorithms go mpl open-source dns http gossip-protocol hashicorp)

Links for 2014-04-17

  • Druid | How We Scaled HyperLogLog: Three Real-World Optimizations

    3 optimizations Druid.io have made to the HLL algorithm to scale it up for production use in Metamarkets: compacting registers (fixes a bug with unions of multiple HLLs); a sparse storage format (to optimize space); faster lookups using a lookup table.

    (tags: druid.io metamarkets scaling hyperloglog hll algorithms performance optimization counting estimation)

  • HyperLogLog - Intersection Arithmetic

    'In general HLL intersection in StreamLib works.  |A INTERSECT B| = |A| + |B| - |A UNION B|.  Timon's article on intersection is important to read though.  The usefulness of HLL intersection depends on the features of the HLLs you are intersecting.'

    (tags: hyperloglog hll hyperloglogplus streamlib intersections sets estimation algorithms)

  • Structural Integrity | 99% Invisible

    'The student (who has since been lost to history) was studying Citicorp Center as part of his thesis and had found that the building was particularly vulnerable to quartering winds (winds that strike the building at its corners). Normally, buildings are strongest at their corners, and it’s the perpendicular winds (winds that strike the building at its face) that cause the greatest strain. But this was not a normal building. LeMessurier had accounted for the perpendicular winds, but not the quartering winds. He checked the math, and found that the student was right. He compared what velocity winds the building could withstand with weather data, and found that a storm strong enough to topple Citicorp Center hits New York City every 55 years. But that’s only if the tuned mass damper, which keeps the building stable, is running. LeMessurier realized that a major storm could cause a blackout and render the tuned mass damper inoperable. Without the tuned mass damper, LeMessurier calculated that a storm powerful enough to take out the building his New York every sixteen years.'

    (tags: william-lemessurier architecture danger risk buildings nyc citicorp-center wind mass-dampers physics)

  • Linode announces new instance specs

    'TL;DR: SSDs + Insane network + Faster processors + Double the RAM + Hourly Billing'

    (tags: hosting linode ssd performance linux ops datacenters)

  • fcron

    Fcron is a scheduler. It aims at replacing Vixie Cron, so it implements most of its functionalities. But contrary to Vixie Cron, fcron does not need your system to be up 7 days a week, 24 hours a day : it also works well with systems which are running only occasionnally (contrary to anacrontab). In other words, fcron does both the job of Vixie Cron and anacron, but does even more and better :)) ...
    Thanks Craig!

    (tags: via:chughes cron fcron unix linux ops scheduler automation scripts)

  • Ryanair drops out of top Google flight search results after website overhaul | Business | theguardian.com

    They've done the classic website-redesign screwup -- omitted redirects from the old URLs.

    Sam Silverwood-Cope, director of Intelligent Positioning, said: "They've ignored the legacy of the old Ryanair.com. It's quite startling. They are doing it just before their busiest time of the year." A change in [URLs] without proper redirects means many results found by Google now simply return error pages, he added. "Unless redirects get put in pretty soon, the position is going to get worse and worse."

    (tags: ryanair inept fail funny via:christinebohan web google search redirects)

  • Scarfolk Council

    Scarfolk is a town in North West England that did not progress beyond 1979. Instead, the entire decade of the 1970s loops ad infinitum. Here in Scarfolk, pagan rituals blend seamlessly with science; hauntology is a compulsory subject at school, and everyone must be in bed by 8pm because they are perpetually running a slight fever. "Visit Scarfolk today. Our number one priority is keeping rabies at bay." For more information please reread.

    (tags: scarfolk 1970s england history funny humour public-information pagan morbid)

  • OpenSSL Valhalla Rampage

    OpenBSD are going wild ripping out "arcane VMS hacks" in an attempt to render OpenSSL's source code comprehensible, and finding amazing horrors like this: 'Well, even if time() isn't random, your RSA private key is probably pretty random. Do not feed RSA private key information to the random subsystem as entropy. It might be fed to a pluggable random subsystem…. What were they thinking?!'

    (tags: random security openssl openbsd coding horror rsa private-keys entropy)

Links for 2014-04-16

  • "H" in cron syntax

    This is something Jenkins have come up to randomize and distribute load, in order to avoid the "thundering-herd" bug. Good call

    (tags: jenkins randomization load-balancing load thundering-herd ops capacity sleep)

  • Shared Space and other bad junction designs lead to crashes and injuries

    Just because something is "Dutch", that doesn't mean it's good. The Netherlands has many excellent examples, but you have to be very selective about what serves as a model. Cyclists fare best where their interactions with motor vehicles are limited and controlled. They fare best where infrastructure ensures that minor mistakes do not result in injuries. Anywhere that we rely upon everyone behaving perfectly but where we do not protect the most vulnerable, there will be injuries. Good design takes human nature into account and removes the causes of danger from those who are most vulnerable.
    via Tony Finch

    (tags: cycling design junctions shared-space dutch holland roads safety crashes)

  • Beefcake

    A sane Google Protocol Buffers library for Ruby. It's all about being Buf; ProtoBuf.

    (tags: protobuf google protocol-buffers ruby coding libraries gems open-source)

  • Dan Kaminsky on Heartbleed

    When I said that we expected better of OpenSSL, it’s not merely that there’s some sense that security-driven code should be of higher quality.  (OpenSSL is legendary for being considered a mess, internally.)  It’s that the number of systems that depend on it, and then expose that dependency to the outside world, are considerable.  This is security’s largest contributed dependency, but it’s not necessarily the software ecosystem’s largest dependency.  Many, maybe even more systems depend on web servers like Apache, nginx, and IIS.  We fear vulnerabilities significantly more in libz than libbz2 than libxz, because more servers will decompress untrusted gzip over bzip2 over xz.  Vulnerabilities are not always in obvious places – people underestimate just how exposed things like libxml and libcurl and libjpeg are.  And as HD Moore showed me some time ago, the embedded space is its own universe of pain, with 90’s bugs covering entire countries. If we accept that a software dependency becomes Critical Infrastructure at some level of economic dependency, the game becomes identifying those dependencies, and delivering direct technical and even financial support.  What are the one million most important lines of code that are reachable by attackers, and least covered by defenders?  (The browsers, for example, are very reachable by attackers but actually defended pretty zealously – FFMPEG public is not FFMPEG in Chrome.) Note that not all code, even in the same project, is equally exposed.    It’s tempting to say it’s a needle in a haystack.  But I promise you this:  Anybody patches Linux/net/ipv4/tcp_input.c (which handles inbound network for Linux), a hundred alerts are fired and many of them are not to individuals anyone would call friendly.  One guy, one night, patched OpenSSL.  Not enough defenders noticed, and it took Neel Mehta to do something.

    (tags: development openssl heartbleed ssl security dan-kaminsky infrastructure libraries open-source dependencies)

  • s3funnel

    'a command line tool for Amazon's Simple Storage Service (S3). Written in Python, easy_install the package to install as an egg. Supports multithreaded operations for large volumes. Put, get, or delete many items concurrently, using a fixed-size pool of threads. Built on workerpool for multithreading and boto for access to the Amazon S3 API. Unix-friendly input and output. Pipe things in, out, and all around.' MIT-licensed open source. (via Paul Dolan)

    (tags: via:pdolan s3 s3funnel tools ops aws python mit open-source)

Links for 2014-04-15

  • Hydra Takes On Hadoop

    The intuition behind Hydra is something like this, "I have a lot of data, and there are a lot of things I could try to learn about it -- so many that I'm not even sure what I want to know.” It's about the curse of dimensionality -- more dimensions means exponentially more cost for exhaustive analysis. Hydra tries to make it easy to reduce the number of dimensions, or the cost of watching them (via probabilistic data structures), to just the right point where everything runs quickly but can still answer almost any question you think you might care about.
    Code: https://github.com/addthis/hydra Getting Started blog post: https://www.addthis.com/blog/2014/02/18/getting-started-with-hydra/

    (tags: hyrda hadoop data-processing big-data trees clusters analysis)

  • Stalled SCP and Hanging TCP Connections

    a Cisco fail.

    It looks like there’s a firewall in the middle that’s doing additional TCP sequence randomisation which was a good thing, but has been fixed in all current operating systems. Unfortunately, it seems that firewall doesn’t understand TCP SACK, which when coupled with a small amount of packet loss and a stateful host firewall that blocks invalid packets results in TCP connections that stall randomly. A little digging revealed that firewall to be the Cisco Firewall Services Module on our Canterbury network border.
    (via Tony Finch)

    (tags: via:fanf cisco networking firewalls scp tcp hangs sack tcpdump)

  • Akamai's "Secure Heap" patch wasn't good enough

    'Having the private keys inaccessible is a good defense in depth move. For this patch to work you have to make sure all sensitive values are stored in the secure area, not just check that the area looks inaccessible. You can't do that by keeping the private key in the same process. A review by a security engineer would have prevented a false sense of security. A version where the private key and the calculations are in a separate process would be more secure. If you decide to write that version, I'll gladly see if I can break that too.' Akamai's response: https://blogs.akamai.com/2014/04/heartbleed-update-v3.html -- to their credit, they recognise that they need to take further action. (via Tony Finch)

    (tags: via:fanf cryptography openssl heartbleed akamai security ssl tls)

  • Shuffle Sharding

    Colm MacCarthaigh writes about a simple sharding/load-balancing algorithm which uses randomized instance selection and optional additional compartmentalization. See also: continuous hashing, and http://aphyr.com/posts/278-timelike-2-everything-fails-all-the-time

    (tags: hashing load-balancing sharding partitions dist-sys distcomp architecture coding)

  • Open Crypto Audit Project: TrueCrypt

    phase I, a source code audit by iSEC Partners, is now complete. Bruce Schneier says: "I'm still using it".

    (tags: encryption security crypto truecrypt audits source-code isec matthew-green)

  • The science of 'hangry'

    In the PNAS paper, Brad Bushman and colleagues looked at 107 couples over 21 days and found that people experiencing uncharacteristically low blood sugar were more likely to display anger toward their spouse. (The researchers measured this by having subjects stick needles into voodoo dolls representing their significant others.)

    (tags: hangry hunger food eating science health blood-sugar voodoo-dolls glucose)

  • insane ESB health and safety policy

    Where it is not possible to avoid reversing, it is ESB policy that staff driving on behalf of the company or anybody on company premises should reverse into car spaces/bays, allowing them to drive out subsequently.
    BUT WHYYYYYYYYYY

    (tags: esb health-n-safety policies crazy funny driving reversing lol safety)

Links for 2014-04-14

  • Cloudflare demonstrate Heartbleed key extraction

    from nginx. 'Based on the findings, we recommend everyone reissue + revoke their private keys.'

    (tags: security nginx heartbleed ssl tls exploits private-keys)

  • When two-factor authentication is not enough

    Fastmail.FM nearly had their domain stolen through an attack exploiting missing 2FA authentication in Gandi.

    An important lesson learned is that just because a provider has a checkbox labelled “2 factor authentication” in their feature list, the two factors may not be protecting everything – and they may not even realise that fact themselves. Security risks always come on the unexpected paths – the “off label” uses that you didn’t think about, and the subtle interaction of multiple features which are useful and correct in isolation.

    (tags: gandi 2fa fastmail authentication security mfa two-factor-authentication mail)

  • Of Money, Responsibility, and Pride

    Steve Marquess of the OpenSSL Foundation on their funding, and lack thereof:

    I stand in awe of their talent and dedication, that of Stephen Henson in particular. It takes nerves of steel to work for many years on hundreds of thousands of lines of very complex code, with every line of code you touch visible to the world, knowing that code is used by banks, firewalls, weapons systems, web sites, smart phones, industry, government, everywhere. Knowing that you’ll be ignored and unappreciated until something goes wrong. The combination of the personality to handle that kind of pressure with the relevant technical skills and experience to effectively work on such software is a rare commodity, and those who have it are likely to already be a valued, well-rewarded, and jealously guarded resource of some company or worthy cause. For those reasons OpenSSL will always be undermanned, but the present situation can and should be improved. There should be at least a half dozen full time OpenSSL team members, not just one, able to concentrate on the care and feeding of OpenSSL without having to hustle commercial work. If you’re a corporate or government decision maker in a position to do something about it, give it some thought. Please. I’m getting old and weary and I’d like to retire someday.

    (tags: funding open-source openssl heartbleed internet security money)

  • Huginn

    a system for building agents that perform automated tasks for you online. They can read the web, watch for events, and take actions on your behalf. Huginn's Agents create and consume events, propagating them along a directed event flow graph. Think of it as Yahoo! Pipes plus IFTTT on your own server. You always know who has your data. You do.
    MIT-licensed open source, built on Rails.

    (tags: ifttt automation huginn ruby rails open-source agents)

Links for 2014-04-13

Links for 2014-04-11

  • Basho LevelDB supports tiered storage

    Tiered storage is turning out to be a pretty practical trick to take advantage of SSDs:

    The justification for two types/speeds of storage arrays is simple. leveldb is extremely write intensive in its lower levels. The write intensity drops off as the level number increases. Similarly, current and frequently updated data tends to be in lower levels while archival data tends to be in higher levels. These leveldb characteristics create a desire to have faster, more expensive storage arrays for the high intensity lower levels. This branch allows the high intensity lower levels to be on expensive storage arrays while slower, less expensive storage arrays to hold the higher level data to reduce costs.

    (tags: caching tiered-storage storage ssds ebs leveldb basho patches riak iops)

  • Forbes on the skeleton crew nature of OpenSSL

    This is a great point:

    Obviously, those tending to the security protocols that support the rest of the Web need better infrastructure and more funding. “Large portions of the software infrastructure of the Internet are built and maintained by volunteers, who get little reward when their code works well but are blamed, and sometimes savagely derided, when it fails,” writes Foster in the New Yorker. [...] "money and support still tend to flow to the newest and sexiest projects, while boring but essential elements like OpenSSL limp along as volunteer efforts,” he writes. “It’s easy to take open-source software for granted, and to forget that the Internet we use every day depends in part on the freely donated work of thousands of programmers.” We need to find ways to pay for work that is currently essentially donated freely. One promising project is Bithub, from Whisper Systems, where people who make valuable contributions to open source projects are rewarded (with Bitcoin of course). But the pool of Bitcoin is still donation based. The Internet has helped create a culture of free, but what we may need to recognize is that we get what we pay for. Well-funded companies pulling critical code from open source projects for their sites should have formal fee arrangements, rather than the volunteer group simply hoping these users will pony up some Benjamins for “prominent logo placement” on a website most people had never heard of before Heartbleed.

    (tags: open-source openssl free sponsorship forbes via:karl-whelan)

Links for 2014-04-10

Links for 2014-04-09

  • MICA: A Holistic Approach To Fast In-Memory Key-Value Storage [paper]

    Very interesting new approach to building a scalable in-memory K/V store. As Rajiv Kurian notes on the mechanical-sympathy list: 'The basic idea is that each core is responsible for a portion of the key-space and requests are forwarded to the right core, avoiding multiple-writer scenarios. This is opposed to designs like memcache which uses locks and shared memory. Some of the things I found interesting: The single writer design is taken to an extreme. Clients assist the partitioning of requests, by calculating hashes before submitting GET requests. It uses Intel DPDK instead of sockets to forward packets to the right core, without processing the packet on any core. Each core is paired with a dedicated RX/TX queue. The design for a lossy cache is simple but interesting. It does things like replacing a hash slot (instead of chaining) etc. to take advantage of the lossy nature of caches. There is a lossless design too. A bunch of tricks to optimize for memory performance. This includes pre-allocation, design of the hash indexes, prefetching tricks etc. There are some other concurrency tricks that were interesting. Handling dangling pointers was one of them.' Source code here: https://github.com/efficient/mica

    (tags: mica in-memory memory ram key-value-stores storage smp dpdk multicore memcached concurrency)

  • Google's Open Bidder stack moving from Jetty to Netty

    Open Bidder traditionally used Jetty as an embedded webserver, for the critical tasks of accepting connections, processing HTTP requests, managing service threads, etc. Jetty is a robust, but traditional stack that carries the weight and tradeoffs of Servlet’s 15 years old design. For a maximum performance RTB agent that must combine very large request concurrency with very low latencies, and often benefit also from low-level control over the transport, memory management and other issue, a different webserver stack was required. Open Bidder now supports Netty, an asynchronous, event-driven, high-performance webserver stack. For existing code, the most important impact is that Netty is not compatible with the Servlet API. Its own internal APIs are often too low-level, not to mention proprietary to Netty; so Open Bidder v0.5 introduces some new, stack-neutral APIs for things like HTTP requests and responses, cookies, request handlers, and even simple HTML templating based on Mustache. These APIs will work with both Netty and Jetty. This means you don’t need to change any code to switch between Jetty and Netty; on the other hand, it also means that existing code written for Open Bidder 0.4 may need some changes even if you plan to keep using Jetty. [....] Netty's superior efficiency is very significant; it supports 50% more traffic in the same hardware, and it maintains a perfect latency distribution even at the peak of its supported load.
    This doc is noteworthy on a couple of grounds: 1. the use of Netty in a public API/library, and the additional layer in place to add a friendlier API on top of that. I hope they might consider releasing that part as OSS at some point. 2. I also find it interesting that their API uses protobufs to marshal the message, and they plan in a future release to serialize those to JSON documents -- that makes a lot of sense.

    (tags: apis google protobufs json documents interoperability netty jetty servlets performance java)

  • The University Times: TCD Provost Under Pressure To “Re-think” Identity Initiative

    Students, staff and alumni put pressure on Provost to reconsider changes to Trinity College Dublin's name and coat of arms.

    alumni scholars from 2004 and 1994 who had been invited back for the dinner shouted ‘Dublin’ after the Provost welcomed them back to “Trinity College”.

    (tags: tcd tcuod rebranding fail identity dublin)

  • Daring Fireball: Rethinking What We Mean by 'Mobile Web'

    We shouldn’t think of “the web” as only what renders in web browsers. We should think of the web as anything transmitted using HTTP and HTTPS. Apps and websites are peers, not competitors. They’re all just clients to the same services.
    +1. Finally, a Daring Fireball post I agree with! ;)

    (tags: daring-fireball apps web http https mobile apple android browsers)

Links for 2014-04-08

Links for 2014-04-07

Links for 2014-04-05

Links for 2014-04-03

Links for 2014-04-02

Links for 2014-04-01

Links for 2014-03-31

Links for 2014-03-28

  • "They Know Everything We Do"

    [via Boing Boing:] A new, exhaustive report from Human Rights Watch details the way the young state of modern Ethiopia has become a kind of pilot program for the abuse of "off-the-shelf" surveillance, availing itself of commercial products from the US, the UK, France, Italy and China in order to establish an abusive surveillance regime that violates human rights and suppresses legitimate political opposition under the guise of a anti-terrorism law that's so broadly interpreted as to be meaningless. The 137 page report [from Human Rights Watch] details the technologies the Ethiopian government has acquired from several countries and uses to facilitate surveillance of perceived political opponents inside the country and among the diaspora. The government’s surveillance practices violate the rights to freedom of expression, association, and access to information. The government’s monopoly over all mobile and Internet services through its sole, state-owned telecom operator, Ethio Telecom, facilitates abuse of surveillance powers.

    (tags: human-rights surveillance ethiopia spying off-the-shelf spyware big-brother hrw human-rights-watch)

Links for 2014-03-26

  • Chinese cops cuff 1,500 in fake base station spam raid

    The street finds its own uses for things, in this case Stinger/IMSI-catcher-type fake mobile-phone base stations:

    Fake base stations are becoming a particularly popular modus operandi. Often concealed in a van or car, they are driven through city streets to spread their messages. The professional spammer in question charged 1,000 yuan (£100) to spam thousands of users in a radius of a few hundred metres. The pseudo-base station used could send out around 6,000 messages in just half an hour, the report said. Often such spammers are hired by local businessmen to promote their wares.
    (via Bernard Tyers)

    (tags: stingers imsi-catcher mobile-phones mobile cellphones china spam via:bernard-tyers)

  • TJ McIntyre on the incredible surveillance of telephone traffic at various Garda stations around the country

    The most grave issue is that each recording likely amounted to a serious criminal offence. Under Irish law, the recording of a telephone conversation on a public network without the consent of at least one party to the call amounts to an "interception", a criminal offence carrying a possible term of imprisonment of up to five years. [...] Consequently, unless gardai were notified that their calls might be recorded then a large number of criminal offences are likely to have been committed by and within the Garda Siochana itself.

    (tags: gubu surveillance gardai ags tjmcintyre bugging tapping phones ireland politics)

  • rr

    A cool-looking new debugging tool for C/C++ from Mozilla.

    Many, many people have noticed that if we had a way to reliably record program execution and replay it later, with the ability to debug the replay, we could largely tame the nondeterminism problem. This would also allow us to deliberately introduce nondeterminism so tests can explore more of the possible execution space, without impacting debuggability. Many record and replay systems have been built in pursuit of this vision. (I built one myself.) For various reasons these systems have not seen wide adoption. So, a few years ago we at Mozilla started a project to create a new record-and-replay tool that would overcome the obstacles blocking adoption. We call this tool rr.
    Low runtime overhead; easy deployability; targeted at 32-bit (?!) Linux; OSS. (via Bryan O'Sullivan)

    (tags: via:bos mozilla debugging coding firefox rr record replay gdb c++ linux)

  • Ask AIB - Boards.ie

    AIB now have a dedicated customer-support forum on Boards.ie. That is a *great* idea

    (tags: aib banking support forums boards.ie banks)

Links for 2014-03-25

  • Microservices and nanoservices

    A great reaction to Martin Fowler's "microservices" coinage, from Arnon Rotem-Gal-Oz: 'I guess it is easier to use a new name (Microservices) rather than say that this is what SOA actually meant'; 'these are the very principles of SOA before vendors does pushed the [ESB] in the middle.' Others have also chosen to define microservices slightly differently, as a service written in 10-100 LOC. Arnon's reaction: “Nanoservice is an antipattern where a service is too fine-grained. A nanoservice is a service whose overhead (communications, maintenance, and so on) outweighs its utility.” Having dealt with maintaining an over-fine-grained SOA stack in Amazon, I can only agree with this definition; it's easy to make things too fine-grained and create a raft of distributed-computing bugs and deployment/management complexity where there is no need to do so.

    (tags: architecture antipatterns nanoservices microservices soa services design esb)

  • Accidentally Turing-Complete

    slightly ruined by the inclusion of some "deliberately Turing-complete" systems

    (tags: turing computation software via:jwz turing-complete accidents automatons)

Links for 2014-03-24

Links for 2014-03-21

  • Microsoft "Scroogles" Itself

    'Microsoft went through a blogger’s private Hotmail account in order to trace the identity of a source who allegedly leaked trade secrets.' Bear in mind that the alleged violation which MS allege allows them to read their email was a breach of the terms of service, which also include distribution of content which 'incites, advocates, or expresses pornography, obscenity, vulgarity, [or] profanity'. So no dirty jokes on Hotmail!

    (tags: hotmail fail scroogled microsoft stupid tos law privacy data-protection trade-secrets ip)

  • Theresa May warns Yahoo that its move to Dublin is a security worry

    Y! is moving to Dublin to evade GCHQ spying on its users. And what is the UK response?

    "There are concerns in the Home Office about how Ripa will apply to Yahoo once it has moved its headquarters to Dublin," said a Whitehall source. "The home secretary asked to see officials from Yahoo because in Dublin they don't have equivalent laws to Ripa. This could particularly affect investigations led by Scotland Yard and the national crime agency. They regard this as a very serious issue."
    There's priorities for you!

    (tags: ripa gchq guardian uk privacy data-protection ireland dublin london spying surveillance yahoo)

  • A Look At Airbnb’s Irish Pub-Inspired Office In Dublin - DesignTAXI.com

    Very nice, Airbnb!

    (tags: airbnb design offices work pubs ireland dublin)

  • Internet Tolls And The Case For Strong Net Neutrality

    Netflix CEO Reed Hastings blogs about the need for Net Neutrality:

    Interestingly, there is one special case where no-fee interconnection is embraced by the big ISPs -- when they are connecting among themselves. They argue this is because roughly the same amount of data comes and goes between their networks. But when we ask them if we too would qualify for no-fee interconnect if we changed our service to upload as much data as we download** -- thus filling their upstream networks and nearly doubling our total traffic -- there is an uncomfortable silence. That's because the ISP argument isn't sensible. Big ISPs aren't paying money to services like online backup that generate more upstream than downstream traffic. Data direction, in other words, has nothing to do with costs. ISPs around the world are investing in high-speed Internet and most already practice strong net neutrality. With strong net neutrality, new services requiring high-speed Internet can emerge and become popular, spurring even more demand for the lucrative high-speed packages ISPs offer. With strong net neutrality, everyone avoids the kind of brinkmanship over blackouts that plague the cable industry and harms consumers. As the Wall Street Journal chart shows, we're already getting to the brownout stage. Consumers deserve better.

    (tags: consumer net-neutrality comcast netflix protectionism cartels isps us congestion capacity)

  • Micro jitter, busy waiting and binding CPUs

    pinning threads to CPUs to reduce jitter and latency. Lots of graphs and measurements from Peter Lawrey

    (tags: pinning threads performance latency jitter tuning)

  • The Day Today - Pool Supervisor - YouTube

    "in 1979, no-one died. in 1980, some one died. in 1981, no-one died. in 1982, no-one died. ... I could go on"

    (tags: the-day-today no-one-died safety pool supervisor tricky-word-puzzles funny humour classic video)

  • The colossal arrogance of Newsweek’s Bitcoin “scoop” | Ars Technica

    Many aspects of the story already look like a caricature of journalism gone awry. The man Goodman fingered as being worth $400 million or more is just as modest as his house suggests. He’s had a stroke and struggles with other health issues. Unemployed since 2001, he strives to take care of basic needs for himself and his 93-year-old mother, according to a reddit post by his brother Arthur Nakamoto (whom Goodman quoted as calling his brother an “asshole”). If Goodman has mystery evidence supporting the Dorian Nakamoto theory, it should have been revealed days ago. Otherwise, Newsweek and Goodman are delaying an inevitable comeuppance and doubling down on past mistakes. Nakamoto’s multiple denials on the record have changed the dynamic of the story. Standing by the story, at this point, is an attack on him and his credibility. The Dorian Nakamoto story is a “Dewey beats Truman” moment for the Internet age, with all of the hubris and none of the humor. It shouldn’t be allowed to end in the mists of “he said, she said.” Whether or not a lawsuit gets filed, Nakamoto v. Newsweek faces an imminent verdict in the court of public opinion: either the man is lying or the magazine is wrong.

    (tags: dorian-nakamoto newsweek journalism bitcoin privacy satoshi-nakamoto)

  • Papa's Maze | spoon & tamago

    While going through her papa's old belongings, a young girl discovered something incredible - a mind-bogglingly intricate maze that her father had drawn by hand 30 years ago. While working as a school janitor it had taken him 7 years to produce the piece, only for it to be forgotten about... until now.
    34" x 24" print, $40

    (tags: mazes art prints weird papas-maze japan)

  • Continuous Delivery with ETL Systems

    Lonely Planet and Dr Foster Intelligence both make heavy use of ETL in their products, and both organisations have applied the principles of Continuous Delivery to their delivery process. Some of the Continuous Delivery norms need to be adapted in the context of ETL, and some interesting patterns emerge, such as running Continuous Integration against data, as well as code.

    (tags: etl video presentations lonely-planet dr-foster-intelligence continuous-delivery deployment pipelines)

  • The MtGox 500

    'On March 9th a group posted a data leak, which included the trading history of all MtGox users from April 2011 to November 2013. The graphs below explore the trade behaviors of the 500 highest volume MtGox users from the leaked data set. These are the Bitcoin barons, wealthy speculators, dueling algorithms, greater fools, and many more who took bitcoin to the moon.'

    (tags: dataviz stamen bitcoin data leaks mtgox greater-fools)

  • What We Know 2/5/14: The Mt. Chiliad Mystery

    hats off to Rockstar -- GTA V has a great mystery mural with clues dotted throughout the game, and it's as-yet unsolved

    (tags: mysteries gaming via:hilary_w games gta gta-v rockstar mount-chiliad ufos)

  • Make Your Own 3-D Printer Filament From Old Milk Jugs

    Creating your own 3-D printer filament from old used milk jugs is exponentially cheaper, and uses considerably less energy, than buying new filament, according to new research from Michigan Technological University. [...] The savings are really quite impressive — 99 cents on the dollar, in addition to the reduced use of energy. Interestingly (but again not surprisingly), the amount of energy used to ‘recycle’ the old milk jugs yourself is considerably less than that used in recycling such jugs conventionally.

    (tags: recycling 3d-printers printing tech plastic milk)

Links for 2014-03-20

Links for 2014-03-19

  • No, Nate, brogrammers may not be macho, but that’s not all there is to it

    Great essay on sexism in tech, "brogrammer" culture, "clubhouse chemistry", outsiders, wierd nerds and exclusion:

    Every group, including the excluded and disadvantaged, create cultural capital and behave in ways that simultaneously create a sense of belonging for them in their existing social circle while also potentially denying them entry into another one, often at the expense of economic capital. It’s easy to see that wearing baggy, sagging pants to a job interview, or having large and visible tattoos in a corporate setting, might limit someone’s access. These are some of the markers of belonging used in social groups that are often denied opportunities. By embracing these markers, members of the group create real barriers to acceptance outside their circle even as they deepen their peer relationships. The group chooses to adopt values that are rejected by the society that’s rejecting them. And that’s what happens to “weird nerd” men as well—they create ways of being that allow for internal bonding against a largely exclusionary backdrop.
    (via Bryan O'Sullivan)

    (tags: nerds outsiders exclusion society nate-silver brogrammers sexism racism tech culture silicon-valley essays via:bos31337)

  • Impact of large primitive arrays (BLOBS) on JVM Garbage Collection

    some nice graphs and data on CMS performance, with/without -XX:ParGCCardsPerStrideChunk

    (tags: cms java jvm performance optimization tuning off-heap-storage memory)

  • Anatomical Collages by Travis Bedel

    these are fantastic

    (tags: collage anatomy art prints)

  • htcat/htcat

    a utility to perform parallel, pipelined execution of a single HTTP GET. htcat is intended for the purpose of incantations like: htcat https://host.net/file.tar.gz | tar -zx It is tuned (and only really useful) for faster interconnects: [....] 109MB/s on a gigabit network, between an AWS EC2 instance and S3. This represents 91% use of the theoretical maximum of gigabit (119.2 MiB/s).

    (tags: go cli http file-transfer ops tools)

Links for 2014-03-18

  • Analyzing Citibike Usage

    Abe Stanway crunches the stats on Citibike usage in NYC, compared to the weather data from Wunderground.

    (tags: data correlation statistics citibike cycling nyc data-science weather)

  • NSA surveillance recording every single voice call in at least 1 country

    Storing them in a 30-day rolling buffer, allowing retrospective targeting weeks after the call. 100% of all voice calls in that country, although it's unclear which country that is

    (tags: nsa surveillance gchq telephones phone bugging)

  • S3QL

    a file system that stores all its data online using storage services like Google Storage, Amazon S3, or OpenStack. S3QL effectively provides a hard disk of dynamic, infinite capacity that can be accessed from any computer with internet access running Linux, FreeBSD or OS-X. S3QL is a standard conforming, full featured UNIX file system that is conceptually indistinguishable from any local file system. Furthermore, S3QL has additional features like compression, encryption, data de-duplication, immutable trees and snapshotting which make it especially suitable for online backup and archival.

    (tags: s3 s3ql backup aws filesystems linux freebsd osx ops)

  • What's New in Java 8

    good explanation of all the new features -- I'm really looking forward to fixing up all the crappy over-verbose interface-as-lambdas we have scattered throughout our code

    (tags: java java8 lambdas fp functional-programming currying joda-time)

  • FM-index

    a compressed full-text substring index based on the Burrows-Wheeler transform, with some similarities to the suffix array. It was created by Paolo Ferragina and Giovanni Manzini,[1] who describe it as an opportunistic data structure as it allows compression of the input text while still permitting fast substring queries. The name stands for 'Full-text index in Minute space'. It can be used to efficiently find the number of occurrences of a pattern within the compressed text, as well as locate the position of each occurrence. Both the query time and storage space requirements are sublinear with respect to the size of the input data.
    kragen notes 'gene sequencing is using [them] in production'.

    (tags: sequencing bioinformatics algorithms bowtie fm-index indexing compression search burrows-wheeler bwt full-text-search)

Links for 2014-03-14

  • Health privacy: formal complaint to ICO

    'Light Blue Touchpaper' notes:

    Three NGOs have lodged a formal complaint to the Information Commissioner about the fact that PA Consulting uploaded over a decade of UK hospital records to a US-based cloud service. This appears to have involved serious breaches of the UK Data Protection Act 1998 and of multiple NHS regulations about the security of personal health information.
    Let's see if ICO can ever do anything useful.... not holding my breath

    (tags: ico privacy data-protection dpa nhs health data ross-anderson)

  • Why Google Flu Trends Can't Track the Flu (Yet)

    It's admittedly hard for outsiders to analyze Google Flu Trends, because the company doesn't make public the specific search terms it uses as raw data, or the particular algorithm it uses to convert the frequency of these terms into flu assessments. But the researchers did their best to infer the terms by using Google Correlate, a service that allows you to look at the rates of particular search terms over time. When the researchers did this for a variety of flu-related queries over the past few years, they found that a couple key searches (those for flu treatments, and those asking how to differentiate the flu from the cold) tracked more closely with Google Flu Trends' estimates than with actual flu rates, especially when Google overestimated the prevalence of the ailment. These particular searches, it seems, could be a huge part of the inaccuracy problem. There's another good reason to suspect this might be the case. In 2011, as part of one of its regular search algorithm tweaks, Google began recommending related search terms for many queries (including listing a search for flu treatments after someone Googled many flu-related terms) and in 2012, the company began providing potential diagnoses in response to symptoms in searches (including listing both "flu" and "cold" after a search that included the phrase "sore throat," for instance, perhaps prompting a user to search for how to distinguish between the two). These tweaks, the researchers argue, likely artificially drove up the rates of the searches they identified as responsible for Google's overestimates.
    via Boing Boing

    (tags: google flu trends feedback side-effects colds health google-flu-trends)

Links for 2014-03-13

Links for 2014-03-12

  • Sacked Google worker says staff ratings fixed to fit template

    Allegations of fixing to fit the stack-ranking curve: 'someone at Google always had to get a low score “of 2.9”, so the unit could match the bell curve. She said senior staff “calibrated” the ratings supplied by line managers to ensure conformity with the template and these calibrations could reduce a line manager’s assessment of an employee, in effect giving them the poisoned score of less than three.'

    (tags: stack-ranking google ireland employment work bell-curve statistics eric-schmidt)

  • Corporate Tax 2014: Irish Government's "flawed premise" on Apple's avoidance

    According to our calculation about €40bn or over 40% of Irish services exports of €90bn in 2012 and related national output, resulted from global tax avoidance schemes. It is true that Ireland gains little from tax cheating but at some point, the US tax system will be reformed and a territorial system where companies are only liable in the US on US profits, would only be viable if there was a disincentive to shift profits to non-tax or low tax countries. The risk for Ireland is that a minimum foreign tax would be introduced that would be greater than the Irish headline rate of 12.5%. It's also likely that US investment in Ireland would not have been jeopardized if Irish politicians had not been so eager as supplicants to doff the cap. Nevertheless today it would be taboo to admit the reality of participation in massive tax avoidance and the Captain Renaults of Merrion Street will continue with their version of the Dance of the Seven Veils.

    (tags: apple tax double-irish tax-avoidance google investment itax tax-evasion ireland)

  • An online Magna Carta: Berners-Lee calls for bill of rights for web

    TimBL backing the "web we want" campaign -- https://webwewant.org/

    (tags: freedom gchq nsa censorship internet privacy web-we-want human-rights timbl tim-berners-lee)

  • How the search for flight AF447 used Bayesian inference

    Via jgc, the search for the downed Air France flight was optimized using this technique: 'Metron’s approach to this search planning problem is rooted in classical Bayesian inference, which allows organization of available data with associated uncertainties and computation of the Probability Distribution Function (PDF) for target location given these data. In following this approach, the first step was to gather the available information about the location of the impact site of the aircraft. This information was sometimes contradictory and filled with ambiguities and uncertainties. Using a Bayesian approach we organized this material into consistent scenarios, quantified the uncertainties with probability distributions, weighted the relative likelihood of each scenario, and performed a simulation to produce a prior PDF for the location of the wreck.'

    (tags: metron bayes bayesian-inference machine-learning statistics via:jgc air-france disasters probability inference searching)

  • How the NSA Plans to Infect 'Millions' of Computers with Malware - The Intercept

    The implants being deployed were once reserved for a few hundred hard-to-reach targets, whose communications could not be monitored through traditional wiretaps. But the documents analyzed by The Intercept show how the NSA has aggressively accelerated its hacking initiatives in the past decade by computerizing some processes previously handled by humans. The automated system – codenamed TURBINE – is designed to “allow the current implant network to scale to large size (millions of implants) by creating a system that does automated control implants by groups instead of individually.” In a top-secret presentation, dated August 2009, the NSA describes a pre-programmed part of the covert infrastructure called the “Expert System,” which is designed to operate “like the brain.”
    Great. Automated malware deployment to millions of random victims. See also the "I hunt sysadmins" section further down...

    (tags: malware gchq nsa oversight infection expert-systems turbine false-positives the-intercept surveillance)

Links for 2014-03-11

Links for 2014-03-10

Links for 2014-03-06

Links for 2014-03-05

  • A cautionary tale about building large-scale polyglot systems

    'a fucking nightmare':

    Cascading requires a compilation step, yet since you're writing Ruby code, you get get none of the benefits of static type checking. It was standard to discover a type issue only after kicking off a job on, oh, 10 EC2 machines, only to have it fail because of a type mismatch. And user code embedded in strings would regularly fail to compile – which you again wouldn't discover until after your job was running. Each of these were bad individually, together, they were a fucking nightmare. The interaction between the code in strings and the type system was the worst of all possible worlds. No type checking, yet incredibly brittle, finicky and incomprehensible type errors at run time. I will never forget when one of my friends at Etsy was learning Cascading.JRuby and he couldn't get a type cast to work. I happened to know what would work: a triple cast. You had to cast the value to the type you wanted, not once, not twice, but THREE times.

    (tags: etsy scalding cascading adtuitive war-stories languages polyglot ruby java strong-typing jruby types hadoop)

  • It’s So Easy

    Attempting to cash out of Bitcoins turns out to be absurdly difficult:

    Trying to sell the coins in person, and basically saying he ether wants Cash, or a Cashiers check (since it can be handed over right then and there), has apparently been a hilarious clusterfuck. Today he met some guy infront of his bank, and apparently as soon as he mentioned that he needs to get the cash checked to make sure it is not counterfeit, the guy freaked out and basically walked away. Stuff like this has been happening all week, and he apparently so far has only sold a single coin of several hundred.

    (tags: bitcoin fail funny mtgox fraud cash fiat-currency via:rsynnott buttcoin)

  • Florida cops used IMSI catchers over 200 times without a warrant

    Harris is the leading maker of [IMSI catchers aka "stingrays"] in the U.S., and the ACLU has long suspected that the company has been loaning the devices to police departments throughout the state for product testing and promotional purposes. As the court document notes in the 2008 case, “the Tallahassee Police Department is not the owner of the equipment.” The ACLU now suspects these police departments may have all signed non-disclosure agreements with the vendor and used the agreement to avoid disclosing their use of the equipment to courts. “The police seem to have interpreted the agreement to bar them even from revealing their use of Stingrays to judges, who we usually rely on to provide oversight of police investigations,” the ACLU writes.

    (tags: aclu police stingrays imsi-catchers privacy cellphones mobile-phones security wired)

Links for 2014-03-04

Links for 2014-03-02

  • Answer to How many topics (queues) can be created in Apache Kafka? - Quora

    Good to know:

    'As far as I understand (this was true as of 2013, when I last looked into this issue) there's at least one Apache ZooKeeper znode per topic in Kafka. While there is no hard limitation in Kafka itself (Kafka is linearly scalable), it does mean that the maximum number of znodes comfortable supported by ZooKeeper (on the order of about ten thousand) is the upper limit of Kafka's scalability as far as the number of topics goes.'

    (tags: kafka queues zookeeper znodes architecture)

Links for 2014-03-01

  • Care.data is in chaos. It breaks my heart | Ben Goldacre

    There are people in my profession who think they can ignore this problem. Some are murmuring that this mess is like MMR, a public misunderstanding to be corrected with better PR. They are wrong: it's like nuclear power. Medical data, rarefied and condensed, presents huge power to do good, but it also presents huge risks. When leaked, it cannot be unleaked; when lost, public trust will take decades to regain. This breaks my heart. I love big medical datasets, I work on them in my day job, and I can think of a hundred life-saving uses for better ones. But patients' medical records contain secrets, and we owe them our highest protection. Where we use them – and we have used them, as researchers, for decades without a leak – this must be done safely, accountably, and transparently. New primary legislation, governing who has access to what, must be written: but that's not enough. We also need vicious penalties for anyone leaking medical records; and HSCIC needs to regain trust, by releasing all documentation on all past releases, urgently. Care.data needs to work: in medicine, data saves lives.

    (tags: hscic nhs care.data data privacy data-protection medicine hospitals pr)

Links for 2014-02-27

Links for 2014-02-23

Links for 2014-02-21

Links for 2014-02-20

Links for 2014-02-19

  • Belkin managed to put their firmware update private key in the distribution

    'The firmware updates are encrypted using GPG, which is intended to prevent this issue. Unfortunately, Belkin misuses the GPG asymmetric encryption functionality, forcing it to distribute the firmware-signing key within the WeMo firmware image. Most likely, Belkin intended to use the symmetric encryption with a signature and a shared public key ring. Attackers could leverage the current implementation to easily sign firmware images.' Using GPG to sign your firmware updates: yay. Accidentally leaving the private key in the distribution: sad trombone.

    (tags: fail wemo belkin firmware embedded-systems security updates distribution gpg crypto public-key pki home-automation ioactive)

  • Video Processing at Dropbox

    On-the-fly video transcoding during live streaming. They've done a great job of this!

    At the beginning of the development of this feature, we entertained the idea to simply pre-transcode all the videos in Dropbox to all possible target devices. Soon enough we realized that this simple approach would be too expensive at our scale, so we decided to build a system that allows us to trigger a transcoding process only upon user request and cache the results for subsequent fetches. This on-demand approach: adapts to heterogeneous devices and network conditions, is relatively cheap (everything is relative at our scale), guarantees low latency startup time.

    (tags: ffmpeg dropbox streaming video cdn ec2 hls http mp4 nginx haproxy aws h264)

Links for 2014-02-18

  • GPLv2 being tested in US court

    The case is still ongoing, so one to watch.

    Plaintiff wrote an XML parser and made it available as open source software under the GPLv2. Defendant acquired from another vendor software that included the code, and allegedly distributed that software to parties outside the organization. According to plaintiff, defendant did not comply with the conditions of the GPL, so plaintiff sued for copyright infringement. Defendants moved to dismiss for failure to state a claim. The court denied the motion.

    (tags: gpl open-source licensing software law legal via:fplogue)

  • Latest Snowden leak: GCHQ spying on Wikileaks users

    “How could targeting an entire website’s user base be necessary or proportionate?” says Gus Hosein, executive director of the London-based human rights group Privacy International. “These are innocent people who are turned into suspects based on their reading habits. Surely becoming a target of a state’s intelligence and security apparatus should require more than a mere click on a link.” The agency’s covert targeting of WikiLeaks, Hosein adds, call into question the entire legal rationale underpinning the state’s system of surveillance. “We may be tempted to see GCHQ as a rogue agency, ungoverned in its use of unprecedented powers generated by new technologies,” he says. “But GCHQ’s actions are authorized by [government] ministers. The fact that ministers are ordering the monitoring of political interests of Internet users shows a systemic failure in the rule of law."

    (tags: gchq wikileaks snowden privacy spying surveillance politics)

  • "Hackers" unsubscribed a former Mayor from concerned citizen's emails

    "The dog ate my homework, er, I mean, hackers hacked my account."

    Former Mayor of Kildare, Cllr. Michael Nolan, has denied a claim he asked a local campaigner to stop e-mailing him. Cllr. Michael Nolan from Newbridge said his site was hacked and wrong e-mails were sent out to a number of people, including Leixlip based campaigner, John Weigel. Mr. Weigel has been campaigning, along with others, about the danger of electromagnetic radiation to humans and the proximity of communications masts to homes and, in particular schools. He regularly updates local politicians on news items relating to the issue. Recently, he said that he had received an e-mail from Cllr. Nolan asking to be removed from Mr. Weigel’s e-mail list. The Leader asked Cllr. Nolan why he had done this. But the Fine Gael councillors said that “his e-mail account was hacked and on one particular day a number of mails a were sent from my account pertaining to be from me.”

    (tags: dog-ate-my-homework hackers funny kildare newbridge fine-gael michael-nolan email politics ireland excuses)

  • Making Remote Work Work

    very good, workable tips on how to remote-work effectively (both in the comments of this thread and the original article)

    (tags: tips productivity collaboration hn via:lhl remote-working telecommuting work)

  • Disgraced Scientist Granted U.S. Patent for Work Found to be Fraudulent - NYTimes.com

    Korean researcher Hwang Woo-suk electrified the science world 10 years ago with his claim that he had created the world’s first cloned human embryos and had extracted stem cells from them. But the work was later found to be fraudulent, and Dr. Hwang was fired from his university and convicted of crimes. Despite all that, Dr. Hwang has just been awarded an American patent covering the disputed work, leaving some scientists dumbfounded and providing fodder to critics who say the Patent Office is too lax. “Shocked, that’s all I can say,” said Shoukhrat Mitalipov, a professor at Oregon Health and Science University who appears to have actually accomplished what Dr. Hwang claims to have done. “I thought somebody was kidding, but I guess they were not.” Jeanne F. Loring, a stem cell scientist at the Scripps Research Institute in San Diego, said her first reaction was “You can’t patent something that doesn’t exist.” But, she said, she later realized that “you can.”

    (tags: patents absurd hwang-woo-suk cloning stem-cells science biology uspto)