Justin's Linklog – Page 49 – (Things I found interesting recently.)

Links for 2015-06-23

Published June 23, 2015

Google Cloud Platform announces new Container Registry

Yay. Sensible Docker registry pricing at last. Given the high prices, rough edges and slow performance of the other registry offerings, I’m quite happy to see this.
Google Container Registry helps make it easy for you to store your container images in a private and encrypted registry, built on Cloud Platform. Pricing for storing images in Container Registry is simple: you only pay Google Cloud Storage costs. Pushing images is free, and pulling Docker images within a Google Cloud Platform region is free (Cloud Storage egress cost when outside of a region). Container Registry is now ready for production use: * Encrypted and Authenticated – Your container images are encrypted at rest, and access is authenticated using Cloud Platform OAuth and transmitted over SSL * Fast – Container Registry is fast and can handle the demands of your application, because it is built on Cloud Storage and Cloud Networking. * Simple – If you’re using Docker, just tag your image with a gcr.io tag and push it to the registry to get started. Manage your images in the Google Developers Console. * Local – If your cluster runs in Asia or Europe, you can now store your images in ASIA or EU specific repositories using asia.gcr.io and eu.gcr.io tags.

(tags: docker registry google gcp containers cloud-storage ops deployment)
Docker at Shopify: From This-Looks-Fun to Production

Pragmatic evolution story, adding Docker as a packaging/deploy format for an existing production Capistrano/Rails fleet

(tags: docker ops deployment packaging shopify slides)
Semian

Hystrix-style Circuit Breakers and Bulkheads for Ruby/Rails, from Shopify

(tags: circuit-breaker bulkhead patterns architecture microservices shopify rails ruby networking reliability fallback fail-fast)
Brubeck, a statsd-compatible metrics aggregator – GitHub Engineering

GitHub’s statsd replacement in C

(tags: github monitoring statsd c rewrites ops metrics)

Links for 2015-06-22

Published June 22, 2015

Patrick Shuff – Building A Billion User Load Balancer – SCALE 13x – YouTube

‘Want to learn how Facebook scales their load balancing infrastructure to support more than 1.3 billion users? We will be revealing the technologies and methods we use to route and balance Facebook’s traffic. The Traffic team at Facebook has built several systems for managing and balancing our site traffic, including both a DNS load balancer and a software load balancer capable of handling several protocols. This talk will focus on these technologies and how they have helped improve user performance, manage capacity, and increase reliability.’ Can’t find the standalone slides, unfortunately.

(tags: facebook video talks lbs load-balancing http https scalability scale linux)
Codeface

a good collection of coding fonts (via Tony Finch)

(tags: via:fanf fonts coding ui)
Facebook’s Folly Futures

Finagle Futures ported to C++11

(tags: futures async c++ c++11 facebook coding callbacks threading)

Links for 2015-06-21

Published June 21, 2015

jwz on Inceptionism

“Shoggoth ovipositors”:
So then they reach inside to one of the layers and spin the knob randomly to fuck it up. Lower layers are edges and curves. Higher layers are faces, eyes and shoggoth ovipositors. [….] But the best part is not when they just glitch an image — which is a fun kind of embossing at one end, and the “extra eyes” filter at the other — but is when they take a net trained on some particular set of objects and feed it static, then zoom in, and feed the output back in repeatedly. That’s when you converge upon the platonic ideal of those objects, which — it turns out — tend to be Giger nightmare landscapes. Who knew. (I knew.)
This stuff is still boggling my mind. All those doggy faces! That is one dog-obsessed ANN.
(tags: neural-networks ai jwz funny shoggoths image-recognition hr-giger art inceptionism)

Links for 2015-06-20

Published June 20, 2015

Levenshtein automata can be simple and fast

Nice algorithm for fuzzy text search with a limited Levenshtein edit distance using a DFA

(tags: dfa algorithms levenshtein text edit-distance fuzzy-search search python)

Links for 2015-06-19

Published June 19, 2015

Discretized Streams: Fault Tolerant Stream Computing at Scale

The paper describing the innards of Spark Streaming and its RDD-based recomputation algorithm:
we use a data structure called Resilient Distributed Datasets (RDDs), which keeps data in memory and can recover it without replication by tracking the lineage graph of operations that were used to build it. With RDDs, we show that we can attain sub-second end-to-end latencies. We believe that this is sufficient for many real-world big data applications, where the timescale of the events tracked (e.g., trends in social media) is much higher.

(tags: rdd spark streaming fault-tolerance batch distcomp papers big-data scalability)
Improving testing by using real traffic from production

Gor, a very nice-looking tool to log and replay HTTP traffic, specifically designed to “tee” live traffic from production to staging for pre-release testing

(tags: gor performance testing http tcp packet-capture tests staging tee)
Git team workflows: merge or rebase?

Well-written description of the pros and cons. I’m a rebaser, fwiw. (via Darrell)

(tags: via:darrell git merging rebasing history git-log coding workflow dev teams collaboration github)
How to receive a million packets per second on Linux

To sum up, if you want a perfect performance you need to: Ensure traffic is distributed evenly across many RX queues and SO_REUSEPORT processes. In practice, the load usually is well distributed as long as there are a large number of connections (or flows). You need to have enough spare CPU capacity to actually pick up the packets from the kernel. To make the things harder, both RX queues and receiver processes should be on a single NUMA node.

(tags: linux networking performance cloudflare packets numa so_reuseport sockets udp)

Links for 2015-06-18

Published June 18, 2015

Inceptionism: Going Deeper into Neural Networks

This is amazing, and a little scary.
If we choose higher-level layers, which identify more sophisticated features in images, complex features or even whole objects tend to emerge. Again, we just start with an existing image and give it to our neural net. We ask the network: “Whatever you see there, I want more of it!” This creates a feedback loop: if a cloud looks a little bit like a bird, the network will make it look more like a bird. This in turn will make the network recognize the bird even more strongly on the next pass and so forth, until a highly detailed bird appears, seemingly out of nowhere.
An enlightening comment from the G+ thread:
This is the most fun we’ve had in the office in a while. We’ve even made some of those ‘Inceptionistic’ art pieces into giant posters. Beyond the eye candy, there is actually something deeply interesting in this line of work: neural networks have a bad reputation for being strange black boxes that that are opaque to inspection. I have never understood those charges: any other model (GMM, SVM, Random Forests) of any sufficient complexity for a real task is completely opaque for very fundamental reasons: their non-linear structure makes it hard to project back the function they represent into their input space and make sense of it. Not so with backprop, as this blog post shows eloquently: you can query the model and ask what it believes it is seeing or ‘wants’ to see simply by following gradients. This ‘guided hallucination’ technique is very powerful and the gorgeous visualizations it generates are very evocative of what’s really going on in the network.?

(tags: art machine-learning algorithm inceptionism research google neural-networks learning dreams feedback graphics)

Links for 2015-06-17

Published June 17, 2015

Apple to switch APNS protocol to HTTP/2

This is great news — the current protocol is a binary, proprietary horrorshow, particularly around error reporting. Available “later this year” in production, and Pushy plan to support it.

(tags: http2 apns pushy apple push-notifications protocols http)
Comparing the Defect Reduction Benefits of Code Inspection and Test-Driven Development

tl;dr: Code review trumps TDD alone for finding bugs. (Via Mark Dennehy)

(tags: via:markdennehy code-review coding tdd unit-tests testing papers bugs)
Evidence-Based Software Engineering

Objective: Our objective is to describe how software engineering might benefit from an evidence-based approach and to identify the potential difficulties associated with the approach. Method: We compared the organisation and technical infrastructure supporting evidence-based medicine (EBM) with the situation in software engineering. We considered the impact that factors peculiar to software engineering (i.e. the skill factor and the lifecycle factor) would have on our ability to practice evidence-based software engineering (EBSE). Results: EBSE promises a number of benefits by encouraging integration of research results with a view to supporting the needs of many different stakeholder groups. However, we do not currently have the infrastructure needed for widespread adoption of EBSE. The skill factor means software engineering experiments are vulnerable to subject and experimenter bias. The lifecycle factor means it is difficult to determine how technologies will behave once deployed. Conclusions: Software engineering would benefit from adopting what it can of the evidence approach provided that it deals with the specific problems that arise from the nature of software engineering.
(via Mark Dennehy)
(tags: papers toread via:markdennehy software coding ebse evidence-based-medicine medicine research)
Amazon offer a WhatsMyIp service as part of AWS

curl -s http://checkip.amazonaws.com/

(tags: checkip networking internet whats-my-ip ops)
Huge Loss For Free Speech In Europe: Human Rights Court Says Sites Liable For User Comments | Techdirt

The ruling is terrible through and through. First off, it insists that the comments on the news story were clearly “hate speech” and that, as such, “did not require any linguistic or legal analysis since the remarks were on their face manifestly unlawful.” To the court, this means that it’s obvious such comments should have been censored straight out. That’s troubling for a whole host of reasons at the outset, and highlights the problematic views of expressive freedom in Europe. Even worse, however, the Court then notes that freedom of expression is “interfered with” by this ruling, but it doesn’t seem to care — saying that it is deemed “necessary in a democratic society.”
This is going to have massive chilling effects. Terrible ruling from the ECHR.
(tags: echr freedom via:tjmcintyre law europe eu comments free-speech censorship hate-speech)
Shock European court decision: Websites are liable for users’ comments | Ars Technica

In the wake of this judgment, the legal situation is complicated. In an e-mail to Ars, T J McIntyre, who is a lecturer in law and Chairman of Digital Rights Ireland, the lead organization that won an important victory against EU data retention in the Court of Justice of the European Union last year, explained where things now stand. “Today’s decision doesn’t have any direct legal effect. It simply finds that Estonia’s laws on site liability aren’t incompatible with the ECHR. It doesn’t directly require any change in national or EU law. Indirectly, however, it may be influential in further development of the law in a way which undermines freedom of expression. As a decision of the Grand Chamber of the ECHR it will be given weight by other courts and by legislative bodies.”

(tags: ars-technica delfi free-speech eu echr tj-mcintyre law europe estonia)
Google Cloud Platform Blog: A look inside Google’s Data Center Networks

We used three key principles in designing our datacenter networks: We arrange our network around a Clos topology, a network configuration where a collection of smaller (cheaper) switches are arranged to provide the properties of a much larger logical switch. We use a centralized software control stack to manage thousands of switches within the data center, making them effectively act as one large fabric. We build our own software and hardware using silicon from vendors, relying less on standard Internet protocols and more on custom protocols tailored to the data center.

(tags: clos-networks google data-centers networking sdn gcp ops)
Automated Nginx Reverse Proxy for Docker

Nice hack. An automated nginx reverse proxy which regenerates as the Docker containers update

(tags: nginx reverse-proxy proxies web http ops docker)
6 Reasons Modern Movie CGI Looks Surprisingly Crappy

Spot on

(tags: color-grading teal-and-orange cgi movies film sfx jurassic-world)

Links for 2015-06-16

Published June 16, 2015

Cover Story: “Playdate” – The New Yorker

the story behind Chris Ware’s lovely Minecraft New Yorker cover

(tags: minecraft chris-ware art kids play gaming games)

Links for 2015-06-15

Published June 15, 2015

How We Moved Our API From Ruby to Go and Saved Our Sanity

Parse on their ditching-Rails story. I haven’t heard a nice thing about Ruby or Rails as an operational, production-quality platform in a long time :(

(tags: go ruby rails ops parse languages platforms)
VPC Flow Logs

we are introducing Flow Logs for the Amazon Virtual Private Cloud. Once enabled for a particular VPC, VPC subnet, or Elastic Network Interface (ENI), relevant network traffic will be logged to CloudWatch Logs for storage and analysis by your own applications or third-party tools. You can create alarms that will fire if certain types of traffic are detected; you can also create metrics to help you to identify trends and patterns. The information captured includes information about allowed and denied traffic (based on security group and network ACL rules). It also includes source and destination IP addresses, ports, the IANA protocol number, packet and byte counts, a time interval during which the flow was observed, and an action (ACCEPT or REJECT).

(tags: ec2 aws vpc logging tracing ops flow-logs network tcpdump packets packet-capture)
Tim Hunt “jokes” about women scientists. Or not. (with image, tweets) · deborahblum · Storify

‘[Tim Hunt] said that while he meant to be ironic, he did think it was hard to collaborate with women because they are too emotional – that he was trying to be honest about the problems.’ So much for the “nasty twitter took my jokes seriously” claims then.

(tags: twitter science misogyny women tim-hunt deborah-blum journalism)
Why I dislike systemd

Good post, and hard to disagree.
One of the “features” of systemd is that it allows you to boot a system without needing a shell at all. This seems like such a senseless manoeuvre that I can’t help but think of it as a knee-jerk reaction to the perception of Too Much Shell in sysv init scripts. In exactly which universe is it reasonable to assume that you have a running D-Bus service (or kdbus) and a filesystem containing unit files, all the binaries they refer to, all the libraries they link against, and all the configuration files any of them reference, but that you lack that most ubiquitous of UNIX binaries, /bin/sh?

(tags: history linux unix systemd bsd system-v init ops dbus)
Adrian Colyer reviews the Twitter Heron paper

ouch, really sounds like Storm didn’t cut the muster. ‘It’s hard to imagine something more damaging to Apache Storm than this. Having read it through, I’m left with the impression that the paper might as well have been titled “Why Storm Sucks”, which coming from Twitter themselves is quite a statement.’ If I was to summarise the lessons learned, it sounds like: backpressure is required; and multi-tenant architectures suck.

(tags: storm twitter heron big-data streaming realtime backpressure)

Links for 2015-06-14

Published June 14, 2015

Security theatre at Allied Irish Banks

Allied Irish Banks’s web and mobile banking portals are ludicrously insecure. Vast numbers of accounts have easily-guessable registration numbers and are thus ‘protected’ by a level of security that is twice as easy to crack as would be provided by a single password containing only two lowercase letters. A person of malicious intent could easily gain access to hundreds, possibly thousands, of accounts as well as completely overwhelm the branch network by locking an estimated several 100,000s of people out of their online banking. Both AIB and the Irish Financial Services Ombudsman have refused to respond meaningfully to multiple communications each in which these concerns were raised privately.

(tags: aib banking security ireland hacking ifso online-banking)
Leveraging AWS to Build a Scalable Data Pipeline

Nice detailed description of an auto-scaled SQS worker pool

(tags: sqs aws ec2 auto-scaling asg worker-pools architecture scalability)

Links for 2015-06-13

Published June 13, 2015

China’s Spies Hit the Blackmail Jackpot With Data on 4 Million Federal Workers

The Daily Beast is scathing re the OPM hack:
Here’s where things start to get scary. Whoever has OPM’s records knows an astonishing amount about millions of federal workers, members of the military, and security clearance holders. They can now target those Americans for recruitment or influence. After all, they know their vices, every last one—the gambling habit, the inability to pay bills on time, the spats with former spouses, the taste for something sexual on the side—since all that is recorded in security clearance paperwork. (To get an idea of how detailed this gets, you can see the form, called an SF86, here.) Speaking as a former counterintelligence officer, it really doesn’t get much worse than this.

(tags: daily-beast sf86 clearance us-government america china cyberwar hacking opm privacy)

Links for 2015-06-12

Published June 12, 2015

For a Good Strftime

‘Easy Skeezy Ruby Date/Time Formatting’ — or indeed anywhere else strftime() is supported

(tags: strftime time date formatting coding ruby via:oisin)
etcd Clustering in AWS

‘a fully-automated solution to build auto-scaling etcd clusters in AWS’

(tags: aws cluster docker etcd asg autoscaling ops)

Links for 2015-06-11

Published June 11, 2015

Facebook Infer

New static analysis goodnews, freshly open-sourced by Facebook:
Facebook Infer uses logic to do reasoning about a program’s execution, but reasoning at this scale — for large applications built from millions of lines of source code — is hard. Theoretically, the number of possibilities that need to be checked is more than the number of estimated atoms in the observable universe. Furthermore, at Facebook our code is not a fixed artifact but an evolving system, updated frequently and concurrently by many developers. It is not unusual to see more than a thousand modifications to our mobile code submitted for review in a given day. The requirements on the program analyzer then become even more challenging because we expect a tool to report quickly on these code modifications — in the region of 10 minutes — to fit in with developers’ workflow. Coping with this scale and velocity requires advanced mathematical techniques. Facebook Infer uses two such techniques: separation logic and bi-abduction. Separation logic is a theory that allows Facebook Infer’s analysis to reason about small, independent parts of the application storage, rather than having to consider the entirety of the memory potentially at every step. That would be a daunting task on modern processors with their large addressable virtual memories. Bi-abduction is a logical inference technique that allows Facebook Infer to discover properties about the behavior of independent parts of the application code. By storing these properties between runs, Facebook Infer needs to analyze only the parts of the software that have changed, reusing the results of its previous analysis where it can. By combining these approaches, our analyzer is able to find complex problems in modifications to an application built from millions of lines of code, in minutes.
(via Bryan O’Sullivan)
(tags: via:bos infer facebook static-analysis lint code java ios android coding bugs)
The Tamborzão Goes to Thailand

This is great. the story of how cheesy funk carioca tune “A Minha Amiga Fran” managed to become “Kawo Kawo” and become a massive hit in Thailand

(tags: thai brazil carioca music dance-music kawo-kawo)

Links for 2015-06-10

Published June 10, 2015

AV vendors still relying on MD5 to identify malware

oh dear. I can see how this happened — in many cases they may not still have samples to derive new sums from :(

(tags: md5 hashing antivirus malware security via:fanf bugs)
Google Photos – Can I get out?

what’s the export policy for Google’s new Photos service? pretty good, it turns out

(tags: google export data google-photos photos archive history storage)
A higher order estimate of the optimum checkpoint interval for restart dumps

tl;dr:
the bottom line is as follows: If the time it takes to create a dump, ? < M/2 then use ?opt = ?(2?M) – ? Otherwise (it takes longer than M/2 to create a dump), just use ?opt = M.

(tags: dumping periodic-tasks scheduling frequency maths optimal interval checkpointing)

Links for 2015-06-09

Published June 9, 2015

Dogestry

Simple CLI app for storing Docker image on Amazon S3.

(tags: dogestry registry docker s3 github)

Links for 2015-06-08

Published June 8, 2015

Testing@LMAX – Aliases

Creating a user with our DSL looks like: registrationAPI.createUser(“user”); You might expect this to create a user with the username ‘user’, but then we’d get conflicts between every test that wanted to call their user ‘user’ which would prevent tests from running safely against the same deployment of the exchange. Instead, ‘user’ is just an alias that is only meaningful while this one test is running. The DSL creates a unique username that it uses when talking to the actual system. Typically this is done by adding a postfix so the real username is still reasonably understandable e.g. user-fhoai42lfkf.
Nice approach — makes sense.
(tags: testing lmax system-tests naming coding)
Orbit Async

Orbit Async implements async-await methods in the JVM. It allows programmers to write asynchronous code in a sequential fashion. It was developed by BioWare, a division of Electronic Arts.
Open source, BSD-licensed.
(tags: async await java jvm bioware coding threading)
Who wrote this amazing, mysterious book satirizing tech startup culture?

very cool

(tags: books reading startups silicon-valley mysteries pranks san-francisco)
1172401 – Add Amazon root certificates

Well, well — looks like AWS is about to disrupt PKI, and about time too. If they come up with a Plex-style “provision a cert” API, it’ll be revolutionary

(tags: pki ssl tls amazon aws apis web-services ops)

Links for 2015-06-07

Published June 7, 2015

Vintage Illustrations for Tolkien’s The Hobbit from Around the World | Brain Pickings

including a lovely set from Tove Jansson

(tags: tove-jansson art illustration tolkien the-hobbit books via:ianmoore)
How Plex is doing HTTPS for all its users

large-scale automated TLS certificate deployment. very impressive and not easy to reproduce, good work Plex! (via Nelson)

(tags: via:nelson https ssl tls certificates pki digicert security plex)

Links for 2015-06-06

Published June 6, 2015

Tuning Java Garbage Collection for Spark Applications

So much for G1GC being fire-and-forget

(tags: g1gc gc java jvm performance spark ops tuning)

Links for 2015-06-05

Published June 5, 2015

Airflow

Airbnb’s workflow management system; works off a DAG defined in Python code (ugh). Nice UI though, but I think Pinboard’s take is neater

(tags: airbnb open-source python workflow jobs cron scheduling batch)
A Complete Taxonomy of Internet Chum – The Awl

Introducing the chumbox

(tags: chum chumbox spam ads web content)
Buck

A high-performance java build tool, from Facebook. Make-like

(tags: android build java make coding facebook)

Links for 2015-06-04

Published June 4, 2015

Twitter ditches Storm

in favour of a proprietary ground-up rewrite called Heron. Reading between the lines it sounds like Storm had problems with latency, reliability, data loss, and supporting back pressure.

(tags: analytics architecture twitter storm heron backpressure streaming realtime queueing)
Hybrid Logical Clocks

neat substitute for physical-time clocks in synchronization and ordering in a distributed system, based on Lamport’s Logical Clocks and Google’s TrueTime. ‘HLC captures the causality relationship like LC, and enables easy identification of consistent snapshots in distributed systems. Dually, HLC can be used in lieu of PT clocks since it maintains its logical clock to be always close to the PT clock.’

(tags: hlc clocks logical-clocks time synchronization ordering events logs papers algorithms truetime distcomp)
Me vs An Post

Increasingly bizarre postal address obfuscation with An Post, the Irish postal service. Example:
I have decided to see what you can post [….] My first experiment was a dice [sic] with one line of the address on each side. An Post delivered two days later. They win this round
Via JG
(tags: fun an-post post games funny tumblr via:johngilbert)
Netty’s async DNS resolver

‘Can do ~1M queries to ~3K public DNS servers within ~3 minutes with just a few threads.’ via Trustin Lee. Netty is the business

(tags: netty dns async crawlers resolver benchmarks scanning)

Links for 2015-06-03

Published June 3, 2015

Performance Testing at LMAX

Good series of blog posts on the LMAX trading platform’s performance testing strategy — they capture live traffic off the wire, then build statistical models simulating its features. See also http://epickrram.blogspot.co.uk/2014/07/performance-testing-at-lmax-part-two.html and http://epickrram.blogspot.co.uk/2014/08/performance-testing-at-lmax-part-three.html .

(tags: performance testing tests simulation latency lmax trading sniffing packet-capture)
The Violence of Algorithms: Why Big Data Is Only as Smart as Those Who Generate It

The modern state system is built on a bargain between governments and citizens. States provide collective social goods, and in turn, via a system of norms, institutions, regulations, and ethics to hold this power accountable, citizens give states legitimacy. This bargain created order and stability out of what was an increasingly chaotic global system. If algorithms represent a new ungoverned space, a hidden and potentially ever-evolving unknowable public good, then they are an affront to our democratic system, one that requires transparency and accountability in order to function. A node of power that exists outside of these bounds is a threat to the notion of collective governance itself. This, at its core, is a profoundly undemocratic notion—one that states will have to engage with seriously if they are going to remain relevant and legitimate to their digital citizenry who give them their power.

(tags: palantir algorithms big-data government democracy transparency accountability analytics surveillance war privacy protest rights)

Links for 2015-06-02

Published June 2, 2015

Dong detection in LEGO Universe

great example of how Minecraft solved the problem the easy way — by simply not making an MMO, the whole problem effectively goes away

(tags: penis funny games lego lego-universe minecraft gaming mmo ugc)
HTTP/2 is here, let’s optimize! – Velocity SC 2015 – Google Slides

Changes which server-side developers will need to start considering as HTTP/2 rolls out. Remove domain sharding; stop concatenating resources; stop inlining resources; use server push.

(tags: http2 http protocols streaming internet web dns performance)
Five different ways to handle leap seconds with NTP

Without switching to chronyd, ntpd -x sounds not too suboptimal:
With ntpd, the kernel backward step is used by default. With ntpd versions before 4.2.6, or 4.2.6 and later patched for this bug, the -x option (added to /etc/sysconfig/ntpd) can be used to disable the kernel leap second correction and ignore the leap second as far as the local clock is concerned. The one-second error gained after the leap second will be measured and corrected later by slewing in normal operation using NTP servers which already corrected their local clocks.
It’s all pretty messy though :(
(tags: ntpd ntp chronyd clocks time synchronization via:fanf linux leap-seconds)
The Agency – NYTimes.com

Russia’s troll farms. Ladies and gentlemen — the future

(tags: future abuse trolls russia trolling politics social-media twitter facebook)

Links for 2015-05-29

Published May 29, 2015

Ireland’s media silenced over MP’s speech about Denis O’Brien

this is appalling. And of course we can only find out about it from overseas media because our own media is quaking in their boots :(

(tags: media ireland he-who-cannot-be-named censorship omgwtfbbq law libel injunctions high-court)
How Ireland’s same-sex marriage referendum played out on Twitter

nice clear data there

(tags: ireland ssm marref history twitter hashtags yesequality)
murbul comments on The security issue of Blockchain.info’s Android Wallet is not about system’s entropy. It’s their own BUGs on PRNG again!

I was in the middle of writing a breakdown of what went wrong, but you’ve beat me to it. Basically, they have a LinuxSecureRandom class that’s supposed to override the standard SecureRandom. This class reads from /dev/urandom and should provide cryptographically secure random values. They also seed the generator using SecureRandom#setSeed with data pulled from random.org. With their custom SecureRandom, this is safe because it mixes the entropy using XOR, so even if the random.org data is dodgy it won’t reduce security. It’s just an added bonus. BUT! On some devices under some circumstances, the LinuxSecureRandom class doesn’t get registered. This is likely because /dev/urandom doesn’t exist or can’t be accessed for some reason. Instead of screaming bloody murder like any sensible implementation would, they just ignore that and fall back to using the standard SecureRandom. If the above happens, there’s a problem because the default implementation of SecureRandom#setSeed doesn’t mix. If you set the seed, it replaces the entropy entirely. So now the entropy is coming solely from random.org. And the final mistake: They were using HTTP instead of HTTPS to make the webservice call to random.org. On Jan 4, random.org started enforcing HTTPS and returning a 301 Permanently Moved error for HTTP – see https://www.random.org/news/. So since that date, the entropy has actually been the error message (turned into bytes) instead of the expected 256-bit number. Using that seed, SecureRandom will generate the private key for address 1Bn9ReEocMG1WEW1qYjuDrdFzEFFDCq43F 100% of the time. Ouch. This is around the time that address first appears, so the timeline matches. I haven’t had a thorough look at what they’ve replaced it with in the latest version, but initial impressions are that it’s not ideal. Not disastrous, but not good.
Always check return values; always check HTTP status codes.
(tags: bugs android fail securerandom random prng blockchain.info bitcoin http randomness entropy error-checking)
CommonMark

A strongly specified, highly compatible implementation of Markdown

(tags: reference markdown commonmark specs formatting text compatibility)
GitTorrent

‘A Decentralized GitHub’. nifty

(tags: distributed git github bittorrent bitcoin gittorrent dvcs)

Links for 2015-05-28

Published May 28, 2015

I Fooled Millions Into Thinking Chocolate Helps Weight Loss

“Slim by Chocolate!” the headlines blared. A team of German researchers had found that people on a low-carb diet lost weight 10 percent faster if they ate a chocolate bar every day. It made the front page of Bild, Europe’s largest daily newspaper, just beneath their update about the Germanwings crash. From there, it ricocheted around the internet and beyond, making news in more than 20 countries and half a dozen languages. It was discussed on television news shows. It appeared in glossy print, most recently in the June issue of Shape magazine (“Why You Must Eat Chocolate Daily”, page 128). Not only does chocolate accelerate weight loss, the study found, but it leads to healthier cholesterol levels and overall increased well-being. The Bild story quotes the study’s lead author, Johannes Bohannon, Ph.D., research director of the Institute of Diet and Health: “The best part is you can buy chocolate everywhere.” I am Johannes Bohannon, Ph.D. Well, actually my name is John, and I’m a journalist. I do have a Ph.D., but it’s in the molecular biology of bacteria, not humans. The Institute of Diet and Health? That’s nothing more than a website. Other than those fibs, the study was 100 percent authentic. My colleagues and I recruited actual human subjects in Germany. We ran an actual clinical trial, with subjects randomly assigned to different diet regimes. And the statistically significant benefits of chocolate that we reported are based on the actual data. It was, in fact, a fairly typical study for the field of diet research. Which is to say: It was terrible science. The results are meaningless, and the health claims that the media blasted out to millions of people around the world are utterly unfounded.
Interesting bit: the online commenters commenting on the published stories quickly saw through the bullshit. Why can’t the churnalising journos do that?
(tags: chocolate journalism science diet food churnalism pr bild health clinical-trials papers peer-review research)
Snake-Oil Superfoods

mainly interesting for the dataviz and the Google-Doc-driven backend. wish they published the script though

(tags: google snake-oil superfoods food dataviz bubble-race-chart graphics infographics google-docs spreadsheets)

Links for 2015-05-27

Published May 27, 2015

Three Questions to Answer When Reporting an Error

Very long, but tl;dr:
the trick to creating an effective error message is to answer the 3 Questions within your message: What is the error? What was the probable cause of the error? What is the probable remedy?

(tags: errors ui ux reporting logging coding)
Volvo says horrible ‘self-parking car accident’ happened because driver didn’t have ‘pedestrian detection’

Grim meathook future, courtesy of Volvo:
“The Volvo XC60 comes with City Safety as a standard feature however this does not include the Pedestrian detection functionality […] The pedestrian detection feature […] costs approximately $3,000.
However, there’s another lesson here, in crappy car UX and the risks thereof:
But even if it did have the feature, Larsson says the driver would have interfered with it by the way they were driving and “accelerating heavily towards the people in the video.” “The pedestrian detection would likely have been inactivated due to the driver inactivating it by intentionally and actively accelerating,” said Larsson. “Hence, the auto braking function is overrided by the driver and deactivated.” Meanwhile, the people in the video seem to ignore their instincts and trust that the car assumed to be endowed with artificial intelligence knows not to hurt them. It is a sign of our incredible faith in the power of technology, but also, it’s a reminder that companies making AI-assisted vehicles need to make safety features standard and communicate clearly when they aren’t.

(tags: self-driving-cars cars ai pedestrian computer-vision volvo fail accidents grim-meathook-future)
iPhone UTF-8 text vulnerability

‘Due to how the banner notifications process the Unicode text. The banner briefly attempts to present the incoming text and then “gives up” thus the crash’. Apparently the entire Springboard launcher crashes.

(tags: apple vulnerability iphone utf-8 unicode fail bugs springboard ios via:abetson)

Links for 2015-05-26

Published May 26, 2015

Schedule Recurring AWS Lambda Invocations With The Unreliable Town Clock (UTC)

The Unreliable Town Clock (UTC) is a new, free, public SNS Topic (Amazon Simple Notification Service) that broadcasts a “chime” message every quarter hour to all subscribers. It can send the chimes to AWS Lambda functions, SQS queues, and email addresses. You can use the chime attributes to run your code every fifteen minutes, or only run your code once an hour (e.g., when minute == “00”) or once a day (e.g., when hour == “00” and minute == “00”) or any other series of intervals. You can even subscribe a function you only want to run only once at a specific time in the future: Have the function ignore all invocations until it’s after the time it wants. When it is time, it can perform its job, then unsubscribe itself from the SNS Topic.

(tags: alestic aws lambda cron time clock periodic-tasks recurrence hacks)

Links for 2015-05-25

Published May 25, 2015

Soylent, Neoliberalism and the Politics of Life Hacking – CounterPunch: Tells the Facts, Names the Names

Soylent’s not purchased by the Mark Zuckerbergs or the Larry Pages or the other tech aristocrats […] Rather, it’s been taken up by white-collar workers and students destined for perpetual toil in the digital mills. Their embrace of life hacking represents the internalisation of management practices by the managed themselves.

(tags: life-hacks soylent food politics taylorism efficiency capitalism work life)
Working with Apache Spark: Or, How I Learned to Stop Worrying and Love the Shuffle | Cloudera Engineering Blog

some good Spark optimization tips

(tags: spark performance optimization rdd emr big-data cloudera tips akka)
Elements of Scale: Composing and Scaling Data Platforms

Great, encyclopedic blog post rounding up common architectural and algorithmic patterns using in scalable data platforms. Cut out and keep!

(tags: architecture storage databases data big-data scaling scalability ben-stopford cqrs druid parquet columnar-stores lambda-architecture)
ISIS vs. 3D Printing | Motherboard

Morehshin Allahyari, an Iranian born artist, educator, and activist [….] is working on digitally fabricating [the] sculptures [ISIS destroyed] for a series called “Material Speculation” as part of a residency in Autodesk’s Pier 9 program. The first in the series is “Material Speculation: ISIS,” which, through intense research, is modeling and reproducing statues destroyed by ISIS in 2015. Allahyari isn’t just interested in replicating lost objects but making it possible for anyone to do the same: Embedded within each semi-translucent copy is a flash drive with Allahyari’s research about the artifacts, and an online version is coming. In this way, “Material Speculation: ISIS,” is not purely a metaphorical affront to ISIS, but a practical one as well. Allahyari’s work is similar to conservation efforts, including web-based Project Mosul, a small team and group of volunteers that are three-dimensionally modeling ISIS-destroyed artifacts based on crowd-sourced photographs. “Thinking about 3D printers as poetic and practical tools for digital and physical archiving and documenting has been a concept that I’ve been interested in for the last three years,” Allahyari says. Once she began exploring the works, she discovered a thorough lack of documentation. Her research snowballed. “It became extremely important for me to think about ways to gather this information and save them for both current and future civilizations.”

(tags: 3d-printing fabrication scanning isis niniveh iraq morehshin-allahyari history preservation archives archival)

Links for 2015-05-24

Published May 24, 2015

Kubernetes for developers

great intro

(tags: kubernetes ops docker containers rocket deployment packaging)
A Piece of Apple II History Cracks Open — May 24, 2015

Lovely description of cracking (ie. copy-protection removal) in the Apple-II era. Very reminiscent of the equivalent in the C=64 scene, from my experience. ;)

(tags: history c=64 apple-ii personal-computers archive cracks copy-protection hacking)

Links for 2015-05-19

Published May 19, 2015

Deploying Elastic Beanstalk Applications from Docker Containers – Elastic Beanstalk

oh wow, this actually sounds pretty cool

(tags: docker aws ec2 beanstalk deployment ops containers)

Links for 2015-05-18

Published May 18, 2015

TIL we have more gravity than Canada

‘Early gravity mapping efforts in the 1960s revealed that the Hudson Bay area in particular exerts a weaker gravitational force. Since less mass equals less gravity, there must be less mass underneath these areas.’ informed!

(tags: gravity canada geode earth science hudson-bay mass)
SolarCapture Packet Capture Software

Interesting product line — I didn’t know this existed, but it makes good sense as a “network flight recorder”. Big in finance.
SolarCapture is powerful packet capture product family that can transform every server into a precision network monitoring device, increasing network visibility, network instrumentation, and performance analysis. SolarCapture products optimize network monitoring and security, while eliminating the need for specialized appliances, expensive adapters relying on exotic protocols, proprietary hardware, and dedicated networking equipment.
See also Corvil (based in Dublin!): ‘I’m using a Corvil at the moment and it’s awesome- nanosecond precision latency measurements on the wire.’ (via mechanical sympathy list)
(tags: corvil timing metrics measurement latency network solarcapture packet-capture financial performance security network-monitoring)
Top 10 data mining algorithms in plain English

This is a phenomenally useful ML/data-mining resource post — ‘the top 10 most influential data mining algorithms as voted on by 3 separate panels in [ICDM ’06’s] survey paper’, but with a nice clear intro and description for each one. Here’s the algorithms covered:
1. C4.5 2. k-means 3. Support vector machines 4. Apriori 5. EM 6. PageRank 7. AdaBoost 8. kNN 9. Naive Bayes 10. CART

(tags: svm k-means c4.5 apriori em pagerank adaboost knn naive-bayes cart ml data-mining machine-learning papers algorithms unsupervised supervised)
Developer believes he can turn digital game into global hit

g’wan the Colm!

(tags: colm-larkin guild-of-dungeoneering games press)
Trend Micro Locality Sensitive Hash

a fuzzy matching library. Given a byte stream with a minimum length of 512 bytes, TLSH generates a hash value which can be used for similarity comparisons. Similar objects will have similar hash values which allows for the detection of similar objects by comparing their hash values. Note that the byte stream should have a sufficient amount of complexity. For example, a byte stream of identical bytes will not generate a hash value.
Paper here: https://drive.google.com/file/d/0B6FS3SVQ1i0GTXk5eDl3Y29QWlk/edit via adulau
(tags: nilsimsa sdhash ssdeep locality-sensitive hashing algorithm hashes trend-micro tlsh hash fuzzy-matching via:adulau)
Eric Brewer interview on Kubernetes

What is the relationship between Kubernetes, Borg and Omega (the two internal resource-orchestration systems Google has built)? I would say, kind of by definition, there’s no shared code but there are shared people. You can think of Kubernetes?—?especially some of the elements around pods and labels?—?as being lessons learned from Borg and Omega that are, frankly, significantly better in Kubernetes. There are things that are going to end up being the same as Borg?—?like the way we use IP addresses is very similar?—?but other things, like labels, are actually much better than what we did internally. I would say that’s a lesson we learned the hard way.

(tags: google architecture kubernetes docker containers borg omega deployment ops)

Links for 2015-05-17

Published May 17, 2015

‘Can People Distinguish Pâté from Dog Food?’

Ugh.
Considering the similarity of its ingredients, canned dog food could be a suitable and inexpensive substitute for pâté or processed blended meat products such as Spam or liverwurst. However, the social stigma associated with the human consumption of pet food makes an unbiased comparison challenging. To prevent bias, Newman’s Own dog food was prepared with a food processor to have the texture and appearance of a liver mousse. In a double-blind test, subjects were presented with five unlabeled blended meat products, one of which was the prepared dog food. After ranking the samples on the basis of taste, subjects were challenged to identify which of the five was dog food. Although 72% of subjects ranked the dog food as the worst of the five samples in terms of taste (Newell and MacFarlane multiple comparison, P<0.05), subjects were not better than random at correctly identifying the dog food.

(tags: pate food omgwtf science research dog-food meat economics taste flavour)
Redditor runs the secret Python code in Ex Machina

and finds:
when you run with python2.7 you get the following: ISBN = 9780199226559 Which is Embodiment and the inner life: Cognition and Consciousness in the Space of Possible Minds. and so now I have a lot more respect for the Director.

(tags: python movies ex-machina cool books easter-eggs)
Metalwoman beer recipe

via the Dublin Ladies Beer Society ;)

(tags: metalman metalwoman recipes beer brewing hops dlbs)

Links for 2015-05-15

Published May 15, 2015

Linux futex_wait() bug

major bug in kernel versions 3.14 – 3.18 on Haswell hardware

(tags: haswell linux futex_wait futexes kernel bugs hang)

Links for 2015-05-14

Published May 14, 2015

repo

‘The multiple repository tool’. How Google kludged around the split-repo problem when you don’t have a monorepo.

(tags: kludges git monorepo monorepi google android aosp repo coding version-control dvcs)
Declaratively Provision Docker Images Using Nix

I really wish Docker/CoreOS would look at copying some of the deterministic-build ideas from Nix; see also http://gregoryszorc.com/blog/2014/10/13/deterministic-and-minimal-docker-images/

(tags: build packaging docker nix nix-docker deterministic-builds nixos apollo brazil)
Please stop calling databases CP or AP

In his excellent blog post […] Jeff Hodges recommends that you use the CAP theorem to critique systems. A lot of people have taken that advice to heart, describing their systems as “CP” (consistent but not available under network partitions), “AP” (available but not consistent under network partitions), or sometimes “CA” (meaning “I still haven’t read Coda’s post from almost 5 years ago”). I agree with all of Jeff’s other points, but with regard to the CAP theorem, I must disagree. The CAP theorem is too simplistic and too widely misunderstood to be of much use for characterizing systems. Therefore I ask that we retire all references to the CAP theorem, stop talking about the CAP theorem, and put the poor thing to rest. Instead, we should use more precise terminology to reason about our trade-offs.

(tags: cap databases storage distcomp ca ap cp zookeeper consistency reliability networking)

Links for 2015-05-12

Published May 12, 2015

Input: Fonts for Code

Non-monospaced coding fonts! I’m all in favour…
As writing and managing code becomes more complex, today’s sophisticated coding environments are evolving to include everything from breakpoint markers to code folding and syntax highlighting. The typography of code should evolve as well, to explore possibilities beyond one font style, one size, and one character width.

(tags: input fonts via:its typography code coding font text ide monospace)
Apache HTrace

a Zipkin-compatible distributed-system tracing framework in Java, in the Apache Incubator

(tags: zipkin tracing trace apache incubator java debugging)
Intel speeds up etcd throughput using ADR Xeon-only hardware feature

To reduce the latency impact of storing to disk, Weaver’s team looked to buffering as a means to absorb the writes and sync them to disk periodically, rather than for each entry. Tradeoffs? They knew memory buffers would help, but there would be potential difficulties with smaller clusters if they violated the stable storage requirement. Instead, they turned to Intel’s silicon architects about features available in the Xeon line. After describing the core problem, they found out this had been solved in other areas with ADR. After some work to prove out a Linux OS supported use for this, they were confident they had a best-of-both-worlds angle. And it worked. As Weaver detailed in his CoreOS Fest discussion, the response time proved stable. ADR can grab a section of memory, persist it to disk and power it back. It can return entries back to disk and restore back to the buffer. ADR provides the ability to make small (<100MB) segments of memory “stable” enough for Raft log entries. It means it does not need battery-backed memory. It can be orchestrated using Linux or Windows OS libraries. ADR allows the capability to define target memory and determine where to recover. It can also be exposed directly into libs for runtimes like Golang. And it uses silicon features that are accessible on current Intel servers.

(tags: kubernetes coreos adr performance intel raft etcd hardware linux persistence disk storage xeon)

Links for 2015-05-11

Published May 11, 2015

streamtools: a graphical tool for working with streams of data | nytlabs

Visual programming, Yahoo! Pipes style, back again:
we have created streamtools – a new, open source project by The New York Times R&D Lab which provides a general purpose, graphical tool for dealing with streams of data. It provides a vocabulary of operations that can be connected together to create live data processing systems without the need for programming or complicated infrastructure. These systems are assembled using a visual interface that affords both immediate understanding and live manipulation of the system.
via Aman
(tags: via:akohli streaming data nytimes visual-programming coding)
MappedBus

a Java based low latency, high throughput message bus, built on top of a memory mapped file; inspired by Java Chronicle with the main difference that it’s designed to efficiently support multiple writers – enabling use cases where the order of messages produced by multiple processes are important. MappedBus can be also described as an efficient IPC mechanism which enable several Java programs to communicate by exchanging messages.

(tags: ipc java jvm mappedbus low-latency mmap message-bus data-structures queue message-passing)

Links for 2015-05-10

Published May 10, 2015

Amazon’s Drone Delivery Patent Just Feels Like Trolling At This Point

Oh dear, Amazon.
These aren’t actual technologies yet. […] All of which underscores that Amazon might never ever ever ever actually implement delivery drones. The patent paperwork was filed nearly a year after Amazon’s splashy drone program reveal on 60 Minutes. At the time we called it revolutionary marketing because, you know, delivery drones are technical and logistical madness, not to mention that commercial drone use is illegal right now. Although, in fairness the FAA did just relax some rules so that Amazon could test drones. At this point it feels like Amazon is just trolling. It’s trolling us with public relations BS about its future drones, and it’s trolling future competitors — Google is also apparently working on this — so that if somebody ever somehow does anything relating to drone delivery, Amazon can sue them. If I’m wrong, I’ll deliver my apology via Airmail.

(tags: amazon trolling patents uspto delivery drones uavs competition faa)
Red Hat on rkt vs Docker

This is like watching a train-wreck in slow motion on Groundhog Day. We, in the broader Linux and open source community, have been down this path multiple times over the past fifteen years, specifically with package formats. While there needs to be room for experimentation, having two incompatible specs driven by two startups trying to differentiate and in direct competition is *not* a good thing. It would be better for the community and for everyone who depends on our collective efforts if CoreOS and Docker collaborated on a standardized common spec, image format, and distribution protocol. To this end, we at Red Hat will continue to contribute to both initiatives with the goal of driving convergence.

(tags: rkt docker appc coreos red-hat dpkg rpm linux packaging collaboration open-source)

Links for 2015-05-09

Published May 9, 2015

Migration to, Expectations, and Advanced Tuning of G1GC

Bookmarking for future reference. recommended by one of the GC experts, I can’t recall exactly who ;)

(tags: gc g1gc jvm java tuning performance ops migration)
Deploy a registry – Docker Documentation

Looks like it’s pretty feasible to run a private Docker registry on every host, backed by S3 (according to the ECS team’s AMA). SPOF-free — handy

(tags: docker registry ops deployment s3)
How to change Gradle cache location

$GRADLE_USER_HOME, basically — it may also be possible to set from the Gradle script itself too

(tags: gradle build caching environment unix cache)
Internet of 404’s

“An archive of the former Internet of Things”

(tags: archive iot things internet nabaztag startups acquisitions tumblr gadgets history)
Memory Layouts for Binary Search

Key takeaway:
Nearly universally, B-trees win when the data gets big enough.

(tags: caches cpu performance optimization memory binary-search b-trees algorithms search memory-layout)
Understanding the Docker Cache for Faster Builds

good advice. see also the Best Practices official doc at https://docs.docker.com/articles/dockerfile_best-practices/

(tags: docker build packaging cache best-practices tips)

Links for 2015-05-08

Published May 8, 2015

Your Google Algorithm Cheat Sheet: Panda, Penguin, and Hummingbird

Interesting that GOOG are still doing these big-bang releases — I guess crunching the data to come up with new weights/rules is a heavyweight, time-consuming process

(tags: google search ranking releases panda penguin hummingbird weighting)
Dublin Bike Theft Survey Results

Dublin Cycling Campaign’s survey results: estimated 20,000 bikes stolen per year in Dublin; only 1% of thefts results in a conviction

(tags: dublin bikes cycling theft crime statistics infographics dcc)
DRUG PUMP’S SECURITY FLAW LETS HACKERS RAISE DOSE LIMITS

The Hospira drug pump vulnerabilities described here sound pretty horrific

(tags: drugs drug-pumps hospira exploits vulnerabilities security root dosage limits)
Making End-to-End Tests Work

+1 to ALL of this. We are doing exactly the same in Swrve and it has radically improved our release quality

(tags: end-to-end testing acceptance-tests tests system-tests lmax)
How to do named entity recognition: machine learning oversimplified

Good explanation of this NLP tokenization/feature-extraction technique. Example result: “Jimi/B-PER Hendrix/I-PER played/O at/O Woodstock/B-LOC ./O”

(tags: named-entities feature-extraction tokenization nlp ml algorithms machine-learning)
The Discovery of Apache ZooKeeper’s Poison Packet – PagerDuty

Excellent deep dive into a production issue. Root causes: crappy error handling code in Zookeeper; lack of bounds checking in ZK; and a nasty kernel bug.

(tags: zookeeper bugs error-handling bounds-checking oom poison-packets pagerduty packets tcpdump xen aes linux kernel)
The Injector: A new Executor for Java

This honestly fits a narrow niche, but one that is gaining in popularity. If your messages take > 100?s to process, or your worker threads are consistently saturated, the standard ThreadPoolExecutor is likely perfectly adequate for your needs. If, on the other hand, you’re able to engineer your system to operate with one application thread per physical core you are probably better off looking at an approach like the LMAX Disruptor. However, if you fall in the crack in between these two scenarios, or are seeing a significant portion of time spent in futex calls and need a drop in ExecutorService to take the edge off, the injector may well be worth a look.

(tags: performance java executor concurrency disruptor algorithms coding threads threadpool injector)

Links for 2015-05-07

Published May 7, 2015

KillBiller

Excellent mobile-phone plan comparison site for the Irish market, using apps which you install and which analyse your call history, data usage, etc. over the past month to compute the optimal plan based on your usage. Pretty amazing results in my case! The only downside is the privacy policy, which allows the company to resell your usage data (anonymised, and in aggregate) — I’d really prefer if this wasn’t the case :(

(tags: mobile-phones shopping tesco emobile 3g 4g ireland plans comparison-shopping killbiller via:its)
Family in No poster Says YES to Marriage Equality | Amnesty International

Beyond the politics, the risks of stock photo usage are pretty evident too:
“In 2014, as a young family, we did a photo shoot with a photographer friend to get some nice shots for the family album. No money was exchanged – we got nice photos for free, they got nice images for their portfolio. As part of this agreement, we agreed to let them upload them to a stock photo album. We knew that these were available for purchase and we gave permission. Perhaps, naïvely, we imagined that on the off chance that any was ever selected, it might be for a small magazine or website. To confirm, we have not received any money for the photo – then or now, and nor do we expect any. We were surprised and upset to see that the photo was being used as part of a campaign with which we do not agree. We completely support same-sex marriage, and we believe that same-sex couples’ should of course be able to adopt, as we believe that they are equally able to provide children with much-needed love and care. To suggest otherwise is offensive to us, and to many others.”

(tags: ssm ireland politics amnesty stock-photos ip rights photos campaigns ads)
Lambda: Bees with Frickin’ Laser Beams

a HTTP testing tool in AWS Lambda. nice enough, but still a toy…

(tags: lambda aws node javascript hacks http load-testing)

Links for 2015-05-06

Published May 6, 2015

Why Loggly loves Apache Kafka

Some good factoids about Loggly’s Kafka usage and scales

(tags: scalability logging loggly kafka queueing ops reliabilty)
Patterns for building a resilient and scalable microservices platform on AWS

Some good details from Boyan Dimitrov at Hailo, on their orchestration, deployment, provisioning infra they’ve built

(tags: deployment ops devops hailo microservices platform patterns slides)
hyperlogsandwich

A probabilistic data structure for frequency/k-occurrence cardinality estimation of multisets. Sample implementation
(via Patrick McFadin)
(tags: via:patrickmcfadin hyperloglog cardinality data-structures algorithms hyperlogsandwich counting estimation lossy multisets)
“Trash Day: Coordinating Garbage Collection in Distributed Systems”

Another GC-coordination strategy, similar to Blade (qv), with some real-world examples using Cassandra

(tags: blade via:adriancolyer papers gc distsys algorithms distributed java jvm latency spark cassandra)
Five Takeaways on the State of Natural Language Processing

Good overview of the state of the art in NLP nowadays. I particularly like word2vec interesting:
Embedding words as real-numbered vectors using a skip-gram, negative-sampling model (word2vec code) was mentioned in nearly every talk I attended. Either companies are using various word2vec implementations directly or they are building diffs off of the basic framework. Trained on large corpora, the vector representations encode concepts in a large dimensional space (usually 200-300 dim).
Quite similar to some tokenization approaches we experimented with in SpamAssassin, so I don’t find this too surprising….
(tags: word2vec nlp tokenization machine-learning language parsing doc2vec skip-grams data-structures feature-extraction via:lemonodor)

Justin's Linklog Posts