Justin's Linklog

How to do distributed locking

A critique of the "Redlock" locking algorithm from Redis by Martin Kleppman. antirez responds here: http://antirez.com/news/101

(tags: distributed locking redis algorithms coding distcomp redlock martin-kleppman zookeeper)
Submitting User Applications with spark-submit - AWS Big Data Blog

looks reasonably usable, although EMR's crappy UI is still an issue

(tags: emr big-data spark hadoop yarn map-reduce batch)
The Nyquist theorem and limitations of sampling profilers today, with glimpses of tracing tools from the future

Awesome post from Dan Luu with data from Google:
The cause [of some mystery widespread 250ms hangs] was kernel throttling of the CPU for processes that went beyond their usage quota. To enforce the quota, the kernel puts all of the relevant threads to sleep until the next multiple of a quarter second. When the quarter-second hand of the clock rolls around, it wakes up all the threads, and if those threads are still using too much CPU, the threads get put back to sleep for another quarter second. The phase change out of this mode happens when, by happenstance, there aren’t too many requests in a quarter second interval and the kernel stops throttling the threads. After finding the cause, an engineer found that this was happening on 25% of disk servers at Google, for an average of half an hour a day, with periods of high latency as long as 23 hours. This had been happening for three years. Dick Sites says that fixing this bug paid for his salary for a decade. This is another bug where traditional sampling profilers would have had a hard time. The key insight was that the slowdowns were correlated and machine wide, which isn’t something you can see in a profile.

(tags: debugging performance visualization instrumentation metrics dan-luu latency google dick-sites linux scheduler throttling kernel hangs)

Archives

Links for 2016-02-09