James Hamilton – Failures at Scale & How to Ride Through Them – AWS re:Invent 2012 – Cpn208
mostly an update of his classic USENIX paper, but pretty cool to come across a mention of a network monitoring system we’ve built on page 21 ;)
(tags: amazon james-hamilton reliabilty slides aws)