James Hamilton - Failures at Scale & How to Ride Through Them - AWS re:Invent 2012 - Cpn208
mostly an update of his classic USENIX paper, but pretty cool to come across a mention of a network monitoring system we've built on page 21 ;)
(tags: amazon james-hamilton reliabilty slides aws)