Outages, PostMortems, and Human Error 101
Good basic pres from John Allspaw, covering the basics of tier-one tech incident response — defining the 5 severity levels; root cause analysis techniques (to Five-Whys or not); and the importance of service metrics
(tags: devops monitoring ops five-whys allspaw slides etsy codeascraft incident-response incidents severity root-cause postmortems outages reliability techops tier-one-support)