Today is when Amazon brain drain finally caught up with AWS • The Register
Corey "Last Week In AWS" Quinn really getting the boot in on AWS after yesterday's gigantic us-east-1 outage:
AWS has given increasing levels of detail, as is their tradition, when outages strike, and as new information comes to light. Reading through it, one really gets the sense that it took them 75 minutes to go from "things are breaking" to "we've narrowed it down to a single service endpoint, but are still researching," which is something of a bitter pill to swallow. To be clear: I've seen zero signs that this stems from a lack of transparency, and every indication that they legitimately did not know what was breaking for a patently absurd length of time. [...]
At the end of 2023, Justin Garrison left AWS and roasted them on his way out the door. He stated that AWS had seen an increase in Large Scale Events (or LSEs), and predicted significant outages in 2024. It would seem that he discounted the power of inertia, but the pace of senior AWS departures certainly hasn't slowed — and now, with an outage like this, one is forced to wonder whether those departures are themselves a contributing factor.
You can hire a bunch of very smart people who will explain how DNS works at a deep technical level (or you can hire me, who will incorrect you by explaining that it's a database), but the one thing you can't hire for is the person who remembers that when DNS starts getting wonky, check that seemingly unrelated system in the corner, because it has historically played a contributing role to some outages of yesteryear.
When that tribal knowledge departs, you're left having to reinvent an awful lot of in-house expertise that didn't want to participate in your RTO games, or play Layoff Roulette yet again this cycle. This doesn't impact your service reliability — until one day it very much does, in spectacular fashion. I suspect that day is today.
Ouch. This is a very painful read and I'd say AWS are not happy to see it....
Tags: aws amazon layoffs tech how-we-work lses outages us-east-1 rto brain-drain work