Comment by rezonant
8 months ago
Hard agree. They clearly were more interested in making clear that there's not a systemic problem in how GCP's operators manage the platform, which read strongly and alarmingly that there is a systemic problem in how GCP's operators manage the platform. The lack of the common sense measures you outline in their postmortem just tells me that they aren't doing anything to fix it.
“There’s no systemic problem.”
Meanwhile, the operators were allowed to leave a parameter blank and the default was to set a deletion time bomb.
Not systemic my butt! That’s a process failure, and every process failure like this is a systemic problem because the system shouldn’t allow a stupid error like this.
If you're arguing that that was the systemic problem, then it's been fully fixed, as the manual operation was removed and so validation can no longer be bypassed.
I think you glossed over the importance of the term process failure.
The idea is that this one particular form missing the appropriate care is indicative of a wider lack of discipline amongst the engineers building it.
Definitionally, you cannot solve a process problem by fixing a specific bug.
2 replies →