Comment by motorest

2 days ago

> Incidents like this repeatedly happening reveal that’s mostly a myth. They aren’t much smarter, their standards are somewhat wishful thinking, their accomplishments are mostly rooted in the problems they needed to solve just like any other company.

I think you're only seeing what you want to see, because somehow bringing FANG engineers down a peg makes you feel better?

A broken deployment due to a once-in-a-lifetime configuration change in a project that wasn't allocated engineering effort to allow more robust and resilient deployment modes doesn't turn any engineer into an incompetent fool. Sometimes you need to flip a switch, and you can't spare a team working one year to refactor the whole thing.

> Sometimes you need to flip a switch

If anyone needs to flip a global switch, and can't convince their leadership to allocate the resources to do it safely, engineering culture is dead, at least locally to that system.

But I'm not convinced lack of headcount was the problem here, the incident report makes it sound like there's an established pattern for feature flagging even for global changes like this.

Putting aside the fact that this team seems unperturbed by global deployments and all the other scary things, high impact changes should use every mechanism available to shrink fault containers. It would be inexcusable to roll this change out without tha feature flag mechanism if this were a regional roll out.

Skipping the feature flag when this is global is simply incomprehensible. It goes beyond headcount, it should never have been considered in the first place.

> Sometimes you need to flip a switch, and you can't spare a team working one year to refactor the whole thing.

This seems to imply that the person in charge at G was right to cause this outage... and that Google is very short-staffed and too poor to afford to do proper engineering work?

Somehow that doesn't inspire confidence in their engineering prowess. Sure seems to me that bad engineering leadership decisions is equivalent to bad engineering.

  • > Google is very short-staffed

    This has been the case for many teams since January 2023