← Back to context

Comment by marcinzm

2 days ago

As an outsider my quick guess is that at some point after enough layoffs and the CEO accusing everyone of being lazy, people focus on speed/perceived output over quality. After a while the culture shifts so if you block such things then you're the problem and will be ostracized.

As an outsider, what I perceive is quite different:

HN likes to pretend that FAANG is the pinnacle of existence. The best engineers, the best standards, the most “that wouldn’t have happened here,” the yardstick by which all companies should be measured for engineering prowess.

Incidents like this repeatedly happening reveal that’s mostly a myth. They aren’t much smarter, their standards are somewhat wishful thinking, their accomplishments are mostly rooted in the problems they needed to solve just like any other company.

  • That’s just PR that serves these companies. I’ve never seen them that way. The stupid avoidable bugs and terrible UX in a lot of their products tells you enough at the surface level. What’s true is these companies do hire some amazing specialists but that doesn’t make them the pinnacle of engineering overall.

  • You might be right to some extend, but not entirely. For example, there have been almost no incidents in AWS where one customer would be able to access the data of another customer because of AWS fault. The cases so far like Superglue etc. were very limited and IMHO AWS security is quite solid.

    So I would say there is a difference between AWS architects and engineers (although I know first hand that certain things are subobtimal, but...) and those of several other companies who have less customers but experienced successful attacks (or data loss). Even if you take Microsoft, there is huge difference in security posture between AWS and Azure (and I say this as a big fan of the so-called "private cloud" (previously know as just your own infra)).

  • > Incidents like this repeatedly happening reveal that’s mostly a myth. They aren’t much smarter, their standards are somewhat wishful thinking, their accomplishments are mostly rooted in the problems they needed to solve just like any other company.

    I think you're only seeing what you want to see, because somehow bringing FANG engineers down a peg makes you feel better?

    A broken deployment due to a once-in-a-lifetime configuration change in a project that wasn't allocated engineering effort to allow more robust and resilient deployment modes doesn't turn any engineer into an incompetent fool. Sometimes you need to flip a switch, and you can't spare a team working one year to refactor the whole thing.

    • > Sometimes you need to flip a switch

      If anyone needs to flip a global switch, and can't convince their leadership to allocate the resources to do it safely, engineering culture is dead, at least locally to that system.

      But I'm not convinced lack of headcount was the problem here, the incident report makes it sound like there's an established pattern for feature flagging even for global changes like this.

      Putting aside the fact that this team seems unperturbed by global deployments and all the other scary things, high impact changes should use every mechanism available to shrink fault containers. It would be inexcusable to roll this change out without tha feature flag mechanism if this were a regional roll out.

      Skipping the feature flag when this is global is simply incomprehensible. It goes beyond headcount, it should never have been considered in the first place.

    • > Sometimes you need to flip a switch, and you can't spare a team working one year to refactor the whole thing.

      This seems to imply that the person in charge at G was right to cause this outage... and that Google is very short-staffed and too poor to afford to do proper engineering work?

      Somehow that doesn't inspire confidence in their engineering prowess. Sure seems to me that bad engineering leadership decisions is equivalent to bad engineering.

      1 reply →