Comment by mumblemumble

5 years ago

> Our systems are designed to audit commands like these to prevent mistakes like this, but a bug in that audit tool didn’t properly stop the command.

I'm so glad to see that they framed this in terms of a bug in a tool designed to prevent human error, rather than simply blaming it on human error.

> I'm so glad to see that they framed this in terms of a bug in a tool designed to prevent human error, rather than simply blaming it on human error.

Wouldn't human error reflect extremely poorly on the company though? I mean, for human error to be the root cause of this mega-outage, that would imply that the company's infrastructure and operational and security practices were so ineffective that a single person screwing up could inadvertently bring the whole company down.

A freak accident that requires all stars to be aligned to even be possible, on the other hand, does not cause a lot of concerns.

  • Organisations have a bad habit of using "human error" to blame systemic problems whose true root cause is inadequate leadership on individual low level employees. So, we're glad to see Facebook didn't try this shitty practice.

    For a modern example, look for information on Symantec's "A tough day as leaders" in which they try to blame an incident that's clearly a result of at least incompetence by senior management on a single person who they've just fired. This is part of the sequence of events that leads to Symantec no longer being a trusted root CA. You won't find that actual post by Symantec because (of course) once they realised it wasn't doing what they wanted they deleted it, but you can find copies and references to it.

    For much older examples, look at the early history of the railway in most of the world. Train crashes, blame the (often dead in the crash and thus unable to defend themselves) train driver, hint that they may have been drunk and were certainly incompetent. Owners carry on profiting from unsafe railway and needn't spend any money making it safer.

  • I mean, sure? mumblemumble is still right though. If you're looking for a cynical reason for everything FB related, then, sure, it's true that a human error looks bad.

Human error is a cop-out excuse anyway, since it's not something you can fix going forward. Humans err, and if a mistake was made once it could easily be made again.

The buggy audit tool was probably made by a human too, though.

I wouldn't be surprised if that tool was a shell script with a mistyped conditional somewhere, I really dislike shell scripting.

  • As opposed to what? Sixteen pages of boilerplate Java/Python?

    • I wouldn't conflate Java and Python in the boilerplate camp. Python can be very boilerplate-y, but it tends to only happen in the hands of Java developers.

      That said, even clean, idiomatic Python isn't as terse as sh. It also isn't as terse as perl. Many would argue that's a good thing. The optimum point for readability isn't found at either of the extremes. Not entirely unlike how the most readable way of writing English is neither shorthand nor blackletter.