← Back to context

Comment by tantalor

5 years ago

But the FB outage was not a configuration change.

> a command was issued with the intention to assess the availability of global backbone capacity, which unintentionally took down all the connections in our backbone network

From yesterday's post:

"Our engineering teams have learned that configuration changes on the backbone routers that coordinate network traffic between our data centers caused issues that interrupted this communication.

...

Our services are now back online and we’re actively working to fully return them to regular operations. We want to make clear that there was no malicious activity behind this outage — its root cause was a faulty configuration change on our end."

Ultimately, that faulty command changed router configuration globally.

The Google outage was triggered by a configuration change due to an automation system gone rogue. But hey, it too was triggered by a human issuing a command at some point.

  • I'm inclined to believe the later post as they've had more time to assess the details. I think the point of the earlier post is really to say "we weren't hacked!" but they didn't want to use exactly that language.

This is kind of like Chernobyl where they were testing to see how hot they could run the reactor to see how much power it could generate. Then things went sideways.

  • The Chernobyl test was not a test to drive the reactor to the limits, but actually a test to verify that the inertia of the main turbines is big enough to drive the coolant pumps for X amount of time in the case of grid failure.

  • As already said the test was about something entirely different. And the dangerous part was not the test itself, but the way they delayed the test and then continued to perform it despite the reactor being in a problematic state and the night shift being on duty, who were not trained on this test. The main problem was that they ran the reactor at reduced power long enough to have significant xenon poisoning, and then put the reactor at the brink when they tried to actually run the test under these unsafe conditions.

    • I'd say the failure at Chernobyl was that anyone who asked questions got sent to a labor camp and the people making the decisions really had no clue about the work being done. Everything else just stems from that. The safest reactor in the world would blow up under the same leadership.

  • At first i thought it was inappropriate hyperbole to compare Facebook to Chernobyl, but then i realized that i think Facebook (along with twitter and other "web 2.0" graduates) has spread toxic waste across far larger of an area than Chernobyl. But I would still say that it's not the _outage_ which is comparable to Chernobyl, but the steady-state operations.