← Back to context

Comment by cnst

5 years ago

Note that contrary to popular reports, DNS was NOT to blame for this outage — for once DNS worked exactly as per per the spec, design and configuration:

> To ensure reliable operation, our DNS servers disable those BGP advertisements if they themselves can not speak to our data centers, since this is an indication of an unhealthy network connection.

Not the first cause, but involved. Before reading I expected to see some combination of (1) automation (2) DNS (3) BGP. I didn't expect to see all three and the special automatically disconnect the backbone from the internet with no other way for senior tech staff to get to the backbone, not even a secure dial-up console.

I think the general lesson here is for each thing you automate, assume that it can act in error and have another manual way to do what the automatic action prevents.