Comment by Animats
5 years ago
You can see the security/reliability tradeoff problem here.
You need a control plane. But what does it run over? Your regular data links? A problem if it also controls their configuration. Something outside your own infrastructure, like a modest connection to the local ISP as a backup? That's an attack vector.
One popular solution is to keep both the current and previous generation of the control plane up. Both Google and AT&T seem to have done that. AT&T kept Signalling System 5 up for years after SS7 was doing all the work. Having two totally different technologies with somewhat different paths is helpful.
The post mentioned that the out-of-band network was also down, and I’m curious what that entails and how it was also impacted. They must not have been on external DNS or had static IPs to access recovery. I’m sure they won’t share more than this now, but I’d sure love to hear more about the OOB access.