Comment by jakejarvis
6 years ago
Always appreciate the transparency from you and Cloudflare. :)
My main fright during this outage wasn't really the outage itself, but the fact that I couldn't log into the dashboard and simply click the orange cloud to bypass Cloudflare in the meantime. I'm assuming that this is now covered by this mitigation:
>> 6. Putting in place an emergency ability to take the Cloudflare Dashboard and API off Cloudflare's edge.
If so, and if this would have prevented the dashboard outage even during the WAF fiasco, this is a huge comfort to me. Just curious, though: how far can you really go in separating Cloudflare "the interface" from Cloudflare "the network?"
And in general, what does everyone on HN think about mission-critical companies using their own infrastructure and being their own customer? Especially when the alternative is using a competitor?
I've always been a proponent of separating the monitoring from the infra. Otherwise your insight is binary: the service is either up or down. You don't have an context as to why.
Edit: Additionally, from a competitive standpoint, I don't see a problem with using a third-party platform for a monitoring service.
Yes, this is always one of our primary questions we ask when deciding when and how we should dogfood our own services; will we create a circular dependency where our ability to fix an issue on one service is hindered by any chain of dependencies between the service with the issue and the service used to fix it. We always avoid those, or at least have easy alternatives.
Absolutely agree about an external monitoring service being a necessity. I was more referring to cloudflare.com (and specifically dash.cloudflare.com) being entirely served through Cloudflare itself, or the AWS console being hosted on AWS, etc.
>> 6. Putting in place an emergency ability to take the Cloudflare Dashboard and API off Cloudflare's edge.
> If so, and if this would have prevented the dashboard outage even during the WAF fiasco
It wouldn't prevent the initial dashboard outage. However, in a similar situation where the main issue can't be resolved quickly, it would allow them to restore dashboard access.