Comment by voytec

3 days ago

> outage of a 3rd party service that is a key dependency.

Good to know that Cloudflare has services seemingly based on GCP with no redundancy.

Probably unintentional. "We just read this config from this URL at startup" can easily snowball into "if that URL is unavailable, this service will go down globally, and all running instances will fail to restart when the devops team try to do a pre-emptive rollback"

After reading about cloudflare infra in post mortems it has always been surprising how immature their stack is. Like they used to run their entire global control plane in a single failure domain.

Im not sure who is running the show there, but the whole thing seems kinda shoddy given cloudflares position as the backbone of a large portion of the internet.

I personally work at a place with less market cap than cloudflare and we were hit by the exact same instances (datacenter power went out) and had almost no downtime, whereas the entire cloudflare api was down for nearly a day.

What's the alternative here? Do you want them to replicate their infrastructure across different cloud providers with automatic fail-over? That sounds -- heck -- I don't know if modern devops is really up to that. It would probably cause more problems than it would solve...

  • They're a company that has to run their own datacenters, you'd expect them to not fall over when a public cloud does.

    • I was really surprised. The dependence on another enterprise’s cloud services in-general I think is risky, but pretty much everyone does it these days, but I didn’t expect them to be.

      2 replies →

  • > What's the alternative here? Do you want them to replicate their infrastructure

    Cloudflare adverises themselves as _the_ redundancy / CDN provider. Don't ask me for an "alternative" but tell them to get their backend infra shit in order.

  • There are roughly 20-25 major IaaS providers in the world that should have close to dependency on each other. I'm almost certain that cloud flare believe that was their posture, and that the action items coming out of this post mortem will be to make sure that this is the case.

Google is an advertising company not a tech company. Do not rely on them performing anything critical that doesn't depend on ad revenue.