← Back to context

Comment by terom

3 days ago

From the Cloudflare incident:

> Cloudflare’s critical Workers KV service went offline due to an outage of a 3rd party service that is a key dependency. As a result, certain Cloudflare products that rely on KV service to store and disseminate information are unavailable [...]

Surprising, but not entirely unplausible for a GCP outage to spread to CF.

> outage of a 3rd party service that is a key dependency.

Good to know that Cloudflare has services seemingly based on GCP with no redundancy.

  • Probably unintentional. "We just read this config from this URL at startup" can easily snowball into "if that URL is unavailable, this service will go down globally, and all running instances will fail to restart when the devops team try to do a pre-emptive rollback"

  • After reading about cloudflare infra in post mortems it has always been surprising how immature their stack is. Like they used to run their entire global control plane in a single failure domain.

    Im not sure who is running the show there, but the whole thing seems kinda shoddy given cloudflares position as the backbone of a large portion of the internet.

    I personally work at a place with less market cap than cloudflare and we were hit by the exact same instances (datacenter power went out) and had almost no downtime, whereas the entire cloudflare api was down for nearly a day.

  • What's the alternative here? Do you want them to replicate their infrastructure across different cloud providers with automatic fail-over? That sounds -- heck -- I don't know if modern devops is really up to that. It would probably cause more problems than it would solve...

    • > What's the alternative here? Do you want them to replicate their infrastructure

      Cloudflare adverises themselves as _the_ redundancy / CDN provider. Don't ask me for an "alternative" but tell them to get their backend infra shit in order.

    • There are roughly 20-25 major IaaS providers in the world that should have close to dependency on each other. I'm almost certain that cloud flare believe that was their posture, and that the action items coming out of this post mortem will be to make sure that this is the case.