Cloudflare was down

2 days ago (cloudflarestatus.com)

https://blog.cloudflare.com/cloudflare-service-outage-june-1...

> Cloudflare’s critical Workers KV service went offline due to an outage of a 3rd party service that is a key dependency.

So they depend on GCP for (some of) their services

https://downdetector.com/ is showing outages at many major companies including Google, CloudFlare, AWS and more.

Word on the street is that there are large BGP routing issues behind all of this.

Yeah this is going to be a problem. I haven't seen an issue this widespread across so many services in a while.

  • Seems to be semi regular now that everyone puts all their eggs in only a few baskets.

    • I gotta say, it's kinda nice when that happens... work just kinda pauses for everyone, from providers to customers. It kinda feels like a national holiday, and everyone downstream from the affected cloud can just kinda sit back and relax cuz there's nothing they can do anyway except wait.

      When it's your own outage, it's all-hands-on-deck panic mode. When it's half the internet down, it's no longer your problem, lol

      1 reply →

Seems to be affecting functionality of their "Verify you are human" dialogs as well as Workers.

  • It does. Another question is why do we get these dialogues always from Cloudflare and never from Akamai in the first place?

    • Downvoting this comment and flagging the submission does not address the serious issue. These verification dialogues make the Internet unusable.

      3 replies →

Is it coincidence that there's a Scheduled Maintenance in Tokyo for 18:00 UTC in progress, and the problems started at 18:19 UTC?

Can’t wait to read this post-mortem. Seems odd that a Google Cloud outage would bring down Cloudflare services.

So both Cloudflare authentication as well as Google's identity systems suffered major dowtime yesterday. Are there technical dependecies between these?

  • Cloudflare doesn't say this directly but in their blog they've written

    > The cause of this outage was due to a failure in the underlying storage infrastructure used by our Workers KV service, which is a critical dependency for many Cloudflare products and relied upon for configuration, authentication and asset delivery across the affected services. Part of this infrastructure is backed by a third-party cloud provider, which experienced an outage today and directly impacted availability of our KV service.

Our Workers apps are up again

edit:

It works in the US but EU customers are still reporting our services as down.

edit:

EU customers are reporting ok

distributed systems break, that’s the whole point what actually matters is how fast they localize damage and how invisible that feels to the end user if kv failing takes down auth, ui, and workers, then failure isolation’s missing recovery is fine, but if your fix needs global coordination to unbreak local flows, that’s a design smell not saying perfect uptime, but the post-outage ux should feel smoother, not shakier right now it feels like the system survived but the interface didn’t

Workers KV has been down for like +30mins. This is impacting us seriously.

Their API is down too.

Amazing that something can impact their whole infrastructure like this given how much redundance they have.

  • From their incident page (https://www.cloudflarestatus.com/incidents/25r9t0vz99rp):

    > Cloudflare’s critical Workers KV service went offline due to an outage of a 3rd party service that is a key dependency.

    I bet that 3rd party service is GCP.

    I would be pretty pissed if I were a CF customer that used Workers KV for redundancy because it was heavily marketed as running on CF data centers.

  • >can impact their whole infrastructure

    CDN and WAF seem to be working fine. I think CF rushed a lot of newer services out without the reliability some of their older/core services enjoy

proxy seems available in general, must just be local to workers because only one of my sites going thru ZT tunnel with identity access rules is affected