Cloudflare was down

2 days ago (cloudflarestatus.com)

https://blog.cloudflare.com/cloudflare-service-outage-june-1...

95 comments

datadrivenangel

> Cloudflare’s critical Workers KV service went offline due to an outage of a 3rd party service that is a key dependency.

So they depend on GCP for (some of) their services

its-kostya 2 days ago
If that is true, and there is no other BGP shenanigans, then I suspect this dependency will not be around for long
- yencabulator 1 day ago
  
  From the article:
  > Workers KV is in the process of being transitioned to significantly more resilient infrastructure for its central store: regrettably, we had a gap in coverage which was exposed during this incident.
- beastman82 2 days ago
  
  My WAG is it comprises 95% of the company infrastructure
  
  3 replies →
- pizzafeelsright 2 days ago
  
  ceo just said not for long
asteroidburger 2 days ago

Sub-processor pages are an easy way to verify that sort of thing.
https://www.cloudflare.com/gdpr/subprocessors/cloudflare-ser...
reimertz 2 days ago

wrote a similar comment - good to know for the future.
voxadam 2 days ago
> So they depend on GCP for (some of) their services
Google denies they had any outages.
https://x.com/Google/status/1933246051512644069
https://nitter.net/Google/status/1933246051512644069
- yencabulator 1 day ago
  
  https://status.cloud.google.com/incidents/ow5i3PPK96RduMcb1S...
- IX-103 2 days ago
  
  They can say that, but any one of their customers knows it's not true.
  
  1 reply →
- mirashii 2 days ago
  
  Come on, linking four hour old tweets instead of their actual service dashboard where they clearly state there was an outage.
- voytec 2 days ago
  
  Weaseling out of SLA/SLO payments.

koliber 2 days ago

https://downdetector.com/ is showing outages at many major companies including Google, CloudFlare, AWS and more.

Word on the street is that there are large BGP routing issues behind all of this.

cogman10 2 days ago
Would make sense. I think the last time I saw this sort of thing it was BGP causing a bunch of traffic to route through Iran or china IIRC.
- nijave 2 days ago
  
  There was also an older instance with China https://www.cyberdefensemagazine.com/experts-detailed-how-ch...
- koliber 2 days ago
  
  I vaguely recall that incident. But it did not feel like it affected this many services.
  At the same time I have not noticed anything being down firsthand. I am in Europe.
  
  1 reply →
- NooneAtAll3 2 days ago
  
  so this is related to Israel's escalation that everyone is expecting?
  
  1 reply →
Animats 2 days ago

Internet Health Report is reporting "No data to show".
[1] https://www.ihr.live/
ramesh31 2 days ago

Anthropic down/degraded as well. Time to go for a walk.

jerrygoyal 2 days ago

GCP is also down https://news.ycombinator.com/item?id=44260810

tete 2 days ago

When being down scales. :D
ipsum2 2 days ago
Odd coincidence. Wonder if Cloudflare uses GCP?
- ikiris 2 days ago
  
  It's likely their auth infra based on what the Google outage is
  
  14 replies →

neo_doom 2 days ago

Yeah this is going to be a problem. I haven't seen an issue this widespread across so many services in a while.

tete 2 days ago
Seems to be semi regular now that everyone puts all their eggs in only a few baskets.
- solardev 1 day ago
  
  I gotta say, it's kinda nice when that happens... work just kinda pauses for everyone, from providers to customers. It kinda feels like a national holiday, and everyone downstream from the affected cloud can just kinda sit back and relax cuz there's nothing they can do anyway except wait.
  When it's your own outage, it's all-hands-on-deck panic mode. When it's half the internet down, it's no longer your problem, lol
  
  1 reply →

paxys 2 days ago

Let me guess, someone pushed out a bad BGP config?

CSMastermind 2 days ago
For an outage this large and widespread that would have to be the main culprit.

tete 2 days ago

Big blog post about how they saved the internet upcoming. ;)

Currently down, but reference: https://blog.cloudflare.com/the-ddos-that-almost-broke-the-i...

aranchelk 2 days ago

Seems to be affecting functionality of their "Verify you are human" dialogs as well as Workers.

clairegraham 2 days ago
Yep, KV is broken too. Any worker that depends on KV is throwing exceptions. I was able to get into the dash, but it's very slow. Error rates started to go up significantly around 18:00 UTC.
Edit: The CF status page has acknowledged it's a broad outage across many services: https://www.cloudflarestatus.com/incidents/25r9t0vz99rp
- aranchelk 2 days ago
  
  After many tries I also got into the dashboard, but it's not that usable, constant error pop-ups.
bgwalter 2 days ago
It does. Another question is why do we get these dialogues always from Cloudflare and never from Akamai in the first place?
- bgwalter 2 days ago
  
  Downvoting this comment and flagging the submission does not address the serious issue. These verification dialogues make the Internet unusable.
  
  3 replies →

pier25 2 days ago

They've changed the title to "Broad Cloudflare service outages"

ourmandave 2 days ago

Is it coincidence that there's a Scheduled Maintenance in Tokyo for 18:00 UTC in progress, and the problems started at 18:19 UTC?

alexcroox 2 days ago

Unrelated, they have a few services that rely on GCP which is down. Still, I imagine the people working on the maintenance for Tokyo turned white during that job worried it was caused by them...
perching_aix 2 days ago
Guess we'll find out from the postmortem. Always the silver lining with these, get to learn from and enjoy a good writeup.
- solarmist 2 days ago
  
  Do these get posted publicly?
  
  2 replies →
jonfw 2 days ago

There is always scheduled maintenance on that page, so that's not much of a signal in my experience
bhaney 2 days ago

Probably

sidcool 2 days ago

Cloudflare's lava lamps are dimming.

poorman 2 days ago

Can’t wait to read this post-mortem. Seems odd that a Google Cloud outage would bring down Cloudflare services.

PeterStuer 1 day ago

So both Cloudflare authentication as well as Google's identity systems suffered major dowtime yesterday. Are there technical dependecies between these?

tom1337 1 day ago

Cloudflare doesn't say this directly but in their blog they've written
> The cause of this outage was due to a failure in the underlying storage infrastructure used by our Workers KV service, which is a critical dependency for many Cloudflare products and relied upon for configuration, authentication and asset delivery across the affected services. Part of this infrastructure is backed by a third-party cloud provider, which experienced an outage today and directly impacted availability of our KV service.

iimblack 2 days ago

They updated the incident noting that it's not just authentication affected.

pier25 2 days ago

Our Workers apps are up again

edit:

It works in the US but EU customers are still reporting our services as down.

edit:

EU customers are reporting ok

b0a04gl 2 days ago

distributed systems break, that’s the whole point what actually matters is how fast they localize damage and how invisible that feels to the end user if kv failing takes down auth, ui, and workers, then failure isolation’s missing recovery is fine, but if your fix needs global coordination to unbreak local flows, that’s a design smell not saying perfect uptime, but the post-outage ux should feel smoother, not shakier right now it feels like the system survived but the interface didn’t

pier25 2 days ago

Workers KV has been down for like +30mins. This is impacting us seriously.

Their API is down too.

Amazing that something can impact their whole infrastructure like this given how much redundance they have.

kenhwang 2 days ago

From their incident page (https://www.cloudflarestatus.com/incidents/25r9t0vz99rp):
> Cloudflare’s critical Workers KV service went offline due to an outage of a 3rd party service that is a key dependency.
I bet that 3rd party service is GCP.
I would be pretty pissed if I were a CF customer that used Workers KV for redundancy because it was heavily marketed as running on CF data centers.
nijave 2 days ago

>can impact their whole infrastructure
CDN and WAF seem to be working fine. I think CF rushed a lot of newer services out without the reliability some of their older/core services enjoy
stri8ted 2 days ago

The same is true for Google.

vimwizard 2 days ago

proxy seems available in general, must just be local to workers because only one of my sites going thru ZT tunnel with identity access rules is affected

ineedaj0b 2 days ago

solar flare?

CoopaTroopa 2 days ago

No, Cloudflare.

joduplessis 2 days ago

Hopefully they also publish the prompt that did this.

daxfohl 2 days ago

They should make the AI lead the postmortem.
tough 2 days ago

i was thinking about this too
vsgherzi 2 days ago

They're just moving fast and breaking things 100x faster. Who cares what code does just vibe it all away /s
artursapek 2 days ago

lmao