GCP Outage

6 months ago (status.cloud.google.com)

It looks like that it is a central service @ Google called Chemist that is down.

"Chemist checks the project status, activation status, abuse status, billing status, service status, location restrictions, VPC Service Controls, SuperQuota, and other policies."

-> This would totally explain the error messages "visibility check (of the API) failed" and "cannot load policy" and the wide amount of services affected.

cf. https://cloud.google.com/service-infrastructure/docs/service...

EDIT: Google says "(Google Cloud) is down due to Identity and Access Management Service Issue"

Getting a lot of errors for Claude Sonnet 4 (Cursor) and Gemini Pro.

Nooooo I'm going to have to use my brain again and write 100% of my code like a caveman from December 2024.

Cloudflare is down too. From https://www.cloudflarestatus.com:

Update - We are seeing a number of services suffer intermittent failures. We are continuing to investigate this and we will update this list as we assess the impact on a per-service level.

Impacted services: Access WARP Durable Objects (SQLite backed Durable Objects only) Workers KV Realtime Workers AI Stream Parts of the Cloudflare dashboard Jun 12, 2025 - 18:48 UTC

Edit: https://news.ycombinator.com/item?id=44261064

  • Seems like a major wtf if Cloudflare is using GCP as a key dependency.

    • Some day Cloudflare will depend on GCP and GCP will depend on Cloudflare and AWS will rely on one of the two being online and Cloudflare will also depend on AWS and the internet will go down and no one will know how to restart it

      7 replies →

Everything appears to be down as of 18:43 UTC... https://downdetector.com/

The status page is green, but there are outages reported: https://downdetector.com/status/google-cloud/

What's crazy is that RCS messaging is down as a result of this outage. It shows how poorly the technology or infrastructure was designed.

Cloudflare Outage also just updated

> Cloudflare’s critical Workers KV service went offline due to an outage of a 3rd party service that is a key dependency. As a result, certain Cloudflare products that rely on KV service to store and disseminate information

Does anyone know of a good dashboard to check for such BGP routing anomalies as (apparently) this one? I am currently digging around https://radar.cloudflare.com/routing but it doesn't show which routes were actually leaked.

I would love if anyone has any good tool recommendations!

thank god hn is hosted on a single bare metal server, free of all this bloat.

Smells like BGP since there are services people claim have nothing to do with GCP being affected. OpenRouter is down, Lovable is down, etc.

If Google Chat is down per https://www.google.com/appsstatus/dashboard/, the ability for Google engineers to communicate among themselves impaired, despite SREs having IRC as a backup.

> No major incidents

… Proceeds to show worldwide degraded service level alerts.

  • Yep. Self-reporting status pages are pretty near worthless. At my former large company (not FAANG), we weren't allowed to update the status page until we got VP approval, which also required approval from both PR and Legal. It would take a lot more time and effort to get those approvals than to just fix the problem and move on.

    • SLA contracts, clawbacks, and performance obligations make these pages a bit of a minefield for CSPs. When I was at a top-tier CSP, we had the status page that was public, one that was for a trusted tier of customers, one built for a customer-by-customer basis, and one for internal engineering.

      1 reply →

Sorry, after decades of being hard wired, I just installed a PCIe Wifi6 card on my desktop. Internet took a dive the second I got it connected. Must have done something wrong.

Status pages at cloud providers aren't usually based in reality -- usually requires VP level political games to actually get them changed especially for serious outages.

Would be comedy if one of the progenitors of this took Sundar’s buyout offer yesterday and let the world burn today.

Kinda funny that the top post on HN titled "GCP Outage" links to the Google Cloud status page which shows...no outage.

Does anyone know if it's region-specific? We're experiencing it and are in us-west-1.

https://www.cloudflarestatus.com/ is showing outage, which cause google gcp outage, claude outage, firbase outage https://status.firebase.google.com/

  • How would Cloudflare's outage cause a GCP outage?

    I'm sure it's not entirely impossible, but sounds backwards to me. Sure - a lot of the internet relies on Cloudflare, but I'd be very surprised if GCP had a direct dependency on Cloudflare, for a lot of reasons. Maybe I misunderstood your comment?

This appears to be continuing to cascade over an hour later... wow... more and more services mentioned as completely down on the outage page.

Kind of nice to not be glued to AI chat prompts for a while to be honest.

https://status.cloud.google.com/incidents/ow5i3PPK96RduMcb1S...

> Multiple GCP products are experiencing impact due to Identity and Access Management Service Issue

IAM issue huh. The post-mortem should be interesting at least.

  • Ha. With all this soviet style euphemism I rather read the onion instead.

    • It’s not a euphemism - every outage, including the 99.9% that don’t end up on HN gets a postmortem document written about it, which is almost always a fascinating discussion of the technical, cultural and organisational situation that led to an unexpected bad thing happening.

      Even a few years ago senior management knew to stay the fuck out except for asking for more info.

Google Maps not loading, thought it was my 4g, go to see if my connection works by loading Hacker News, GCP Outage XD

Let's say a typical base service (network attached RAM or whatever) has 99.99% reliability. If you have a dependency on 100 of those, you're suddenly closer to 99% reliability. So you switch to higher-level dependencies, and only have 10 dependencies, for a 99.9% reliability. But! It turns out, those dependencies each have dependencies, so they're really already more like 99.9% at best, and you're back at 99% reliability.

"good enough" is, indeed, just good enough to make it not worthwhile to rip out all the upstreams and roll your own everything from scratch, because the cost of the occasional outages is much lower than the cost of reinventing every single wheel, nut, bolt, axle, bearing, and grease formulation.

Looks like I'm about to start learning which of my time-killing websites are hosted on GCP - The Ringer is down, and since Spotify owns them and is a major GCP customer, it looks like they've been hit by this. CRAZY that the GCP status page is still green.

One of these days in which the young engineers learn the concept of 'counterparty risk'.

I wonder what the damage ($) for having a good portion of the internet down for an hour or two ;)

Just our bi-yearly reminder of our over reliance on cloud providers for literally everything. Can't say there's an answer beyond trying to build more independent tech but we know how that goes.

  • Yet migration to the cloud continues, driven by people arguing that doing it yourself is too complicated and expensive. Let’s see how long until one outage takes down the global economy for multiple days or weeks.

    • "The cloud" isn't the problem nor is migration to it. The problem is single points of failure in a vast, super-connected, global network.

      To demonstrate my point, say someone like cloudflare opted to get "off cloud" and run their own datacenters. Half the web would still go down if they had some issue in their datacenters. If anything, the economy of scale and resiliency of a huge cloud network is far beyond what any single operator can ensure for their own service. If this wasn't the case, cloud services couldn't be as profitable as they are.

      It isn't the panacea people seem to think it is. One critical service going down, regardless of whether it's cloud hosted or not, has ripple effects in the broader network.

  • Hilariously, I did not know about any outages today during the workday because we discourage cloud service usage and nobody complained about anything breaking. :)

Google denies the outage. https://x.com/Google/status/1933246051512644069

Was just about to do a demo, but Google Meet was down. Tried to use Jitsi as a fallback, but couldn't log in because Firebase was down too. Ended up using a Slack Huddle, lol.

For us Cloud SQL instances are toast but App Engine Standard instances are still serving requests. Google Cloud console is borked too, mostly just erroring out.

some core GCP cloud services are down. might be a good time for GCP dependent people to go for a walk, do some stretches, and check back in a couple hours.

Seems like a wider issue at Google than just GCP, the Sheets and Chat APIs are also returning similar "Visibility check was unavailable" errors.

Haha, I don't ordinarily spend a lot of time in the Google Cloud Console but just now I was debugging a squirrely OAuth issue with reCAPTCHA failing to refresh several days running. I'm getting this weird page error, and I think, "Is this an issue with my organization? [futz futz futz] Hey wait is GCP actually down?" And it turns out to be the top discussion on HN. XD

The last few times this happened I wouldn't have thought "So this is the day AI takes over".

But this time...

Spotify was not loading, thought my 5G was bad, used YouTube Music instead without issues. Hmmm...

Cloudbuild completely down for us. Getting "Visibility check was unavailable" errors.

Any chance this is the root being that so many different services are effected? https://github.com/kubernetes/kops/issues/17433

Twitch was broken too: https://status.twitch.com/incidents/b79nyp1yhxql

EDIT: Updated link to point to the specific incident.

When Google said GCP is "down", did it affect entire availability zones within a region? For people who designed redundant infrastructure, did your backup AZs/regions keep your systems online?

  • The outage was global. For my team specifically, a global Identity and Access Management outage meant that our internal service accounts could not refresh their short-lived access tokens and so different parts of our infrastructure began to fail over the course of an hour or so, regardless of what region or zone they were in. Services were up, but they could not access critical GCP services because of auth-related issues which resulted in internal service errors for us.

    To give an example, our web servers connect to our GCP CloudSQL database via a Cloud SQL Auth Proxy (there's also a connection pooler in between but that also stayed up). The connection to the proxy was always available, but the proxy wasn't able to renew auth tokens it uses to tunnel to the database, regardless of where the webserver or database happened to be located. To mitigate this in the future we're planning to stop using the auth proxy and connect directly via mutual TLS but now it means we have to manage TLS certificates.

Our GCP workloads are unavailable across several US regions. The GCP console is intermittently unavailable for most pages.

Crossing my fingers for a quick resolution.

My firebase hosting and firestore db are back online, but GCP console and Google SQL instances are still having serious issues as of 7:00pm UTC.

Ahhh, explains why some of my apps are going crazy... Couldn't read a message from my kids pre-school

Thankfully we use AWS at work for everything critical

if all services at down at once, no one is thinking or mentioning a potential attack on US cloud providers ? (China or Russia) Maybe ?

GCP status page now reflect the issues, looks like Google Cloud Dataproc, Google Cloud Storage and Identity & Access Management

Same here. Even the page to submit support requests is down.

Cloud console does nothing.

They should host their support services on AWS and vice-versa.

  • I just logged into several of my GCP accts, everything popped up, multiple home regions.. I wonder what % of folks are feeling this right now.

We're in us-west-1 and seeing issues across Cloud Run, Cloud SQL, Cloud Storage and Compute Engine.

I'm able to login to the GCP dashboard, but it isn't able to find any of my projects.

Claude Code is down :( too lazy to do manual conversion from Cocoapods dependency to SwiftPM

reCAPTCHA affected? I couldn't log into my local utilities website due to a reCAPTCHA error. Downdetector agrees, but I interpret that site as dubious.

Not just GCP, most of Googles services are out of action

  • I'm on a meet, in cal, editing a dozen docs, in GCP, pushing commits and launching containers; it's not clear yet what exactly is going on but it's certainly intermittent and sparse, at least so far

    • stop it. you're overloading their system by doing three things at once. let the rest of us have a turn.

if everything down at the same time - No one is mentioning an attack on us cloud services ? ( China or Russia ) Maybe ?

mapbox maps seemed to be down for a few minutes about an hour ago. I wonder if it is related.

"All locations except us-central1 have fully recovered. us-central1 is mostly recovered. We do not have an ETA for full recovery in us-central1."

  • An hour later and everything is a mess in central-1. They seemed to jump the gun on that one. Doesn't matter if some dinky service like "AutoML Vision" is working, if GCS isn't, then they shouldn't post an optimistic message.

It's completely nuts that Firebase has this: https://status.firebase.google.com/incidents/ZcF1YDUvpdixZ2e...

"Firebase Data Connect unavailable due to a known Google Cloud global outage"

While the Google Cloud status page https://status.cloud.google.com/ says "No major incidents" and everything is green. So Google Cloud know there is an outage but just deem it not major enough to show it.

Edit to add: within 10 minutes of this post Google updated their status page. More curiously the Firebase page I linked to has been edited to remove mention of Google Cloud in the status and now says "Firebase Data Connect is currently experiencing a service disruption. Please check back for status. ".

  • IIRC status pages drive customer compensation for downtime. Updating it is basically signing the check for their biggest customers, in most similar companies you need a very senior executive to approve the update

    On the other side of this, Firebase probably doesn't have money at stake making the update

    • Nah, its just some client side caching / JS stuff. Clicking the big refresh button fixed it for me, 15 minutes before OP noted it.

      (n.b. as much as Google in aggregate is evil, they're smart evil. You can't avoid execs approving every outage because checks without some paper trail, and execs don't want to approve every outage, you'd have to rely on too many engineers and sales people, even as ex-employees, to keep it a secret. disclaimer: xoogler)

      (EDIT: for posterity, we're discussing a "overall status" thing with a huge refresh button, right above a huge table chockful of orange triangles that indicate "One or more regions affected" - even when the "overall status" was green, the table was still full of orange and visible immediately underneath. My point being, you gotta suppose a wholeeee bunch of stuff to get to the point there was ever info suppressed, much less suppressed intentionally to avoid cutting checks)

  • Something must be preventing them updating the status page at this point. Of course they could still deem it not enough, but just from my limited tests, docker, buf, etc (it may not be GCP that is down, but it is quite the coincidence). are outright down. I'd wager that this is much more widespread.

    • I'm actually on a bridge call with Google Cloud, we're a large customer -- I just learned today that their status page is not automated, instead someone actually manually updates it!

      17 replies →

  • This extra funny that GCP status page even includes a “last updated” time, which is exactly built to convey possible failure to update in cases like this

    No major incident as of “ Last updated time: 12 Jun 2025, 11:48 PDT”

  • Maybe the outage is preventing them from updating that specific page? Hmm

    EDIT: Looks like it has been updated now (6:49 PM UTC)

  • More likely they are unable to update their own status page, but in either case not covering themselves in glory over at GCP right now.

  • AWS has this all the time. If you need to know if a service is down in a region, check for other engineers talking about it on X.

@dang could you merge this and https://news.ycombinator.com/item?id=44260669?

  • No notifications for mentions, have to email the mods at the hn@ email address.

    • I think I was a bit optimistic in the response time from mods. This thread won the popularity contest quite well...

      Thanks for letting me know about emailing the mods, refreshingly explicit to send email.

Borg and K8s were fighting for resources, so Gemini decided to take out DNS. Now a sysAdmin has to step in.

* just trying to add a little humour. pretty stressfull outage. grarr!!

The cloud enables you to scale. It allows us to distribute systems across multiple regions and data centers. Seems that this is true for outages as well.

The PHP application I wrote as a student running on a single self-hosted server had a higher uptime than any of the cloud providers or redundant system I have seen so far. If you don’t need the cloud for scalability, do it yourself and save yourself the trouble and money. Most companies would be better off investing into some IT staff instead of giving away their systems in the hands of some proprietary and insanely complex cloud environment. You are becoming dependent on someone you don’t know, have no control over and can’t talk with directly. Also the single point of failure is just shifting: from your system to whatever system is managing the cloud. Guess one advantage is that you can shift the blame to someone else…