Comment by farhadhf

7 hours ago

Pretty much everything is down (checking from the Netherlands). The Cloudflare dashboard itself is experiencing an outage as well.

Not-so-funny thing is that the Betterstack dashboard is down but our status page hosted by Betterstack is up, and we can't access the dashboard to create an incident and let our customers know what's going on.

Edit: wording.

Yep that's also my experience. Except HN because it does not use *** Cloudflare because it knows it is not necessary. I just wrote a blog titled "Do Not Put Your Site Behind Cloudflare if You Don't Need To" [1].

[1]: https://huijzer.xyz/posts/123/

  • Sadly, AI bots and crawlers have made CF the only affordable way to actually keep my sites up without incurring excessive image serving costs.

    Those TikTok AI crawlers were destroying some of my sites.

    Millions of images served to ByteSpider bots, over and over again. They wouldn't stop. It was relentless abuse. :-(

    Now I've just blocked them all with CF.

    • I don't understand. What exactly are they doing, what are their goals? I'm not trying to argue, I genuinely don't get it.

      edit: I guess I understand "AI bots scraping sites for data to feed LLM training" but what about the image serving?

    • > Now I've just blocked them all with CF.

      You realize it was possible to block bad actors before Cloudflare right? They just made it easier, not possible in the first place.

      2 replies →

  • Yes, I never understand this obsession for centralized services like Cloudflare. To be fair though, if our tiny blogs anyway had a hundred or so visitors monthly, does it matter if it had an outage for a day?

    • I think partially is not having to worry about certs is a nice reason to hide behind the proxy. Also, to help hide your IP address, I guess.

      Of course, on the other hand, I know that relying on Cloudflare cert's is basically inviting a MITM attack.

      5 replies →

  • Last time I tried this I got DDoS'd so I don't see a reason to step away from CF. That said, this is the price I pay

  • ~~two~~ three comments on that:

    1. DDOS protection is not the only thing anymore, I use cloudflare because of vast amounts of AI bots from thousands of ASNs around the world crawling my CI servers (bloated Java VMs on very undersized hosts) and bringing them down (granted, I threw cloudflare onto my static sites as well which was not really necessary, I just liked their analytics UX)

    2. the XKCD comic is mis-interpreted there, that little block is small because it's a "small open source project run by one person", cloudflare is the opposite of that

    3. edit: also cloudflare is awesome if you are migrating hosts, did a migration this past month, you point cloudflare to the new servers and it's instant DNS propagation (since you didnt propagate anything :) )

It’s that time of the year again where we all realize that relying on AWS and Cloudflare to this degree is pretty dangerous but then again it’s difficult to switch at this point.

If there is a slight positive note to all this, then it is that these outages are so large that customers usually seem to be quite understanding.

  • Unless you’re say at airport trying to file a luggage claim … or at the pharmacy trying to get your prescription. I think as a community we have a responsibility to do better than this.

    • > I think as a community we have a responsibility to do better than this.

      I have always felt so, but my opinion is definitely in the minority.

      In fact, I find that folks have extremely negative responses to any discussion of improving software Quality.

      2 replies →

    • You aren’t cloudflare’s customer in these examples. It depends on the companies that are actually paying for and using the service to complain. Odds are that they won’t care on your behalf due to how our society is structured.

      Not really sure how our community is supposed to deal with this.

      1 reply →

  • > If there is a slight positive note to all this, then it is that these outages are so large that customers usually seem to be quite understanding.

    Which only shows that chasing five 9s is worthless for almost all web products. The idea is that by relying on AWS or Cloudflare you can push your uptime numbers up to that standard, but these companies themselves are having such frequent outages that customers themselves don't expect that kind reliability from web products.

  • If I choose AWS/cloudflare and we're down with half of the internet, then I don't even need to explain it to my boss' bosses, because there will be an article in the mainstream media.

    If I choose something else, we're down, and our competitors aren't, then my overlords will start asking a lot of questions.

    • Yup. AWS went down at a previous job and everyone basically took the day off and the company collectively chuckled. Cloudflare is interesting because most execs don’t know about it so I’d imagine they’d be less forgiving. “So what does cloudflare do for us exactly? Don’t we already have aws?”

    • In reality it is not half of the internet. That is just marketing. I've personally noticed one news site while others were working. And I guess sites like that will get the blame.

  • Happy to hear anyone's suggestions about where else to go or what else to do in regards to protecting from large-scale volumetric DDoS attacks. Pretty much every CDN provider nowadays has stacked up enough capacity to tank these kind of attacks, good luck trying to combat these yourself these days?

  • Oh no, we had 30 minutes of downtime this year :(

    • 5 9's is like 7 minutes a year. They are breaking SLAs and impacting services people depend on

      Tbh though this is sort of all the other companies fault, "everyone" uses aws and cf and so others follow. now not only are all your chicks in one basket, so is everyone elses. When the basket inevitably falls into a lake....

      Providers need to be more aware of their global impact in outages, and customers need to be more diverse in their spread.

      4 replies →

    • I do think this is tenable as long as these services are reliable. Even though there have been some outages I would argue that they’re incredibly reliable at this point. If though this ever changes the costs to move to a competitor won’t be as simple as pushing a repository elsewhere, especially for AWS. I think that’s where some of the potential danger lies.

      6 replies →

Cloudflare dashboard is down-ish, not totally down. If you're persistent you can turn off the turnstile and proxy.

It took a few minutes but I got https://hcker.news off of it.

  • I can't sign in since Turnstile is down so I can't complete the captcha to log in.

    I also can't log in via Google SSO since Cloudflare's SSO service is down.

  • I'm already logged in on the cloudflare dashboard and trying to disable the CF proxy, but getting "404 | Either this page does not exist, or you do not have permission to access it" when trying to access the DNS configuration page.

  • Not saying not to do this to get through, but just as an observation, it’s also the sort of thing that can make these issues a nightmare to remediate, since the outage can actually draw more traffic just as things are warming up, from customers desperate to get through.

    But then, that’s what Cloudflare signed up to be.

I think there is a big business opportunity here. Make a site that let companies put their status update on local vps for $100.

Could always just use a status page that updates itself. For my side project Total Real Returns [1], if you scroll down and look at the page footer, I have a live status/uptime widget [2] (just an <img> tag, no JS) which links to an externally-hosted status page [3]. Obviously not critical for a side project, but kind of neat, and was fun to build. :)

[1] https://totalrealreturns.com/

[2] https://status.heyoncall.com/svg/uptime/zCFGfCmjJN6XBX0pACYY...

[3] https://status.heyoncall.com/o/zCFGfCmjJN6XBX0pACYY

  • This is unrelated to the cloudflare incident but thanks a lot for making that page. I keep checking it from time to time and it's basically the main data source for my long term investing.

Same here. We’re using OhDear. The status page is available but I can’t post an incident because their service is also behind Cloudflare.

  • Co-founder here, we'll be working on better ways to handle this over the coming days.

    Update: our app is available again without Cloudflare, you'll be able to post updates to status pages smoothly again.

All my stuff is working. Things on GCP. Things on Fly.io. Tooling I use.

"Only" 10% of the internet is behind Cloudflare so far ;)

  • Happy for you :)

    I am curious about these two things:

    1- Does GCP also have any outages recently similar to AWS, Azure or CF? If a similar size (14 TB?) DDoS were to hit GCP, would it stand or would it fail?

    2- If this DDoS was targeting Fly.io, would it stand? :)

    • I actually spoke too soon, and accept I have egg on my face!

      Apparently prisma's `npm exec prisma generate` command tries to download "engine binaries" from https://binaries.prisma.sh, which is behind... guess what...

      So now my CI/CD is broken, while my production env is down, and I can't fix it.

      Amazing lol

    • For GCP network that would be a rounding error. Of course GCP sometimes has outages too, all providers do.

BetterStack did report issues with some of their services, but they were not very informative.

When its back up, do yourself a favour and rent a $5/mo vps in another country from a provider like OVH or Hetzner and stick your status page on that.

"Yes but what if they go down" - it doesnt matter, having it hosted by someone who can be down for the same reason as your main product/service is a recipe for disaster.

  • Definitely. Tangentially, I encountered 504 Gateway Timeout errors on cloudflarestatus.com about an hour ago. The error page also disclosed the fact that it's powered by CloudFront (Amazon's CDN).

  • https://cachethq.io/ is great for this

    • Been using Cachet for quite a while before inevitably migrating to Atlassian's Statuspage.io. I'm a huge fan of self-hosting and self-managing every single thing in existence but Cachet was just such a PITA to maintain and there was just no other good alternative to Cachet that was also open source.

Seems like workers are less affected and maybe betterstack has decided to bypass cloudflare "stuff" for the status pages? (maybe to cut down costs). My site is still up though some GitHub runners did show it failed at certain points.

  • I have a workers + kv app that seems fine right now.

    • Pretty sure they went down for a while because I have 4xx errors they returned but apparently it was short-lived. I wonder if their workers infra. failed for a moment and that let to a total collapse of all of their products?

I don't get why you need such a service for a status page with 99.whatever% uptime. I mean, your status page only has to be up if everything else is down, so maybe 1% uptime is fine.

/s