Comment by pell

3 months ago

It’s that time of the year again where we all realize that relying on AWS and Cloudflare to this degree is pretty dangerous but then again it’s difficult to switch at this point.

If there is a slight positive note to all this, then it is that these outages are so large that customers usually seem to be quite understanding.

57 comments

pell

isodev 3 months ago

Unless you’re say at airport trying to file a luggage claim … or at the pharmacy trying to get your prescription. I think as a community we have a responsibility to do better than this.

ChrisMarshallNY 3 months ago
> I think as a community we have a responsibility to do better than this.
I have always felt so, but my opinion is definitely in the minority.
In fact, I find that folks have extremely negative responses to any discussion of improving software Quality.
- mosura 3 months ago
  
  Merely reducing external dependencies causes people to come out in rashes.
  A large proportion of “developers” enjoy build vs buy arguments far too much.
- abustamam 3 months ago
  
  I always see such negative responses when HN brings up software bloat ("why is your static site measured in megabytes").
  Now that we have an abundance of compute and most people run devices more powerful than the devices that put man on the moon, it's easier than ever to make app bloat, especially when using a framework like Electron or React Native.
  People take it personally when you say they write poor quality software, but it's not a personal attack, it's an observation of modern software practices.
  And I'm guilty of this, mainly because I work for companies that prioritize speed of development over quality of software, and I suspect most developers are in this trap.
  
  2 replies →
sigilis 3 months ago
You aren’t cloudflare’s customer in these examples. It depends on the companies that are actually paying for and using the service to complain. Odds are that they won’t care on your behalf due to how our society is structured.
Not really sure how our community is supposed to deal with this.
- isodev 3 months ago
  
  “We” are the ones making the architecture and the technical specs of these services. Taking care for it to still work when your favourite FAANGMC is down seems like something we can help with.

dlisboa 3 months ago

> If there is a slight positive note to all this, then it is that these outages are so large that customers usually seem to be quite understanding.

Which only shows that chasing five 9s is worthless for almost all web products. The idea is that by relying on AWS or Cloudflare you can push your uptime numbers up to that standard, but these companies themselves are having such frequent outages that customers themselves don't expect that kind reliability from web products.

tommica 3 months ago

> It’s that time of the year again

It's monthly by now

lbreakjai 3 months ago

If I choose AWS/cloudflare and we're down with half of the internet, then I don't even need to explain it to my boss' bosses, because there will be an article in the mainstream media.

If I choose something else, we're down, and our competitors aren't, then my overlords will start asking a lot of questions.

stevepotter 3 months ago

Yup. AWS went down at a previous job and everyone basically took the day off and the company collectively chuckled. Cloudflare is interesting because most execs don’t know about it so I’d imagine they’d be less forgiving. “So what does cloudflare do for us exactly? Don’t we already have aws?”
jfengel 3 months ago
And if everyone else is down, and you are not, you will get no credit.
- lbreakjai 3 months ago
  
  Or _you_ aren't down, but a third-party you depend on is (auth0, payment gateway, what have you), and you invested a lot of time and effort into being reliable, but it was all for less than nothing, because your website loads but customers can't purchase, and they associate the problem with you, not with the AWS outage.
  
  1 reply →
- trollbridge 3 months ago
  
  Right. Whereas if we get whacked with a random DDoS, that's my fault.
timeon 3 months ago

In reality it is not half of the internet. That is just marketing. I've personally noticed one news site while others were working. And I guess sites like that will get the blame.

fusl 3 months ago

Happy to hear anyone's suggestions about where else to go or what else to do in regards to protecting from large-scale volumetric DDoS attacks. Pretty much every CDN provider nowadays has stacked up enough capacity to tank these kind of attacks, good luck trying to combat these yourself these days?

trollbridge 3 months ago

Somehow KiwiFarms figured it out with their own "KiwiFlare" DDOS mitigation. Unfortunately, all of the other Cloudflare-like services seem exceptionally shady, will be less reliable than Cloudflare, and probably share data with foreign intelligence services I have even less trust for than the ones Cloudflare possibly shares them with.
bandrami 3 months ago

Is a DDOS more frequent and/or worse than stochastic CDN outages?
isodev 3 months ago
Anubis and/or Bunny are good alternatives/combination depending on your exact needs
- https://anubis.techaro.lol/
- https://bunny.net/
- fusl 3 months ago
  
  Unfortunately Anubis doesn't help where my pipe to the internet isn't fat enough to just eat up all the bandwidth that the attacker has available. Renting tens of terabits of capacity isn't cheap and DDoS attacks nowadays are in the scale of that. BunnyCDN's DDoS protection is unfortunately too basic to filter out anything that's ever so slightly more sophisticated. Cloudflare's flexibility in terms of custom rulesets and their global pre-trained rulesets (based on attacks they've seen in the past) is imo just unbeatable at this time.
  
  2 replies →
- RKFADU_UOFCCLEL 3 months ago
  
  Why do people on a technical website suggest this? It's literally the same snake oil as Cloudflare. Both have an endgame of total web DRM; they want to make sure users "aren't bots". Each time the DRM is cracked, they will increase its complexity of the "verifier". You will be running arbitrary code in your big 4 browser to ensure you're running a certified big 4 browser, with 10 trillion man hours of development, on an certified OS.
  
  2 replies →
- Doman 3 months ago
  
  bunny.net is not reachable for me too... really funny
  https://imgur.com/a/8gh3hOb
  
  2 replies →
q3k 3 months ago
Just accept that a DDoS might happen and that there's nothing you can do about it. It's fine, it's just how the Internet works.
- herbst 3 months ago
  
  That was possible when a DDos was usually still an occasional attack by a bad actor.
  Most time I get ddosed now it's either Facebook directly, Something something Azure or any random AI.
  
  7 replies →
- peanut-walrus 3 months ago
  
  So accept that your customers won't be able to use your services whenever some russian teenager is bored? Yeah, good luck with justifying that choice.
  
  3 replies →

weird-eye-issue 3 months ago

Oh no, we had 30 minutes of downtime this year :(

CableNinja 3 months ago
5 9's is like 7 minutes a year. They are breaking SLAs and impacting services people depend on
Tbh though this is sort of all the other companies fault, "everyone" uses aws and cf and so others follow. now not only are all your chicks in one basket, so is everyone elses. When the basket inevitably falls into a lake....
Providers need to be more aware of their global impact in outages, and customers need to be more diverse in their spread.
- world2vec 3 months ago
  
  99.999% availability is around 5 minutes or so of downtime per year.
- weird-eye-issue 3 months ago
  
  > Providers need to be more aware of their global impact in outages
  So you think the problem is they aren't "aware"?
  
  2 replies →
pell 3 months ago
I do think this is tenable as long as these services are reliable. Even though there have been some outages I would argue that they’re incredibly reliable at this point. If though this ever changes the costs to move to a competitor won’t be as simple as pushing a repository elsewhere, especially for AWS. I think that’s where some of the potential danger lies.
- swyx 3 months ago
  
  > 30 minutes of downtime
  > this is tenable as long as these services are reliable
  do you hear yourself, this is supposed to be a distributed CDN. imagine if HTTP had 30 minutes of downtime a year.
  and judging by the HN post age, we're now past minute 60 of this incident.
  
  4 replies →
- weird-eye-issue 3 months ago
  
  > especially for AWS
  CF can be just as difficult if not more to migrate off of especially when using things like durable objects