Comment by vlovich123
1 hour ago
That is also true at Cloudflare for what it’s worth. However, the company is so big that there’s so many different products all shipping at the same time it can be hard to correlate it to your release, especially since there’s a 5 min lag (if I recall correctly) in the monitoring dashboards to get all the telemetry from thousands of servers worldwide.
Comparing the difficulty of running the world’s internet traffic with hundreds of customer products with your fintech experience is like saying “I can lift 10 pounds. I don’t know why these guys are struggling to lift 500 pounds”.
> However, the company is so big that there’s so many different products all shipping at the same time it can be hard to correlate it to your release
This kind of thing would be more understandable for a company without hundreds of billions of dollars, and for one that hasn't centralized so much of the internet. If a company has grown too large and complex to be well managed and effective and it's starting to look like a liability for large numbers of people there are obvious solutions for that.
Genuinely curious, how to actually implement detection systems for a large scale global infra which that works with < 1 minute SLO ? Given cost is no constraint.
Can you name a major cloud provider that doesn’t have major outages?
If this were purely a money problem it would have been solved ages ago. It’s a difficult problem to solve. Also, they’re the youngest of the major cloud providers and have a fraction of the resources that Google, Amazon, and Microsoft have.
> Can you name a major cloud provider that doesn’t have major outages?
That fact that no major cloud provider is actually good is not an argument that cloudflare isn't bad, or even that they couldn't/shouldn't do better than they are. They have fewer resources than Google or Microsoft but they're also in a unique position that makes us differently vulnerable when they fuck up. It's not all their fault, since it was a mistake to centralize the internet to the extent that we have in the first place, but now that they are responsible for so much they have to expect that people will be upset when they fail.