Comment by iLoveOncall
3 hours ago
> I truly believe they're really going to make resilience their #1 priority now
I hope that was their #1 priority from the very start given the services they sell...
Anyway, people always tend to overthink about those black-swan events. Yes, 2 happened in a quick succession, but what is the average frequency overall? Insignificant.
This is Cloudflare. They've repeatedly broken DNS for years.
Looking across the errors, it points to some underlying practices: a lack of systems metaphors, modularity, testability, and an reliance on super-generic configuration instead of software with enforced semantics.
I think they have to strike a balance between being extremely fast (reacting to vulnerabilities and DDOS attacks) while still being resilient. I don't think it's an easy situation