← Back to context

Comment by k8sToGo

5 hours ago

It's not about outages. It's about the why. Hardware can fail. Bugs can happen. But to continue a roll out despite warning sings and without understanding the cause and impact is on another level. Especially if it is related to the same problem as last time.

And yet, it's always clownflare breaking everything. Failures are inevitable, which is widely known, therefore we build resilience systems to overcome the inevitable

  • It is healthy for tech companies to have outages, as they will build experience in resolving them. Success breeds complacency.

    • You don't need outages to build experience in resolving them, if you identify conditions that increase the risk of outages. Airlines can develop a lot of experience resolving issues that would lead to plane crashes, without actually crashing any planes.