Comment by scott_w
14 hours ago
Semi-related: this type of thing is actually covered in the Site Reliability Engineering book by Google. They highlighted a case of a system that outperformed its SLO, so people depended on it having 100% uptime. They "fixed" this by injecting errors to go closer to their SLA, forcing downstream engineers to deal with the fact that the dependent services would sometimes fail for no reason.
I know it's easier said than done everywhere, just found it to be an interesting parallel.
No comments yet
Contribute on Hacker News ↗