Comment by dqpb
3 years ago
> Once you have done that you need at least two failures (underlying issue + safety, hot+cold, or two interacting systems).
Ah, I missed the part where he said - except for distributed systems. The thing is, effectively all systems are distributed systems with two or more interacting subsystems.
And no, I'm not talking about immature systems or ones where failure is acceptable. Queuing issues, for example, are well known to cause to cascading effects, and are not trivial to identify or solve.
Even basic correctness issues can be very difficult to identify if you have a large permutation space and no model checking, and will also cascade.
No comments yet
Contribute on Hacker News ↗