← Back to context

Comment by nine_k

8 hours ago

Given a list of estimates of failure probabilities, finding the right mix of redundancy becomes a very tractable problem, maybe even freshman-level.

Getting the probabilities could be very difficult though, especially for issues that never occurred before.

  • The fault tolerance is mostly focused on background radiation flipping bits. We've got half a century of data on the frequency of those upsets and the extent to which they're correlated under different space conditions for that, not to mention the ability to irradiate prototypes of the flight computer with representative amounts of shielding in ground based facilities...

  • For issues that have never occurred before, probabilities are the wrong tool. The right thing to do is list all the behaviour the vehicle must never exhibit and think of ways it still might, despite all redundancies -- maybe even despite every single component working as intended.

    Lots of mission failures in history were caused by unexpected interactions between fully functional components. Probabilities of failures don't help with that.

    • And why you test till failure (ideally under real/similar conditions): to surface the failures that have never occurred before, and start collecting data on them.