Comment by ranger207
7 hours ago
> The self-checking pairs ensure that if a CPU performs an erroneous calculation due to a radiation event, the error is detected immediately and the system responds.
How does a pair determine which of the pair did the calculation correctly?
It doesn't have to. It raises an error that the system can detect and take action on. Usually that'll be some combination of interrupt/reset and an external pin to let the rest of the system know what's happened.
What raises the error, and how does the system know that an error has happened? Like, if you have two processors calculating 2+2, and one comes out to 4 and the other to 5, how does the system know which one is correct? Actually, typing it out I think I get it now. It doesn't need to know which one is correct, it just has to redo the calculation if there's ever a disagreement. Then if somehow both processors calculate 2+2=5 simultaneously, the next computer over will disagree and everyone will repeat the calculation, and that's why they have 3 levels of paired redundancy and the chance of 8 simultaneous single-event upsets is low enough for their risk tolerance. Ok, now I get it.
In simple terms, this works by doing an XOR on the outputs and if they disagree, performing a fault recovery.
There's also space systems that use 3 processors and a majority vote for the correct output, but that's different.
You just run the calculation again until both agree.