Comment by somat

5 months ago

I worry about this sometimes, there is this long tail of "reliability" you can chase, redundant systems, processes, voting, failover, "shoot the other node in the head scripts" etc. But everything adds complexity, now it has more moving parts, more things that can go wrong on weird ways. I wonder if the system would be more reliable if it were a lot simpler and stupid, a single box that can be rebooted if needed.

It reminds me of the lesson of the Apollo computers, The AGC was to more famous computer, probably rightfully so, but there were actually two computers, The other was the LVDC, made by IBM for controlling the Saturn V during launch, now it was a proper aerospace computer, redundant everything, a can not fail architecture, etc. In contrast the AGC was a toy, However this let the AGC be much faster and smaller, instead of reliability they made it reboot well, and instead of automatic redundancy they just put two of them.

https://en.wikipedia.org/wiki/Launch_Vehicle_Digital_Compute...

There is something to be learned here, I am not exactly sure what is is, worse is better?

0 comments

somat

No comments yet

Contribute on Hacker News ↗