← Back to context

Comment by regularfry

5 days ago

Similarly one of our biggest causes of power outages when I worked with a DC was the UPSes. And the biggest causes of data loss were the hardware RAID controllers. Feels like there's a fundamental law lurking under this stuff.

As the complexity of a system increases, the number of single points of failure also tends to increase. Sometimes you can make sure that several subsystems need to fail before the whole system fails. Often, the best you can do is swap one SPoF (e.g. unreliable power grid) for another, more robust SPoF (unreliable UPS).