Comment by Coffeewine
12 hours ago
It would be interesting to see if the failure rate across time holds true after a rocket launch and time spent in space. My guess is that it wouldn’t, but that’s just a guess.
12 hours ago
It would be interesting to see if the failure rate across time holds true after a rocket launch and time spent in space. My guess is that it wouldn’t, but that’s just a guess.
I think it's likely the overall rate would be higher, and you might find you need more aggressive burn-in, but even then you'd need an extremely high failure rate before it's more efficient to replace components than writing them off.
The bathtub curve isn’t the same for all components of a server though. Writing off the entire server because a single ram chip or ssd or network card failed would limit the entire server to the lifetime of the weakest part. I think you would want redundant hot spares of certain components with lower mean time between failures.
We do often write off an entire server because a single component fails because the lifetime of the shortest-lifetime components is usually long enough that even on-earth with easy access it's often not worth the cost to try to repair. In an easy-to-access data centre, the component most likely to get replaced would be hot-swappable drives or power supplies, but it's been about 2 decades since the last time I worked anywhere where anyone bothered to check for failed RAM or failed CPUs to salvage a server. And lot of servers don't have network devices you can replace without soldering, and haven't for a long time outside of really high end networking.
And at sufficient scale, once you plan for that it means you can massively simplify the servers. The amount of waste a sever case suitable for hot-swapping drives adds if you're not actually going to use the capability is massive.