Comment by lumost
6 hours ago
I used to build and operate data center infrastructure. There is very limited reason to do anything more than a warranty replacement on a GPU. With a high quality hardware vendor that properly engineers the physical machine, failure rates can be contained to less than .5% per year. Particularly if the network has redundancy to avoid critical mass failures.
In this case, I see no reason to perform any replacements of any kind. Proper networked serial port and power controls would allow maintenance for firmware/software issues.
No comments yet
Contribute on Hacker News ↗