← Back to context

Comment by lumost

6 hours ago

I used to build and operate data center infrastructure. There is very limited reason to do anything more than a warranty replacement on a GPU. With a high quality hardware vendor that properly engineers the physical machine, failure rates can be contained to less than .5% per year. Particularly if the network has redundancy to avoid critical mass failures.

In this case, I see no reason to perform any replacements of any kind. Proper networked serial port and power controls would allow maintenance for firmware/software issues.