Comment by davidmr

9 years ago

In my limited experience, their root cause analyses are really impressive as well with lots of internal attention and resources. I'm not allowed to talk about any Intel issues, but we reported a very strange issue to Nvidia, sent a couple of dozen cards back and six months later got a truly fascinating report back we with hundreds of pages of compute test result tables and electron microscope images and chemistry lab reports. Anything that hints of a manufacturing problem is taken incredibly seriously.

I worked for a large company that used thousands of Intel CPUs every year and when we suspected a CPU bug we were mostly brushed off. We had a very persistent person on the team who kept tracking the issue to find correlations and some very good kernel developers that went on to nearly pin-point the issue and only then did Intel pay attention and it then took them still several months to acknowledge the issue give a brief report on the issue and acknowledge that our proposed workaround will indeed work.

I've never seen Intel do a very good job at failure analysis or following on with failures unless prodded very hard.

  • Intel likely has very thorough data on the issue, but you'll never see it unless you are one of their tier 1 customers* and have an NDA with them. In my experience working for a large hardware manufacturer they are very skittish about releasing detailed failure analysis data to outside companies.

    * For Intel that would be companies like Dell, Apple, HP, and maybe a couple of others.

  • To be fair, I don't even want to think about the amount of spurious bug reports they are probably receiving.

This is because once upon a time people were put in jail for not doing that (when the customer was the DoD).