Comment by lorenzhs

9 years ago

So it's okay for software to have bugs that get fixed (I think everybody here acknowledges that software will always have bugs), but Intel isn't allowed to have issues in their processors, even if they can fix them with a software (microcode) update?

In general, people expect hardware to be far more bug-free than all the software which depends on it. I'm disgusted enough at the current trend of continuously releasing half-baked software and "updating" it to fix some of the bugs (while introducing new bugs in the process); and yet it seems to be spreading to hardware too. From the perspective of formal verification, buggy hardware is like an incorrect axiom.

but Intel isn't allowed to have issues in their processors, even if they can fix them with a software (microcode) update?

They could've caught and fixed this one with some more testing, before actually releasing. Formal verification isn't necessarily going to help, if it's a statistical type of defect --- thus the concept of wafer yield, and why dies manufactured from the exact same masks can behave completely differently with respect to the clock speeds attainable and voltages required.

  • Formal verification cannot be used to fully verify an entire Intel CPU, simply because the computational resources required to do so are astronomical. Heck, the cache coherence unit alone is basically impossible to verify.

    Therefore, CPU designers verify the important units (e.g., the ALU) independently of each other, and then try to verify their interaction. But a system level design verification check is simply not possible.

    Obviously, faults that result from fabrication can still take place, so they are tested for using BIST and JTAG. Test coverage can be pretty high, but obviously not 100%.

    As you can see, there are still a ton of places where hardware errors can seep through.

  • Expectations of hardware are indeed higher than for software, but isn't it a valid trade-off for Intel to make as long as most of the issues can be fixed with a microcode update? On Windows, it's updated via Windows Update, and on most Linux distributions, a package exists for CPU microcode. Most users have an easy—if not automatic—way to obtain the updates.

    I believe the main issue with formally verifying everything is that reasoning about parallel code is extremely hard and might well make the entire endeavour unattainable. It would be great to have formally verified CPUs, though.

    > They could've caught and fixed this one with some more testing, before actually releasing.

    Isn't this true of any bug?

If this "fix" ends up disabling HT, how can I get a refund not just for the CPU but for the $3k laptop I spec'd around it? Without needing to sue?

Well, if a software bug leads to data corruption it is not really "allowed" as well. It is quite amazing that there are not more bugs like this. Intel spends an insane amount of money on very large teams of very smart people to work on these insanely complicated systems.

I don't think this is sustainable, and I think that massive arrays of relatively simple processors. First, we will need a culture shift that learns programmers to program concurrent programs from the start, and this will take a lot of time (because technology moves a lot faster than culture).

It would be nice if they allowed users to update rather than vendors. I have a mobile Xeon. I don't know if it's affected or not.

There are hardware bugs that are impossible to fix in microcode. What then?

It's funny, since many of today's software bugs are due to the legacy of C, faults of which can often times be attributed to mimicking the behaviour CPU hardware too closely.

EDIT: My wording was poor, but what I meant was that it's not like hardware vendors do not make bugs happen due to deficiencies in their design, which are not only tolerated, but often reflected in software, which runs counter to OP saying that apparently hardware vendors are not allowed to have bugs, but the software ones are:

> So it's okay for software to have bugs that get fixed (I think everybody here acknowledges that software will always have bugs), but Intel isn't allowed to have issues in their processors

  • Go back to rust-land now :D

    • It was meant as a response to OP basically saying that hardware vendors make way fewer bugs that the software ones, which is true, but many bugs of software can arguably be attributed to hardware design, so it's not like they're without fault. Rust has nothing to do with it, not sure why even bring it up.

      1 reply →