← Back to context

Comment by Joker_vD

2 days ago

> you already have buggy compilers and buggy OSes and buggy libraries.

Which run, I must add, on effectively infallible hardware. Most of the software straight up assumes that the CPUs and the RAM will function perfectly and don't bother even trying to detect such failures (unless those failures manifest themselves in a catastrophic manner, the show will simply go on).

So in effect, we also can, and do, build less reliable systems out of more reliable components, and that's how software is different.

>Which run, I must add, on effectively infallible hardware.

Keyword: effective. Hardware is also built on top of components with error tolerances. What do you think ECC memory is for? Or why chips have "yields" and parts that were "turned off" while shipping? Or how thermal throttling happens? Or CPU clocks, which have jitter to be corrected, and tons of other examples, all the way to transistors and capacitors.

And let's not even get into HDs and SSDs.

  • My point is, when you go from hardware to software, there is a noticeable change in attitude and corresponding drop in reliability. Inside hardware, there is a lot of effort to detect and correct glitches and provide reliably correct behaviour; in software, the most common assumption is that the hardware will just work as intended. How many filesystems store checksums? Even back in the days when HDDs were way less reliable? And ECC exists in no small part because writing software that's reliable in the face of memory-corruption (at the very least can detect such corruption) while is possible, it's hard, costly, and produces less performant software (obviously).

    Anyway, a larger point of my comment is that when we go from traditional software to AI, there a) seems to be an even bigger drop in reliability, b) the attitudes are even more cavalier, c) the software people largely lack most of discipline/knowledge required to build more reliable system from less reliable ones — because they spent most of their careers doing exactly the opposite: building software that they know will work incorrectly even on perfect hardware, and being fine with it. "You just tend to accept the risk", my ass. Yes, when it comes to software, people do this, but when it comes to hardware — remember when Intel messed up the division algorithm inside their chips?

    • > remember when Intel messed up the division algorithm inside their chips?

      Yea, and what do you remember about that? Pretty much no normal person noticed before someone constructed a specific test case.

      You know what I also remember? Meltdown, Spectre, which were indeed worked around by software tweaks in OS and compilers. There have been dozens of other microcode patches, etc. You are being too lenient in your assessment of the crap that hardware engineers ship. There's just a lot more of software out there with errors that are in your face, so you tend to notice them immediately. I give you one thing though: precisely because the remediation cost of a software defect is less than a hardware defect once shipped, people are not as worried about validation in most use cases. That's not inherently a mistake though, just intuitive cost-benefit assessment.

      1 reply →

I am not sure if I correctly understood your point. On one dimension, you are basically hinting at another anecdote that proves my point: hardware failure (specifically bit flip in non-ECC memory) is pretty much guaranteed to happen at scale, but people are mostly okay with absorbing that risk. I feel you are overselling the hardware reliability story. For sure, we can build less reliable systems out of reliable components. That goes without saying, and no, that's definitely not software specific. Almost by default most composite systems are less reliable than their primitives (simple example would be nailing two pieces of wood) unless specific care is taken to build in those guardrails or redundancies. The point, however, is it is possible, and there is a vast precedent for it.

You should talk to an electrical engineer or materials scientist about how reliable transistors emerge from noisy voltages in wires.