Comment by BadBadJellyBean
7 hours ago
I always wonder how many system crashes that we put on the software or the OS are actually just sub optimal components. Computers are so complex and so fast that just a little bit of instability can probably lead to data corruption.
The optimist in me hopes that the bullwhip effect will lead to cheap ram in a few years, and that the glut allows for the wider adoption and support of ECC memory.
I’d just like to see a repeat of the glut of HBM-backed processors like the Xeon Max 9480 that dipped as low as 900/cpu, about 2000ish all in, and with bandwidth that compares favorably to a 3090.
I've been building computers for my friends and I for 25 years and the two worst "random stability" issues I had were high quality but aging PSUs.
Yup. When building "upcycled" PCs out of used second-to-last-gen components, I learned very quickly to only ever use brand-new, high quality PSUs ... the alternative is insanity
Not just wholesale crashes, but all sorts of misbehavior. For example, cheap WiFi/BT/ethernet can wreak havoc on your connectivity and out of spec USB peripherals can cause all sorts of problems. Both can bring sleep/power saving problems.
Most people using computers aren't technical enough to be able to discern these things, however, and many buy the cheapest thing on the shelf and so these subpar components persist.
It has been 25 years, but back in college I had a job refurbishing and repairing PCs. Most problems were caused by cheap no name hardware. Most the quality hardware rarely had problems.
Maybe when quality hardware has problems, the owner knows how to deal with it, but when no-name hardware has problems, the owner has no clue how to build a computer
Maybe. But then again, as someone who dual boots, I see one of the OS crashing and giving an alround worse experience then the other, on the same exact hardware, while the other just chugs along.
Now, I'm not someone good at maths or physics, so maybe, somehow, it's actually more likely than not that the worse OS gets to run when there's worse solar activity going on or whatever else has en effect on my hardware, which also doesn't seem to affect memtest for some reason.
But the likelihood can't be that high. Can it?
It could easily be flakey hardware and different drivers. Not necessarily better or worse, but one driver cause the hardware to ocassionaly fail in exciting ways, like DMA to the wrong address if jusy the right access patterns happen.
If you've got an IO-MMU and everything aligns properly, devices can't DMA to the wrong place anymore, which might make it easier to track things down.
Sometimes it's both. I had some crazy data corruption problems that turned out to be a one-two punch of a buggy anti-cheat driver from a game I was playing and a defective M.2 SSD slot on my motherboard. Without the combination of both factors everything was fine, but when I played the game with that slot populated, the disk in that slot started getting corrupted and failing to respond to requests from the OS (eventually hanging the system).
Wild troubleshooting adventure.
My sympathies. I've had to track some of those sorts of things down and sometimes you wonder about your sanity.