Comment by schemescape
5 years ago
Obvious in retrospect, but very surprising to my inexperienced past self:
I'd been working on some C code for an hour or two. It wasn't behaving how I expected it to (and at the time I knew nothing of debuggers), so I added a print statement and recompiled. I got a compilation error: something like "Syntax error on line 123: #incl5de <stdio.h>". Shocked, I scrolled to that line in my text editor to fix the typo, but it wasn't there. I compiled the same code again and there were no errors.
Turns out there wasn't a bug in my code! I immediately shut down my computer because my RAM was going bad. To this day, what surprises me most is that my computer was able to successfully boot and behave normally for an hour or two, even though random bits were apparently being flipped.
Reminds me of a rouge NIC, flipping a single bit every now and then, that took down S3: https://news.ycombinator.com/item?id=13859733
I recall a talk about someone registering domains similar to google.com and due to occasional bit flips people landing to these domains.
An article on something like this was linked on HN a few years ago: https://news.ycombinator.com/item?id=2944445
The RAM going bad in my PC was one of the most annoying issues I had to troubleshoot. It started with having my firefox pages randomly crash. first occasionally and then several times a day.
I then started getting errors when playing games with obscure error codes - which yielded nothing when searching them up.
I eventually found a comment in a thread about the crashing game that the RAM could be bad. I ran some diagnostic tests and with the number of errors that came up I was surprised my computer worked at all
> To this day, what surprises me most is that my computer was able to successfully boot and behave normally for an hour or two, even though random bits were apparently being flipped.
With my current computer I overclocked the RAM to the best config I could get memtest to run without errors over the night. The RAM also has ECC and there were no problems reported during normal operation, even when re-compiling (most) packages. But when I got to compiling LLVM the system would crash shortly after logging ECC errors. Backing off the overclock a bit fixed that. So the rate of memory errors can definitely depend on what you are running.
> To this day, what surprises me most is that my computer was able to successfully boot and behave normally for an hour or two, even though random bits were apparently being flipped.
Yup. I once had a machine that would freeze up when I ran package updates. I thought (of course), that the package manager had a bug. Turned out, running the upgrade was the only thing memory-intensive enough to use the faulty memory that I'd installed. After all, on a light system you can totally boot and run in <1GB of RAM...