Comment by kevingadd
13 years ago
Guild Wars did this in user mode in the background while playing - it did background testing of the user's CPU, RAM, and GPU. Machines that failed the tests were flagged so that their crash reports got bucketed separately (saving us the time of trying to understand impossible crashes), and it popped up a message telling the user their computer was broken.
So even if the OS should be doing this for you, for long-running processes you could do it yourself in user mode. (I don't know if it's worth the effort, though.)
That's exactly my plan with Redis, and it is awesome to discover that it was used with success in the past! But I've a problem given that I can't access memory at a lower level, that is, when to test and what?
I've the following possibilities basically:
1) Test on malloc(), with a given probability, and perhaps only up to N bytes of the allocation, for latency concerns.
2) Do the same also at free() time.
3) From the time to time test just allocating a new piece of memory with malloc of a fixed size, test it, and free it again.
"3" is the one with the minimum overhead, but probably 1+2 have a bigger percentage of hitting all the pages, eventually...
I don't have a broken memory module to test different strategies, I wonder if there is a kenrel module that simulates broken mem.
Note that Redis can already test the computer memory with 'redis-server --test-memory' but of course this requires user intervention.
That's really smart.