Comment by paulsutter

9 years ago

You need ECC /and/ pervasive checksumming. There are too many stages of processing where errors can occur. For example, disk controllers or networks. The TCP checksum is a bit of a joke at 16 bits (it will fail to detect 1 in 65000 errors), and even the Ethernet CRC can fail - you need end to end checksums.

http://www.evanjones.ca/tcp-and-ethernet-checksums-fail.html

5 comments

paulsutter

StillBored 9 years ago

I did a bunch of protocol level design in the 90's and one of the handful of things that taught me was _ALWAYS_ use at least a CRC with a standard polynomial. Its just not worth it, in the 2000's I relearned the lesson when it comes to data at rest (on disk/etc). If nothing else both of those will catch "bugs" rather than silently corrupting things and leading to mysteries long after the initial data was corrupted.

I just had this discussion (about why TCP's checksum was a huge mistake) a couple days ago. That link is going to be useful next time it comes up.

lomnakkus 9 years ago

Too many stages... for what? You haven't stated what the criteria for 'recovery' (for lack of a better word) are. What is the (intrisic) value of the data?

Personally, I'm a bit of a hoarder of data, but honestly, if X-proportion of that data were to be lost... it probably wouldn't actually affect my life substantially even though I feel like it would be devastating.

MichaelMoser123 9 years ago

Crc checksums can be wrong if you have multiple bit errors like runs of zeros. (This resets the polynomial computation) http://noahdavids.org/self_published/CRC_and_checksum.html

but crc is good to check against single bit errors.