← Back to context

Comment by psi-squared

10 years ago

AIUI, ZFS was explicitly designed to deal with this sort of data corruption - one of the descriptions of the design I've heard is "read() will return either the contents of a previous successful write() or an error". That would (in principle) prevent the file containing "a boo" or "a far" at any point.

It looks like one of the authors cited in this article has written a paper analysing ZFS - though they admittedly don't test its behaviour on crashes. Citation here, in PDF form:

http://pages.cs.wisc.edu/~kadav/zfs/zfsrel.pdf

(edited to add: This only deals with the second part of this article. The first part would still be important even on ZFS)

Right, Copy-On-Write filesystems (ZFS, Bttr) are explicitly designed to prevent that kind of corruption by never editing blocks in place, but rather copying the contents to a new block and using a journaled metadata update to point the file at it's new block.

ZFS also includes features around checksumming of the metadata. "Silent" write errors become loud the next time data is accessed and the checksums don't match. This can't prevent all errors, but has some very nice data integrity properties - Combined with it's RAID format, you can likely recover from most any failures, and with RAIDZ2, you can recover from a scattered failures on all drives even if one drive has completely died. This is actually fairly common - Modern drives are very large, and rust is more susceptible to 'cosmic rays' than one might think.