← Back to context

Comment by maximente

6 years ago

i wouldn't use ZFS either. my guess is 90% of ZFS users have never run failure scenarios and grappled with potential failure modes of ZFS, nor even know that you really need ECC RAM to run ZFS without fear of existential data corruption due to bit flips.

furthermore, the allure of ZFS means people aren't testing their disaster plans until it's too late, bc ZFS is "resilient".

lastly, data recovery is expensive as all hell if even possible. i am talking order of magnitude four figures for 100s of GBs and sketchy probabilities.

ZFS is the ultimate "pet" in the pets vs. cattle continuum. in a world where shoddy engineering and "break things fast" is the zeitgeist, i'm happy to use a classic dumb FS like ext4 and pathologically backing it up and testing said backups.

i would not risk any of my personal treasured data to ZFS due to inherent existential threats. i would implore ZFS users to evaluate and test their setups, and especially use ECC RAM - like, starting now - to protect their assets.

> you really need ECC RAM to run ZFS

This is FUD. ZFS does as well, if not better than the average file system with its focus on integrity, online scrubs etc. On the other hand "use ECC RAM" is standard best practices for any mission critical data, no file system magic is going to fix computer RAM lying to you 100% of the time. Its the standard recommendation for ZFS because its rare to be deployed in environments that can tolerate data corruption.

> pathologically backing it up and testing said backups.

ZFS doesn't remove the need for backups and no one seriously makes that arguments. Though snapshots + send/receive make them very easy to do in ZFS.

  • I've detected broken memory chips thanks to BTRFS checksumming finding errors, luckily before it had a chance to corrupt any written data. So if anything, a properly checksummed filesystem makes non-ECC RAM less dangerous.

> ZFS is the ultimate "pet" in the pets vs. cattle continuum. in a world where shoddy engineering and "break things fast" is the zeitgeist,

Live storage is never 'cattle', that is idiotic your running filesystem IS actually a pet. Harddrives are 'cattle' and that's exactly what ZFS treats like 'cattle'.

ZFS was born out of long frustrations with file system and was systematically designed to protect against data corruption and bad hardware. It is literally the exact opposite of 'move fast and break things'.

Go and actually watch the videos where the designer show it for the first time. It speaks very clearly about how and why they designed it.

> i would not risk any of my personal treasured data to ZFS due to inherent existential threats. i would implore ZFS users to evaluate and test their setups, and especially use ECC RAM - like, starting now - to protect their assets.

ZFS has always recommended ECC to its users. No filesystem can protect you from not having it.

> you really need ECC RAM to run ZFS without fear of existential data corruption due to bit flips

https://arstechnica.com/civis/viewtopic.php?f=2&t=1235679&p=...

http://www.open-zfs.org/wiki/User:Mahrens

  • The problem is that ZFS doesn't have an offline repair tool. A (granted unlikely) bit flip in an important data structure that gets written to disk makes the whole fs unmountable and that's it (idk if it has a tool to rescue file data from a unmountable pool? Maybe we should ask Gandi...).

    With e.g. ext4 you can get back to a mountable state pretty much guaranteed with e2fsck. You might loose a few files, or find them in lost+found, etc. but at least you have something.

    The reason ZFS doesn't have a offline repair tool is pretty convincing. Once you have zettabytes (that's the marketing) of data, running that repair tool will take too long, so you'd have to do everything to prevent that in the first place anyway. Including checksumming everything, storing everything redundantly and using ECC RAM.

    • AFAIK it stores multiple copies of those important data structures though, so should take more than a single bit flip.

So better to use ext4 and let it silently corrupt your data?

ZFS does indeed catch memory errors. If you are running without ECC, most filesystems will happily write that corrupt data to disk. Unless the corruption is in the metadata, you will be none the wiser.

ZFS has seen me through 6 disk failures since I started using it on Nexenta about 10 years ago; zero data loss.

It's not a backup by itself, but it makes a fine backup target if it's located somewhere else, since it's both redundant (hard to lose data by accident) and snapshotted (hard to lose data by mistake) - it was my local CrashPlan target (alongside cloud) back when CrashPlan supported home users.