← Back to context

Comment by craftkiller

5 days ago

No need to keep it to yourself. As you've mentioned, all of these requirements are misinformation so you can ignore people who repeat them (or even better, tell them to stop spreading misinformation).

For those not in the know:

You don't need to use enterprise quality disks. There is nothing in the ZFS design that requires enterprise quality disks any more than any other file system. In fact, ZFS has saved my data through multiple consumer-grade HDD failures over the years thanks to raidz.

The 1 gig per TB figure is ONLY for when using the ZFS dedup feature, which the ZFS dedup feature is widely regarded as a bad idea except in VERY specific use cases. 99.9% of ZFS users should not and will not use dedup and therefore they do not need ridiculous piles of ram.

There is nothing in the design of ZFS any more dangerous to run without ECC than any other filesystem. ECC is a good idea regardless of filesystem but its certainly not a requirement.

And you don't need x5 disks of redundancy. It runs great and has benefits even on single-disk systems like laptops. Naturally, having parity drives is better in case a drive fails but on single disk systems you still benefit from the checksumming, snapshotting, boot environments, transparent compression, incremental zfs send/recv, and cross-platform native encryption.

One reason why it might be a good idea to use higher quality drives when using ZFS is because it seems like in some scenarios ZFS can result in more writes being done to the drive than when other file systems are used. This can be a problem for some QLC and TLC drives that have low endurance.

I'm in the process of setting up a server at home and was testing a few different file systems. I was doing a test where I had a program continuously synchronously writing just a single byte every second (like might happen for some programs that are writing logs fairly continuously). For most of my tests I was just using the default settings for each file system. When using ext4 this resulted in 28 KB/s of actual writes being done to the drive which seems reasonable due to 4 KB blocks needing to be written, journaling, writing metadata, etc... BTRFS generated 68 KB/s of actual writes which still isn't too bad. When using ZFS about the best I could get it to do after trying various settings for volblocksize, ashift, logbias, atime, and compression settings still resulted in 312 KB/s of actual writes being done to the drive which I was not pleased with. At the rate ZFS was writing data, over a 10 year span that same program running continuously would result in about 100 TB of writes being done to the drive which is about a quarter of what my SSD is rated for.

  • One knob you could change that should radically alter that is zfs_txg_timeout which is how many seconds ZFS will accumulate writes before flushing them out to disk. The default is 5 seconds, but I usually increase mine to 20. When writing a lot of data, it'll get flushed to disk more often, so this timer is only for when you're writing small amounts of data like the test you just described.

    > like might happen for some programs that are writing logs fairly continuously

    On Linux, I think journald would be aggregating your logs from multiple services so at least you wouldn't be incurring that cost on a per-program basis. On FreeBSD with syslog we're doomed to separate log files.

    > over a 10 year span that same program running continuously would result in about 100 TB of writes being done to the drive which is about a quarter of what my SSD is rated for

    I sure hope I've upgraded SSDs by the year 2065.

    • > One knob you could change that should radically alter that is zfs_txg_timeout which is how many seconds ZFS will accumulate writes before flushing them out to disk.

      I don't believe that zfs_txg_timeout setting would make much of a difference for the test I described where I was doing synchronous writes.

      > On Linux, I think journald would be aggregating your logs from multiple services so at least you wouldn't be incurring that cost on a per-program basis.

      The server I'm setting up will be hosting several VMs running a mix of OSes and distros and running many types types of services and apps. Some of the logging could be aggregated but there will be multiple types of I/O (various types of databases, app updates, file server, etc...) and I wanted to get an idea of how much file system overhead there might be in a worst case kind of scenario.

      > I sure hope I've upgraded SSDs by the year 2065.

      Since I'll be running a lot of stuff on the server, I'll probably have quite a bit more writing going on than the test I described so if I used ZFS I believe the SSD could reach its rated endurance in just several years.

    • >I sure hope I've upgraded SSDs by the year 2065.

      My mind jumped at that too when I first read parent's comment. But presumably he's writing other files to disk too. Not just that one file. :)

      1 reply →

> The 1 gig per TB figure is ONLY for when using the ZFS dedup feature, which the ZFS dedup feature is widely regarded as a bad idea except in VERY specific use cases. 99.9% of ZFS users should not and will not use dedup and therefore they do not need ridiculous piles of ram.

You also really don't need a 1GB for RAM unless you have a very high write volume. YMMV but my experience is that its closer to 1GB for 10TB.