Comment by rleigh

6 years ago

Not sure where that belief comes from. But it might be that many benchmarks are naive and compare it against other filesystems in single-disc setups with zero tuning. Since its metadata overheads are higher, it's definitely slower in this scenario. However, put a pool onto an array of discs and tune it a little, and the performance scales up and up leaving all Linux-native filesystems, and LVM/dm/mdraid, well behind. It's a shame that Linux has nothing compelling to do better than this.

Last time I used ZFS write performance was terrible compared to an ordinary RAID5. IIRC Writes in a raidz are always limited to a single disk’s performance. The only way to get better write speed is to combine multiple raidzs - which means you need a boatload if disks.

  • We had a bunch of Thumpers (SunFire X4200) with 48 disks at work, running ZFS on Solaris. It was dog slow and awful, tuning performance was complicated and took ages. One had to use just the right disks in just the right order in RaidZs with striping over them. Swap in a hotspare: things slow to a crawl (i.e. not even Gbit/s).

    After EoL a colleague installed Linux with dmraid, LVM and xfs on the same hardware: much faster, more robust. Sorry, don't have numbers around anymore, stuff has been trashed since.

    Oh, and btw., snapshots and larger numbers of filesystems (which Sun recommended instead of the missing Quota support) also slow things down to a crawl. ZFS is nice on paper and maybe nice to play with. Definitely simpler to use than anything else. But performance-wise it sucked big time, at least on Solaris.

    • ZFS, on Solaris, not robust?

      ZFS for “play”?!

      This... is just plain uninformed.

      Not just me and my employer, but many (many) others rely on ZFS for critical production storage, and have done so for many years.

      It’s actually very robust on Linux as well - considering the fact that freeBSD have started to use the ZoL code base is quite telling.

      Would freeBSD also be in the “play” and “not robust” category as well, hanging out together with Solaris?

      Will it perform better than all in terms of writes/s? Most likely not - although by staying away from de-dup, enough RAM and adhere the pretty much general recommendation to use mirror vdevs only in your pools, it can be competitive.

      Something solid with data integrity guarantees? You can’t beat ZFS, imo.

      3 replies →

    • Sounds like you turned on dedupe, or had an absurdly wide stripe size. You do need to match your array structure to your needs as well as tune ZFS.

      On our backup servers (45 disks, 6-wide Z2 stripes) easily handle wire-speed 10G with 32G ARC.

      And you're just wrong about snapshots and filesystem counts.

      ZFS is no speed demon, but it performs just fine if you set it up correctly and tune it.

      2 replies →

    • ZFS performs quite well if you give it boatloads of RAM. It uses its own cache layer, and eats RAM like hotcakes. XFS OTOH is as fast as the hardware can go with any amount of RAM.

      7 replies →

    • "After EoL a colleague installed Linux with dmraid, LVM and xfs on the same hardware: much faster, more robust."

      Please let me know which company this is, so I can ensure that I never end up working there by accident. Much obliged in advance, thank you kindly.

      2 replies →

    • dmraid raid5/6 lose data, sometimes catastrophically, in normal failure scenarios that the ZFS equivalent handles just fine. If a sector goes bad between the time when you last scrubbed and the time when you get a disk failure (which is pretty much inevitable with modern disk sizes), you're screwed.

  • > Writes in a raidz are always limited to a single disk’s performance

    what? no. why would that be the case? You lose a single disk's performance due to the checksumming.

    just from my personal NAS I can tell you that I can do transfers from my scratch drive (NVMe SSD) to the storage array at more than twice the speed of any individual drive in the array... and that's in rsync which is notably slower than a "native" mv or cp.

    The one thing I will say is that it does struggle to keep up with NVMe SSDs, otherwise I've always seen it run at drive speed on anything spinning, no matter how many drives.

    • > what? no. why would that be the case? You lose a single disk's performance due to the checksumming.

      I think they are probably referring to the write performance of a RAIDZ VDEV being constrained by the performance of the slowest disc within the VDEV.

      1 reply →

Have you got any info on how to do the required tuning that's geared towards a home NAS?

  • Group your disks in bunches of 4 or 5 per Raidz, no more. And have them on the same controller or SAS-expander per bunch. Use striping over the bunches. Don't use hotspares, for performance maybe avoid RAIDz6. Try out and benchmark a lot. Get more RAM, lots more RAM.