← Back to context

Comment by HorstG

6 years ago

We had a bunch of Thumpers (SunFire X4200) with 48 disks at work, running ZFS on Solaris. It was dog slow and awful, tuning performance was complicated and took ages. One had to use just the right disks in just the right order in RaidZs with striping over them. Swap in a hotspare: things slow to a crawl (i.e. not even Gbit/s).

After EoL a colleague installed Linux with dmraid, LVM and xfs on the same hardware: much faster, more robust. Sorry, don't have numbers around anymore, stuff has been trashed since.

Oh, and btw., snapshots and larger numbers of filesystems (which Sun recommended instead of the missing Quota support) also slow things down to a crawl. ZFS is nice on paper and maybe nice to play with. Definitely simpler to use than anything else. But performance-wise it sucked big time, at least on Solaris.

ZFS, on Solaris, not robust?

ZFS for “play”?!

This... is just plain uninformed.

Not just me and my employer, but many (many) others rely on ZFS for critical production storage, and have done so for many years.

It’s actually very robust on Linux as well - considering the fact that freeBSD have started to use the ZoL code base is quite telling.

Would freeBSD also be in the “play” and “not robust” category as well, hanging out together with Solaris?

Will it perform better than all in terms of writes/s? Most likely not - although by staying away from de-dup, enough RAM and adhere the pretty much general recommendation to use mirror vdevs only in your pools, it can be competitive.

Something solid with data integrity guarantees? You can’t beat ZFS, imo.

  • > Something solid with data integrity guarantees? You can’t beat ZFS, imo.

    This reminds me. We had one file server used mostly for package installs that used ZFS for storage. One day our java package stops installing. The package had become corrupt. So I force a manual ZFS scrub. No dice. Ok fine I’ll just replace the package. It seems to work but the next day it’s corrupt again. Weird. Ok I’ll download the package directly from Oracle again. The next day again it’s corrupt. I download a slightly different version. No problems. I grab the previous problematic package and put it in a different directory (with no other copies on the file system) - again it becomes corrupt.

    There was something specific about the java package that ZFS just thought it needed to “fix”. If I had to guess it was getting the file hash confused. I’m pretty sure we had dedupe turned on so that may have factored into it.

    Anyway that’s the first and only time I’ve seen a file system munge up a regular file for no reason - and it was on ZFS.

  • Performance wasn't robust, especially on dead disks and rebuilds, but also on pools with many (>100) filesystems or snapshots. Performance would often degrade heavily and unpredictably on such occasions. We didn't loose data more often than with other systems.

    "play" comes from my distinct impression that the most vocal ZFS proponents are hobbyists and admins herding their pet servers (as opposed to cattle). ZFS comes at low/no cost nowadays and is easy to use, therefore ideal in this world.

    • Fair enough, I can’t argue with your personal experience, but I can assure you that ZFS is used ”for real” at many shops.

      I’ve only used zfs in two or three way mirror setup, on beefy boxes, where the issues you describe are minimal. Also JBOD only.

      The thing is that without checksumming you’ve actually no idea if you lose data. I’ve had several pools over the years report automatic resilvering on checksum mismatches. Usually it’s been disks acting up well before smart can tell, and reporting this has been invaluable.

Sounds like you turned on dedupe, or had an absurdly wide stripe size. You do need to match your array structure to your needs as well as tune ZFS.

On our backup servers (45 disks, 6-wide Z2 stripes) easily handle wire-speed 10G with 32G ARC.

And you're just wrong about snapshots and filesystem counts.

ZFS is no speed demon, but it performs just fine if you set it up correctly and tune it.

  • Stripe size could have been a problem, though we just went with the default there afair. Most of the first tries was just along the Sun docs, we later only changed things until performance was sufficient. Dedupe wasn't even implemented back then.

    Maybe you also don't see as massive an impact because your hardware is a lot faster. X4200s were predominantly meant to be cheap, not fast. No cache, insufficient RAM, slow controllers, etc.

    • X4200s were the devil's work. Terrible BMC, raid controller, even the disk caddies were poorly designed.

      The BMC controller couldn't speak to the disk controller so you had no out-of-band storage management.

      I had to Run a fleet of 300 of them, truly an awful time.

ZFS performs quite well if you give it boatloads of RAM. It uses its own cache layer, and eats RAM like hotcakes. XFS OTOH is as fast as the hardware can go with any amount of RAM.

  • Sort of. But no snapshots.

    Wanna use LVM for snapshots? 33% performance hit for the entire LV per snapshot, by implementation.

    ZFS? ~1% hit. I've never been able to see any difference at the workloads I run, whereas with LVM it was pervasive and inescapable.

    • That was with the old LVM snapshots. Modern CoW snapshots have a much smaller impact. Plus XFS developers are working on internal snapshots, multi-volume management, and live fsck (live check already works, live repair to come).

    • I don't doubt this but do you have any documentation?

      Asking for a friend who uses XFS on LVM for disk heavy applications like database, file server, etc.

      2 replies →

"After EoL a colleague installed Linux with dmraid, LVM and xfs on the same hardware: much faster, more robust."

Please let me know which company this is, so I can ensure that I never end up working there by accident. Much obliged in advance, thank you kindly.

  • Why? What is bad about playing around with leftover hardware?

    • Nothing at all; it's what was done to that hardware that's the travesty here. It takes an extraordinary level of incompetence and ignorance to even get the idea to slap Linux with dmraid and LVM on that hardware and then claim that it was faster and more robust without understanding how unreliable and fragile that constelation is and that it was faster because all the reliability was gone.

dmraid raid5/6 lose data, sometimes catastrophically, in normal failure scenarios that the ZFS equivalent handles just fine. If a sector goes bad between the time when you last scrubbed and the time when you get a disk failure (which is pretty much inevitable with modern disk sizes), you're screwed.