Comment by reacharavindh

6 years ago

As a heavy user of ZFS and Linux, what else is there that even comes close to what ZFS offers?

I want cheap and reliable snapshots, export & import of file systems like ZFS datasets, simple compression, caching facilities(like SLOG and ARC) and decent performance.

76 comments

reacharavindh

fgonzag 6 years ago

Bcachefs is probably the only thing that will get there. The codebase is clean and we'll mantained, built from solid technology (bcache) and will include most of the ZFS niceties. I just wish more companies would sponsor de project and stop wasting money on BTRFS

reacharavindh 6 years ago

Yes, I’m eagerly waiting for Bcachefs to get there at some point, but it is several years away (rightly so, because it is hard and the developer is doing an amazing job) if my understanding of its state is correct.
I have heard of durability issues with btrfs, and do not want to touch it if it fails with its primary job.
XorNot 6 years ago

Which is why ZFS is still a thing today - there are no other alternatives. Everything is coming "soon" while ZFS is actually here and clocking up a stability track-record.
ksec 6 years ago

>Bcachefs is probably the only thing that will get there.
Or Bcachefs is probably the only thing that might get there.
The amount of engineering hours went into ZFS is insane. It is easy to get a project that has 80% similarity on the surface, but then you spend the same amount of time from 0 - 80% on the last 20% and edge cases. ZFS has been battle tested by many. Rsync is on ZFS.
The amount of Petabyte stored in ZFS safely over the years gives peace of mind.
Speaking of Rsync, normally a topic of ZFS on HN will have him resurface. Hasn't seen any reply from him yet.
lifty 6 years ago

I’m looking forward to bcachefs becoming feature complete and upstreamed. We finally have a good chance of having a modern and reliable FS in the Linux kernel. My wish list includes snapshots and per volume encryption.
donmcronald 6 years ago

What if the main purpose of BTRFS is to have something "good enough" so no one starts working on a project that can compete with large commercial storage offerings?
Does anyone remember the parity patches they rejected in 2014?
> Your work is very very good, it just doesn’t fit our business case.
I haven't followed it much. Does it have anything more than mirroring (that's stable) these days?
forgotpwd16 6 years ago
>stop wasting money on BTRFS
You're saying they should stop supporting a project that was considered stable by the time the other started being developed. Why do that? What makes Bcachefs a better choice?
- fgonzag 6 years ago
  
  Take a cursory look into both codebases, the stability of the every feature at launch and on maintenance. It's not hard to see BTRFS is a doomed project. Bcachefs is more like PostgreSQL, the developer doesn't add features until he has a solid design that's well thought out. Hence why he hasn't implemented snapshots.
  I don't think too many people consider it stable enough for production, either. (Unless you count a very limited subset of its functionality).
  I rather run Bcachefs today than Btrfs, by a mile. At least with bcachefs I won't lose my data.
  
  2 replies →
- starfallg 6 years ago
  
  Btrfs is the only FS I used that resulted in complete FS corruption losing nearly all data on disk, not once, but 3 times.
  After that, none of the features like compression, snapshots, COW or checksums meant anything to me. I'm much happier with ext4 and xfs on lvm.
  
  6 replies →
- phire 6 years ago
  
  I don't think BTRFS has ever been considered stable.
  I think they just said: "The on-disk data structure is stable" and lots of people misinterpreted that as "the whole thing is stable"
  A stable on-disk data structure just means it's been frozen and can't be changed in non-backwards compatible ways. It says nothing about code quality, feature completeness or if the frozen data structure was any good.
  
  1 reply →
jessant 6 years ago
Snapshots don't seem to be done yet.
- aseipp 6 years ago
  
  Kent has admitted (many times) that snapshots are one of the more difficult features to add in a reliable and safe way, and will require significant work to do right, especially for what he wants to see them do (I assume "really damn fast and low overhead" is a major one, plus some other tricks he has up his sleeve.) So he has intentionally not tackled them yet, instead going after a slew of other features first. Reflink, full checksum, replication, caching, compression, native encryption, etc. All of that works today.
  Snapshots are a huge feature for sure, but it's not like bcachefs is completely incapable without them.
  There was a very recent update he gave in late December (2019) that mentioned he's actively chipping away at the roadblocks for snapshots.
- ZoomZoomZoom 6 years ago
  
  They're being worked on ATM: Dec 29, 2019 "Just finished a major rework that gets us a step closer to snapshots: the btree code is incrementally being changed to handle extents like regular keys." https://www.patreon.com/posts/towards-32698961
- fgonzag 6 years ago
  
  That's exactly why I said it's probably the only one that will get there.
  
  2 replies →

zippergz 6 years ago

I know this isn't an option for everyone, but this is part of why I run FreeBSD instead of Linux for servers where I need ZFS.

ggm 6 years ago
This isn't why I started running FreeBSD, but it is also one of the reasons I continue to run FreeBSD.
- mekster 6 years ago
  
  Yes, I run Linux for business but keep a FreeBSD personally so I'm used to it in case I need zfs for business.

xupybd 6 years ago

I agree that ZFS has a lot to offer. But the legal difficulties in merging ZFS support into the mainline kernal are understandable. It's a shame but I think he is making the right call.

shawnz 6 years ago
Merging into the mainline kernel is not what the person he is replying to was even asking for. All they were asking is for Linux to stop putting APIs behind DRM that prevents non-GPL modules like ZFS from using them. That doesn't mean ZFS must be bundled with Linux.
I think everyone is in agreement that ZFS can't be included in the mainline kernel. The question is just if users should be able to install and use it themselves or not.
- xupybd 6 years ago
  
  Thanks, I should have read more into this.
  The follow up actually clears things up pretty well. https://www.realworldtech.com/forum/?threadid=189711&curpost...
stopreformation 6 years ago

Kernal? If you can merge zfs support into 8KB kernal then you are not a mere mortal, so no need to worry about any legal difficulties.

m4rtink 6 years ago

XFS on LVM thin pool LV should give you a very robust fs, cheap CoW snapshots, multi device support. If you want, you can make the thin pool be on RAID via LVM RAID under the thin pool.

For import export, IIRC XFS has support for it and you can dump/import LV snapshots to get atomicity.

For caching there is LVM cache, should be again possible to combine with thinpool & RAID. Or you can use it separately for normal LV.

All this is functionality tested by years of production use.

For compression/deduplication, that is AFAIK work in progress upstream based on the open sourced VDO code.

reacharavindh 6 years ago
Interesting combination of tools I have used independently but never as a replacement of my beloved ZFS.
Never made snapshots with LVM. Always used LVM as a way to carve up logical storage from a pool of physical devices but nothing more. I need to RTFM on how snapshotting would work there - could I restore just a few files from an hour ago while letting everything else be as they are?
With ZFS, I use RAM as read chace(ARC) and an Optane disk as sync write cache(SLOG). I wonder if LVM cache would let me do such a thing. Again, a pointer for more manual reading for me.
Compression is a nice to have for me at this moment. Good to know that it is being worked on at the LVM layer.
- m4rtink 6 years ago
  
  IIRC you can mount any of the snapshots & copy files from it without influencing the others & the thin LV itself. As for RAM caching, I'm not sure LVM would allow LVM cache residing on ram disk PV, but isin't regular Linux transparent FS access RAM caching sufficient actually ?
  For some reading about LVM thin provisioning:
  http://man7.org/linux/man-pages/man7/lvmthin.7.html
  https://access.redhat.com/documentation/en-us/red_hat_enterp...
nickik 6 years ago
Call me when somebody like a major cloud provider has used this system to drive millions of hard-drives. I'm not gone patch my data security together like that.
There is difference between 'all these tools have been used in production' and 'this is an integrated tool that has been used for 15+ years in the biggest storage installations in the world'.
- tpetry 6 years ago
  
  Yes! The problem with the LVM approach trying to replicate anything ZFS is doing that you have to use a myriad of different tools. And then you have to pray that they all work correctly together, and if one has a bug you possible lost all your data because there may be so many data corruptions emerging because of it.

bayindirh 6 years ago

Honestly asking, how Btrfs compares to ZFS?

There's also Lustre but it's a different beast altogether for a different scenario.

dsr_ 6 years ago
On the surface, btrfs is pretty close to zfs.
Once you actually use them, you discover all the ways that btrfs is a pain and zfs is a (minor) joy:
- snapshot management
- online scrub
- data integrity
- disk management
I lost data from perfectly healthy-appearing btrfs systems twice. I've never lost data on maintained zfs systems, and I now trust a lot more data to zfs than I ever have to btrfs.
- the8472 6 years ago
  
  At least disk management is far easier with btrfs. You can restripe at will while zfs has severe limitations around resizing, adding and removing devices.
  Granted, at enterprise scale this hardly matters because you can just send-receive to rebuild pools if you have enough spares, but for consumer-grade deployments it's a non-negligible annoyance.
  
  4 replies →
- aidenn0 6 years ago
  
  Since the plural of anecdote is data, I'll provide mine here. ZFS is the only file-system from which I've lost data on hardware that was functioning properly, though that does come with a caveat.
  Twice btrfs ended up in a non-mountable situation, but both times it was due to a known issue and #btrfs on freenode was able to walk me through getting it working again.
  With ZFS, I neded up in a non-mountable system, and the response in both #zfs and #zfsonlinux to me posting the error message were, "that sucks, hope you had backups." Since I both had backups and it was my laptop 2000 miles from home that was my only computing device, I didn't dig deeper to see if I could discover the problem. FWIW, I've been using ZFS on that same hardware for almost 2 years since with no issues.
- bayindirh 6 years ago
  
  Thanks for your answer and sorry for your data loss.
  > I lost data from perfectly healthy-appearing btrfs systems twice.
  I still consider btrfs as beta-level software. This is why I never looked into it very seriously and asked this question.
  Looks like btrfs has something around five years to be considered serious at the scale where ZFS just starting to warm-up.
- berbec 6 years ago
  
  The one thing I can't understand about btrfs is the unknown answer to the question "How much disk space do I have left?". I don't get that being a "this much, maybe" answer
  
  2 replies →
- cookiecaper 6 years ago
  
  btrfs is such a mess that for a database or VM to be marginally stable, you have to disable the CoW featureset for those files with the +C attribute. It's nowhere near a serious solution.
nickik 6 years ago

Btrfs has eat my data, and once that happens I will never, ever, ever, literally ever go back to that system. Its unacceptable to me that a system eats data specially after multiple rounds of 'its stable now'.
But in the end it always turns out that only if you 'use' it correctly it is actually not gone eat your data.
I used ZFS for far longer and had far fewer issues.

freedomben 6 years ago

Stratis and VDO have a lot of promise, although it's still a little early. The approach that Stratis has taken is refreshing. It's very simple and reuses lots of already existing stuff so by the time it's released it will already be mature (since the underlying code has been running for many years).

Once a little more guidance comes out about how to properly use VDO and Stratis together, I'll move my personal stuff to it.

arminiusreturns 6 years ago

So besides the obvious btrfs answer, what about ceph as clustered storage with very fast connectivity?

There is also BeeGFS, I haven't used it but /r/datahoarders sometimes touts it.

Not for linux but I have been keeping an eye on M Dillons DragonFly BSD where he has been working on HAMMER2, which is very interesting.

I don't know much but bcachefs has been making more waves lately also.

I think the bottom line is that people need to have good backup in place regardless.

jessant 6 years ago

Does btrfs met your requirements?

Youden 6 years ago
I've tried btrfs without much luck.
btrfs still has a write hole for RAID5/6 (the kind I primarily use) [0] and has since at least 2012.
For a filesystem to have a bug leading to dataloss unpatched for over 8 years is just plain unacceptable.
I've also had issues even without RAID, particularly after power outages. Not minor issues but "your filesystem is gone now, sorry" issues.
[0]: https://btrfs.wiki.kernel.org/index.php/RAID56
- the8472 6 years ago
  
  It's not a bug, but an unimplemented feature. They never made any promise that raid5 is production-ready.
  Pretty much all software-raid systems suffer from it unless they explicitly patch over it via journaling. Hardware raid gets away with it if it has battery backups, if they don't they suffer from exactly the same problem.
  
  1 reply →
- brianpgordon 6 years ago
  
  My home NAS runs btrfs in RAID 5. The key is to use software RAID / LVM to present a single block device to btrfs. That way you never use btrfs's screwed-up RAID 5/6 implementation.
  
  3 replies →
- beatgammit 6 years ago
  
  Why use RAID5/6, RAID10 is much more safe because you drastically reduce the change of a cascading resilvering failure. Yes, you get less capacity per drive, but drives are (relatively) cheap.
  I thought I wanted RAID5, but after reading horror stories of drives failing when replacing a failed drive, I decided it just wasn't worth the risk.
  I currently run RAID1, and when I need more space, I'll double my drives and set up RAID10. I don't need most of the features of ZFS, so BTRFS works for me.
  
  1 reply →
nosequel 6 years ago
btrfs is not at all reliable, so if you care about your files staying working files, it probably doesn't meet your requirements. It is like the MongoDB 0.1 of filesystems.
- jessant 6 years ago
  
  Seems pretty reliable these days. Are you commenting based upon personal experience? If so, when was it that you used btrfs?
  
  3 replies →

the8472 6 years ago

btrfs + dm-cache? throw in dm-raid if you want raid5.

montjoy 6 years ago

Hardware RAID controllers can do most if not all of these things.

ggm 6 years ago
I've lost more data in hardware RAID than in ZFS but I have lost data in both.
Hardware RAID has very poor longevity. Vendor support and battery backup replacement collide in BIOS and host management badly.
Disclaimer: I work on Dell rackmounts, which means rather than native SAS I am 'Dells hack on SAS' which is a problem and I know its possible to 'downgrade' back to native.
- montjoy 6 years ago
  
  Yeah we started ordering the ones with the supercap so we didn’t have to replace batteries anymore.
  Somewhat recently I dealt with LSI and Dell cards. Longevity seemed just fine for a normal 3 year server lifecycle. The only time we had an issue is when the power went down in the data center. The power spike fried a few of the cards. Luckily we had spares.
  Way way back I dealt with the Compaq/hp smartarrays. Those were awful. Also anything consumer grade is awful.
AtHeartEngineer 6 years ago
The problem with most of these is you have to bring the system down to do maintenance. You can do a scrub on zfs while it's up.
- HorstG 6 years ago
  
  Most non-hobbyist RAID hardware does online-scrub just fine (not that I would recommend wasting money on such hw).
  Btw, ZFS scrub is not only a RAID-block-check but also a partial fsck, so its not really comparable.
  
  1 reply →
nickik 6 years ago

Pay more for less safety and put all your data into the hands of the guy who wrote the firmware for that thing. I'm sure that software is well maintained open source code.