← Back to context

Comment by zozbot234

3 days ago

btrfs RAID is quite infamous for eating your data. Has it been fixed recently?

To be fair, your statement could be edited as follows to increase its accuracy:

> btrfs is quite infamous for eating your data.

This is the reason for the slogan on the bcachefs website:

"The COW filesystem for Linux that won't eat your data".

https://bcachefs.org/

After over a decade of in-kernel development, Btrfs still can't either give an accurate answer to `df -h`, or repair a damaged volume.

Because it can't tell a program how much space is free, it's trivially easy to fill a volume. In my personal experience, writing to a full volume corrupts it irretrievably 100% of the time, and then it cannot be repaired.

IMHO this is entirely unacceptable in an allegedly enterprise-ready filesystem.

The fact that its RAID is even more unstable merely seals the deal.

  • > Btrfs still can't either give an accurate answer to `df -h`, or repair a damaged volume.

    > In my personal experience, writing to a full volume corrupts it irretrievably 100% of the time, and then it cannot be repaired.

    While I get the frustration, I think you could have probably resolved both of them by reading the manual. Btrfs separates metadata & regular data, meaning if you create a lot of small files your filesystem may be 'full' while still having data available; `btrfs f df -h <path>` would give you the break down. Since everything is journaled & CoW it will disallow most actions to prevent actual damage. If you run into this you can recover by adding an additional disk for metadata (can just be a loopback image), rebalancing, and then taking steps to resolve the root cause, finally removing the additional disk.

    May seem daunting but it's actually only about 6 commands.

    • Hi. My screen name is my real name, and my experience with Btrfs stems from the fact that I worked for SUSE for 4 years in the technical documentation department.

      What that means is I wrote the manual.

      Now, disclaimer, not that manual: I did not work on filesystems or Btrfs, not at all. (I worked on SUSE's now-axed-because-of-Rancher container distro CaaSP, and on SLE's support for persistent memory, and lots of other stuff that I've now forgotten because it was 4 whole years and it was very nearly 4 years ago.)

      I am however one of the many people who have contributed to SUSE's excellent documentation, and while I didn't write the stuff about filesystems, it is an error to assume that I don't know anything about this. I really do. I had meetings with senior SUSE people where I attempted to discuss the critical weaknesses of Btrfs, and my points were pooh-poohed.

      Some of them still stalk me on social media and regularly attack me, my skills, my knowledge, and my reputation. I block them where I can. Part of the price of being online and using one's real name. I get big famous people shouting that I am wrong sometimes. It happens. Rare indeed is the person who can refute me and falsify my claims. (Hell, rare enough is the person who knows the difference between "rebut" and "refute".)

      So, no, while I accept that there may be workarounds that a smart human may be able to do, I strongly suspect that these things are accessible to software, to tools such as Zypper and Snapper.

      In my repeated direct personal experience, using openSUSE Leap and openSUSE Tumbleweed, routine software upgrades can fill up the root filesystem. I presume this is because the packaging tools can't get accurate values for free space, probably because Btrfs can't accurately account for space used or about to be used by snapshots, and a corrupt Btrfs root filesystem can't be turned back into a valid consistent one using the automated tools provided.

      Which is why both SUSE's and Btrfs's own docs say "do not use the repair tools unless you are instructed to by an expert."

      2 replies →

No. RAID 5/6 is still fundamentally broken and probably won't get fixed

  • This is incorrect, quoting Linux 6.7 release (Jan 2024):

    "This release introduces the [Btrfs] RAID stripe tree, a new tree for logical file extent mapping where the physical mapping may not match on multiple devices. This is now used in zoned mode to implement RAID0/RAID1* profiles, but can be used in non-zoned mode as well. The support for RAID56 is in development and will eventually fix the problems with the current implementation."

    I've not kept with more recent releases but there has been progress on the issue

I believe RAID5/6 is still experimental (although I believe the main issues were worked out in early 2024), I've seen reports of large arrays being stable since then. It's still recommended to run metadata in raid1/raid1c3.

RAID0/1/10 has been stable for a while.