Comment by liuliu

6 years ago

bcachefs should be heavily supported, it doesn't get nearly enough for what it supposes to do: https://www.patreon.com/bcachefs

I've been looking forward to using bcachefs as I had a few bad experiences with btrfs.

Is bcachefs more-or-less ready for some use cases now? Does it still support caching layers like bcache did?

  • It's quite usable, but of course, do not trust it with your unique unbacked-up data yet. I use it as a main FS for a desktop workstation and I'm pretty happy with it. Waiting impatiently for EC to be implemented for efficient pooling of multiple devices.

    Regarding caching: "Bcachefs allows you to specify disks (or groups thereof) to be used for three categories of I/O: foreground, background, and promote. Foreground devices accept writes, whose data is copied to background devices asynchronously, and the hot subset of which is copied to the promote devices for performance."

  • To my knowledge, caching layers are supported but require some setup and don't have much documentation to setup rn.

    If all you need is a simple root FS that is CoW and checksummed, bcachefs works pretty good, in my experience. I've been using it productively as a root and home FS for about two years or so.

Many of the advanced features aren't implemented yet though, like compression, encryption, snapshots, RAID5/6....

  • Compression and encryption have been implemented, but not snapshots and RAID5/6.

  • why would you want to embed raid5/6 in the filesystem layer? Linux has battle-tested mdraid for this, I'm not going to trust a new filesystem's own implementation over it.

    Same for encryption, there are already existing crypto layers both on the block and filesystem (as an overlay) level.

    • Because the FS can be deeply integrated with the RAID implementation. With a normal RAID, if the data at some address is different between the two disks, there's no way for the fs to tell which is correct, because the RAID code essentially just picks one, it can't even see the other. With ZFS for example, there is a checksum stored with the data, so when you read, zfs will check the data on both and pick the correct one. It will also overwrite the incorrect version with the correct one, and log the error. It's the same kind of story with encryption, if its built in you can do things like incremental backups of an encrypted drive, without ever decrypting it on the target.

      15 replies →

    • why would you want to embed raid5/6 in the filesystem layer?

      There are valid reasons, most having to do with filesystem usage and optimization. Off the top of my head:

      - more efficient re-syncs after failure (don't need to re-sync every block, only the blocks that were in use on the failed disk)

      - can reconstruct data not only on disk self-reporting, but also on filesystem metadata errors (CRC errors, inconsistent dentries)

      - different RAID profiles for different parts of the filesystem (think: parity raid for large files, raid10 for database files, no raid for tmp, N raid1 copies for filesystem metadata)

      and for filesystem encryption:

      - CBC ciphers have a common weakness: the block size is constant. If you use FS-object encryption instead of whole-FS encryption, the block size, offset and even the encryption keys can be varied across the disk.

    • I think to even call volume management a "layer" as though traditional storage was designed from first principles, is a mistake.

      Volume management is a just a hack. We had all of these single-disk filesystems, but single disks were too small. So volume management was invented to present the illusion (in other words, lie) that they were still on single disks.

      If you replace "disk" with "DIMM", it's immediately obvious that volume management is ridiculous. When you add a DIMM to a machine, it just works. There's no volume management for DIMMs.

      2 replies →

    • > why would you want to embed raid5/6 in the filesystem layer?

      One of the creators of ZFS, Jess Bonwick, explained it in 2007:

      > While designing ZFS we observed that the standard layering of the storage stack induces a surprising amount of unnecessary complexity and duplicated logic. We found that by refactoring the problem a bit -- that is, changing where the boundaries are between layers -- we could make the whole thing much simpler.

      * https://blogs.oracle.com/bonwick/rampant-layering-violation

    • It's not about ZFS. It's about CoW filesystems in general; since they offer functionalities beyond the FS layer, they are both filesystems and logical volume managers.

Honestly just use ZFS. We've wasted enough effort over obscure licensing minutia.

  • > We've wasted enough effort over obscure licensing minutia.

    Which was precisely Sun/Oracle's goal when they released ZFS under the purposefully GPL incompatible CDDL. Sun was hoping to make OpenSolaris the next Linux whilst ensuring that no code from OpenSolaris could be moved back to linux. I can't think of another plausible reason why they would write a new open source license for their open source operating system and making such a license incompatible with the GPL.

  • I don't think something that is the subject of an ongoing multi-billion-dollar lawsuit can rightly be called "obscure licensing minutia." It is high-profile and its actual effects have proven pretty significant.

  • > Honestly just use ZFS. We've wasted enough effort over obscure licensing minutia.

    I am willing to bet that Google had the same thought. And I am also willing to bet that Google is regretting that thought now.