Comment by koverstreet
5 days ago
Erasure coding is almost done; all that's missing is some of the device evacuate and reconstruct paths, and people have been testing it and giving positive feedback (especially w.r.t. performance).
It most definitely does have fsck and has since the beginning, and it's a much more robust and dependable fsck than btrfs's. Scrub isn't quite done - I actually was going to have it ready for this upcoming merge window except for a nasty bout of salmonella :)
Send/recv is a long ways off, there might be some low level database improvements needed before that lands.
Short term (next year or two) priorities are finishing off online fsck, more scalability work (upcoming version for this merge window will do 50PB, but now we need to up the limit on number of drives), and quashing bugs.
Hearing that it is missing some code for reconstruction makes it sound like it is missing something fairly important. The original purpose of parity RAID is to support reconstruction.
We can do reconstruct reads, what's missing is the code to rewrite missing blocks in a stripe after a drive dies.
In general, due to the scope of the project, I've been prioritizing the functionality that's needed to validate the design and the parts that are needed for getting the relationships between different components correct.
e.g. recently I've been doing a bunch of work on backpointers scalability, and that plus scrub are leading to more back and forth iteration on minor interactions with erasure coding.
So: erasure coding is complete enough to know that it works and for people to torture test it, but yes you shouldn't be running it in production yet (and it's explicitly marked as such). What's remaining is trivial but slightly tedious stuff that's outside the critical path of the rest of the design.
Some of the code I've been writing for scrub is turning out to also be what we want for reconstruct, so maybe we'll get there sooner rather than later...
>except for a nasty bout of salmonella
Did the Linux Foundation send you some "free" sushi? ;)
However keep the good work rolling, super happy about a good, usable and modern Filesystem native to Linux.
FYI: the main reason I gave up on bcachefs is that I can't use devices with native 16K blocks.
Hope that's coming this year. I have a bunch of old HDDs and SSDs and I could very easily assemble a spare storage server with about 4TB capacity. Already tested bcachefs with most of the drives and it performed very well.
Also lack of ability to reconstruct seems like another worrying omission.
I wasn't aware there were actual users needing bs > ps yet. Cool :)
That should be a completely trivial for bcachefs to support, it'll mostly just be a matter of finding or writing the tests.
Seriously? But... NVMe drives! I stopped testing because I only have one spare NVMe and couldn't use it with bcachefs.
If you or others can get it done I'm absolutely starting to use bcachefs the month after. I do need fast storage servers in my home office.
4 replies →
Thank you, looking forward to it!