Comment by bayindirh

4 days ago

What I see with CoW filesystems is, when you force the FS to sync a lot (like apt does to keep immunity against power losses to a maximum), the write performance slouches visibly. This also means that when you're writing a lot of small files with a lot of processes and flood the FS with syncs, you get the same slouching, making everything slower in the process. This effect is better controlled in simpler filesystems, namely XFS and EXT4. This is why I keep backups elsewhere and keep my single disk rootfs on "simple" filesystems.

I'll be installing a 2 disk OpenZFS RAID1 volume on a SBC for high value files soon-ish, and I might be doing some tests on that when it's up. Honestly, I don't expect stellar performance since I'll be already putting it on constrained hardware, but let you know if I experience anything that doesn't feel right.

Thanks for the doc links, I'll be devouring them when my volume is up and running.

Where do you prefer your (bug and other) reports? GitHub? E-mail? IP over Avian Carriers?

Heavy synchronous IO from incredibly frequent fsync is a weak point. You can make it better using SLOG devices. I realize what I am about to say is not what you want to hear, but any application doing excessive fsync operations is probably doing things wrong. This is a view that you will find prevalent among all filesystem developers (i.e. the ext4 and XFS guys will have this view too). That is because all filesystems run significantly faster when fsync() is used sparingly.

In the case of APT, it should install all of the files and then call sync() once. This is equivalent of calling fsync on every file like APT currently does, but aggregates it for efficiency. The reason APT does not use sync() is probably a portability thing, because the standard does not require sync() to be blocking, but on Linux it is:

https://www.man7.org/linux/man-pages/man2/sync.2.html

From a power loss perspective, if power is lost when installing a package into the filesystem, you need to repair the package. Thus it does not really matter for power loss protection if you are using fsync() on all files or sync() once for all files, since what must happen next to fix it is the same. However, from a performance perspective, it really does matter.

That said, slow fsync performance generally is not an issue for desktop workloads because they rarely ever use fsync. APT is the main exception. You are the first to complain about APT performance in years as far as I know (there were fixes to improve APT performance 10 years ago, when its performance was truly horrendous).

You can file bug reports against ZFS here:

https://github.com/openzfs/zfs

I suggest filing a bug report against APT. There is no reason for it to be doing fsync calls on every file it installs in the filesystem. It is inefficient.

  • Actually this was discussed recently [0]. While everybody knows it's not efficient, it's required to keep update process resilient against unwanted shutdowns (like power losses which corrupt the filesystem due to uncommitted work left on the filesystem).

    > From a power loss perspective, if power is lost when installing a package into the filesystem, you need to repair the package.

    Yes, but at least you have all the files, otherwise you can have 0 length files which can prevent you from booting your system. In this case, your system boots, all files are in place, but some packages are in semi-configured state. Believe me, apt can recover from many nasty corners without any ill effects as long as all files are there. I used to be a tech-lead for a Debian derivative back in the day, so I lived in the trenches in Debian for a long time, so I have seen things.

    Again it's decided that the massive sync will stay in place for now, because the risks involved in the wild doesn't justify the performance difference yet. If you prefer to be reckless, there's "eatmydata" and "--force-unsafe-io" options baked in already.

    Thanks for the links, I'll let you know if I find something. I just need to build the machine from the parts I have, then I'll be off to the races.

    [0]: https://lists.debian.org/debian-devel/2024/12/msg00533.html [warning, long thread]

    • This email mentions a bunch of operations that are done per file to ensure the file put in the final location always has the correct contents:

      https://lists.debian.org/debian-devel/2024/12/msg00540.html

      It claims that the fsync is needed to avoid the file appearing at the final location with a zero length after a power loss. This is not true on ZFS.

      ZFS puts every filesystem operation into a transaction group that is committed atomically about every 5 seconds by default. On power loss, the transaction group either succeeds or never happens. The result is that even without using fsync, there will never be a zero length file at the final location because the rename being part of a successful transaction group commit implies that the earlier writes also were part of a successful transaction group commit.

      The result is that you can use --force-unsafe-io with dpkg on ZFS, things will run faster and there should be no issues for power loss recovery as far as zero length files go.

      The following email mentions that sync() had been used at one point but caused problems when flash drives were connected, so it was dropped:

      https://lists.debian.org/debian-devel/2024/12/msg00597.html

      The timeline is unclear, but I suspect this happened before Linux 2.6.29 introduced syncfs(), which would have addressed that. Unfortunately, it would have had problems for systems with things like a separate /usr mount, which requires the package manager to realize multiple syncfs calls are needed. It sounds like dpkg was calling sync() per file, which is even worse than calling fsync() per file, although it would have ensured that the directory entries for prior files were there following a power loss event.

      The email also mentions that fsync is not called on directories. The result is that a power loss event (on any Linux filesystem, not just ZFS) could have the files missing from multiple packages marked as installed in the package database, which is said to use fsync to properly record installations. I find this situation weird since I would use sync() to avoid this, but if they are comfortable having systems have multiple “installed” packages missing files in the filesystem after a power loss, then there is no need to use sync().

Hi! I am quite a beginner when it comes to file systems. Would this sync effect not be helped by direct IO in ZFS's case?

Also, given that you seem quite knowledgeable of the topic, what is your go-to backup solution?

I initially thought about storing `zfs send` files into backblaze (as backup at a different location), but without recv-ing these, I don't think the usual checksumming works properly. I can checksum the whole before and after updating, but I'm not convinced if this is the best solution.

  • No, it will not. It would be helped by APT switching to using a single sync/syncfs call after installing all files, which is the performant way to do what it wants on Linux:

    https://www.man7.org/linux/man-pages/man2/sync.2.html

    • After studying the DPKG developers’ reasoning for using fsync excessively, it turns out that there is no need for them to use fsync on a ZFS rootfs. When the rootfs is ZFS, you can use --force-unsafe-io to skip the fsync operations for a speed improvement and there will be no safety issues due to how ZFS is designed.

      DPKG will write each file to a temporary location and then rename it to the final location. On ext4, without fsync, when a power loss event occurs, it is possible for the rename to the final location to be done, without any of the writes such that you have a zero length file. On ZFS, the rename being done after the writes means that the rename being done implies the writes were done due to the sequential nature of ZFS’ transaction group commit, so the file will never appear in the final location without the file contents following a power loss event, which is why ZFS does not need the fsync there.