Comment by cmurf
10 years ago
The article makes me wonder whether there's enough abstraction being done via the VFS layer, because all this fsync business that application developers seem to have to do can be so workload and file system specific. And I think that's asking too much from application developers. You might have to fsync the parent dir? That's annoying.
I wonder if the article and papers its based on account for how the VFS actually behaves, and then if someone wanting to do more research in this area could investigate this accounting for the recent VFS changes. On Linux in particular I think this is a bigger issue because there are so many file systems the user or distro could have picked, totally unbeknownst to and outside the control of the app developer.
That's definitely asking too much of app developers. Every time someone complains about any of this, the filesystem developers come back with a bit of lore about a non-obvious combination of renameat (yes that's a real thing) and fsync on the parent directory, or some particular flavor of fallocate, or just use AIO and manage queues yourself, or whatever depending on exactly which bogus behavior you're trying to work around. At best it's an unnecessary PITA. At worst it doesn't even do what they claimed, so now you've wasted even more time. Most often it's just non-portable (I'm so sick of hearing about XFS-specific ioctls as the solution to everything) or performs abominably because of fsync entanglement or some other nonsense.
We have libraries to implement "best practices" for network I/O, portable across systems that use poll or epoll or kqueues with best performance on each etc. What we need is the same thing for file I/O. However imperfect it might be, it would be better than where we are now.
Very rudimentary, but a way for an application developer to specify categories of perform/safety ratio operations. An app developer might have a simple app that only cares about performance, only cares about safety, there'd be a default in between both. Another app developer might have mixed needs depending on the type of data the app is generating. But in this way, if they write the app with a category of A (let's say that means highest safety at expense of performance) and their benchmarking determines this is crap, and they have to go to category B for writes, that's a simpler change that going back through their code and refactoring a pile of fsyncs or FUA writes.
I mean, I thought this was a major reason for VFS abstraction between the application and kernel anyway. It's also an example of the distinction between open source and free (libre). If as an application developer you have to know such esoterics to sanely optimize, you in fact aren't really free to do what you want. You have to go down a particular rabbit hole and optimize for that fs, at the expense of others. That's not fair choice to have to make.
The inherent issue is that there's a huge performance benefit to be gained by batching updates. FS safety will always come at the cost of performance.
The article doesn't say but I suspect most of the issues it mentions can be mitigated by mounting with the "sync" and "dirsync" options, but that absolutely kills performance.
The APIs involved could definitely be friendlier, but the app developer is using an API that's explicitly performance oriented by default at the cost of safety, and needs to opt-in to get safer writes. Whether the default should be the other way around is one matter, but ultimately someone has to pick which one they want and live with the consequences.