Comment by simcop2387

8 years ago

This is why calling fsync(fd) before closing the file and exiting is a good idea if you need that kind of error to be handled. You should get it as a return of fsync if it happens after the write.

O_[D]SYNC is better than a separate call to fsync, since it is not supposed to suffer from the race condition inherent to fsync. Arguably pedantic.

  • Definitely agree there. But it's also not always a good idea if you're going to do a lot of writes and rewrites to the same area of a file. If you're doing something more write once or a log append type pattern it won't make a difference usually. But if you're changing data a lot before closing/finishing the file then you might not want the dramatic performance change that O_SYNC can bring, and the race between fsync and close might still be worth it (esp if you're doing an fsync on all the directories involved to ensure metadata is commited too).

  • I agree with using O_DSYNC to surface the error to the write call, rather than waiting until the fsync call, which is often not checked by the user.

    I did some testing recently [1] with O_DIRECT + O_DSYNC and found some surprising performance results, on Linux it can be similar to O_DIRECT + fsync() after every write for hard drives. But as soon as you are doing grouped writes, performance is almost always better by using O_DIRECT + fsync() after the end of the group.

    For SSD drives though, O_DIRECT + O_DSYNC can be faster than O_DIRECT + fsync() after the end of the group, if you are pipelining your IO, e.g. you encrypt and checksum the next batch of sectors while you wait for the previous batch of checksummed and encrypted sectors to be written out. Because SSDs are so much faster, you can actually afford to slow down the write a little more by using O_DSYNC, so that your write is not faster than the related CPU work.

    [1] https://github.com/ronomon/direct-io.

    • A more advanced (and somewhat easy to get wrong) option would be sync_file_range combined with fdatasync, which allows to roughly emulate O_DSYNC overall but without blocking synchronously for IO.

      2 replies →