Comment by mrd999

4 years ago

Is it possible the next write was incomplete when the power cut out? Wouldn't this depend on how updates to file data are managed by the filesystem? The size and alignment of disk and filesystem data & metadata blocks?

Yes, kinda. If the drive completes the flush but gets disconnected before the kernel can read the ack then I can get an error from fcntl(). In theory it's possible I could get an error from write() even though it succeeded but I don't know if that is possible in practice.

In any case the file's last line will have a counter value +1 compared to what I expected. That is counted as a success.

Failure is only when a line was written to the file with counter==N, fcntl(fd, F_FULLFSYNC, 1) reports success all the way back to userspace, yet the file has a value < N. This gives the drive a fairly big window to claim it finished flushing as the ack races back to userspace but even so two of the drives still failed. The SK Hynix Gold P31 sometimes lost multiple writes (N-2) meaning two flush cycles were not enough.