Comment by lxgr

2 years ago

I don't think that's how it works: Flushing metadata before data would be a security concern (consider e.g. the metadata change of increasing a file's length due to an append before the data change itself), so file systems usually only ever do the opposite, which is safe.

Getting back zeroes after a metadata sync (which must follow a data sync) would accordingly be an indication of something weird having happened at the disk level: We'd expect to either see no data at all, or correct data, but not zeroes or any other file's or previously written stale data.

The file isn't stored contiguously on disk, so that would depend on the implementation of the filesystem. Perhaps the size of the file can be changed, without extents necessarily being allocated to cover the new size?

I seem to vaguely recall an issue like that, for ext4 in particular. Of course it's possible in general for any filesystem that supports holes, but I don't think we can necessarily assume that the data is always written, and all the pointers to it also written, before the file-size gets updated.

  • At least for ext4 and actually written data (i.e. not ftruncate’d files), I believe zeroes should really not occur.

    Both extents and the file size are metadata as far as I understand, which would be atomically updated through the journal.

    Data can be written before metadata (in data=ordered mode):

    > All data are forced directly out to the main file system prior to its metadata being committed to the journal.

I think there could semi-reasonably be case for the zero bytes appearing if the fs knows there should be something written there, and the block is has been allocated, but no write yet. Then it's not compromising confidentiality to zero the allocated block when recovering the journal when the disk is mounted. But the zero byte origin doesn't seem to be spelled out anywhere so this is just off the cuff reasoning.

The file's size could have been set by the application before copying data to it. This will result in a file which reads all zeroes.

Or if it were a hardware ordering fault, remember that SSD TRIM is typically used by modern filesystems to reclaim unused space. TRIMmed blocks read as zero.

  • > The file's size could have been set by the application before copying data to it. This will result in a file which reads all zeroes.

    Hm, is that a common approach? I thought applications mostly use fallocate(2) for that if it's for performance reasons, which does not change the nominal file size.

    Actually allocating zeroes sounds like it could be quite inefficient and confusing, but then again, fallocate is not portable POSIX.

    > Or if it were a hardware ordering fault

    That's what I suspect might be going on here.

    • fseek to the (new) end and back. Avoids metadata updates on every write. Not sure how common it is but it does occur.

Ext3 will totally let you expose yourself to those security issues. I'm not sure about ext4.

  • Only in data=writeback mode, which is not the default in either ext3 or ext4.