← Back to context

Comment by r1ch

2 years ago

We shipped a shader cache in the latest release of OBS and quickly had reports come in that the cached data was invalid. After investigating, the cache files were the correct size on disk but the contents were all zero. On a journaled file system this seems like it should be impossible, so the current guess is that some users have SSDs that are ignoring flushes and experience data corruption on crash / power loss.

I think this is typical behaviour with ext4 on Linux, if the application doesn't do fsync/fdatasync to flush the data to disk.

Depending on mount options, ext4fs does metadata journaling ensuring the FS itself is not borked, but not data journaling which would safeguard the file contents in event of unclean shutdown with pending writes in the caches.

The same phenomenon is at play when people complain that their log files contain NUL bytes after a crash. The file system metadata has been updated for the size of the file to fit the appended write, but the data itself was not written out yet.

  • The current default is data=ordered, which should prevent this problem if the hardware doesn't lie. The data doesn't go in the journal, but it has to be written before the journal is committed.

    There was a point where ext3 defaulted to data=writeback, which can definitely give you files full of null bytes.

    And data=journal exists but is overkill for this situation.

  • I don't think that's how it works: Flushing metadata before data would be a security concern (consider e.g. the metadata change of increasing a file's length due to an append before the data change itself), so file systems usually only ever do the opposite, which is safe.

    Getting back zeroes after a metadata sync (which must follow a data sync) would accordingly be an indication of something weird having happened at the disk level: We'd expect to either see no data at all, or correct data, but not zeroes or any other file's or previously written stale data.

    • The file isn't stored contiguously on disk, so that would depend on the implementation of the filesystem. Perhaps the size of the file can be changed, without extents necessarily being allocated to cover the new size?

      I seem to vaguely recall an issue like that, for ext4 in particular. Of course it's possible in general for any filesystem that supports holes, but I don't think we can necessarily assume that the data is always written, and all the pointers to it also written, before the file-size gets updated.

      3 replies →

    • I think there could semi-reasonably be case for the zero bytes appearing if the fs knows there should be something written there, and the block is has been allocated, but no write yet. Then it's not compromising confidentiality to zero the allocated block when recovering the journal when the disk is mounted. But the zero byte origin doesn't seem to be spelled out anywhere so this is just off the cuff reasoning.

    • The file's size could have been set by the application before copying data to it. This will result in a file which reads all zeroes.

      Or if it were a hardware ordering fault, remember that SSD TRIM is typically used by modern filesystems to reclaim unused space. TRIMmed blocks read as zero.

      2 replies →

I had this exact experience with my workstation SSD (NTFS) after a short power loss while NPM was running. After I turned the computer back on, several files (package.json, package-lock.json and many others inside node_modules) had the correct size on disk but were filled with zeros.

I think the last time I had corrupted files after a power loss was in a FAT32 disk on Win98, but you'd usually get garbage data, not all zeros.

  • > but you'd usually get garbage data, not all zeros.

    You are less likely to get garbage with an SSD in combination with a modern filesystem because of TRIM. Even if the SSD has not (yet) wiped the data, it knows that a block that is marked as unused can be retuned as a block of 0s without needing to check what is currently stored for that block.

    Traditional drives had no such facility to have blocks marked as unused from their PoV, so they always performed the read and returned what they found which was most likely junk (old data from deleted files that would make sense in another context) though could also be a block of zeros (because that block hadn't been used since the drive had a full format or someone zeroed free-space).

  • They may be pointing to unallocated space which on a SSD running TRIM would return all zeros. NTFS is an extremely resilient yet boring filesystem, I cannot remember the last time I had to run chkdsk even after an improper shutdown.

    • As somebody who worked as a PC technician for a while until very recently, I've run chkdsk and had to repair errors on NTFS filesystems very, very, very often. It's almost an everyday thing. Anecdotal evidence is less than useful here.

      3 replies →

Journaling filesystems (including NTFS, and ext3/ext4 using default mount options) typically only track file structure metadata in the journal, so that is WAI - the filesystem structure was not corrupted, but all bets are off when it comes to the contents of the files.

I lost Audacity projects due to BSODs on a Surface Book several times in ~2019: the *_data/**.au files were intact, each containing just a few seconds of audio; but the .aup XML file that maps them and contains whatever else makes up the project was all zeroed. My memory’s fuzzy, but I think it was something like exit sometimes triggering the BSOD, and save-on-exit corrupting consistently if it BSODed, and so the workaround was to remember to save first, and then if it BSODs you’re OK.

>experience data corruption on crash / power loss

You mean on complete system crash, right? Your application crashing shouldn't lead to files being fulls of zeroes as long as you've already written everything out.