Comment by r1ch

2 years ago

We shipped a shader cache in the latest release of OBS and quickly had reports come in that the cached data was invalid. After investigating, the cache files were the correct size on disk but the contents were all zero. On a journaled file system this seems like it should be impossible, so the current guess is that some users have SSDs that are ignoring flushes and experience data corruption on crash / power loss.

41 comments

r1ch

fulafel 2 years ago

I think this is typical behaviour with ext4 on Linux, if the application doesn't do fsync/fdatasync to flush the data to disk.

Depending on mount options, ext4fs does metadata journaling ensuring the FS itself is not borked, but not data journaling which would safeguard the file contents in event of unclean shutdown with pending writes in the caches.

The same phenomenon is at play when people complain that their log files contain NUL bytes after a crash. The file system metadata has been updated for the size of the file to fit the appended write, but the data itself was not written out yet.

Dylan16807 2 years ago
The current default is data=ordered, which should prevent this problem if the hardware doesn't lie. The data doesn't go in the journal, but it has to be written before the journal is committed.
There was a point where ext3 defaulted to data=writeback, which can definitely give you files full of null bytes.
And data=journal exists but is overkill for this situation.
- charleshn 2 years ago
  
  It's likely because of delayed allocations (delalloc): https://issuetracker.google.com/issues/172227346#comment6
  because the only guarantee which data=ordered provides is the security guarantee that stale data won't be revealed.
  Yes, it's bad and breaks prefix append consistency, and does not match the documentation...
  
  2 replies →
- matja 2 years ago
  
  > which should prevent this problem if the hardware doesn't lie.
  Or, one can take the ZFS approach and assume the hardware often lies :)
  
  10 replies →
- altfredd 2 years ago
  
  The "data" setting of ext filesystems isn't replacement for fsync().
  
  2 replies →
- consp 2 years ago
  
  My only n=1 observation is that null values in logs occur on nvme, ssd and spinning rust. All ext4 with defaults. I do have the idea it occurs more on nvme drives though. But maybe my systems settings are just booked.
lxgr 2 years ago
I don't think that's how it works: Flushing metadata before data would be a security concern (consider e.g. the metadata change of increasing a file's length due to an append before the data change itself), so file systems usually only ever do the opposite, which is safe.
Getting back zeroes after a metadata sync (which must follow a data sync) would accordingly be an indication of something weird having happened at the disk level: We'd expect to either see no data at all, or correct data, but not zeroes or any other file's or previously written stale data.
- Filligree 2 years ago
  
  The file isn't stored contiguously on disk, so that would depend on the implementation of the filesystem. Perhaps the size of the file can be changed, without extents necessarily being allocated to cover the new size?
  I seem to vaguely recall an issue like that, for ext4 in particular. Of course it's possible in general for any filesystem that supports holes, but I don't think we can necessarily assume that the data is always written, and all the pointers to it also written, before the file-size gets updated.
  
  3 replies →
- fulafel 2 years ago
  
  I think there could semi-reasonably be case for the zero bytes appearing if the fs knows there should be something written there, and the block is has been allocated, but no write yet. Then it's not compromising confidentiality to zero the allocated block when recovering the journal when the disk is mounted. But the zero byte origin doesn't seem to be spelled out anywhere so this is just off the cuff reasoning.
- colanderman 2 years ago
  
  The file's size could have been set by the application before copying data to it. This will result in a file which reads all zeroes.
  Or if it were a hardware ordering fault, remember that SSD TRIM is typically used by modern filesystems to reclaim unused space. TRIMmed blocks read as zero.
  
  2 replies →
- Dylan16807 2 years ago
  
  Ext3 will totally let you expose yourself to those security issues. I'm not sure about ext4.
  
  1 reply →

bugfix 2 years ago

I had this exact experience with my workstation SSD (NTFS) after a short power loss while NPM was running. After I turned the computer back on, several files (package.json, package-lock.json and many others inside node_modules) had the correct size on disk but were filled with zeros.

I think the last time I had corrupted files after a power loss was in a FAT32 disk on Win98, but you'd usually get garbage data, not all zeros.

dspillett 2 years ago

> but you'd usually get garbage data, not all zeros.
You are less likely to get garbage with an SSD in combination with a modern filesystem because of TRIM. Even if the SSD has not (yet) wiped the data, it knows that a block that is marked as unused can be retuned as a block of 0s without needing to check what is currently stored for that block.
Traditional drives had no such facility to have blocks marked as unused from their PoV, so they always performed the read and returned what they found which was most likely junk (old data from deleted files that would make sense in another context) though could also be a block of zeros (because that block hadn't been used since the drive had a full format or someone zeroed free-space).
wannacboatmovie 2 years ago
They may be pointing to unallocated space which on a SSD running TRIM would return all zeros. NTFS is an extremely resilient yet boring filesystem, I cannot remember the last time I had to run chkdsk even after an improper shutdown.
- Sakos 2 years ago
  
  As somebody who worked as a PC technician for a while until very recently, I've run chkdsk and had to repair errors on NTFS filesystems very, very, very often. It's almost an everyday thing. Anecdotal evidence is less than useful here.
  
  3 replies →

matja 2 years ago

Journaling filesystems (including NTFS, and ext3/ext4 using default mount options) typically only track file structure metadata in the journal, so that is WAI - the filesystem structure was not corrupted, but all bets are off when it comes to the contents of the files.

chrismorgan 2 years ago

I lost Audacity projects due to BSODs on a Surface Book several times in ~2019: the *_data/**.au files were intact, each containing just a few seconds of audio; but the .aup XML file that maps them and contains whatever else makes up the project was all zeroed. My memory’s fuzzy, but I think it was something like exit sometimes triggering the BSOD, and save-on-exit corrupting consistently if it BSODed, and so the workaround was to remember to save first, and then if it BSODs you’re OK.

agbrrw 2 years ago

>experience data corruption on crash / power loss

You mean on complete system crash, right? Your application crashing shouldn't lead to files being fulls of zeroes as long as you've already written everything out.