Comment by rowanG077

4 years ago

I don't have a clue how a journaling FS works. But any ordering should not be observable unless you have a power outage. Can you give an example how a journaling FS could observe something that should be observable?

> unless you have a power outage

Journaling FSes are all about safety in the face of such things. That is, unless the drive lies.

The simplest answer is that the journal size isn't infinite, and not everything goes into the journal (like often actual file data). Therefore, stuff must be removed from the journal at some point. The filesystem only removes stuff from the journal once it has a clear message from the drive that the data that has been written elsewhere is safe and secure. If the drive lies about that, then the filesystem may overwrite part of the journal that it thinks is no longer needed, and the drive may write that journal-overwrite before it writes the long-term representation. That's how you get filesystem corruption.

> But any ordering should not be observable unless you have a power outage.

But what if the front does fall off?

A crash, lockup, are the same as a power failure.

  • No, during a crash or lockup, acknowledged writes are not lost. (Because the drive has acknowledged them, they are in the drive's internal queue and thus need no further action from the OS to be committed to durable storage.) Only power loss/power cycle causes this.

  • Why? During a crash or lockup acked writes still reached the drive. They will be flushed to the storage eventually by the SSD controller. As long as you have power that is.

    • The key word is ‘eventually’. How long? Seconds, or even minutes? If your machine locks up, you turn it off and on again. If the drive didn’t bother to flush its internal caches by then, that data is lost, just as in a power failure.

      6 replies →

  • macOS flushes the NVMe cache on kernel panic.

    Probably not on lockups though. A watchdog reset won't flush NVMe. Not sure if they have a special pre-fire path that tries a last ditch NVMe flush...