← Back to context

Comment by monocasa

4 years ago

Journaling file systems depend on commands like FLUSH to create the appropriate barriers to build their journal's semantics.

From the filesystem's perspective looking at storage, the flush does indeed preserve the semantics. It is just that you can't rely on the contents of anything if the power goes out.

I don't have a clue how a journaling FS works. But any ordering should not be observable unless you have a power outage. Can you give an example how a journaling FS could observe something that should be observable?

  • > unless you have a power outage

    Journaling FSes are all about safety in the face of such things. That is, unless the drive lies.

  • The simplest answer is that the journal size isn't infinite, and not everything goes into the journal (like often actual file data). Therefore, stuff must be removed from the journal at some point. The filesystem only removes stuff from the journal once it has a clear message from the drive that the data that has been written elsewhere is safe and secure. If the drive lies about that, then the filesystem may overwrite part of the journal that it thinks is no longer needed, and the drive may write that journal-overwrite before it writes the long-term representation. That's how you get filesystem corruption.

  • > But any ordering should not be observable unless you have a power outage.

    But what if the front does fall off?

  • A crash, lockup, are the same as a power failure.

    • No, during a crash or lockup, acknowledged writes are not lost. (Because the drive has acknowledged them, they are in the drive's internal queue and thus need no further action from the OS to be committed to durable storage.) Only power loss/power cycle causes this.

      1 reply →

    • Why? During a crash or lockup acked writes still reached the drive. They will be flushed to the storage eventually by the SSD controller. As long as you have power that is.

      7 replies →

    • macOS flushes the NVMe cache on kernel panic.

      Probably not on lockups though. A watchdog reset won't flush NVMe. Not sure if they have a special pre-fire path that tries a last ditch NVMe flush...