Comment by shadowgovt
4 years ago
As a frame of reference, how much loss of FLUSH'd data should be expected on power loss for a semi-permanent storage device (including spinning-platter hard drives, if anyone still installs them in machines these days)?
I'm far more used to the mainframe space where the rule is "Expect no storage reliability; redundancy and checksums or you didn't want that data anyway" and even long-term data is often just stored in RAM (and then periodically cold-storage'd to tape). I've lost sight of what expected practice is for desktop / laptop stuff anymore.
The semantics of a FLUSH command (per NVMe spec) is that all previously sent write commands along with any internal metadata must be written to durable storage before returning success.
Basically the drive is saying "yup, it's all on NAND - not in some internal buffer. You can power off or whatever you want, nothing will be lost".
Some drives are doing work in response to that FLUSH but still lose data on power loss.
A flush command only guarantees, upon completion, that all writes COMPLETED prior to submission of the flush are non-volatile. Not all previously sent writes. NVMe base specification 2.0b section 7.1.
That's a very important distinction. You can't assume just because a write completed before the flush that it's actually durable. Only if it completed before you sent the flush.
I'm not very confident that software is actually getting this right all that often, although it probably is in this fsync test.
Is there a separate barrier command so you don't have to track all the writes individually in software?
1 reply →
> how much loss of FLUSH'd data should be expected on power loss for
0%
In enterprise you are expected to expect lost data, but only if your drive fails and needs to be replaced, or if it's not yet flushed.
None. If the drive responds that the data has been written, it is expected to be there after a power failure.