← Back to context

Comment by colanderman

4 years ago

Hard drive write caches are supposed to be battery-backed (i.e., internal to the drive) for exactly this reason. (Apparently the drives tested are not.) Data integrity should not be dependent on power supply (UPS or not) in any way; it's unnecessary coupling of failure domains (two different domains nonetheless -- availability vs. integrity).

The entire point of the FLUSH command is to flush caches that aren't battery backed.

Battery-backed drives are free to ignore such commands. Those that aren't need to honor them. That's the point.

Battery- or capacitor-backed enterprise drives are intended to give you more performance by allowing the drive and indeed the OS to elide flushes. They aren't supposed to give you more reliability if the drive and software are working properly. You can achieve identical reliability with software that properly issues flush requests, assuming your drive is honoring them as required by the NVMe spec.

  • I don't think I said anything to the contrary?

    • You said caches should be battery backed, implying that it's wrong for them not to be. I'm saying FLUSH is what you use to maintain data integrity when caches are not battery backed, which is a perfectly valid use case. Modern drives are not expected to have battery backed caches; instead the software knows how to ask them to flush to preserve integrity. We've traded off performance to make up the integrity.

      The problem is these drives don't provide integrity even when you explicitly ask them to.

As a systems engineer, I think we should be careful throwing words around like “should”. Maybe the data integrity isn’t something that’s guaranteed by a single piece of hardware but instead a cluster or a larger eventually consistent system?

There will always be trade-offs to any implementation. If you’re just using your M2 SSD to store games downloaded off Steam I doubt it really matters how well they flush data. However if your financial startup is using then without an understanding of the risks and how to mitigate them, then you may have a bad time.

  • The OS or application can always decide not to wait for an acknowledgement from the disk if it's not necessary for the application. The disk doesn't need to lie to the OS for the OS to provide that benefit.

> it's unnecessary coupling

I think much improved write performance is a good example of how it can be beneficial, with minimal risk.

Everything can be nice ideals of abstraction, until you want to push the envelope.

  • Accidental drive pulls happen -- think JBODs and RAID. Ideally, if an operator pulls the wrong drive, and then shoves it back in in a short amount of time, you want to be able to recover from that without a full RAID rebuild. You can't do that correctly if the RAID's bookkeeping structures (e.g. write-intent bitmap) are not consistent with the rest of the data on the drive. (To be fair, in practice, an error arising in this case would likely be caught by RAID parity.)

    Not saying UPS-based integrity solutions don't make sense, you are right it's a tradeoff. The issue to me is more device vendors misstating their devices' capabilities.