Comment by hughrr
4 years ago
Not really a problem when your computer has a large UPS built into it. Desktop macs, not so good.
But really isn’t the point of a journaling file system to make sure it is consistent at one guaranteed point in time, not necessarily without incidental data loss.
Journaling file systems depend on commands like FLUSH to create the appropriate barriers to build their journal's semantics.
From the filesystem's perspective looking at storage, the flush does indeed preserve the semantics. It is just that you can't rely on the contents of anything if the power goes out.
I don't have a clue how a journaling FS works. But any ordering should not be observable unless you have a power outage. Can you give an example how a journaling FS could observe something that should be observable?
> unless you have a power outage
Journaling FSes are all about safety in the face of such things. That is, unless the drive lies.
The simplest answer is that the journal size isn't infinite, and not everything goes into the journal (like often actual file data). Therefore, stuff must be removed from the journal at some point. The filesystem only removes stuff from the journal once it has a clear message from the drive that the data that has been written elsewhere is safe and secure. If the drive lies about that, then the filesystem may overwrite part of the journal that it thinks is no longer needed, and the drive may write that journal-overwrite before it writes the long-term representation. That's how you get filesystem corruption.
> But any ordering should not be observable unless you have a power outage.
But what if the front does fall off?
A crash, lockup, are the same as a power failure.
11 replies →
> Not really a problem when your computer has a large UPS built into it.
Actually it is (through a small one) to name some examples where it can still lose without full sync:
- OS crashes
- random hard reset, e.g. due to bit flips due to e.g. cosmic radiation (happens). Or someone putting their magnetic earphone cases or similar on your laptop or similar.
Also any application which care about data integrity will do full syncs and in turn will get hit by a huge perf. penalty.
I have no idea why people are so adamant to defend Apple in this case, it's pretty clear that they messed up as performance with full flush is just WAY to low and this affects anything which uses full flushes, which any application should at least do on (auto-)safe.
The point of a journalism file system is about making it less likely the file system _itself_ isn't corrupted. Not that the files are not corrupted if they don't use full sync!
I had an NVMe controller randomly reset itself a few days ago. I think it was a heat issue. Not really sure though, may be that the motherboard is dodgy.
This shit does happen.
OS crashes do not cause acknowledged writes to be lost. They are already in the drive's queue.
They do if you don't use F_FULLSYNC, even apple acknowledges it (quote apple man pages):
> Specifically, if the drive loses power or the OS crashes, the application may find that only some or none of their data was written.
It's also worse then just write losses:
> The disk drive may also re-order the data so that later writes may be present, while earlier writes are not.
3 replies →
Hard drive write caches are supposed to be battery-backed (i.e., internal to the drive) for exactly this reason. (Apparently the drives tested are not.) Data integrity should not be dependent on power supply (UPS or not) in any way; it's unnecessary coupling of failure domains (two different domains nonetheless -- availability vs. integrity).
The entire point of the FLUSH command is to flush caches that aren't battery backed.
Battery-backed drives are free to ignore such commands. Those that aren't need to honor them. That's the point.
Battery- or capacitor-backed enterprise drives are intended to give you more performance by allowing the drive and indeed the OS to elide flushes. They aren't supposed to give you more reliability if the drive and software are working properly. You can achieve identical reliability with software that properly issues flush requests, assuming your drive is honoring them as required by the NVMe spec.
I don't think I said anything to the contrary?
1 reply →
As a systems engineer, I think we should be careful throwing words around like “should”. Maybe the data integrity isn’t something that’s guaranteed by a single piece of hardware but instead a cluster or a larger eventually consistent system?
There will always be trade-offs to any implementation. If you’re just using your M2 SSD to store games downloaded off Steam I doubt it really matters how well they flush data. However if your financial startup is using then without an understanding of the risks and how to mitigate them, then you may have a bad time.
The OS or application can always decide not to wait for an acknowledgement from the disk if it's not necessary for the application. The disk doesn't need to lie to the OS for the OS to provide that benefit.
> it's unnecessary coupling
I think much improved write performance is a good example of how it can be beneficial, with minimal risk.
Everything can be nice ideals of abstraction, until you want to push the envelope.
Accidental drive pulls happen -- think JBODs and RAID. Ideally, if an operator pulls the wrong drive, and then shoves it back in in a short amount of time, you want to be able to recover from that without a full RAID rebuild. You can't do that correctly if the RAID's bookkeeping structures (e.g. write-intent bitmap) are not consistent with the rest of the data on the drive. (To be fair, in practice, an error arising in this case would likely be caught by RAID parity.)
Not saying UPS-based integrity solutions don't make sense, you are right it's a tradeoff. The issue to me is more device vendors misstating their devices' capabilities.
UPS won't help if kernel panics.
It doesn't need to, kernel panic alone does not cause acknowledged data not to be written to the drive.
UPS is not perfect though, it's better if your data integrity guarantees are valid independent of power supply. All that requires is that the drive doesn't lie.
kernel panic wouldn't take out the SSD firmware...
To quote Apples man pages:
> Specifically, if the drive loses power or the OS crashes, the application may find that only some or none of their data was written.
1 reply →
macOS issues an NVMe flush on kernel panics.
> Not really a problem when your computer has a large UPS built into it.
Except that _one time_ you need to work until the battery fails to power the device, at 8%, because the battery's capacity is only 80%. Granted, this is only after a few years of regular use...
In Apple's defense, they probably have enough power even in the worst case to limp along enough to flush in laptop form factor, even if the power management components refuse to power the main CCXs. Speccing out enough caps in the desktop case would be very Apple as well.
Apple do not have PLP on their desktop machines (at least not the Mac Mini). I've tested over 5 seconds of written but not FLUSHed data loss, and confirmed via hypervisor tracing that macOS doesn't do anything when you yank power. It just dies.
5 replies →
Once voltage from the battery gets too low (despite reporting whatever % charge), you aren't getting anything from the battery.
6 replies →
There is no defense for lying about sync like this. Ever.
It seems pretty clear that desktop Macs are an afterthought for Apple.
Power loss is not the only way things can stop working.