Comment by hughrr

4 years ago

Not really a problem when your computer has a large UPS built into it. Desktop macs, not so good.

But really isn’t the point of a journaling file system to make sure it is consistent at one guaranteed point in time, not necessarily without incidental data loss.

58 comments

hughrr

monocasa 4 years ago

Journaling file systems depend on commands like FLUSH to create the appropriate barriers to build their journal's semantics.

jnwatson 4 years ago

From the filesystem's perspective looking at storage, the flush does indeed preserve the semantics. It is just that you can't rely on the contents of anything if the power goes out.
rowanG077 4 years ago
I don't have a clue how a journaling FS works. But any ordering should not be observable unless you have a power outage. Can you give an example how a journaling FS could observe something that should be observable?
- monocasa 4 years ago
  
  > unless you have a power outage
  Journaling FSes are all about safety in the face of such things. That is, unless the drive lies.
- mnw21cam 4 years ago
  
  The simplest answer is that the journal size isn't infinite, and not everything goes into the journal (like often actual file data). Therefore, stuff must be removed from the journal at some point. The filesystem only removes stuff from the journal once it has a clear message from the drive that the data that has been written elsewhere is safe and secure. If the drive lies about that, then the filesystem may overwrite part of the journal that it thinks is no longer needed, and the drive may write that journal-overwrite before it writes the long-term representation. That's how you get filesystem corruption.
- nwallin 4 years ago
  
  > But any ordering should not be observable unless you have a power outage.
  But what if the front does fall off?
- b112 4 years ago
  
  A crash, lockup, are the same as a power failure.
  
  11 replies →

dathinab 4 years ago

> Not really a problem when your computer has a large UPS built into it.

Actually it is (through a small one) to name some examples where it can still lose without full sync:

- OS crashes

- random hard reset, e.g. due to bit flips due to e.g. cosmic radiation (happens). Or someone putting their magnetic earphone cases or similar on your laptop or similar.

Also any application which care about data integrity will do full syncs and in turn will get hit by a huge perf. penalty.

I have no idea why people are so adamant to defend Apple in this case, it's pretty clear that they messed up as performance with full flush is just WAY to low and this affects anything which uses full flushes, which any application should at least do on (auto-)safe.

The point of a journalism file system is about making it less likely the file system _itself_ isn't corrupted. Not that the files are not corrupted if they don't use full sync!

marginalia_nu 4 years ago

I had an NVMe controller randomly reset itself a few days ago. I think it was a heat issue. Not really sure though, may be that the motherboard is dodgy.
This shit does happen.
colanderman 4 years ago
OS crashes do not cause acknowledged writes to be lost. They are already in the drive's queue.
- dathinab 4 years ago
  
  They do if you don't use F_FULLSYNC, even apple acknowledges it (quote apple man pages):
  > Specifically, if the drive loses power or the OS crashes, the application may find that only some or none of their data was written.
  It's also worse then just write losses:
  > The disk drive may also re-order the data so that later writes may be present, while earlier writes are not.
  
  3 replies →

colanderman 4 years ago

Hard drive write caches are supposed to be battery-backed (i.e., internal to the drive) for exactly this reason. (Apparently the drives tested are not.) Data integrity should not be dependent on power supply (UPS or not) in any way; it's unnecessary coupling of failure domains (two different domains nonetheless -- availability vs. integrity).

marcan_42 4 years ago
The entire point of the FLUSH command is to flush caches that aren't battery backed.
Battery-backed drives are free to ignore such commands. Those that aren't need to honor them. That's the point.
Battery- or capacitor-backed enterprise drives are intended to give you more performance by allowing the drive and indeed the OS to elide flushes. They aren't supposed to give you more reliability if the drive and software are working properly. You can achieve identical reliability with software that properly issues flush requests, assuming your drive is honoring them as required by the NVMe spec.
- colanderman 4 years ago
  
  I don't think I said anything to the contrary?
  
  1 reply →
oceanplexian 4 years ago
As a systems engineer, I think we should be careful throwing words around like “should”. Maybe the data integrity isn’t something that’s guaranteed by a single piece of hardware but instead a cluster or a larger eventually consistent system?
There will always be trade-offs to any implementation. If you’re just using your M2 SSD to store games downloaded off Steam I doubt it really matters how well they flush data. However if your financial startup is using then without an understanding of the risks and how to mitigate them, then you may have a bad time.
- colanderman 4 years ago
  
  The OS or application can always decide not to wait for an acknowledgement from the disk if it's not necessary for the application. The disk doesn't need to lie to the OS for the OS to provide that benefit.
nomel 4 years ago
> it's unnecessary coupling
I think much improved write performance is a good example of how it can be beneficial, with minimal risk.
Everything can be nice ideals of abstraction, until you want to push the envelope.
- colanderman 4 years ago
  
  Accidental drive pulls happen -- think JBODs and RAID. Ideally, if an operator pulls the wrong drive, and then shoves it back in in a short amount of time, you want to be able to recover from that without a full RAID rebuild. You can't do that correctly if the RAID's bookkeeping structures (e.g. write-intent bitmap) are not consistent with the rest of the data on the drive. (To be fair, in practice, an error arising in this case would likely be caught by RAID parity.)
  Not saying UPS-based integrity solutions don't make sense, you are right it's a tradeoff. The issue to me is more device vendors misstating their devices' capabilities.

joenathanone 4 years ago

UPS won't help if kernel panics.

colanderman 4 years ago

It doesn't need to, kernel panic alone does not cause acknowledged data not to be written to the drive.
UPS is not perfect though, it's better if your data integrity guarantees are valid independent of power supply. All that requires is that the drive doesn't lie.
metalliqaz 4 years ago
kernel panic wouldn't take out the SSD firmware...
- dathinab 4 years ago
  
  To quote Apples man pages:
  > Specifically, if the drive loses power or the OS crashes, the application may find that only some or none of their data was written.
  
  1 reply →
marcan_42 4 years ago

macOS issues an NVMe flush on kernel panics.

withinboredom 4 years ago

> Not really a problem when your computer has a large UPS built into it.

Except that _one time_ you need to work until the battery fails to power the device, at 8%, because the battery's capacity is only 80%. Granted, this is only after a few years of regular use...

monocasa 4 years ago
In Apple's defense, they probably have enough power even in the worst case to limp along enough to flush in laptop form factor, even if the power management components refuse to power the main CCXs. Speccing out enough caps in the desktop case would be very Apple as well.
- marcan_42 4 years ago
  
  Apple do not have PLP on their desktop machines (at least not the Mac Mini). I've tested over 5 seconds of written but not FLUSHed data loss, and confirmed via hypervisor tracing that macOS doesn't do anything when you yank power. It just dies.
  
  5 replies →
- withinboredom 4 years ago
  
  Once voltage from the battery gets too low (despite reporting whatever % charge), you aren't getting anything from the battery.
  
  6 replies →
- b112 4 years ago
  
  There is no defense for lying about sync like this. Ever.

emodendroket 4 years ago

It seems pretty clear that desktop Macs are an afterthought for Apple.

lmilcin 4 years ago

Power loss is not the only way things can stop working.