← Back to context

Comment by gruez

4 years ago

Correct me if I'm wrong, but if these drives are used for consumer applications, this behavior is probably not a big deal? If you made changes to a document, pressed control-S, and then 1 second later the power went out, then you might lose that last save. That'd suck, but you would have lost the data anyways if the power loss occurred 2s before, so it's not that bad. As long as other properties weren't violated (eg. ordering), your data should mostly be okay, aside from that 1s of data. It's a much bigger issue for enterprise applications, eg. a bank's mainframe responsible for processing transactions told a client that the transaction went through, but a power loss occurred and the transaction was lost.

Modern SSDs, and especially NVMe drives, have extensive logic for reordering both reads and writes, which is part of why they perform best at high queue depths. So it's not just possible but expected that the drive will be reordering the queue. Also, as batteries age, it becomes quite common to lose power without warning while on a battery.

In general it's strange to hear excuses for this behavior since it's obviously an attempt to pass off the drive's performance as better than it really is by violating design constraints that are basic building blocks of data integrity.

  • >Modern SSDs, and especially NVMe drives, have extensive logic for reordering both reads and writes, which is part of why they perform best at high queue depths. So it's not just possible but expected that the drive will be reordering the queue.

    If we're already in speculation territory, I'll further speculate that it's not hard to have some sort of WAL mechanism to ensure the writes appear in order. That way you can lie to the software that the writes made it to persistent memory, but still have consistent ordering when there's a crash.

    >Also, as batteries age, it becomes quite common to lose power without warning while on a battery.

    That's... totally consistent with my comment? If you're going for hours without saving and only saving when the OS tells you there's only 3% battery left, then you're already playing fast and loose with your data. Like you said yourself, it's common for old laptops to lose power without warning, so waiting until there's a warning to save is just asking for trouble. Play stupid games, win stupid prizes. Of course, it doesn't excuse their behavior, but I'm just pointing out to the typical consumer, the actual impact isn't bad as people think.

It’s a big deal because they are lying. That sets false expectations for the system. There are different commands for ensuring write ordering.

> As long as other properties weren't violated (eg. ordering), your data should mostly be okay, aside from that 1s of data.

That's the thing though—ordering isn't guaranteed as far as I remember. If you want ordering you do syncs/flushes, and if the drive isn't respecting those, then ordering is out of the window. That means FS corruption and such. Not good.

  • The tweet only mentioned data loss when you yanked the power cable. That doesn't say anything about whether the ordering is preserved. It's possible to have a drive that lies about data written to persistent storage, but still keeps the writes in order.

> If you made changes to a document, pressed control-S, and then 1 second later the power went out, then you might lose that last save.

If you made changes to a document, pressed control-S, and then 1 second later the power went out, then the entire filesystem might become corrupted and you lose all data.

Keep in mind that small writes happen a lot -- a lot a lot. Every time you click a link in a web page it will hit cookies, update your browser history, etc etc, all of which will trigger writes to the filesystem. If one of these writes triggers a modification to the superblock, and during the update a FLUSH is ignored and the superblock is in a temporary invalid state, and the power goes out, you may completely hose your OS.

Nope, the problem here is that it violates a very basic ordering guarantee that all kinds of applications build on top of. Consider all of the cases of these hybrid drives or just multiple hard drives where you fsync on one to journal that you do something on the other (e.g. steam storing actual games on another drive).

This behavior will cause all kinds of weird data inconsistencies in super subtle ways.

> As long as other properties weren't violated (eg. ordering)

That is primarily what fsync is used to ensure. (SCSI provides other means of ensuring ordering, but AFAIK they're not widely implemented.)

EDIT: per your other reply, yes, it's possible the drives maintain ordering of FLUSHed writes, but not durability. I'm curious to see that tested as well. (Still an integrity issue for any system involving more than just one single drive though.)

> That'd suck, but you would have lost the data anyways if the power loss occurred 2s before,

But if you knew power was failing, which is why you did the ^S in the first place, it would not just suck, it be worse than that because your expectations were shattered.

It's all fine and good to have the computers lie to you about what they're doing, especially if you're in on the gag.

But when you're not, it makes the already confounding and exasperating computing experience just that much worse.

Go back to floppies, at least you know the data is saved with the disk stops spinning.

  • >But if you knew power was failing, which is why you did the ^S in the first place, it would not just suck, it be worse than that because your expectations were shattered.

    The only situation I can think of this being applicable is for a laptop running low on battery. Even then, my guess is that there is enough variance in terms of battery chemistry/operating conditions that you're already playing fast and loose with regards to your data if you're saving data when there's only a few seconds of battery left. I agree that that having it not lose data is objectively better than having it lose data, but that's why I characterized it as "not a big deal".

    • You've not been to my house.

      Contractor: "Hi, we need to kill the power to the house now."

      Me: "Oh, ok, let me shut down my computer."

      And, everything I've been reading lately is simply that there's nothing safe about this. How is shutting down a computer ever safe now? How long do we have to wait to ensure our data is flushed correctly, by everything?

      1 reply →