← Back to context

Comment by throwawaylinux

4 years ago

> No, not without that. Even with that, you can't have durable writes; Not on a mac, or linux or anywhere else, if you are worried about fsync()/fcntl+F_FULLSYNC because they do nothing to protect against hardware failure: The only thing that does is shipping the data someplace else (and depending on the criticality of the data, possibly quite far).

"The sun might explode so nothing guarantees integrity", come on, get real. This is pointless nitpicking.

Of course fsync ensures durable writes on systems like Linux with drives that honor FUA. The reliability of the device and stack in question is implied in this and anybody who talks about data integrity understands that. This is how you can calculate and manage error rates of your system.

> "The sun might explode so nothing guarantees integrity", come on, get real. This is pointless nitpicking.

I think most people understand that there is a huge difference between the sun exploding and a single hardware failure.

If you really don't understand that, I have no idea what to say.

> Of course fsync ensures durable writes on systems like Linux with drives that honor FUA

No it does not. The drive can still fail after you write() and nobody will care how often you called fsync(). The only thing that can help is writing it more than once.

  • What is the difference in the context of your comment? The likelihood of the risk, and nothing else. So what is the exact magic amount of risk that makes one thing durable and another not, and who made you the arbiter of this?

    > No it does not. The drive can still fail after you write() and nobody will care how often you called fsync(). The only thing that can help is writing it more than once.

    It does to anybody who actually understands these definitions. It is durable according to the design (i.e., UBER rates) of your system. That's what it means, that's always what it meant. If you really don't understand that, I have no idea what to say.

    > The only thing that can help is writing it more than once.

    This just shows a fundamental misunderstanding. You achieve a desired uncorrected error rate by looking at the risks and designing parts and redundancy and error correction appropriately. The reliability of one drive/system might be greater than two less reliable ones, so "writing it more than once" is not only not the only thing that can help, it doesn't necessarily achieve the required durability.

    • > What is the difference in the context of your comment? The likelihood of the risk, and nothing else. So what is the exact magic amount of risk that makes one thing durable and another not, and who made you the arbiter of this?

      What's the difference between the sun exploding and a single machine failing?

      I have no idea how to answer that. Maybe it's because many people have seen a single machine fail, but nobody has seen the sun explode? I guess I've never had a need to give it more thought than that.

      > It does to anybody who actually understands these definitions. It is durable according to the design (i.e., UBER rates) of your system.

      You are wrong about that: Nobody cares if something is "designed to be durable according to the definition in the design". That's just more weasel words. They care what are the risks, how you actually protect against them, and what it costs to do. That's it.

      1 reply →

    • There is a point at which a redundant array of inexpensive and unreliable replicas is more durable than a single drive. Even N in-memory databases spread across the world is more durable than a single one with fsync.

      Unfortunately few databases besides maybe blockchains have been engineered with that in mind.

      1 reply →

  • Say you have mirrored devices. Or RAID-5, whatever. Say the devices don't lie about flushing caches. And you fsync(), and then power fails, and on the way back up you find data loss or worse, data corruption. The devices didn't fail. The OS did.

    One need not even assume no device failure, since that's the point of RAID: to make up for some not-insignificant device failure rate. We need only assume that not too many devices fail at the same time. A pretty reasonable assumption. One relied upon all over the world, across many data centers.

  • This is not about hardware failure but OS crashes and bugs that much more frequent.

    • If the OS has bugs that will make it crash, what makes you think those bugs aren’t going to affect fsync()?