Comment by aaronbwebber
24 days ago
It's not just better performance on latency benchmarks, it likely improves throughput as well because the writes will be batched together.
Many applications do not require true durability and it is likely that many applications benefit from lazy fsync. Whether it should be the default is a lot more questionable though.
It’s like using a non-cryptographically secure RNG: if you don’t know enough to look for the fsync flag off yourself, it’s unlikely you know enough to evaluate the impact of durability on your application.
> if you don’t know enough to look for the fsync flag off yourself,
Yeah, it should use safe-defaults.
Then you can always go read the corners of the docs for the "go faster" mode.
Just like Postgres's infamous "non-durable settings" page... https://www.postgresql.org/docs/18/non-durability.html
You can batch writes while at the same time not acknowledging them to clients until they are flushed, it just takes more bookkeeping.
I also think fsync before acking writes is a better default. That aside, if you were to choose async for batching writes, their default value surprises me. 2 minutes seems like an eternity. Would you not get very good batching for throughout even at something like 2 seconds too? Still not safe, but safer.
For transactional durability, the writes will definitely be batched ("group commit"), because otherwise throughput would collapse.
> Many applications do not require true durability
Pretty much no application requires true durability.
Maybe what's confusing here is "true durability" but most people want to know that when data is committed that they can reason about the durability of that data using something like a basic MTBF formula - that is, your durability is "X computers of Y total have to fail at the same time, at which point N data loss occurs". They expect that as the number Y goes up, X goes up too.
When your system doesn't do things like fsync, you can't do that at all. X is 1. That is not what people expect.
Most people probably don't require X == Y, but they may have requirements that X > 1.
For the vast majority of applications a rare event of data loss is no big deal and even expected.
1 reply →