Comment by rockwotj
5 hours ago
The sub-millisecond writes with data in S3 is false and impossible. If you look at the benchmark the fsync is not timed, so this is just the latency of either the network or in kernel file operations depending on the mount settings
I hate it when databases celebrate their performance without synchronous flushing. You should be clear about data loss window (which should be zero for committed transactions by default!) and the flushing interval to persistent storage.
I'm okay if you batch writes, I'm okay if you offer a low-latency mode with less durability, but by being unclear about this it just feels like a scam.
Yeah in this case the footnote to the write latency specifically says “at rest in S3”, which is what caused me to go look at the source. To be clear I have no problem with the ZeroFS of only flushing on fsync.
I am very excited for object storage first systems like this to leverage low latency zonal storage for write ahead logs to keep the disaggregated storage but greatly reduce write latency. That ends up being more expensive, but is likely a good tradeoff in lots of cases I have seen
ZeroFS aims to be a POSIX filesystem, the semantics here are the standard ones (ext4, xfs behave the same): write() is buffered (that's the batching) and "committed" maps to fsync(), which returns only once data is durable.
Nothing wrong with that, but you should remove the “at rest in S3” footnote from the write latency on the frontpage of the website, because that is not what is measured