Comment by polyrand

3 years ago

As others have pointed out, that benchmark is from 2017. However, the "SQLite: Past, Present, and Future" paper[0] has an updated version of this benchmark (see section 4.3 Blob manipulation), and also compares it with DuckDB.

Edit:

Another thing that is sometimes forgotten when comparing SQLite to the filesystem is that files are hard[1]. It's not only about performance, but also about all the guarantees that you get "for free", and all the complexity you can remove from your codebase if you need ACID interactions with your files.

[0]: https://www.vldb.org/pvldb/vol15/p3535-gaffney.pdf

[1]: https://danluu.com/file-consistency/

4 comments

polyrand

lanstin 3 years ago

Rename is totally atomic in a way that is simple to reason about. Ten processes try a rename to a per process name, and just one wins, even on an NFS.That one is free to do whatever and then rename back.

polyrand 3 years ago
Yes, rename is atomic, but that doesn't get you very far. For example, if you want to do a partial update to a file, you need to copy the file (you already need more storage), edit the file and then rename it. But then you can't do this between multiple processes because maybe you have copied the file while another process was already changing a different copy (but not yet renamed). And that's not even considering fsync, or supporting different Operating Systems.
- the8472 3 years ago
  
  Basic atomic filesystem operations support consensus numbers[0] 1 or 2 (via rename and RENAME_EXCHANGE, respectively), but for some concurrent algorithms one needs higher ones. File locks are the only way to get those (if we ignore IPC), and those APIs are a pain.
  [0] https://en.wikipedia.org/wiki/Read%E2%80%93modify%E2%80%93wr...

the8472 3 years ago

> It's not only about performance, but also about all the guarantees that you get "for free"

Note that some filesystems have mount options that let you remove some of those guarantees.