Comment by formerly_proven

5 years ago

This particular demo page actually makes queries against an almost 700 MB large (fills one CD!) SQLite database. Because the amount of data read is almost negligible (few hundred kB), performance is limited by latency (as you say). However, high-latency links also tend to be slower, so downloading the entire database a-priori would almost always be much slower.

For example, on a 1 megabit/s link with 300 ms RTT, one example would take about 2 seconds for the data transfer itself while spending another 3 seconds or so on waiting. Downloading the entire file would take around an hour and a half.

For your 10 MB database, transferring it as a whole would take 80 seconds. Assuming this solution instead needs to read e.g. 250 kB (taking 2 seconds to transfer), it could still bounce around 250 times to the database before those 10 MB are fully downloaded. (This would be a really odd query, since it would only read on average two pages per read request)

Right but that is an artificially created demo by the author to justify the solution being presented (no offense). The question is how common are ~GB large SQLite databases in the real world relative to databases that are ~MB large?

In my experience SQLite databases of millions of rows of raw tabular data tend to compress very well into dozens of megabytes. Indeed SQLite is often touted as a file format for applications.

  • I'm running magnetico (https://github.com/boramalper/magnetico) on my VPS. I currently have an index of 1.6M magnet links stored in a 5GB database.

    SQLite is most interesting not when the database is small, but when there are very few writes and all you do is reading. You can also look at https://datasette.io/ and see how SQLite is perfect for representing a lot of datasets and querying them

  • Even if you compress it, you need it in client memory which can also be a dealbreaker. Some people might need it for some projects, I know I did.

    • > you need it in client memory which can also be a dealbreaker

      Most workstations have GBs of available memory. If not you can dump it in indexeddb as a raw data store.

      I never disputed that it would be useful for some use cases. I only said it would be unusable with high latency links. If you have a low latency link and aren’t running very complex queries with lots of random seeks, then this should work fine for you.

  • My team has a few TB of data in SQLite files that are themselves dozens of GB each.

    We're using them as a replacement for leveldb's sstables, but with the structure of full SQL. It is highly effective.