Many Small Queries Are Efficient in SQLite

5 hours ago (sqlite.org)

I am using SQLite on paperless-ngx (an app to manage pdf [4]). It is quite difficult to beat SQLite if you do not have a very huge parallelism factor in writes.

SQLite is an embedded database: no socket to open, you directly access to it via file system.

If you do not plan to use BigData with high number of writers, you will have an hard time beating SQLite on modern hardware, on average use cases.

I have written a super simple search engine [1] using python asyncio and SQLite is not the bottleneck so far.

If you are hitting the SQLite limit, I have an happy news: PostgreSQL upgrade will be enough for a lot of use cases [2]: you can use it to play with a schemaless mongo-like database, a simple queue system [3] or a search engine with stemming. After a while you can decide if you need a specialized component (i.e. Kafka, Elastic Search, etc) for one of your services.

[1]: https://github.com/daitangio/find

[2]: https://gioorgi.com/2025/postgres-all/

[3]: https://github.com/daitangio/pque

[4]: https://docs.paperless-ngx.com

There is some risk that, if you design your website to use a local database (sqlite, or a traditional database over a unix socket on the same machine), then switching later to a networked database is harder. In other words, once you design a system to do 200 queries per page, you’d essentially have to redesign the whole thing to switch later.

It seems like it mostly comes down to how likely it is that the site will grow large enough to need a networked database. And people probably wildly overestimate this. HackerNews, for example, runs on a single computer.

  • I don't see how anyone would design a system that executes 200 queries per page. I understand having a system that is ín use for many many years and accumulates a lot of legacy code eventually ends up there, but designing? Never. That's not design, that's doing a bad job at design.

    • > I don't see how anyone would design a system that executes 200 queries per page.

      They call it the n+1 problem. 200 queries is the theoretically correct approach, but due to high network latency of networked DMBSes you have to hack around it. But if the overhead is low, like when using SQLite, then you would not introduce hacks in the first place.

      The parent is saying that if you correctly design your application, but then move to system that requires hacks to deal with its real-world shortcomings, that you won't be prepared. Although I think that's a major overstatement. If you have correctly designed the rest of your application too, introducing the necessary hacks into a couple of isolated places is really not a big deal at all.

  • The same is true for regular databases though, isn't it?

    Network adds latency and while it might be fine to run 500 queries with the database being on the same machine, adding 1-5ms per query makes it feel not okay.

    • Yes, that is why I said “local database (sqlite, or a traditional database over a unix socket on the same machine).”

      This isn’t an sqlite-specific point, although sqlite often runs faster on a single machine because local sockets have some overhead.

A lot of skepticism in comments. Let me remind them doing N loops over local disk with in memory cached pages is absolutely different compared to doing RT over typical VPS network. Having said that there is no silver bullet for dumb code! So let's not conflate the argument the author is trying to make.

>For a 50-entry timeline, the latency is usually less than 25 milliseconds. Profiling shows that few of those milliseconds were spent inside the database engine.

And instead were spent blocking on the disk for all of the extra queries that were made? Or is it trying to say that the concatenation a handful of strings takes 22 ms. Considering how much games can render with a 16 ms budget I don't see where that time is going rendering html.

This feels like a very elaborate way of saying that doing O(N) work is not a problem, but doing O(N) network calls is.

  • Rather I think their point is that since O(N) is really X * N, it's not the N that gets you, it's the X.

    • Right — the network database is also doing O(N) work to return O(N) results from one query but the multiplier is much lower because it doesn't include a network RTT.

    • ...and the difference between "a fancy hash table" (in-process SQLite) and doing a network roundtrip is a few orders of magnitude.

  • It being so obvious, why is sqlite not the de facto standard?

    • Isn't SQLite a de facto standard? Seems like it to me. If I want an embedded SQL engine, it is the "nobody got fired for selecting" choice. A competitor needs to offer something very compelling to unseat it.

      1 reply →

    • Partly for the same reason it’s fast for small sites. In their words: “SQLite is not client/server”

    • I haven't investigated this so I might be behind the times, but last I checked remotely managing an SQLite database, or having some sort of dashboarding tool run management reporting queries and the likes, or make a Retool app for it, was very messy. The benefit of not being networked becomes a downside.

      Maybe this has been solved though? Anybody here running a serious backend-heavy app with SQLite in production and can share? How do you remotely edit data, do analytics queries etc on production data?

    • It is for use cases like local application storage, but it doesn't do well in (or isn't designed for) concurrent use cases like any networked services. SQLite is not like the other databases.

  • IMO the page is concise and well written. I wouldn’t call it very elaborate.

    Maybe the page could have been shorter, but not my much.

    • It's inline with what I perceive as the more informal tone of the sqlite documentation in general. It's slightly wordier but fun to read, and feels like the people who wrote it had a good time doing so.

Well, it depends. I vividly remember removing 200 small SQLite queries from a routing algorithm in a mobile app (by moving the metadata into a small in-memory data store instead) and roughly doubling its speed. :-) It was a pretty easy call after seeing sqlite3_step being the top CPU user by a large margin.

The article doesnt make it at all clear what it is comparing to - mysql running remotely or on the same server? I'm sure sqlite still has less "latency" than mysql on localhost or unix socket, but surely not meaningfully so. So, is SQLite really just that much faster at any SELECT query, or are they just comparing apples and oranges?

Or am i mistaken in thinking that communicating to mysql on localhost is comparable latency to sqlite?

  • Even if you're on the same local server, you're still going over a socket to a different service, whereas with sqlite you remain in the same application / address space / insert words I don't fully understand here. So while client/server SQL servers are faster locally than on a remote server, they can (theoretically) never be as fast as SQLite in the same process.

    Of course, SQLite and client/server database servers have different use cases, so it is kind of an apples and oranges comparison.

  • I think they're trying to not shame other services, but yes the comparison is vs networked whether that's local on loopback or not. For a small query, which is what they're talking about, it's not inconceivable that formatting into a network packet, passing through the userspace networking functions, into and through kernel, all back out the other side, then again for the response, is indeed meaningfully slower than a simple function call within the program.

  • Connecting to localhost still involves the network stack and a fair bit of overhead.

    SQLite is embedded in your program's address space. You call its functions directly like any other function. Depending on your language, there is probably some FFI overhead but it's a lot less than than an external localhost connection

  • I think the most common set up is to have your application server and DB on different hosts. That way you can scale each independently.

I do t have time to test myself now, but it would be interesting to see a proper benchmark. We all know it's not suitable for high write concurrency, but SQLite should be a very good amount faster for reads because of the lack of overhead. But how much faster is it really?

  • as an in memory database, I got around 40,000,000 reads per second. Using WAL and a file rather than in memory, around 900,000 reads per second. This is single threaded, for a simple integer ID to string Value query, and a few years old at this point, and only minor config optimizations eg not even using memory mapped io and a ~3gb database with a million or so rows on a Windows machine. The performance really is amazing.

I’ve been experimenting with LiveStoreJS which uses a custom SQLite WASM binary for event sync, so for simplicity I’ve also used it for regular application data in browser and found no issues (yet). It surprised me that using a full database engine in memory could perform well vs native JS objects at scale but perhaps at scale is when it starts to shine. Just be wary of size limits beyond 16-20mb.

Has anyone tried using distributed versions of sqlite, such as rqlite? How reliable is it?

Definitely was something surprising that I discovered when building with Sqlite recently. We're tought to avoid N+1 queries at almost any cost in RDBMs but in Sqlite, the N+1 can actually be the best option in most cases.

I had to build some back-office tools and used Ruby on Rails with SQLITE and didn't bother with doing "efficient" joins or anything. Just index the foreign keys, do N+1s everywhere - you'll be fine. The app is incredibly easy to maintain and add features because of this and the db is super easy to backup - literally just scp the sqlite db file somewhere else. Couldn't be happier with this setup.

  • scp works as long as the app is not making changes at the same time.

    If there's a chance someone is writing to the database during the copy, you should "sqlite3 database.sqlite .backup" (or ".dump") first; Or, alternatively, on a new enough sqlite3, you have a builtin sqlite3_rsync that is like rsync except it interacts with the sqlite3 updates to guarantee a good copy at the other end.

One index scan beats 200 index lookups though surely?

I.e. sometimes one query is cheaper. It is not network anymore.

Also you can run your "big" DB like postgres on the same machine too. No law against that.

  • For analytic queries, yes, a single SQL query often beats many small ones. The query optimizer is allowed to see more opportunities to optimize and avoid unnecessary work.

    Most SQLite queries however, are not analytic queries. They're more like record retrievals.

    So hitting a SQLite table with 200 "queries" is similar hitting a webserver with 200 "GET" commands.

    In terms of ergonomics, SQLite feels more like a application file-format with a SQL interface. (though it is an embedded relational database)

    https://www.sqlite.org/appfileformat.html

  • Depends. Throughput is probably higher, but the latency of a big scan might be larger than a small one, so many small lookups might feel more responsive if they’re each rendered independently. The example on the page doesn’t look like it can be merged into a single scan. I’m not a SQL expert but at a glance it does look like it could maybe be compressed into one or two dozen larger lookups.

  • One query isn't cheaper than two queries that do the same amount of IO and processing and operate in the same memory space

    • Yes, (index) scans are rarely faster typical web apps.

      Unless you have toy amounts data... or doing batch operations which is not typical (and can be problematic for other transactions due to locking, etc...)