← Back to context

Comment by endisneigh

3 years ago

Somewhat related - I’m very very curious to hear a detailed account of someone who uses SQLite for a production app with high traffic.

For the embedded use case I think it’s a slam dunk, but there are many interesting use cases for server side but they all seem to be toyish.

The locking behavior of SQLite is somewhat problematic unless you use WAL and even then not perfect

> there are many interesting use cases for server side but they all seem to be toyish.

> The locking behavior of SQLite is somewhat problematic unless you use WAL and even then not perfect

SQLite with WAL and synchronous configured appropriately will insert a row in ~50uS on NVMe hardware. This is completely serialized throughput (i.e. the highest isolation level available in SQL Server, et. al.). At scale, this can be reasonably described as "inserting a billion rows per day".

I have yet to witness a database engine with latency that can touch SQLite. For some applications, this is a go/no-go difference. Why eschew the majesty of SQL because you can't wait for a network hop? Bring that engine in process to help solve your tricky data problems. I'd much rather write SQL than LINQ if I have to join more than 2 tables.

We've been exclusively using SQLite in production for a long time. Big, multi-user databases. Looking at migrating to SQL Server Hyperscale, but not because SQLite is slow or causing any technical troubles. We want to consolidate our installs into one physical place so we can keep a better eye on them as we grow.

Fun fact: SQL Server "Hyperscale" is capped at 100MB/s on its transaction log. I have written to SQLite databases at rates far exceeding this.

A super smart guy on my team at a previous job replaced around $100,000 of server and SAN hardware used for a(n) (admittedly incredibly absurdly designed, well before our time) analytics system built using MySQL filtered replication triggering stored procedures in this magical Rube Goldberg-ian dystopia with a 3 node Flask app running with 2 cores and 4 GB of RAM each performing the analytics work JIT. The app would use in-memory SQLite3 tables to perform the same work as the stored procedure operations, and cost about 50ms of extra time per request for a feature of the app that was rarely used.

Admittedly, not high traffic like you asked, but one of my favorite uses of SQLite hands down.

Apps aren’t divided into “high-traffic” and “toys.” There are plenty of use cases where you have a low-write server in a production environment, and SQLite would work fine there.

If you need high write volume, then yes, the locking behavior means SQLite is not a good fit.

  • SQLite can easily hit 15k INSERTs per minute or more (setting processor affinity to a single core helps drive the max rate up). However, if a process begins a transaction and then stalls, it halts all dml.

    I think performance can be good, as long as a competent schema design is in place. Allowing ad-hoc queries from less trusted users will surely tank performance.

There are plenty of use cases of SQLite being used in production servers, usually for cache locality of small datasets (think more complex read-focused Redis use cases) or for direct data operations (transforms, for instance).

That being said; even if it weren't usable in a web service space, does that make it any less reasonable of a database? That whole mentality sounds like a web developer centric one. Berkeley DB was used for decades for application and system databases, a field that SQLite largely replaced it in. And one that MySQL, Postgres, Oracle, etc are generally completely unsuited for. It's the same reason Microsoft offered MS Access for so long alongside MSSQL, until MSSQL had it's own decent embeddable option to deprecate it.

I've been pushing to try SQLite as a "sacrificial memozation." Basically we have two tasks separated by 5-10 days. When we do the first task, we calculate a bunch of information as a "side effect". At the second task, we don't have that information and trying to reconstruct it without the original context is very slow, because a lot of it is dependent on temporal state-- what was happening 5-10 days ago.

The other use case I'm eager to explore is as persistence of status data in a long-running process. Occasionally the task blows up halfway and although we can recover the functional changes, we lose the reporting data for the original run. If we save it in a SQLite database instead of just in-script data structures, we don't have to try to reverse engineer it anymore.

In both cases, I like the idea of "everything's one file and we nuke it when we're done" rather than "deal with the hostile operations team to spin up MySQL infrastructure."

Yes, the sqlite defaults are quite terrible out of the box. I'm not sure why they never changed them, it will start choking at 5k inserts where other dbs can do 100x that (and so will sqlite in wal and a few other settings). Getting it to perform well in high traffic scenarios would be a lot of effort. I struggle to get it to be vaguely performant in embedded use cases and often roll my own poor man's version unless I really care about data integrity (which is rare).

  • > I'm not sure why they never changed them

    Because SQLite is run in many different environments and scenarios, and what's terrible in one scenario is perfect for another. There are no defaults that will work for everyone. This also applies to MySQL, PostgreSQL, etc. but the range of scenarios for those is more limited (no embedded for example) so the defaults are a bit more tuned to what's suitable for your scenario.

  • Isn't it more rare to not care about data integrity? Being unsure what state your data is at any point in time does not seem like a safe scenario.

High-volume single-writer + WAL mode is perfectly doable for a lot of applications, but you have to keep the concurrency model in mind. I think of it more like a really advanced data structure library than a "database" in that sense.

An old product I worked on (which is still around) used (still uses?) SQLite for storing filesystem blocks that came from the paired backup software. A single database only contained the blocks backed up from a single disk (based on its disk UUID.) So, this is a perfect scenario since there's only one writer at a time because there's only one device and one backup on that device at a time. We could write blocks to SQLite fast enough to saturate the IOPS of the underlying device and filesystem containing the database. This was over 10 years ago. Very practical and durable and way better than anything we could do in house.

Today, you can definitely do GB/s on NVMe class hardware with the right pragma settings and the right workload. So for certain classes of multi-tenant solutions I think it's not so bad, you can just naturally have a single writer to a single data store (or can otherwise linearize writes) and the raw performance will be excellent and more than enough.