Comment by daitangio

6 hours ago

I am using SQLite on paperless-ngx (an app to manage pdf [4]). It is quite difficult to beat SQLite if you do not have a very huge parallelism factor in writes.

SQLite is an embedded database: no socket to open, you directly access to it via file system.

If you do not plan to use BigData with high number of writers, you will have an hard time beating SQLite on modern hardware, on average use cases.

I have written a super simple search engine [1] using python asyncio and SQLite is not the bottleneck so far.

If you are hitting the SQLite limit, I have an happy news: PostgreSQL upgrade will be enough for a lot of use cases [2]: you can use it to play with a schemaless mongo-like database, a simple queue system [3] or a search engine with stemming. After a while you can decide if you need a specialized component (i.e. Kafka, Elastic Search, etc) for one of your services.

[1]: https://github.com/daitangio/find

[2]: https://gioorgi.com/2025/postgres-all/

[3]: https://github.com/daitangio/pque

[4]: https://docs.paperless-ngx.com

The pattern I like to advocate for now is to do customer sharding with SQLite. Cloudflare makes this easy with D1, you can tie Durable Objects to a user as an afterthought.

The nice thing about this pattern is that you can create foreign data wrappers for your customer SQLite databases and query them as if they were in postgres, cross customer aggregations are slow but individual customer analytics are quite fast, and this gives you near infinite scalability.

You hit those write limits surprisingly early if you use background workers though. I had a project with very little user traffic that choked on SQLite simply because a few Celery workers were updating job statuses concurrently. It wasn't the volume of data, just the contention from the workers that forced the switch to Postgres.

  • Are you sure it is choked on writes not on reads and writes? SQLite default setup is inefficient in many ways (as well as it's default compilation options), and that often cause issues.

    (I am just asking: are you sure WAL is on?)

  • SQLite emphatically warns against concurrent writes. It is not designed for that.

    I'm seeing these numbers on my current scratch benchmark:

    - Events append to a 10M+ record table (~4+ GiB database).

    - Reads are fetched from a separate computed table, which is trigger-updated from the append-only table.

    - WAL-mode ON, Auto-vacuum ON

      {:dbtype "sqlite",
       :auto_vacuum "INCREMENTAL",
       :connectionTestQuery "PRAGMA journal_mode;",
       :preferredTestQuery "PRAGMA journal_mode;",
       :dataSourceProperties
       {:journal_mode "WAL",
        :limit_worker_threads 4,
        :page_size 4096,
        :busy_timeout 5000,
        :enable_load_extension true,
        :foreign_keys "ON",
        :journal_size_limit 0,
        :cache_size 15625,
        :maximumPoolSize 1,
        :synchronous "NORMAL"}},
    

    - 1,600 sequential (in a single process) read-after-write transactions, append-only, no batching.

    - With a separate writer process (sequential), and concurrently, two reader processes, I'm seeing 400+ append transactions/second (into the append-only table, no batching), and a total of 41,000 reads per second, doing `select *` on the trigger-updated table.

    My schema is (deliberately) poor --- most of it is TEXT.

    (edit: add clarifying text)