Comment by cyanmagenta

14 days ago

There is some risk that, if you design your website to use a local database (sqlite, or a traditional database over a unix socket on the same machine), then switching later to a networked database is harder. In other words, once you design a system to do 200 queries per page, you’d essentially have to redesign the whole thing to switch later.

It seems like it mostly comes down to how likely it is that the site will grow large enough to need a networked database. And people probably wildly overestimate this. HackerNews, for example, runs on a single computer.

The thing is sqlite can scale further vertically than most network databases. In some context's like writes and interactive transactions it outright scales further. [1]

That's before you even get into sharding sqlite.

[1] - https://andersmurphy.com/2025/12/02/100000-tps-over-a-billio...

  • Sqlite isn't the part that needs to scale in most cases, though. As soon as you need multiple servers to handle the traffic you're getting (serializing data, concatenating strings for HTML, lots of network throughout, or even just handling amounts of data that press you up against your memory limit), you're probably not going to have a great time with sqlite. Having multiple boxes talk to the same sqlite file is not something I've ever seen anyone do well at scale.

    Yes, you can get by with one box for probably quite a while. But eventually a service of any significant size is going to need multiple boxes. Hell, even just having near-zero downtime deployments essentially requires it. Vertically scaling is generally a whole lot less cost effective than horizontal scaling (for rented servers), especially if your peak usage is much higher than off-hours use.

    • I'd argue the opposite vertical scaling us a whole lot more effective than horizontal scaling if your using a language that has both real threads and green/virtual threads (go or anything on the JVM). You get such insane bang for your buck these days even over provisioning is cheap. Hell direct NVME can easily give 10-100x vs the crappy network drives AWS provides.

      Zero downtime deploys have been solved for single machines. But, even then I'd argue most businesses can have an hour of downtime a month. I mean that's the same reliability as AWS these days.

      Really, there are a handful of cases where you need multiple servers:

      - You're network limited (basically you're a CDN).

      - You are drive limited you need to get data off dirves faster than their bandwidth.

      - Some legal requirement.

      This is before we get into how trivial it is to shard sqlite by region or customer company. You can even shard sqlite on the same machine if you need higher write throughput.

  • Is Postgres with "no network" running over a unix socket or an IP socket on the same machine?

    • Yes unix socket using the java 16 socket channels. Interestingly there was only a 5-10% improvement vs IP sockerts (with no ssl).

The same is true for regular databases though, isn't it?

Network adds latency and while it might be fine to run 500 queries with the database being on the same machine, adding 1-5ms per query makes it feel not okay.

  • > adding 1-5ms per query makes it feel not okay

    Or going from ~1ms over a local wired network to ~10ms over a wireless network.

    Had a customer performance complaint that boiled down to that, something that should take minutes took hours. Could not reproduce it internally.

    After a lot of back abd forth I asked if the user machine was wired. Nope, wireless laptop. Got them to plug in like their colleagues and it was fast again.

  • Yes, that is why I said “local database (sqlite, or a traditional database over a unix socket on the same machine).”

    This isn’t an sqlite-specific point, although sqlite often runs faster on a single machine because local sockets have some overhead.

Most of us, majority of the time, don’t need that level of optimization, because not every project is destined to grow 10x quickly.

LLM also has this tendency of premature optimization where they start to write very complex classes for users who only want to extract some information just to resolve a quick problem.

I don't see how anyone would design a system that executes 200 queries per page. I understand having a system that is ín use for many many years and accumulates a lot of legacy code eventually ends up there, but designing? Never. That's not design, that's doing a bad job at design.

  • > I don't see how anyone would design a system that executes 200 queries per page.

    They call it the n+1 problem. 200 queries is the theoretically correct approach, but due to high network latency of networked DMBSes you have to hack around it. But if the overhead is low, like when using SQLite, then you would not introduce hacks in the first place.

    The parent is saying that if you correctly design your application, but then move to system that requires hacks to deal with its real-world shortcomings, that you won't be prepared. Although I think that's a major overstatement. If you have correctly designed the rest of your application too, introducing the necessary hacks into a couple of isolated places is really not a big deal at all.

    • I'd point to the difference between vector-based vs scalar-based systems in numerics. If your web programming language is more like MATLAB or APL than PHP than maybe it can naturally generate the code to do it all with sets. As it is we are usually writing set-based implementations in scalar-based languages.

      Part of the "object-relational mapping" problem has always been that SQL is superior to conventional programming languages in many ways.

      1 reply →