Comment by cyanmagenta

15 days ago

There is some risk that, if you design your website to use a local database (sqlite, or a traditional database over a unix socket on the same machine), then switching later to a networked database is harder. In other words, once you design a system to do 200 queries per page, you’d essentially have to redesign the whole thing to switch later.

It seems like it mostly comes down to how likely it is that the site will grow large enough to need a networked database. And people probably wildly overestimate this. HackerNews, for example, runs on a single computer.

23 comments

cyanmagenta

andersmurphy 15 days ago

The thing is sqlite can scale further vertically than most network databases. In some context's like writes and interactive transactions it outright scales further. [1]

That's before you even get into sharding sqlite.

[1] - https://andersmurphy.com/2025/12/02/100000-tps-over-a-billio...

bastawhiz 13 days ago
Sqlite isn't the part that needs to scale in most cases, though. As soon as you need multiple servers to handle the traffic you're getting (serializing data, concatenating strings for HTML, lots of network throughout, or even just handling amounts of data that press you up against your memory limit), you're probably not going to have a great time with sqlite. Having multiple boxes talk to the same sqlite file is not something I've ever seen anyone do well at scale.
Yes, you can get by with one box for probably quite a while. But eventually a service of any significant size is going to need multiple boxes. Hell, even just having near-zero downtime deployments essentially requires it. Vertically scaling is generally a whole lot less cost effective than horizontal scaling (for rented servers), especially if your peak usage is much higher than off-hours use.
- andersmurphy 13 days ago
  
  I'd argue the opposite vertical scaling us a whole lot more effective than horizontal scaling if your using a language that has both real threads and green/virtual threads (go or anything on the JVM). You get such insane bang for your buck these days even over provisioning is cheap. Hell direct NVME can easily give 10-100x vs the crappy network drives AWS provides.
  Zero downtime deploys have been solved for single machines. But, even then I'd argue most businesses can have an hour of downtime a month. I mean that's the same reliability as AWS these days.
  Really, there are a handful of cases where you need multiple servers:
  - You're network limited (basically you're a CDN).
  - You are drive limited you need to get data off dirves faster than their bandwidth.
  - Some legal requirement.
  This is before we get into how trivial it is to shard sqlite by region or customer company. You can even shard sqlite on the same machine if you need higher write throughput.
graemep 14 days ago
Is Postgres with "no network" running over a unix socket or an IP socket on the same machine?
- andersmurphy 14 days ago
  
  Yes unix socket using the java 16 socket channels. Interestingly there was only a 5-10% improvement vs IP sockerts (with no ssl).

luckylion 15 days ago

The same is true for regular databases though, isn't it?

Network adds latency and while it might be fine to run 500 queries with the database being on the same machine, adding 1-5ms per query makes it feel not okay.

magicalhippo 15 days ago
> adding 1-5ms per query makes it feel not okay
Or going from ~1ms over a local wired network to ~10ms over a wireless network.
Had a customer performance complaint that boiled down to that, something that should take minutes took hours. Could not reproduce it internally.
After a lot of back abd forth I asked if the user machine was wired. Nope, wireless laptop. Got them to plug in like their colleagues and it was fast again.
- fwip 15 days ago
  
  If you have the ability to batch that communication, you could probably get those minutes down to seconds.
  
  2 replies →
cyanmagenta 15 days ago

Yes, that is why I said “local database (sqlite, or a traditional database over a unix socket on the same machine).”
This isn’t an sqlite-specific point, although sqlite often runs faster on a single machine because local sockets have some overhead.

itopaloglu83 15 days ago

Most of us, majority of the time, don’t need that level of optimization, because not every project is destined to grow 10x quickly.

LLM also has this tendency of premature optimization where they start to write very complex classes for users who only want to extract some information just to resolve a quick problem.

63stack 15 days ago

I don't see how anyone would design a system that executes 200 queries per page. I understand having a system that is ín use for many many years and accumulates a lot of legacy code eventually ends up there, but designing? Never. That's not design, that's doing a bad job at design.

9rx 15 days ago
> I don't see how anyone would design a system that executes 200 queries per page.
They call it the n+1 problem. 200 queries is the theoretically correct approach, but due to high network latency of networked DMBSes you have to hack around it. But if the overhead is low, like when using SQLite, then you would not introduce hacks in the first place.
The parent is saying that if you correctly design your application, but then move to system that requires hacks to deal with its real-world shortcomings, that you won't be prepared. Although I think that's a major overstatement. If you have correctly designed the rest of your application too, introducing the necessary hacks into a couple of isolated places is really not a big deal at all.
- PaulHoule 15 days ago
  
  I'd point to the difference between vector-based vs scalar-based systems in numerics. If your web programming language is more like MATLAB or APL than PHP than maybe it can naturally generate the code to do it all with sets. As it is we are usually writing set-based implementations in scalar-based languages.
  Part of the "object-relational mapping" problem has always been that SQL is superior to conventional programming languages in many ways.
  
  1 reply →
ctxc 15 days ago

Sounds a bit like me, reading the comments before the article!
anamexis 15 days ago

Did you read the OP?

Kinrany 15 days ago

There's also the alternative of having a cluster with one local DB in each node

direwolf20 15 days ago
Then you have massive synchronization problems if your data isn't almost read–only.
- Gabrys1 15 days ago
  
  if your data isn't mostly read-only, then you're going to have an issue with SQLite. It doesn't nicely support parallel writers
- CuriouslyC 15 days ago
  
  Not if you're sharding correctly.
  
  1 reply →