Comment by jwr
21 hours ago
I don't get this scalability craze either. Computers are stupid fast these days and unless you are doing something silly, it's difficult to run into CPU speed limitations.
I've been running a SaaS for 10 years now. Initially on a single server, after a couple of years moved to a distributed database (RethinkDB) and a 3-server setup, not for "scalability" but to get redundancy and prevent data loss. Haven't felt a need for more servers yet. No microservices, no Kubernetes, no AWS, just plain bare-metal servers managed through ansible.
I guess things look different if you're using somebody else's money.
It's not about scalability. It's about copying what the leaders in a space do, regardless of whether it makes sense or not. It's pervasive in most areas of life.
One of the silliest things you can do to cripple your performance is build something that is artificially over distributed, injecting lots of network delays between components, all of which have to be transited to fulfill a single user request. Monoliths are fast. Yes, sometimes you absolutely have to break something into a standalone service, but that’s rare.
I've notice a strong correlation between artificially over-distributing, and not understanding things like the CAP theorem. So, you end up with a slow system that's added a bunch of unsolvable distributed systems problems on its fast path.
(Most distributed systems problems are solvable, but only if the person that architected the system knows what they're doing. If they know what they're doing, they won't over-distribute stuff.)
Yes, that too. If you look at the commits for Heisenbugs associated with the system, you have a good chance of seeing artificial waits injected to “fix” things.
You can solve just about any distributed systems problem by accepting latency, but nobody wants to accept latency :)
...despite the vast majority of latency issues being extremely low-hanging fruit, like "maybe don't have tens of megabytes of data required to do first paint on your website" or "hey maybe have an index in that database?".
1 reply →
There's no need to deploy separate service on separate machines.
I ran a SaaS for 10 years. Two products. Profitable from day 1 as customers paid $500/month and it ran on a couple of EC2 instances as well as a small RDS database.
Another thing one has to consider is the market size and timeframe window of your SaaS. No sense in building for scalability if the business opportunity is only 100 customers and only for a few years.
> unless you are doing something silly, it's difficult to run into CPU speed limitations.
Yes, but it's not difficult to do something silly without even noticing until too late. Implicitly (and unintentionally) calling something with the wrong big-O, for example.
That said, anyone know what's up with the slow deletion of Safari history? Clearly O(n), but as shown in this blog post still only deleted at a rate of 22 items in 10 seconds: https://benwheatley.github.io/blog/2025/06/19-15.56.44.html
>> Yes, but it's not difficult to do something silly without even noticing until too late. Implicitly (and unintentionally) calling something with the wrong big-O, for example.
On a non-scalable system you're going to notice that big-O problem and correct it quickly. On a scalable system you're not going to notice it until you get your AWS bill.
Also, instead of having a small team of people to fight scalable infrastructure configuration, you could put 1-2 full time engineers on performance engineering. They'd find big-O and constant factor problems way before they mattered in production.
Of course, those people's weekly status reports would always be "we spent all week tracking down a dumb mistake, wrote one line of code and solved a scaling problem we'd hit at 100x our current scale".
That's equivalent to waving a "fire me" flag at the bean counters and any borderline engineering managers.
For how many users, and at what transaction rate?
Not disagreeing that you can do a lot on a lot less than in the old days, but your story would be much more impactful with that information. :)
Scalability isn't just about CPU.
It's just as much about storage and IO and memory and bandwidth.
Different types of sites have completely different resource profiles.
Microservice is not a solution for scalability. There are multiple options for building scalable software, even a monolith or a modular monolith with proper loadbalanced setup will drastically reduce the complexity of microservice and get massive scale. Only bottleneck will be db.
Microservices take an organizational problem:
The teams don't talk, and always blame each other
and adds distributed systems and additional organizational problems:
Each team implements one half of dozens of bespoke network protocols, but they still don't talk, and still always blame each other. Also, now they have access to weaponizable uptime and latency metrics, since because each team "owns" the server half of one network endpoint, but not the client half.
Is it about scalability, or about resiliency ?
Or is it about outsourcing problems?
There's a lot of off the shelf microservices that can solve difficult problems for me. Like keycloak for user management. Isn't that a good reason?
Or Grafana for log visualization?
Should I build that into the monolith too? Or should I just skip it?