How many HTTP requests/second can a single machine handle? (2024)

1 month ago (binaryigor.com)

From the article: “Huge machine - 8 CPUs, 16 GB of memory”

That’s barely more than a raspberry pi? (4 vs 8 cores) Huge machines today have 20+ TBs of RAM and hundreds of cores. Even top-end consumer machines can have 512GB of RAM!

I do agree with the author that single machines can scale far beyond what most orgs / companies need, but I think they may be underestimating how far that goes by orders-of-magnitude

  • A large number of issues on an open source server are people wondering why perf is so bad when they give it a single core. Single core performance hasn’t improved much in the last 10-15 years, but more and more of them can be accessed. It blows me away how expensive they are that people need to worry about it.

    • Intel's single core performance has 3.4xed in 15 years (980X vs 285K)

      Single core perf doubled every 8 years, multicore every 6 years, and GPUs every 3 years !

  • True :) I did it on purpose to show that even with these modest resources you can achieve amazing performance - better than most systems would ever need

  • You can spend less than double, what Digitalocean charges for 8 cores, at Hetzner and get ten times more cores on a single machine

Well, you have to understand what you're testing.

With a test like this, you're really testing two different things:

1. How fast your database is,

2. How fast your frontend is

Since the query is simple, your frontend is basically a DB access layer and should be taking no time. And since the table is indexed the query should also take no time.

The only other interesting question is if the database can handle the number of connections and the storage is. The app is using connection pools, but the actual size of the database machine is never mentioned...which is a problem. How big is the DB instance? A small instance could be crushed with 80 connections. A database on a hard drive may not be able to handle the load either (though since the data volume is small, it could be that everything ends up cached anyway).

So this is sort of interesting, but sort of not interesting.

  • It's all described in the blog post and there is a link to the source code as well :)

    Both the app and db are hosted on the same machine - they are sharing resources. This fact, type of storage and other details of the setup are contained in this section: https://binaryigor.com/how-many-http-requests-can-a-single-m...

    I think you're right that I didn't mention the details of the db connection pool; they are here: https://github.com/BinaryIgor/code-examples/blob/master/sing...

    Long story short, there's a Hikari Connection Pool with initial 10 connections, resizable to 20.

    • 60000 is near the theoretical max of a 5 tupple with all but the client port being fixed. If you are going to test with this many connections per client you are hopefully using multiple IPs per client or multiple server IPs.

  • Postgres with unmodified default settings can handle thousands of requests like that per second on relatively small hardware. The connection pool is a potential bottleneck, but one you should be able to avoid. I think the default limit for Postgres would be something like 100 connections, that's plenty with a pool in front of it.

A single machine can handle much, much more if you use sqlite and batch updates/inserts.

Honestly, unless you're bandwidth/uplink limited (e.g running a CDN) then a single machine will take you really far.

Also simpler systems tend to have better uptime/reliability. Doesn't get much simpler than a single box.

  • On my pretty modest dev machine with 12 CPUs, I once managed to achieve 14k RPS with Go+SQLite in a write+read test on a real project I was developing (it used a framework so there was also some overhead due to all the abstractions). I didn't even batch anything. The only problem was, I quickly found that SQLite's WAL checkpointer couldn't keep up with the write rates, the WAL file quickly grew to 100s of GBs (this is actually a known issue and is mentioned in their docs), so I had to add a special goroutine to monitor the size of the WAL file and force checkpointing manually when it got too big.

    So when people say 1k is "highload" and requires a whole cluster, I'm not sure what to think of it. You can squeeze so much more out of a single fairly modest machine.

    • Sqlite has some sharp edges for sure honestly even basic batching all inserts/updates in a transaction every 100ms will get you to 30000+ updates a second on a 4 core shared CPU VPS (assuming nvme drives).

      That's the other thing AWS tends to have really dated SSDs.

      Honestly, it's like the industry has jumped the shark. 1k is not a lot of load. It's like when people say single writer means you can't be performant, it's the opposite most of the time single writer lets you batch and batching is where the magic happens.

Sometimes it confuses me how much we are just sort of treading water on the server performance front: the C10K problem was solved in 1999. WhatsApp was hosting a million TCP connections per box in 2011.

It is not all that hard to hit 10k requests/second on modern hardware. 100k requests/second is achievable with some careful technology choices.

I applaud the author’s curiosity but hope they realize this is like comparing the 0-60 performance of a Cadillac SUV vs a Ford Excursion.

A low end ARM processor(like a raspberry pi) can crank out 1000 requests a second with a CGI program handing the requests — using a single CPU core. Of course this doesn’t happen by with traditional CGI. (Actual performance with traditional CGI will be more like 20-50/s or worse).

Like the stereotypical drivers of such vehicles, the industry has become so fat and stupid that an x86 system handling 500 requests/sec actually sounds impressive. Sadly, considering the bloated nature of modern stacks, it kinda is.

  • True :) My main motivation was to at least realistically move us into a right (simpler) direction - from currently still popular microservices architectures deployed to multi-machine Kubernetes clusters to handle, on average, 5 req/s

database on the same machine as the application server, RPS limits enforced via

            var issuedRequests = i + 1;
            if (issuedRequests % REQUESTS_PER_SECOND == 0 && issuedRequests < REQUESTS) {
                System.out.println("%s, %d/%d requests were issued, waiting 1s before sending next batch..."
                    .formatted(LocalDateTime.now(), issuedRequests, REQUESTS));
                Thread.sleep(1000);
            }

don't take any conclusions away from this post, friends

  • That's by intention, I wanted to test REQUESTS_PER_SECOND max, in every test case.

    Same with db - I wanted to see, what kind of load a system (not just app) deployed to a single machine can handle.

    It can be obviously optimized even further, I didn't try to do that in the article

    • Based on that code snippet, and making some (possibly unjustified) assumptions about the rest of the code, your actual request rate could be as low as 50% of your claimed request rate:

      Suppose it takes 0.99s to send REQUESTS_PER_SECOND requests. Then you sleep for 1s. Result: You send REQUESTS_PER_SECOND requests every 1.99s. (If sending the batch of requests could take longer than a second, then the situation gets even worse.)

      The issue GP has with app and DB on the same box is a red herring -- that was explicitly the condition under test.

      2 replies →

    • i mean the details are far beyond what can be effectively communicated in a HN comment but if your loadgen tool is doing anything like sleep(1000ms) it is definitely not making any kind of sound request-per-second load against its target

      and, furthermore, if the application and DB are co-located on the same machine, you're co-mingling service loads, and definitely not measuring or capturing any kind of useful load numbers, in the end

      tl;dr is that these benchmarks/results are ultimately unsound, it's not about optimization, it's about validity

      if you want to benchmark the application, then either you (a) mock the DB at as close to 0 cost as you can, or (b) point all application endpoints to the same shared (separate-machine) DB instance, and make sure each benchmark run executes exactly the same set of queries against against a DB instance that is 100% equivalent to the other runs, resetting in-between each run

      2 replies →

This entire post could be 3 paragraphs of test conditions, 2 paragraphs of narrative and one graph. IT would have been more informative to boot.

A picture would have been worth quite a bit more than a thousand words.

A toy example but it's an interesting read nonetheless. We also host our monolith app on a few bare metal machines (considerably beefier than the example however) and it works well, although the app does considerably more queries (and more complex queries) than this. Transaction locking issues are our bane though.

  • How many queries do you usually handle? Why a few? One doesn't suffice? What resources do they have?

personally I use cloudflare workers not because 1 host couldn't handle the traffic (it could), but the maintenance is a breeze

obviously at high load (1k TPS+) talking in servers is way cheaper than serverless, so the tradeoff can start to swing

> External volume for the database - it does not write to the local file system (we use DigitalOcean Block Storage)

Is this common? Why not use the local filesystem? Actually, I thought that using anything else beyond the local filesystem for the database is a no-no. Am I missing something?

  • Databases on cloud providers are usually not on file systems local to the instance because local instances are meant to fail at any time.

    Block storage is meant to be reliable, so databases go there. Yes it's slower but you don't lose data.

    Generally, the only time you want a local database in the cloud is if it's being used for short-lived data meaningful only to that particular instance in time.

    Or it can work if your database rarely changes and you make regular backups that are easy to revert to, like for a blog.

    • Databases have tools to work with storage or servers that can fail. You would need to use replication between multiple database servers and a backup method to some other storage.

      Databases with high availability and robust storage were possible before the cloud.

      1 reply →

I always find that my regular crud apps kind of grow into something not-so-cruddy due to a single feature (realtime communication, bursty usage profile, large batch jobs to precompute something to expensive to do at request time) and the architecture just explodes from there. s

also, it always feels like I need a second instance at the very least for redundancy, but then we have to ensure they're stateless and that batch jobs are sharded across them (or only run on one), and again we hit an architecture explosion. Wish that I was more comfortable just dropping a single spring boot instance on a vm and calling it a day; spring boot has a lot of bells and whistles and you can get pretty far without the architecture explosion but it is almost inevitable

  • One of the reasons I have completely dropped interpreted languages (Python, Java, JS) and followed the "back to compiled software" hype. I am now writing my software purely in Go and Rust and again using pipes, queues and temp storage, to connect smaller programs, tools, services. The Unix philosophy was revolutionary and (for me) the ultimate solution for software organisation. But, one never knows what he had, until he loses it, so I treat the few years of experimentation with interpreted alternatives, as a positive.

Load Testing: how many HTTP requests/second can a Single Machine while doing <insert thing/things> handle?

> very_high_load: 4000 requests per second - 4 machines x 1000 RPS

This is an incredibly naive article.

Another way to do that is to look at the tech empower benchmarks. It tests on big machines, but you can get > 1 M req/sec across a wide variety of environments.

Needs (2024)

  • Needs (2000) really. This contrived test is using tiny VPSes (even the "big" machine is tiny), slow network-mounted DB storage, nothing like a production stack that a real API server would use. Bespoke simple profiling mechanism. Nothing wrong with OP learning the basics and experimenting, but there's nothing of value in the findings.

    • Regarding the machine's sizes - I did it on purpose, to showcase that even with this limited resources, you can still achieve way better performance than most systems will ever need.

      I know that you can have significantly bigger machines; network-mounted DB storage on the other hand is not slow - it's designed specifically for these kind of use cases

      1 reply →