← Back to context

Comment by kstrauser

1 day ago

I’m not sure that’s it either. PostgreSQL has a feature — don’t remember what it’s called — where multiple readers can share a serial table scan.

Suppose client A runs “select * from foo”, which has a thousand records. It can start streaming those results starting with row 1. Now suppose it’s on row 500 when client B runs the same query. Instead of starting over for B, it can start streaming results to B starting at row 501. Each time it reads a row, now it sends that to both clients.

Now when it finishes with row 1000, client A’s query is done. It starts back over with B on row 1 and continues through row 500.

Hypothetically, you can serve N clients with a total of 2 table scans if they all arrive before the first client’s scan is finished.

So that’s the kind of magic where I think this is going to shine. Queue up a few queries and it’s likely that several will be able to share the same underlying work.

That literally isn’t what pipelining is about in general nor is it relevant to this benchmark which is an insertion workload. The performance benefit observed literally is the ability to start executing the second request even though the ACK for the first one hasn’t fully ACK’ed.

It’s also not true pipelining since you can’t send a follow up request that depends on the results of the previous incomplete request (eg look at capnproto promise pipelining). As such the benefit in practice is actually more limited, especially if instead here you use connection pooling and send the requests over different connections in the first place - I’d expect very similar performance numbers for the benchmark assuming you have enough connections open in parallel to keep the DB busy.