← Back to context

Comment by WilcoKruijer

1 day ago

I feel pipelines (or batches) are slept upon. So many applications use interactive transactions to ‘batch’ multiple queries, waiting for the result of each individual query. Network roundtrip is the biggest contributor to latency in most applications, and this makes it so much worse. Most Postgres drivers don’t even support batching, at least in the JavaScript world.

In many cases it would be good to forego interactive transactions and instead execute all read-only queries at once, and another write batch after doing processing on the obtained data. That way, the amount of roundtrips is bounded. There are some complications of course, like dealing with concurrency becomes more complicated. I’m currently prototyping a library exploring these ideas.

Batching in general is slept upon. So many queue systems support batch injection, and I have seen countless cases where a poorly performing system is “fixed” simply by moving away from incremental injection. This stuff is usually on page two of the docs, which explains why it’s so overlooked…

  • My guess is that this is because our default way of expressing code execution is the procedure call, meaning the default unit of code that we can later is the procedure, which needs to execute synchronously. That's what our programming languages support directly, and that's just how "things are done".

    Everything else both feels weird and also truly is awkward to express because our programming languages don't really allow us to express it well. And usually by the time we figure out that we need a more reified, batch-oriented mechanism. (the one on page 2) it is too late, the procedural assumptions have been deeply baked into the code we've written so far.

    See Can programmers escape the gentle tyranny of call/return? by yours truly.

    https://news.ycombinator.com/item?id=45367519

    • This analysis makes sense to me, but at the same time: we’re already switching between procedural and declarative when switching from [mainstream language] to SQL. This impedance mismatch (or awkwardness) is already there, might as well embrace it.

      3 replies →

I would expect most drivers to support (anonymous) stored procedures so you can batch/pipeline multiple queries into one statement to be executed by the database. Probably more a problem of developers not knowing how to use databases properly, not so much a limitation of technology.

  • People don't do that because when you're writing insert/update queries, you tend to want to write logic based on the value of intermediate results, and also you can't return tabular data from a DO block (they operate as a function returning void).

    You also can't use parameterized values like $1, $2.

    It seems more niche than you're suggesting. Though I wish people would write app layer pseudocode to demonstrate what they are referring to.

I have started to use batching with the Go pgx driver for simple transactions of multiple inserts. Since a batch is automatically a transaction, it’s actually fewer lines of code.

Most of my big clients have about 10 intermediaries between them and the data: the antivirus, the browser, the VPN, the company proxy, the API gateway, their authentication layer, the virtualization layer, the application server, the microservice it requests and whatever data source this one requests.

So unless you are a lean startup, the reasons many products are horribly slow are very low hanging fruits no body are ever going to bother picking.

If you ever reach the time where pipelining is giving you a boost in perf, your app was already in a nice state.

It's so nice to be able to code on a baremetal server where my monolith has directly access to my postgres instance on my personal projects.