Comment by andersmurphy

5 hours ago

> for inserts only into singe table with

Actually, there are no inserts in this example each transaction in 2 updates with a logical transaction that can be rolled back (savepoint). So in raw terms you are talking 200k updates per second and 600k reads per second (as there's a 75%/25% read/write mix in that example). Also worth keeping in mind updates are slower than inserts.

> no indexes.

The tables have an index on the primary key with a billion rows. More indexes would add write amplification which would affect both databases negatively (likely PG more).

> Also, I didn't get why sqlite was allowed to do batching and pgsql was not.

Interactive transactions [1] are very hard to batch over a network. To get the same effect you'd have to limit PG to a single connection (deafeating the point of MVCC).

- [1] An interactive transaction is a transaction where you intermingle database queries and application logic (running on the application).

7 comments

andersmurphy

andriy_koval 4 hours ago

Thank you for clarification, I was wrong in my prev comment.

> - [1] An interactive transaction is a transaction where you intermingle database queries and application logic (running on the application).

could you give specific example why do you think SQlite can do batching and PG not?

hedora 3 hours ago
Not the person you are responding to, but sqlite is single threaded (even in multi process, you get one write transaction at a time).
So, if you have a network server that does BEGIN TRANSACTION (process 1000 requests) COMMIT (send 1000 acks to clients), with sqlite, your rollback rate from conflicts will be zero.
For PG with multiple clients, it’ll tend to 100% rollbacks if the transactions can conflict at all.
You could configure PG to only allow one network connection at a time, and get a similar effect, but then you’re paying for MVCC, and a bunch of other stuff that you don’t need.
- andriy_koval 3 hours ago
  
  In your example, clients can't have their own transactions? You commit/rollback all requests for all 1000 clients together?
  
  2 replies →
andersmurphy 3 hours ago
An interactive transaction works like this in pseudo code.
beginTx
// query to get some data (network hop) result = exec(query1) // application code that needs to run in the application safeResult = transformAndValidate(result) // query to write the data (network hop) exec(query2, safeResult)
endTx
How would you batch this in postgres and get any value? You can nest them all in a single transaction. But, because they are interactive transactions that doesn't reduce your number of network hops.
The only thing you can batch in postgres to avoid network hops is bulk inserts/updates.
But, the minute you have interactive transactions you cannot batch and gain anything when there is a network.
Your best bet is to not have an interactive transaction and port all of that application code to a stored procedure.
- andriy_koval 2 hours ago
  
  > How would you batch this in postgres and get any value? You can nest them all in a single transaction. But, because they are interactive transactions that doesn't reduce your number of network hops.
  you can write it as stored procedure in your favorite language, or use domain socket where communication happens using through shared memory buffs without network involved.
  In your post, I think big performance hit for postgres potentially comes from focus on update only statement, in SQlite updates likely happen in place, while postgress creates separate record on disk for each updated record, or maybe some other internal stuff going on.
  Your benchmark is very simplistic, it is hard to tell what would be behavior of SQlite if you switch to inserts for example, or many writers which compete for the same record, or transaction would be longer. Industry built various benchmarks for this, tpc for example.
  Also, if you want readers understand your posts better, you can consider using less exotic language in the future. Its hard to read what is and how is batched there.