Comment by vbezhenar

17 days ago

In a properly optimized database absolute majority of queries will hit indices and most data will be in memory cache, so majority of transactions will be CPU or RAM bound. So increasing number of concurrent transactions will reduce throughput. There will be few transactions waiting for I/O, but if majority of transactions are waiting for I/O, it's either horrifically inefficient database or very non-standard usage.

5 comments

vbezhenar

CodesInChaos 17 days ago

Your arguments make sense for concurrent queries (though high-latency storage like S3 is becoming increasingly popular, especially for analytic loads).

But transactions aren't processing queries all the time. Often the application will do processing between sending queries to the database. During that time a transaction is open, but doesn't do any work on the database server.

vbezhenar 17 days ago
It is bad application architecture. Database work should be concentrated in minimal transactional units and connection should be released between these units. All data should be prepared before unit start and additional processing should take place after transaction ended. Using long transactions will cause locks, even deadlocks and generally should be avoided. That's my experience at least. Sometimes business transaction should be split into several database transaction.
- CodesInChaos 16 days ago
  
  I'm talking about relatively short running transactions (one call to the HTTP API) which load data, process it in the application, and commit the result to the database. Even for those a significant portion of the time can be spent in the application, or communication latency between db and application. Splitting those into two shorter transactions might sometimes improve performance a bit, but usually isn't worth the complexity (especially since that means the database doesn't ensure that the business transaction is serializable as a whole).
  For long running operations, I usually create a long running read-only transaction/query with snapshot consistency (so the db doesn't need to track what it reads), combined with one or more short writing transactions.
- namibj 17 days ago
  
  Your database usage should not involve application-focused locks, MVCC will restart your transaction if needed to resolve concurrency.

namibj 17 days ago

If you aren't hitting IO (I don't mean HDDs) on a large fraction of queries you either skipped a cache in front of the DB or your data is very small or you spent too much on RAM and too little on your NVMe being not a bottleneck.