Comment by branko_d

1 day ago

I never quite understood these kinds of arguments, event though they seem to be repeated a lot:

> 1. Many critical APIs call would spend most of its CPU time inside the finite database servers. New APIs, or updating existing APIs meant you were hogging the DB/CPU time from every one else.

It's not that easy for a stored procedure to spend CPU on anything other than data processing. If you are going to do that data processing, then it's going to cost this amount of CPU regardless of whether it is initiate internally (through a stored procedure) or externally (by the client submitting queries). Furthermore, a frequently-used stored procedure's plan is likely already cached, so it doesn't have to be re-parsed and re-planned on each call. On the other hand, when the client submits the text of the query, that text has to at least be hashed, so it can be compared to the hashes in the query plan cache (even if the plan is already in the cache). So if anything, client-submitted queries will consume at least as much CPU as stored procedures, if not more.

> 2. API latency was at the mercy of the Query Planner. Any slight change in data statistics or business logic would cause the a different Query Plan to be picked, which drastically changed the execution time, and usually caused timeouts. A hash join suddenly became a full table scan across billions of rows without any warning.

This is also true for client-submitted queries. Basically the same techniques are useful for both stored procedures an client-submitted queries: query hints and plan guides, and above all: sound indexing. That being said, this is a big problem in practice and, I think, a fundamental problem in the design of SQL as "declarative" language where you are not supposed to concern yourself with the physical access path, even though it could make a big difference in performance. Fortunately SQL Server has mitigations for this problem (hints and plan guides), but not all DBMSes do (PostgreSQL still doesn't support hints natively, if I'm not mistaken).

3 comments

branko_d

luuio 1 day ago

1. Beyond just querying, the stored proc spent a lot of time processing data. As in, looping through cursors, making business logic decisions, calculating things, etc.

2. Having the business logic (not just loading the data) inside the stored procs meant that a change in business logic that would normally only need to update application code, now invalidates the stored procedure's cached execution plan.

ako 1 day ago

If you’re going to process a lot of data, doing that in the database is usually faster, as you avoid moving all the data into a separate server process and then moving it back again. For many queries the round trip to the database server from the application server takes longer that the query itself.
branko_d 1 day ago

> looping through cursors, making business logic decisions, calculating things, etc.
Interesting. Can you share more details about the "non-data" processing that was done? Were they doing heavy mathematical calculations and such?
> change in business logic that would normally only need to update application code, now invalidates the stored procedure's cached execution plan
As for plan cache invalidation - the most extreme case I saw was on the order of 5-10s. Basically, it depends on the size of that one stored procedure - not on all other stored procedures that may call it or be called by it. What was the actual time that they got?