← Back to context

Comment by CuriouslyC

2 years ago

Your statement about breaking large queries apart is wrong. You can write queries with CTES to improve readability, and extract CTEs into functions that can be selected as queries get too large and unwieldly. SQL is just as composable as any other language.

The thing that's lacking right now is the tooling for managing/testing/deploying database code. There are solutions out there and the supabase folks have been working to make things better but database first development still has some hurdles in terms of DX.

One issue with functions though is that they can change performance in unpredictable ways. For example a colleague of mine recently altered a function I had written that was used in multiple hot-path queries. The change he made accidentally caused the function to no longer be inline-able on PostgreSQL. Once the function couldn't be inlined then the PostgreSQL planner wasn't able to select the appropriate indices and the performance of several of the queries exploded by about 100x.

So while it's true it can be composed etc the current state of the art planners struggle except under very simple/constrained scenarios.

  • If a function is marked stable it should not impact the query plan at all, since stable functions are essentially in-lined before planning. If logic is unstable a view is probably going to be a better abstraction than a function.

    • Well I thought so but even with the functions marked IMMUTABLE, which is even more stringent than STABLE the in-lining was not successful, this was apparent in the query plan.

      This might be a special case however as the function called another function internally (also IMMUTABLE) which was essentially memoized using an expression index. This is the index that was no-longer hit when inlining failed.

      If you think this is bug I think I can create a minimal reproduction.

      5 replies →

  • I'm not sure how adding another layer of abstraction helps with that.

    The problem you encoutered is that the rewritten function was either no longer table-valued, or else it was no longer deterministic (which is what that big list of rules for inlining really means). But that problem doesn't go away by adding a layer of abstraction. The need to understand relational determinism doesn't disappear. The need to understand SARGability doesn't go away. You can't really abstract the problem away.

Do you have any links to a basic example on using CTEs and functions to keep SQL maintainable?

I've used CTEs, but I had not tried breaking up an SQL query into functions. Didn't know that was possible!

For whatever reason, I feel like I end up with a giant blob of SQL when writing SQL and it's incredibly frustrating.

  • You can just use chatgpt to rewrite sql with ctes, and extract functions. It's quite good at it, particularly gpt4.

    That being said, CTEs are a really good way to write complex queries. They let you tag bits of query with meaningful names, and each thing you tag is accessible to every CTE after it so you can build up an almost imperative data flow by just doing select transforms one after another. That way you're building hard queries from the bottom up rather than the top down.

    • Agreed. It's so nice to be able to query intermediates to see what their output looks like and just check each step of the CTE during debugging. It was such an improvement over most subselects!

> You can write queries with CTES to improve readability, and extract CTEs into functions that can be selected as queries get too large and unwieldly.

Personally I'd go for breaking them into views. IIRC as of around postgres 11-13 they're no longer a barrier for the query planner.

  • You are right, views are a good choice in a lot of instances. Functions give you more behavioral flexibility, easy multiple version support (in postgres) and fewer issues with ddl dependencies, but views are semantically clearer, easier to work with and give good usage flexibility.