← Back to context

Comment by magicalhippo

1 year ago

> What’s wrong with CTEs though?

Depends on DB engines I suppose. I've come across that certain operations were not allowed in CTEs, and they can be an optimization barrier.

However if your query is dynamically modified at runtime, then CTEs can be a no-go. For example, we have a grid component which first does a count and then only selects the visible rows. This is great if you have expensive subselects as columns in a large table. However to do the counting it turns the main query into a sub-query, and it doesn't handle CTEs.

Understood. I should have asked my question a bit more specifically: what's wrong with CTEs that wouldn't be an issue with this new pipe syntax. I briefly scanned the paper and it appears there aren't any specific benefits to the pipe syntax that would make optimization easier. So we can expect that if a SQL engine doesn't optimize CTEs well it would likely have the same limitations for the pipe syntax.

Section 2.1.4 the paper lists the benefits of the pipe syntax over CTEs, and they are all based on ergonomics. As someone who has never had issues with the ergonomics of CTEs I must say I am not convinced that proposed syntax is better. It may be that I've been doing SQL for so long that I don't see its warts. Overall SQL feels like a very well designed and consistent language to me. The new pipe syntax appears to bolt on an imperative construct to an otherwise purely functional language.

  • > The new pipe syntax appears to bolt on an imperative construct to an otherwise purely functional language.

    It's not imperative. The pipe symbol is a relational operator that takes one table as input and produces one as output. It's still purely functional, but it has the advantage of making the execution order obvious. That is, the order is a linear top-down flow, not the inside-out flow implicit in vanilla SQL. Further, when your wanted flow doesn't match vanilla SQL's implicit ordering, you don't have to invent CTEs to wire up your flow. You just express it directly.

    As for ergonomics, consider a simple task: Report some statistics over the top 100 items in a table. Since LIMIT/ORDER processing is last in vanilla SQL's implied ordering, you can't directly compute the stats over the top items. You must create a CTE to hold the top items and then wire it into a second SELECT statement to compute the stats. That's busywork. With pipe syntax, there's no need to invent that intermediate CTE.

    • > It's not imperative. The pipe symbol is a relational operator that takes one table as input and produces one as output.

      Maybe I used the wrong term. In my mental model, the query planner decides the order in which the query is evaluated based on what table stats predict is most efficient query plan, and I actually don't really want to think about the order too much. For example, if I create a CTE, I don't necessarily want it to be executed in that order. Maybe a condition on the later query can be pushed back into the earlier CTE so that less data can be scanned.

      I will admit that technically there should be no difference in how a query planner handles either. But to me the pipe syntax does not hint as much at these non-linear optimizations than CTEs do. I called the CTE syntax more functional as it implies less to me.

      > but it has the advantage of making the execution order obvious.

      So we're back to ergonomics which I just never had an issue with...

      > As for ergonomics, consider a simple task: Report some statistics over the top 100 items in a table. Since LIMIT/ORDER processing is last in vanilla SQL's implied ordering, you can't directly compute the stats over the top items.

      Could I not compute the stats over all values, then order and limit them, and depend on the query planner to not do the stat calculation for items outside the limit? If the order/limit does not depend on a computed statistic that should be possible? Or does that not happen in practice?

      2 replies →