Comment by geertj

1 year ago

What’s wrong with CTEs though? I have never thought of them as busywork and start with them if I know my query is going to be a multi-step process. I already thought them as “pipes” (in the abstract, not related to this paper). If the query flow is linear so is the sequence of CTEs.

The only slightly annoying thing with CTEs is that you always have to name them. You might consider this an overhead if the flow is strictly linear. But when the flow is not linear (eg when doing two different aggregations over the same data, to join them later) you need a way to refer to the intended CTE, so always requiring a name does add some consistency and doesn’t seem a big deal overall.

CTEs are not inherently busywork. I rather like them. What is busywork is having to chop a linear flow of operations into chunks and then wrap those chunks in CTEs that you must wire together. All this, simply because the SQL syntax doesn't let you express that flow directly.

> What’s wrong with CTEs though?

Depends on DB engines I suppose. I've come across that certain operations were not allowed in CTEs, and they can be an optimization barrier.

However if your query is dynamically modified at runtime, then CTEs can be a no-go. For example, we have a grid component which first does a count and then only selects the visible rows. This is great if you have expensive subselects as columns in a large table. However to do the counting it turns the main query into a sub-query, and it doesn't handle CTEs.

  • Understood. I should have asked my question a bit more specifically: what's wrong with CTEs that wouldn't be an issue with this new pipe syntax. I briefly scanned the paper and it appears there aren't any specific benefits to the pipe syntax that would make optimization easier. So we can expect that if a SQL engine doesn't optimize CTEs well it would likely have the same limitations for the pipe syntax.

    Section 2.1.4 the paper lists the benefits of the pipe syntax over CTEs, and they are all based on ergonomics. As someone who has never had issues with the ergonomics of CTEs I must say I am not convinced that proposed syntax is better. It may be that I've been doing SQL for so long that I don't see its warts. Overall SQL feels like a very well designed and consistent language to me. The new pipe syntax appears to bolt on an imperative construct to an otherwise purely functional language.

    • > The new pipe syntax appears to bolt on an imperative construct to an otherwise purely functional language.

      It's not imperative. The pipe symbol is a relational operator that takes one table as input and produces one as output. It's still purely functional, but it has the advantage of making the execution order obvious. That is, the order is a linear top-down flow, not the inside-out flow implicit in vanilla SQL. Further, when your wanted flow doesn't match vanilla SQL's implicit ordering, you don't have to invent CTEs to wire up your flow. You just express it directly.

      As for ergonomics, consider a simple task: Report some statistics over the top 100 items in a table. Since LIMIT/ORDER processing is last in vanilla SQL's implied ordering, you can't directly compute the stats over the top items. You must create a CTE to hold the top items and then wire it into a second SELECT statement to compute the stats. That's busywork. With pipe syntax, there's no need to invent that intermediate CTE.

      4 replies →

> What’s wrong with CTEs though?

At least in SQL Server CTE's are syntax level, so multiple uses of a CTE in a query causes it to get expanded in each of those places, which typically increases the complexity of the query and can cause issues with the optimizer and performance.