Comment by tmoertel

1 year ago

No, the wanted stats are a function of the top 100 items.

As a concrete example, consider computing the average sales volume by category for the top 100 items. Here's the vanilla SQL for it:

    WITH
      TopItems AS (
        SELECT category, sales_volume
        FROM Items
        ORDER BY sales_volume DESC
        LIMIT 100
      )
    SELECT category, AVG(sales_volume) AS avg_sales_volume
    FROM TopItems
    GROUP BY category;

Because ORDER/LIMIT processing is implicitly last in vanilla SQL, if you need to do anything after that processing, you must do it in a new SELECT statement. Thus you must capture the ORDER/LIMIT results (e.g., as a CTE or, heaven forbid, as a nested SELECT statement) and then wire those results into that new SELECT statement via its FROM clause.

In contrast, with SQL pipes you can express any ordering you want, so you can feed the ORDER/LIMIT results directly into the statistical computations:

    FROM Items
    |> ORDER BY sales_volume DESC
    |> LIMIT 100
    |> AGGREGATE AVG(sales_volume) AS avg_sales_volume
       GROUP BY category

That's way simpler and the data flows just as it reads: from top to bottom.

1 comment

tmoertel

geertj 1 year ago

Okay, thanks for that example. The ability of the pipe syntax to re-order the standard SQL pipeline order does indeed provide for more compact queries in this case.