Comment by singron

1 year ago

In the query plan, filtering before or after an aggregation is the same, so it's a strange quirk that SQL requires a different word.

10 comments

singron

andy800 1 year ago

I was not there at the original design decisions of the language, but I imagine it was there specifically to help the person writing/editing the query easily recognize and interpret filtering before or after an aggregation. The explicitness makes debugging a query much easier and ensures it fails earlier. I don't see much reason to stop distinguishing one use case from the other, I'm not sure how that helps anything.

singron 1 year ago
I also wasn't there, but I think this actually wasn't to help authors and instead was a workaround for the warts of SQL. It's a pain to write
SELECT * FROM (SELECT * FROM ... GROUP BY ...) t WHERE ...
and they decided this was common enough that they would introduce a HAVING clause for this case
SELECT * FROM ... GROUP BY ... HAVING ...
But the real issue is that in order to make operations in certain orders, SQL requires you to use subselects, which require restating a projection for no reason and a lot of syntactical ceremony. E.g. you must give the FROM item a name (t), but it's not required for disambiguation.
Another common case is projecting before the filter. E.g. you want to reuse a complicated expression in the SELECT and WHERE clauses. Standard SQL requires you to repeat it or use a subselect since the WHERE clause is evaluated first.
0cf8612b2e1e 1 year ago

I think this stems from the non-linear approach to reading a SQL statement. If it were top-to-bottom linear, like PRQL, then the distinction does not seem merited. It would then always be filtering from what you have collected up to this line.
yen223 1 year ago

I think the original sin here is not making aggregation an explicitly separate thing, even though it should be. Adding a count(*) fundamentally changes what the query does, and what it returns, and what restrictions apply.

0cf8612b2e1e 1 year ago

Indeed. Just as I think git’s N different ways to refer to the same operation was a blunder.

andy800 1 year ago
But pre- and post- aggregation filtering is not really "the same" operation.
- 0cf8612b2e1e 1 year ago
  
  If I use a CTE and filter the aggregate, feels the same to me.
  
  3 replies →