Comment by magicalhippo
4 years ago
Now this is actually nice, unlike the other suggestion posted today[1].
Maybe I'm just too used to non-standard extensions of our database but the SQL example could, at least for our db, be rewritten as
SELECT TOP 20
title,
country,
AVG(salary) AS average_salary,
SUM(salary) AS sum_salary,
AVG(gross_salary) AS average_gross_salary,
SUM(gross_salary) AS sum_gross_salary,
AVG(gross_cost) AS average_gross_cost,
SUM(gross_cost) AS sum_gross_cost,
COUNT(*) as count
FROM (
SELECT
title,
country,
salary,
(salary + payroll_tax) AS gross_salary,
(salary + payroll_tax + healthcare_cost) AS gross_cost
FROM employees
WHERE country = 'USA'
) emp
WHERE gross_cost > 0
GROUP BY title, country
ORDER BY sum_gross_cost
HAVING count > 200
This cuts down the repetition a lot, and can also help the optimizer in certain cases. Could do another nesting to get rid of the HAVING if needed.
Still, think the PRQL looks very nice, especially with a "let" keyword as mentioned in another thread here.
With a CTE it would read a bit more like prql:
Readability is pretty similar to prql. It would really help in SQL if you could refer to column aliases so you don't have to repeat the expression.
My brain just thinks in CTEs over sub queries. I really dislike that my co-workers use these ridiculously nested sub sub sub queries.
I just look at something like this and I immediately know what's going on. If it's nested sub queries it always takes me much longer.
To me those nested sub sub sub SQL queries come from a similar place as beginner coders who tend to make nested IF statements - a lack of experience with the language.
For very complicated stuff, SQL does become very hard to read compared to e.g. tidyverse + targets in R.
In some cases for removing repeating (intermediate) calculations, I generally find it easier to use a lateral join (in postgres), like
So now we have easily come up with three different ways of rewriting the query to avoid that duplication (which obviously was not a problem at all to begin with): subquery, CTE and lateral join. And there are also several more well known ways (views, custom functions, computed columns etc) so the whole premise now for even inventing a "better" language than SQL is then false? Or what am I missing.
It's also weird how people always argue for immutability and eliminating local state, when using procedural languages, but as soon as they switch to SQL, that actually works like this, they immediately want to introduce mutability and local state.
3 replies →
Column aliases would have saved me hundreds of hours over the course of my career. Sorely missing from standard SQL, and would make the need for PRQL less acute.
Snowflake lets you refer to column aliases, and it's great!
There's the slight issue of shadowing of table column names, which they resolve by preferring columns to aliases if both are named the same. So sometimes my aliases end up prefixed with underscores, but that's not a big deal.
The trade-off is that a schema change (adding a column) unrelated to your query can modify its behavior.
Favoring aliases over columns instead has the potential to introduce irresolvable ambiguities as you can’t “qualify” a column alias with a SELECT list or subquery ID the way you can qualify a column by its table/view alias.
> With a CTE
The DB we use supports those, I just learned about them too late so keep forgetting they exist :(
> It would really help in SQL if you could refer to column aliases so you don't have to repeat the expression.
The DB we use supports that, so in your CTE you could write
We do that all the time, which will be a pain now that we're migrating to a different DB server which doesn't.
Not all database systems can optimize queries well over CTE boundaries. I believe this is still true for PostgreSQL (no longer true, see below -- it was true a few years ago). So there's a potential performance hit for (the otherwise excellent advice of) writing with CTE's.
IRC tells me this has been fixed now.
1 reply →
Sybase IQ allows you to use the column alias anywhere else in the query.
what expressions are being repeated here?
> (salary + payroll_tax) AS gross_salary,
> (salary + payroll_tax + healthcare_cost) AS gross_cost
> AND (salary + payroll_tax + healthcare_cost) > 0
And his is a simple example.
In Microsoft SQL cross apply can be used for this in even more situations and with less repetition: