Comment by ako
4 years ago
With a CTE it would read a bit more like prql:
with usa_employees as (
SELECT
title,
country,
salary,
(salary + payroll_tax) AS gross_salary,
(salary + payroll_tax + healthcare_cost) AS gross_cost
FROM employees
WHERE country = 'USA'
AND (salary + payroll_tax + healthcare_cost) > 0
)
select title,
country,
AVG(salary) AS average_salary,
SUM(salary) AS sum_salary,
AVG(gross_salary) AS average_gross_salary,
SUM(gross_salary) AS sum_gross_salary,
AVG(gross_cost) AS average_gross_cost,
SUM(gross_cost) AS sum_gross_cost,
COUNT(*) as emp_count
from usa_employees
group by title, country
having count(*) > 200
order by sum_gross_cost
limit 3
Readability is pretty similar to prql. It would really help in SQL if you could refer to column aliases so you don't have to repeat the expression.
My brain just thinks in CTEs over sub queries. I really dislike that my co-workers use these ridiculously nested sub sub sub queries.
I just look at something like this and I immediately know what's going on. If it's nested sub queries it always takes me much longer.
To me those nested sub sub sub SQL queries come from a similar place as beginner coders who tend to make nested IF statements - a lack of experience with the language.
For very complicated stuff, SQL does become very hard to read compared to e.g. tidyverse + targets in R.
In some cases for removing repeating (intermediate) calculations, I generally find it easier to use a lateral join (in postgres), like
So now we have easily come up with three different ways of rewriting the query to avoid that duplication (which obviously was not a problem at all to begin with): subquery, CTE and lateral join. And there are also several more well known ways (views, custom functions, computed columns etc) so the whole premise now for even inventing a "better" language than SQL is then false? Or what am I missing.
It's also weird how people always argue for immutability and eliminating local state, when using procedural languages, but as soon as they switch to SQL, that actually works like this, they immediately want to introduce mutability and local state.
> so the whole premise now for even inventing a “better” language than SQL is then false?
I don't think anyone is using the above examples to try invalidate PRSQL, just suggesting the baseline for comparisons should account for all constructs available in the SQL standards and common implementations there-of.
> Or what am I missing.
The statement “I can do X better than <SQL example> with <something else>” does not properly show the benefit of <something else> if “I can do X better than <SQL example> with <another SQL example>” is also true (assuming <another SQL example> is actually agreed to be better, not for instance convoluted/confusing/long-winded/other so just replacing some problems with others).
If there's multiple ways to do the same thing that's usually a BAD thing in terms of language design. Especially if some approaches are just newbie traps that experts learn to avoid, or if deciding the best method is a really subtle context-dependent decision. The ideal design is that the language encourages the one obviously "good" way to do it.
1 reply →
Column aliases would have saved me hundreds of hours over the course of my career. Sorely missing from standard SQL, and would make the need for PRQL less acute.
Snowflake lets you refer to column aliases, and it's great!
There's the slight issue of shadowing of table column names, which they resolve by preferring columns to aliases if both are named the same. So sometimes my aliases end up prefixed with underscores, but that's not a big deal.
The trade-off is that a schema change (adding a column) unrelated to your query can modify its behavior.
Favoring aliases over columns instead has the potential to introduce irresolvable ambiguities as you can’t “qualify” a column alias with a SELECT list or subquery ID the way you can qualify a column by its table/view alias.
> With a CTE
The DB we use supports those, I just learned about them too late so keep forgetting they exist :(
> It would really help in SQL if you could refer to column aliases so you don't have to repeat the expression.
The DB we use supports that, so in your CTE you could write
We do that all the time, which will be a pain now that we're migrating to a different DB server which doesn't.
Not all database systems can optimize queries well over CTE boundaries. I believe this is still true for PostgreSQL (no longer true, see below -- it was true a few years ago). So there's a potential performance hit for (the otherwise excellent advice of) writing with CTE's.
IRC tells me this has been fixed now.
Awesome news! thank you for sharing this. I found this post which confirms IRC and suggests it was an improvement in PG 12:
https://paquier.xyz/postgresql-2/postgres-12-with-materializ...
Today is a great day to have been wrong on the Internet. :)
Sybase IQ allows you to use the column alias anywhere else in the query.
what expressions are being repeated here?
> (salary + payroll_tax) AS gross_salary,
> (salary + payroll_tax + healthcare_cost) AS gross_cost
> AND (salary + payroll_tax + healthcare_cost) > 0
And his is a simple example.