Comment by ako

4 years ago

With a CTE it would read a bit more like prql:

  with usa_employees as (
    SELECT
            title,
            country,
            salary,
            (salary + payroll_tax)                   AS gross_salary,
            (salary + payroll_tax + healthcare_cost) AS gross_cost
    FROM  employees
    WHERE country = 'USA'
    AND   (salary + payroll_tax + healthcare_cost) > 0
  )
  select  title,
        country,
        AVG(salary)         AS average_salary,
        SUM(salary)         AS sum_salary,
        AVG(gross_salary)   AS average_gross_salary,
        SUM(gross_salary)   AS sum_gross_salary,
        AVG(gross_cost)     AS average_gross_cost,
        SUM(gross_cost)     AS sum_gross_cost,
        COUNT(*) as emp_count
  from      usa_employees
  group by  title, country
  having    count(*) > 200
  order by  sum_gross_cost
  limit 3

Readability is pretty similar to prql. It would really help in SQL if you could refer to column aliases so you don't have to repeat the expression.

17 comments

ako

gunshai 4 years ago

My brain just thinks in CTEs over sub queries. I really dislike that my co-workers use these ridiculously nested sub sub sub queries.

I just look at something like this and I immediately know what's going on. If it's nested sub queries it always takes me much longer.

Maxion 4 years ago

To me those nested sub sub sub SQL queries come from a similar place as beginner coders who tend to make nested IF statements - a lack of experience with the language.
For very complicated stuff, SQL does become very hard to read compared to e.g. tidyverse + targets in R.

cribwi 4 years ago

In some cases for removing repeating (intermediate) calculations, I generally find it easier to use a lateral join (in postgres), like

    select
        title,
        country,
        avg(salary)         as average_salary,
        sum(salary)         as sum_salary,
        avg(gross_salary)   as average_gross_salary,
        sum(gross_salary)   as sum_gross_salary,
        avg(gross_cost)     as average_gross_cost,
        sum(gross_cost)     as sum_gross_cost,
        count(*)            as emp_count
    from
        employees,
        lateral ( select
            (salary + payroll_tax)                   as gross_salary,
            (salary + payroll_tax + healthcare_cost) as gross_cost
        ) employee_ext
    where
        country = 'usa'
        and gross_cost > 0
    group by  title, country
    having    count(*) > 200
    order by  sum_gross_cost
    limit 3;

jseban 4 years ago
So now we have easily come up with three different ways of rewriting the query to avoid that duplication (which obviously was not a problem at all to begin with): subquery, CTE and lateral join. And there are also several more well known ways (views, custom functions, computed columns etc) so the whole premise now for even inventing a "better" language than SQL is then false? Or what am I missing.
It's also weird how people always argue for immutability and eliminating local state, when using procedural languages, but as soon as they switch to SQL, that actually works like this, they immediately want to introduce mutability and local state.
- dspillett 4 years ago
  
  > so the whole premise now for even inventing a “better” language than SQL is then false?
  I don't think anyone is using the above examples to try invalidate PRSQL, just suggesting the baseline for comparisons should account for all constructs available in the SQL standards and common implementations there-of.
  > Or what am I missing.
  The statement “I can do X better than <SQL example> with <something else>” does not properly show the benefit of <something else> if “I can do X better than <SQL example> with <another SQL example>” is also true (assuming <another SQL example> is actually agreed to be better, not for instance convoluted/confusing/long-winded/other so just replacing some problems with others).
- paulhodge 4 years ago
  
  If there's multiple ways to do the same thing that's usually a BAD thing in terms of language design. Especially if some approaches are just newbie traps that experts learn to avoid, or if deciding the best method is a really subtle context-dependent decision. The ideal design is that the language encourages the one obviously "good" way to do it.
  
  1 reply →

dvasdekis 4 years ago

Column aliases would have saved me hundreds of hours over the course of my career. Sorely missing from standard SQL, and would make the need for PRQL less acute.

correct-me-plz 4 years ago

Snowflake lets you refer to column aliases, and it's great!

There's the slight issue of shadowing of table column names, which they resolve by preferring columns to aliases if both are named the same. So sometimes my aliases end up prefixed with underscores, but that's not a big deal.

twoodfin 4 years ago

The trade-off is that a schema change (adding a column) unrelated to your query can modify its behavior.
Favoring aliases over columns instead has the potential to introduce irresolvable ambiguities as you can’t “qualify” a column alias with a SELECT list or subquery ID the way you can qualify a column by its table/view alias.

magicalhippo 4 years ago

> With a CTE

The DB we use supports those, I just learned about them too late so keep forgetting they exist :(

> It would really help in SQL if you could refer to column aliases so you don't have to repeat the expression.

The DB we use supports that, so in your CTE you could write

   AND   gross_cost > 0

We do that all the time, which will be a pain now that we're migrating to a different DB server which doesn't.

gmfawcett 4 years ago

Not all database systems can optimize queries well over CTE boundaries. I believe this is still true for PostgreSQL (no longer true, see below -- it was true a few years ago). So there's a potential performance hit for (the otherwise excellent advice of) writing with CTE's.

Rovanion 4 years ago
IRC tells me this has been fixed now.
- gmfawcett 4 years ago
  
  Awesome news! thank you for sharing this. I found this post which confirms IRC and suggests it was an improvement in PG 12:
  https://paquier.xyz/postgresql-2/postgres-12-with-materializ...
  Today is a great day to have been wrong on the Internet. :)

mmsimanga 4 years ago

Sybase IQ allows you to use the column alias anywhere else in the query.

jsyolo 4 years ago

what expressions are being repeated here?

oblio 4 years ago

> (salary + payroll_tax) AS gross_salary,
> (salary + payroll_tax + healthcare_cost) AS gross_cost
> AND (salary + payroll_tax + healthcare_cost) > 0
And his is a simple example.