← Back to context

Comment by magicalhippo

4 years ago

Now this is actually nice, unlike the other suggestion posted today[1].

Maybe I'm just too used to non-standard extensions of our database but the SQL example could, at least for our db, be rewritten as

    SELECT TOP 20
        title,
        country,
        AVG(salary) AS average_salary,
        SUM(salary) AS sum_salary,
        AVG(gross_salary) AS average_gross_salary,
        SUM(gross_salary) AS sum_gross_salary,
        AVG(gross_cost) AS average_gross_cost,
        SUM(gross_cost) AS sum_gross_cost,
        COUNT(*) as count
    FROM (
        SELECT
            title,
            country,
            salary,
            (salary + payroll_tax) AS gross_salary,
            (salary + payroll_tax + healthcare_cost) AS gross_cost
        FROM employees
        WHERE country = 'USA'
    ) emp
    WHERE gross_cost > 0
    GROUP BY title, country
    ORDER BY sum_gross_cost
    HAVING count > 200

This cuts down the repetition a lot, and can also help the optimizer in certain cases. Could do another nesting to get rid of the HAVING if needed.

Still, think the PRQL looks very nice, especially with a "let" keyword as mentioned in another thread here.

[1]: https://news.ycombinator.com/item?id=30053860

With a CTE it would read a bit more like prql:

  with usa_employees as (
    SELECT
            title,
            country,
            salary,
            (salary + payroll_tax)                   AS gross_salary,
            (salary + payroll_tax + healthcare_cost) AS gross_cost
    FROM  employees
    WHERE country = 'USA'
    AND   (salary + payroll_tax + healthcare_cost) > 0
  )
  select  title,
        country,
        AVG(salary)         AS average_salary,
        SUM(salary)         AS sum_salary,
        AVG(gross_salary)   AS average_gross_salary,
        SUM(gross_salary)   AS sum_gross_salary,
        AVG(gross_cost)     AS average_gross_cost,
        SUM(gross_cost)     AS sum_gross_cost,
        COUNT(*) as emp_count
  from      usa_employees
  group by  title, country
  having    count(*) > 200
  order by  sum_gross_cost
  limit 3

Readability is pretty similar to prql. It would really help in SQL if you could refer to column aliases so you don't have to repeat the expression.

  • My brain just thinks in CTEs over sub queries. I really dislike that my co-workers use these ridiculously nested sub sub sub queries.

    I just look at something like this and I immediately know what's going on. If it's nested sub queries it always takes me much longer.

    • To me those nested sub sub sub SQL queries come from a similar place as beginner coders who tend to make nested IF statements - a lack of experience with the language.

      For very complicated stuff, SQL does become very hard to read compared to e.g. tidyverse + targets in R.

  • In some cases for removing repeating (intermediate) calculations, I generally find it easier to use a lateral join (in postgres), like

        select
            title,
            country,
            avg(salary)         as average_salary,
            sum(salary)         as sum_salary,
            avg(gross_salary)   as average_gross_salary,
            sum(gross_salary)   as sum_gross_salary,
            avg(gross_cost)     as average_gross_cost,
            sum(gross_cost)     as sum_gross_cost,
            count(*)            as emp_count
        from
            employees,
            lateral ( select
                (salary + payroll_tax)                   as gross_salary,
                (salary + payroll_tax + healthcare_cost) as gross_cost
            ) employee_ext
        where
            country = 'usa'
            and gross_cost > 0
        group by  title, country
        having    count(*) > 200
        order by  sum_gross_cost
        limit 3;

    • So now we have easily come up with three different ways of rewriting the query to avoid that duplication (which obviously was not a problem at all to begin with): subquery, CTE and lateral join. And there are also several more well known ways (views, custom functions, computed columns etc) so the whole premise now for even inventing a "better" language than SQL is then false? Or what am I missing.

      It's also weird how people always argue for immutability and eliminating local state, when using procedural languages, but as soon as they switch to SQL, that actually works like this, they immediately want to introduce mutability and local state.

      3 replies →

  • Column aliases would have saved me hundreds of hours over the course of my career. Sorely missing from standard SQL, and would make the need for PRQL less acute.

  • Snowflake lets you refer to column aliases, and it's great!

    There's the slight issue of shadowing of table column names, which they resolve by preferring columns to aliases if both are named the same. So sometimes my aliases end up prefixed with underscores, but that's not a big deal.

    • The trade-off is that a schema change (adding a column) unrelated to your query can modify its behavior.

      Favoring aliases over columns instead has the potential to introduce irresolvable ambiguities as you can’t “qualify” a column alias with a SELECT list or subquery ID the way you can qualify a column by its table/view alias.

  • > With a CTE

    The DB we use supports those, I just learned about them too late so keep forgetting they exist :(

    > It would really help in SQL if you could refer to column aliases so you don't have to repeat the expression.

    The DB we use supports that, so in your CTE you could write

       AND   gross_cost > 0
    

    We do that all the time, which will be a pain now that we're migrating to a different DB server which doesn't.

  • Not all database systems can optimize queries well over CTE boundaries. I believe this is still true for PostgreSQL (no longer true, see below -- it was true a few years ago). So there's a potential performance hit for (the otherwise excellent advice of) writing with CTE's.

  • what expressions are being repeated here?

    • > (salary + payroll_tax) AS gross_salary,

      > (salary + payroll_tax + healthcare_cost) AS gross_cost

      > AND (salary + payroll_tax + healthcare_cost) > 0

      And his is a simple example.

In Microsoft SQL cross apply can be used for this in even more situations and with less repetition:

    select top(20)
        title,
        country,
        ...
        avg(gross_salary) as average_gross_salary,
        ...
    from employees
    cross apply ( select
        gross_salary = employees.salary + employees.payroll_tax, -- or "as .."
        gross_cost = ...
    ) v -- some name required but don't need to use it if column names are unique
    where ...