← Back to context

Comment by audunw

2 years ago

I have a moderate amount of SQL experince, but I could not write that query at the top of my head. Maybe you misunderstand what the PRQL query is doing?

Here's the SQL it generates:

WITH table_0 AS ( SELECT , ROW_NUMBER() OVER ( PARTITION BY album_id ORDER BY milliseconds DESC ) AS _expr_0 FROM tracks ) SELECT FROM table_0 WHERE _expr_0 <= 1

If I understand PRQL correctly, it finds the longest song for each album? A simple concept, but not a simple MySQL query.

SELECT DISTINCT ON (album_id) * FROM tracks GROUP BY album_id ORDER BY milliseconds DESC;

For those unfamiliar with Postgres, DISTINCT ON takes only one row for each of the groups based on the supplied columns. So in this case, it will return only one row per album_id.

Without an ORDER BY, DISTINCT ON chooses a random row (not actually, but you can’t rely on it.) Since we ORDER BY milliseconds DESC, the first row of each group will be the longest one.

  • that works for top-1 but breaks completely if you want to extend it to top-2

    • The example was asking for top 1. Yes, you are correct that you'd have to switch up to a window function to handle top-n+1.

      Do you believe these cases are common enough to warrant discarding the tools and training available to SQL? Are you also certain that PRQL doesn't have corner cases where SQL is more concise and/or easier to understand?

It's a pretty common pattern (take the top N by group with some order), so I'd expect that it's a familiar pattern to a lot of people who regularly do analytical queries in SQL. It's clearer with formatting/naming:

    WITH ranked_tracks AS (
        SELECT *,
            ROW_NUMBER() OVER ( 
                PARTITION BY album_id
                ORDER BY milliseconds DESC
            ) AS row_num
        FROM tracks
    )
    SELECT *
    FROM ranked_tracks
    WHERE
        row_num = 1

But it is still super ugly for such a common need. If I were to add syntax to make this kind of thing easier, I'd just go for a syntax that made something like this valid:

    SELECT *
    FROM tracks
    WHERE
        ROW_NUMBER() OVER ( 
            PARTITION BY album_id
            ORDER BY milliseconds DESC
        ) = 1

Which apparently the QUALIFY statement does in a few dialects.