← Back to context

Comment by hans_castorp

2 years ago

> How long would it take you to write the SQL for that?

    select distinct on (album_id) *
    from tracks
    order by album_id, milliseconds desc;

Eh, `DISTICT ON` is a custom PostgreSQL extension.

A standard* method would be:

    SELECT *
    FROM tracks
    QUALIFY row_number() over (partition by album_id order by milliseconds desc) = 1;

But the QUALIFY clause is so new that it doesn't work on most RDBMSs. If you're on MS SQL Server, you're still using:

    SELECT *
    FROM (
      SELECT *
        ,row_number() over (partition by album_id order by milliseconds desc) rn
      FROM tracks
      ) x
    WHERE x.rn = 1;

That said, I still don't think PRQL is particularly amazing. I can't tell if it's merely syntactic sugar for SQL, or if it's actually meant to control query execution. If it's the former, it's likely to frustrate developers because it's actually just another layer of abstraction. If it's the latter, then it requires the developer to not only understand the data model well enough to be able to write SQL queries, they need to be able to understand the RDBMS impementation details well enough to be able to write queries that best take advantage of the current database's indexes, statistics, and configuration. Even something as simple as sorting before filtering or projecting can be a significant performance issue. Nevermind the fact that relational algebra done in the wrong order can be non-deterministic or not equivalent transformations, so even if the query processor is smart enough to do rewrites whatever the developer enters might be logically different unintentionally.

Ultimately I think it's a tool that lets the developer thinking about the problem in the way they prefer, rather than thinking about the problem in the way that best suits the problem at hand. Like insisting on writing documentation in LaTeX instead of Word or Markdown.

*: I believed this was in SQL 2023, but double checking it looks like it did not have make the final standard. I would be surprised if it didn't make it in the future, however.

  • The PRQL extension discussed here is for Postgres, so my solution was Postgres specific.

    Since when is QUALIFY part of the SQL standard? So far I have only seen it as a proprietary feature in Terradata.

  • Damn. Reading your comment, i was about to be really glad that this pain would be a thing of the past before too long. Too bad it didn't make the standard :(

    • Well, even when it's part of the standard it will take about 6 years before your vendor chooses to implment it.

      And even then, it'll be another 6 years before your application vendor finally upgrades to it.

      And even then, it'll be another 6 years before the database feature is allowed to be enabled.

      And even then, your reporting software won't support it.

      1 reply →

With the additional benefit that resources about SQL are everywhere and every question has been answered already multiple times. On top of that you are most likely to encounter SQL at a job than PRQL.

You'd also need a LIMIT or a TOP, and you might need a subquery for that depending on your SQL of choice.

  • The question was "the longest track per album", so I'd expect one row per album as the result (which is what my query does). But adding a LIMIT would not do that.

    It might be that the question was "the longest track across all albums" which indeed would require a LIMIT