← Back to context

Comment by ianbicking

4 years ago

I like it, it's readable, unlike some SQL alternatives I've seen it doesn't make me feel like I'm dumb and don't understand what a query even is.

I can't decide if it would be better or worse if it stuck more closely to SQL keywords. You use "from" and "select", but not "where", "order by", "group by". There's some danger of it being in an uncanny valley of SQLish, but I'm pretty sure I'd prefer just using those terms verbatim (including the space in "order by"... that style is less common in modern languages but it's not really that much harder to parse).

I'd like to see more examples of joins and composing SQL. Does this language make it easier to make more general SQL queries? Can I take two queries and squash them together in a reliable way? I feel like I end up with a lot of theme and variation in my queries, often involving optional filters.

I might even like a notion of encapsulation that could help this query language when it's embedded in other languages. Like if I could say, in the language itself, that a query has certain unbound variables (and not just ? or other placeholders). This language seems like it would be better for generating than SQL, and sometimes generation is just necessary (like in any application that supports data exploration), but for most common cases I'd hope to avoid that. Defining inputs and then making whole filter sections or other statements conditional on those inputs would help here.

Yup, I like a lot of things about the way this looks. In particular, I like how friendly this looks to be for things like auto complete (pretty annoying to need to practically type the entire sql query only to go back and fix up the columns in order to get autocomplete to work).

Specific things I'd like to see.

How do you handle column ambiguity. In the examples, they show a join of positions to employee on employee_id == id. But what happens when you have 2 columns with the same name that you are joining on? (like employee_id to employee_id in some mapping table).

Subqueries are pretty important in what I do, so what do those look like (perhaps covered by the "thinking about CTEs section").

How about opportunities for optimization hints? In T-SQL you can hint at which index the optimizer should prefer to a specific query.

Common SQL patterns would also be interesting. Like, how would you do keyset pagination?

Edit: Also, I'd like a discussion about null. SQL null handling rules are terrible. I understand them, I work with them, but at the same time, they are so different from other languages concept of "null" that they are easy to trip over.

  •   > SQL null handling rules are terrible. I understand them, I work with them, but at
      > the same time, they are so different from other languages concept of "null" that
      > they are easy to trip over.
    

    Could you elaborate? I'm really only versed in the MySQL accent, but I don't find anything unusual or unexpected about NULLS in MySQL. If there are any pitfalls that I should be aware of, I'd love to know about it here before my users start complaining about bugs.

    Thanks.

    • In SQL NULL does not mean "no value" it means "unknown value". Existence of such value introduces three value logic because expression "NULL = <anything>" is neither true nor false. This makes queries harder to understand without any benefit.

      Additionally "unknown value" concept is not used consistently. Things like DISTINCT, or UNIQUE indexes (in some databases) treat NULL as single "no value".

      3 replies →

    • Not the person you replied to, but I don't think by “from other languages” he means other dialects of SQL.

      Instead, I think other languages away from the database are being referred to - in many of those NULL is treated like any other value², for instance in Javascript¹ null==null is true and null!=null is false, and due to type coercion null on its own is “falsey”. Personally I have no problem with SQLs handling of NULL with one exception, and find other languages treating it as a single value rather than an unknown feels odd.

      The one thing that I have occasionally tripped over with NULL in SQL is the effect of “<val-or-var> NOT IN (<set>)” when NULL is one of the entries in <set> - it makes sense when you think about it because the IN operator can only return true or false and it can't definitively say the searched for value isn't equal to the unknown one(s)³ but this doesn't seem intuitive.

      Some SQL dialects do handle NULL a little differently, more like languages like JS. MS SQL Server can be made to with SET ANSI_NULLS OFF to force its ancient not-standards-compliant behaviour⁴.

      [1] quick & easy to test in your browser's console

      [2] well, technically in JS I think null is specifically a null object reference, that being one of the differences between null and undefined

      [3] more concretely, “var NOT IN (1, 2, NULL)” being equivalent to “var<>1 AND var<>2 AND var<>NULL” which becomes “true AND true AND NULL” which is NULL as any logical operator against NULL returns NULL.

      [4] though note that this option is officially deprecated, as of at least 2016, and might be removed or just ignored in future versions

      3 replies →

Thanks!

I just fleshed out composing CTEs, which is a small step towards the broader goal of making composition easier: https://github.com/max-sixty/prql/commit/dc68fcaaceef26cc078...

Let me know if you have a good case of the sort of composition you find difficult in SQL (either here or in an issue). Thank you!

  • I think supporting variables and functions already solves most of my composability gripes with SQL.

    Another problem that I have with composing SQL is that large queries quickly become unreadable, and error messages are also often not terribly helpful. I think having a more expressive type system would help with the error messages. Do you have any plans on adding a type system to PRQL?

> Can I take two queries and squash them together in a reliable way? I feel like I end up with a lot of theme and variation in my queries, often involving optional filters.

That's essentially what SQL views do. Each view is a query and then you can treat it like a table and filter/join on it.

Of course, then the problem becomes whether or not the query planner can see through the view to the underlying tables to optimize correctly.

> I can't decide if it would be better or worse if it stuck more closely to SQL keywords. You use "from" and "select", but not "where", "order by", "group by". There's some danger of it being in an uncanny valley of SQLish, but I'm pretty sure I'd prefer just using those terms verbatim (including the space in "order by"... that style is less common in modern languages but it's not really that much harder to parse)

I agree 100% here. As a SQL veteran, it would make the transition a lot easier if you used common SQL keywords like group by, order by, limit, etc. e.g.

    from employees
    where country = "USA"
    derive [
      gross_salary: salary + payroll_tax,
      gross_cost:   gross_salary + benefits_cost
    ]           
    where gross_cost > 0
    group by:[title, country] [
        average salary,
        sum     salary,
        average gross_salary,
        sum     gross_salary,
        average gross_cost,
        sum_gross_cost: sum gross_cost,
        count,
    ]
    order by:sum_gross_cost
    where count > 200
    limit 20