Comment by lkuty

2 years ago

I don't understand the need for libraries that abstract away SQL when you could just write SQL directly and have full access to the power of the language which is quite rich (recursive CTE, windowing, ... aka Modern SQL). You could also use stored procedures/functions for more complex stuff and e.g. JSON (or native types) to transfer data between the database and the application. Why limit ourselves with a sub-optimal language be it PRQL, Ecto, other ORMs, ... ?

94 comments

lkuty

snthpy 2 years ago

This comes up every time PRQL makes it onto HackerNews and is a fair question:

Short answer: DX

Slightly longer answer: Developer productivity and experience, especially for EDA and interactively writing complex analytical queries.

Most people that have tried PRQL just find it more convenient to write their analytical queries in it. PRQL compiles to SQL so it can't express anything you can't already do in SQL, but you can probably express yourself much faster in PRQL.

Just try the following query in the online PRQL Playground (https://prql-lang.org/playground/) to find the longest track per album:

```prql

from tracks

group album_id (

  sort {-milliseconds}

  take 1

  )

```

How long would it take you to write the SQL for that?

Disclaimer: I'm a PRQL contributor.

quesera 2 years ago
Reformatted for readability (indent code blocks with two spaces):
from tracks group album_id ( sort {-milliseconds} take 1 )
Editorializing:
Data query specification is all about getting the details right. This does not look simpler than the corresponding SQL to me though. All components must be present -- scope, group, limit, order.
SQL, for all its faults, is generally succinct at incorporating the required details. The PRQL sample here is succinct as well, but to me at least, not differentiating.
GordonS 2 years ago
> How long would it take you to write the SQL for that?
I don't want to appear rude, but unless I'm missing something, this is a pretty simple SQL query, of the kind anyone with mimimal SQL experience could write off the top of their head in seconds.
I like the idea of PRQL, but I think a better example is needed to sell it.
- audunw 2 years ago
  
  I have a moderate amount of SQL experince, but I could not write that query at the top of my head. Maybe you misunderstand what the PRQL query is doing?
  Here's the SQL it generates:
  WITH table_0 AS ( SELECT , ROW_NUMBER() OVER ( PARTITION BY album_id ORDER BY milliseconds DESC ) AS _expr_0 FROM tracks ) SELECT FROM table_0 WHERE _expr_0 <= 1
  If I understand PRQL correctly, it finds the longest song for each album? A simple concept, but not a simple MySQL query.
  
  4 replies →
- cmpb 2 years ago
  
  This "find the latest row for each <column>" query is kind of a poster-child for seemingly-simple but actually difficult to get right/performant sql.
  E.g. see: https://stackoverflow.com/questions/1313120/retrieving-the-l...
- quesera 2 years ago
  
  I think the logic here is: SQL is hard and most people don't know it well. So PRQL is perhaps easier to learn.
  The same logic is applied to TypeScript vs JavaScript. Or C vs assembly. Or Nano vs vim. Etc etc.
  It's a quandary though. SQL is clearly difficult for most people to get their heads around. It does require a different way of thinking about data, and you can get by with a minimal SQL knowledge for a long time, especially nowadays with ORMs.
  But like so many other things, making the investment is worthwhile and pays off in small and large ways, forever (so far).
  
  5 replies →
- bvrmn 2 years ago
  
  I'm not a PRQL fun. But TBH I can't write this SQL from my head. I have thousand lines of written SQL.
CuriouslyC 2 years ago
This seems like a micro-optimization to me. That's faster to write, but you pay for that with extra tooling and an abstraction layer that you now have to train or hire for, and I'm not sure that's a good payoff.
This idea seems like it'd be better as an editor plugin that lets you write shorthand and have it automatically expanded into correct SQL rather than as a build time thing.
- sph 2 years ago
  
  You can apply this logic to choosing a high level language vs writing assembly code.
  Yes the pipeline is more complex, there are more tools and more syntax to track, but the benefits are pretty clear (or we'd all be writing UI code in hand written asm)
  While leaky abstractions are a huge problem, the thing about abstractions is that, if they are any good, the benefits and improvements to productivity outweigh the negatives. You just have to figure out if the gains PRQL could give you are worth the effort.
  I rarely write SQL, so it's not worth it for me. But if PRQL were an actual query engine, not just a translation layer, and some database offered a native PRQL interface, I would immediately switch to it rather than to keep twisting my brain with SQL and it's inane syntax and rules.
  (I was a full time DBA in one previous life, so I should be more comfortable with SQL than most.)
  
  1 reply →
hans_castorp 2 years ago
> How long would it take you to write the SQL for that?
select distinct on (album_id) * from tracks order by album_id, milliseconds desc;
- da_chicken 2 years ago
  
  Eh, `DISTICT ON` is a custom PostgreSQL extension.
  A standard* method would be:
  SELECT * FROM tracks QUALIFY row_number() over (partition by album_id order by milliseconds desc) = 1;
  But the QUALIFY clause is so new that it doesn't work on most RDBMSs. If you're on MS SQL Server, you're still using:
  SELECT * FROM ( SELECT * ,row_number() over (partition by album_id order by milliseconds desc) rn FROM tracks ) x WHERE x.rn = 1;
  That said, I still don't think PRQL is particularly amazing. I can't tell if it's merely syntactic sugar for SQL, or if it's actually meant to control query execution. If it's the former, it's likely to frustrate developers because it's actually just another layer of abstraction. If it's the latter, then it requires the developer to not only understand the data model well enough to be able to write SQL queries, they need to be able to understand the RDBMS impementation details well enough to be able to write queries that best take advantage of the current database's indexes, statistics, and configuration. Even something as simple as sorting before filtering or projecting can be a significant performance issue. Nevermind the fact that relational algebra done in the wrong order can be non-deterministic or not equivalent transformations, so even if the query processor is smart enough to do rewrites whatever the developer enters might be logically different unintentionally.
  Ultimately I think it's a tool that lets the developer thinking about the problem in the way they prefer, rather than thinking about the problem in the way that best suits the problem at hand. Like insisting on writing documentation in LaTeX instead of Word or Markdown.
  *: I believed this was in SQL 2023, but double checking it looks like it did not have make the final standard. I would be surprised if it didn't make it in the future, however.
  
  4 replies →
- dewey 2 years ago
  
  With the additional benefit that resources about SQL are everywhere and every question has been answered already multiple times. On top of that you are most likely to encounter SQL at a job than PRQL.
  
  1 reply →
- hobs 2 years ago
  
  You'd also need a LIMIT or a TOP, and you might need a subquery for that depending on your SQL of choice.
  
  2 replies →
gigatexal 2 years ago
select id from tracks qualify row_number() over partition by (album_id order by milliseconds desc) =1
That should work
Look I’m a diehard SQL just use it guy but open to improvements. But I’m loathe to use abstractions for things when the underlying thing is so expressive.
Autocomplete of fields in a good editor, schema help, etc go a long way to making SQL being written raw very nice.
- orlp 2 years ago
  
  QUALIFY is not part of the SQL standard.
  
  4 replies →
OJFord 2 years ago
In the real world, probably something like:
select album_id ,first_value(id order by milliseconds desc) as longest_track_id from tracks group by album_id
I agree the PRQL's pretty nice here, but I think such a generalised example (chances you actually want to `select *`?) overstates the advantage.
- da_chicken 2 years ago
  
  This query would error: "Field `id` is neither in the GROUP BY nor in an aggregate function." Since `id` is a key in `tracks` and `album_id` is a foreign key and not unique in `tracks`, the RDBMS wouldn't be able to use any implicit GROUP BY determinism, either.
  You could do this:
  select distinct album_id ,first_value(id order by milliseconds desc) as longest_track_id from tracks
  But this is unlikely to perform as well as the row_number() method because it will cause the RDBMS to generate a record for every track and then waste time sorting the intermediate results to find the unique records in the output.
- riku_iki 2 years ago
  
  your query will return just album_id and id, not other track fields
halayli 2 years ago
They can ask chatgpt to generate the SQL query and use the SQL output that everyone is familiar with rather than use an abstraction that is prone to versioning and behavioral changes and will consume everyone else's time to go learn a new language and become profficient in it.
- oulipo 2 years ago
  
  if they need ChatGPT to generate the query (costly and possibly error-prone), then it means that not everyone is familiar with the output...
elbear 2 years ago

I have a feeling this language will be more familiar to programmers who think functionally. I say this because it seems to consist of transformations applied to data and of `derive`, which defines new variables later used in other transformations.
abirch 2 years ago
This would have been awesome for me 2 years ago.
Currently much of my complicated SQL is generated by a LLM.
- buremba 2 years ago
  
  Hmm, I would think that LLM helps adoption for the semantic layers such as PRQL, Malloy, and dbt since it's possible to generate/validate/iterate 5 lines of PRQL compared to 25 lines of SQL but considering none of them widely adopted yet, you might indeed be correct in a way that LLM makes it harder for the new tools to gain adoption by helping you to suffer less from the verboseness of SQL.
  
  2 replies →
lkuty 2 years ago

Sorry to be that guy :-) The `introduction.prql` example on the playground gives an idea of the better readability of PRQL vs SQL and your small example an idea of the speed you may gain from writing PRQL vs SQL. It is interesting. Indeed, me writing the SQL would have taken more time than you writing the PRQL.
beeboobaa 2 years ago

Won't this just lead to developers wrecking performance because they don't understand what's happening?

vendiddy 2 years ago

The thing I hate most about SQL is lack of composability.

In most languages it's easy to pull out functions.

In SQL you end up with a giant hard to comprehend mess.

I think the underlying relational concepts in SQL are sound but I'd love to see ideas like PRQL that aim make SQL easier to write and maintain.

Stored procedures and functions are nice but don't allow the basic idea of breaking a large query apart into smaller logical components.

CuriouslyC 2 years ago
Your statement about breaking large queries apart is wrong. You can write queries with CTES to improve readability, and extract CTEs into functions that can be selected as queries get too large and unwieldly. SQL is just as composable as any other language.
The thing that's lacking right now is the tooling for managing/testing/deploying database code. There are solutions out there and the supabase folks have been working to make things better but database first development still has some hurdles in terms of DX.
- jpgvm 2 years ago
  
  One issue with functions though is that they can change performance in unpredictable ways. For example a colleague of mine recently altered a function I had written that was used in multiple hot-path queries. The change he made accidentally caused the function to no longer be inline-able on PostgreSQL. Once the function couldn't be inlined then the PostgreSQL planner wasn't able to select the appropriate indices and the performance of several of the queries exploded by about 100x.
  So while it's true it can be composed etc the current state of the art planners struggle except under very simple/constrained scenarios.
  
  8 replies →
- vendiddy 2 years ago
  
  Do you have any links to a basic example on using CTEs and functions to keep SQL maintainable?
  I've used CTEs, but I had not tried breaking up an SQL query into functions. Didn't know that was possible!
  For whatever reason, I feel like I end up with a giant blob of SQL when writing SQL and it's incredibly frustrating.
  
  2 replies →
- Izkata 2 years ago
  
  > You can write queries with CTES to improve readability, and extract CTEs into functions that can be selected as queries get too large and unwieldly.
  Personally I'd go for breaking them into views. IIRC as of around postgres 11-13 they're no longer a barrier for the query planner.
  
  1 reply →
ako 2 years ago

Composition is available in sql, but works a bit different than in a procedural language. In sql you express sets of data, composition consists of defining subsets that you compose into more complex sets. Views and CTEs are the tools for composition in SQL.

hosteur 2 years ago

SQL is incompatible with many types of autocompletion. For example columns in a select statement are not known here you write FROM. This alone justifies PRQL in my opinion.

CuriouslyC 2 years ago

There are tools in most languages to deal with this. For example, take https://jawj.github.io/zapatos/. It introspects your database schema to generate types, and gives you autocompletion inside tagged template strings.
da_chicken 2 years ago
It's not a real problem in a practical sense. Yes, you have to write the FROM clause before autocomplete happens even though the SELECT must appear before it. Fortunately, however, text editiors used as IDEs allow for out-of-sequence code editing. You can just enter the FROM and move back to complete the SELECT.
It's like complaining that you have to know the variable name you're going to assign something to before you start writing the expression that will set the value.
int y = x * 2;
The idea is to evaluate the expression and store it, but the expression doesn't actually read that way left-to-right. Wouldn't it make more sense for it to be:
x * 2 assign to new int y;
Technically, that's written more in execution order. In practice it just isn't that big of a deal. It only trips up beginners.
nwatson 2 years ago

Jetbrains tools (Datagrip, IntelliJ IDEA, PyCharm) deal well with this if you leave a placeholder column before FROM, and write the FROM part with proper joins. You can go back to the columns and autocomplete goes just fine. You can amend our extended any part and the autocomplete logic adapts well.

haspok 2 years ago

One notable reason is being database-agnostic. Like ORM's, if you can generate SQL you can generate database-specific SQL as well.

SQL is also quite verbose in places (JOINs are the most trivial example), and lack a decent amount of abstraction (CTEs are relatively low level).

Updating a large set of FK'd tables can be a nightmare (this is what ORMs shine at).

Finally, some modern additions are quite unreadable, Postgres' JSON syntax, for example.

I'm not saying that PRQL solves any of the above, but these are all legit problems with "plain" SQL.

CuriouslyC 2 years ago
Database agnosticism is so 2010. There's very little reason to choose a DB other than postgres, and if you have a reason to choose a specific niche db you're not probably not going to be migrating away from it any time soon.
CTEs are a first step in structuring queries to make them decomposable. You can extract CTEs to functions and mark them stable and it's logically equivalent to the original query.
- haspok 2 years ago
  
  > There's very little reason to choose a DB other than postgres
  Sure, if you are a startup, or write your own code. But for most people the choice of database(s) is a given, and they are not in a position to challenge that. At the end of the day, Oracle has to make a living, too...
  
  1 reply →
- gaganyaan 2 years ago
  
  I like PostgreSQL quite a bit and pushed for using it by default at my company, but SQL pops up in a surprising number of places. You can use it in PowerBI, Snowflake, and pandas just to name a few. You don't always control which DB you're talking to, and it would be nice if the SQL interface that gets presented in those scenarios got an upgrade. PRQL looks like a really promising option for that.
- riku_iki 2 years ago
  
  > There's very little reason to choose a DB other than postgres
  Postgres is single server OLTP DB with complicated failover story, it is strong enough reason to consider some other contenders e.g. CocroachDB, SpannerDB for distributed OLTP or OLAP specialized ClickHouse, BigQuery, DuckDB.
  
  4 replies →
ako 2 years ago

How are joins verbose? It’s pretty straight to the point: combine these 2 tables on these columns… what do you want to remove to make it less verbose?

murkt 2 years ago

PRQL is not an ORM in any kind of way, it’s the less quirky SQL, basically.

adrian_b 2 years ago

In my opinion SQL is the sub-optimal language.

Whenever I am writing SQL I am not thinking in SQL, but I am thinking in what I consider to be the mathematical sound way, which I translate into SQL while writing. I consider thinking in SQL a much greater mental handicap than having to translate mentally into it.

I would prefer to write directly in what I would consider as a good query language and have it translated automatically into SQL, for compatibility with what is, for unfortunate historical reasons, the standard.

I have not attempted previously to do or use something like this, but work like that discussed here seems like a step in the right direction.

ttfkam 2 years ago

Can you expand on this? I have always seen SQL as a DSL for set theory. How are you seeing things "mathematically" that do not comport with set manipulation in SQL?

davedx 2 years ago

SQL is pretty great, but for some things (I thought the example in the readme was decent) it can be quite cumbersome. Stored procedures are very imperative (and hard to debug, depending on the platform IME). This seems like a more functional approach to stored procedures.

I don't think it's suggested that this replaces SQL. Use the right tool (and abstraction) for the job?

CuriouslyC 2 years ago

I think the debuggability is the #1 issue with database-as-a-platform. Using notices to debug functions is such a poor workflow that even though I'm bullish on putting stuff in PG I avoid writing complex code might need to be debugged as an integrated unit.
lkuty 2 years ago

Maybe. I already think in terms of transformations (relational algebra and its closure property) when I write SQL and use a lot of CTEs. But I guess the functional way might help people see what's going on.

baq 2 years ago

> Why limit ourselves with a sub-optimal language be it PRQL?

Actually why limit yourself with SQL...?

PRQL is a language compiled into SQL and makes certain hard-to-do things in SQL easy purely because it allows to streamline operations which SQL needs CTE joins or whatever hoop jumping to solve.

My favorite example which sounds easy but isn't - select the row which is MAX(...).

ttfkam 2 years ago
> Actually why limit yourself with SQL...?
Because it's everywhere, has extensive documentation and tutorials, all database tools support it, all relational engines support it, some non-relational engines support it, all programming languages have library support for it, it can be accessed through command line tools as well as graphical interfaces, etc.
You think an industry is going to give up 50 years of infrastructure because some (typically junior) devs think the syntax is "kinda icky"?
> My favorite example which sound easy but isn't - select the row which is MAX(...).
If you look earlier in the comments you will see queries that have "DISTINCT ON" in them. It solves the problem that sounds easy, but actually is pretty easy if you know SQL.
- jeroen79 2 years ago
  
  I agree, sql is well documented and an industry standard, no need to make it more complex by adding preprocessors that do nothing but change the syntax.
  Just bite in and learn proper sql.
  
  1 reply →

andix 2 years ago

Because SQL can be cumbersome to write. It's often repetitive, requires nesting, aliases, and a specific order of statements.

ttfkam 2 years ago
1. Notice language is complicated for some tasks
2. Propose newer, simpler language to take care of these
3. Newer, simpler language lacks features of original language
4. Newer language adds features, making it more complicated
5. GOTO 1
- andix 2 years ago
  
  That's why we stopped innovation of programming languages at C89? Why use sql at all if you can also do it in C?
  
  1 reply →