Comment by xi

4 years ago

Maybe you'd like to check FunSQL.jl, my library for compositional construction of SQL queries. It also follows algebraic approach and covers many analytical features of SQL including aggregates/window functions, recursive queries and correlated subqueries/lateral joins. One thing where it differs from dlpyr and similar packages is how it separates aggregation from grouping (by modeling GROUP BY with a universal aggregate function).

3 comments

maximilianroos 4 years ago

This is awesome! I'll add a link to it on PRQL.

I guess the biggest difference between FunSQL (and similarly dbplyr) and PRQL is that the former needs a Julia (or R) runtime to run.

I really respect the library and keen to see how it develops.

andreypopp 4 years ago
FunSQL.jl requires Julia to run (obviously as it is a Julia library) but it produces standard SQL so Julia in this case is just an implementation language.
I have re-implemented parts of FunSQL in Python and OCaml (the one I have ended up using) and have added a concrete syntax similar to what you have in PRQL.
from employees define salary + payroll_tax as gross_salary, gross_salary + benefits_cost as gross_cost where gross_cost > 0 and country = 'usa' group by title, country select title, country, avg(salary) as average_salary, sum(salary) as sum_salary, avg(gross_salary) as average_gross_salary, sum(gross_salary) as sum_gross_salary, avg(gross_cost) as average_gross_cost, sum(gross_cost) as sum_gross_cost, count() as count order by sum_gross_cost where count > 200 limit 20
But, in my mind, the biggest difference between PRQL and FunSQL is the way FunSQL treats relations with `GROUP BY` - as just another kind of namespaces, allowing to defer specifying aggregates. A basic example:
from users as u join (from comments group by user_id) as c on c.user_id = u.id select u.username, c.count() as comment_count, c.max(created_date) as comment_last_created_date
The `c` subrelation is grouped by `user_id` but it doesn't specify any aggregates - they are specified in the `select` below so you have all selection logic co-located in a single place.
I think this approach is very powerful as it allows you to build reusable query fragments in isolation but then combine them into a single query which fully specifies what's being selected.
clarkevans 4 years ago

Writing an alternative syntax is straight forward. Perhaps prototype PRQL using xi's excellent FunSQL backend? This way it's working out of the gate. Once syntax+semantics are pinned, writing another backend in the language of your choice would then be easier. Getting the backend correct is non-trivial work, and xi has done this already. Besides, we need a sandbox syntax anyway, so it might be fun to collaborate.