Comment by victorNicollet

4 years ago

There are plenty of good ideas in here, I really like it.

Here's a few things I can say from my experience (I have been working on a "better SQL" for almost 10 years now):

- Your "aggregate by" taking a set of calculations to be performed on the groups is an excellent idea: the fact that "group by X" isn't a separate statement from the "select" that describes the calculations, means you can keep the idea that each line is an operation applied to the result of the previous line.

- Be very careful with auto-generated column names, they will bite you when you implement "Go To Definition" or "Rename Symbol" tools.

- The use of "filter" instead of "where" is a missed opportunity to be easily understood by people who know SQL. I suspect that you wanted to avoid the WHERE/HAVING confusion, but I'm not sure it is worth it. I do appreciate "sort" instead of "order by" (it's good to have only one keyword).

- To support a space-based call convention `f a b` you will pay a very heavy price in the language grammar. Also, allowing optional arguments in this convention will prevent you from implementing first-class, higher-order or partial functions. This may be a price you have decided to pay, if not, look into the OCaml rules for optional arguments, they are very well thought-out (the general idea being, you must have at least one non-optional positional argument _after_ the optional ones).

- Having [X, Y] be your list syntax, and X be a shorthand notation for [X], works out pretty well in practice, so long as you have specific positions in your grammar where lists are expected, so there is no ambiguity between X-as-a-column and X-as-a-shorthand-for-[X]

- General syntax opinions: "from" is fine, your CTE syntax (non-point-free) is fine. Boolean operators should be usable as infix keywords (or prefix if unary), just like arithmetic ones, because that is the first thing users will attempt.

- Raw syntax: for the sake of your syntax highlighter, I recommend having asymmetrical delimiter pairs like [|SQL|] instead of symmetrical pairs like `SQL`.

In addition to the above, one of the issues I have with SQL in an analytics context is that it tends to "lose" the actual tables: since it operates by joining/filtering/aggregating tables into new tables, your "products" table from the database schema is quickly replaced by a CTE named "products5" which (in theory) contains the same lines as the original, plus some additional columns, but (in practice) some lines have been lost or duplicated in the middle. I called this the "vanishing schema" paradox[1]. Your design, with the "let" statements to add columns to a table, solves one portion of the problem, but I wonder if you would consider this worth solving more fully.

[1]: https://nicollet.net/blog/vanishing-schema-paradox.html