Comment by galkk

4 years ago

I'm quite opposed to the idea "from should be first".

I want to understand what exactly the query returns, not the implementation detail of the source of this data (that can later be changed).

Literally first example from page - I have no idea what is being returned:

    from employees
     filter country = "USA"                           # Each line transforms the previous result.
     let gross_salary = salary + payroll_tax          # This _adds_ a column / variable.
     let gross_cost   = gross_salary + benefits_cost  # Variables can use other variables.
     filter gross_cost > 0
     aggregate by:[title, country] [                  # `by` are the columns to group by.
          average salary,                              # These are the calcs to run on the groups.
          sum     salary,
          average gross_salary,
          sum     gross_salary,
          average gross_cost,
          sum     gross_cost,
          count,
 ]
     sort sum_gross_cost                              # Uses the auto-generated column name.
     filter count > 200
     take 20

of course, similar things are happening to SQL too, with CTEs becoming more widespread and "real" list of the columns hidden somewhere inside, but it's still parseable

44 comments

galkk

phailhaus 4 years ago

    SELECT id, name, author

Quick, what is this query about? What's ironic is that I think you have it backwards: the columns are the implementation detail, not the table. The table is the context: you can't change that without having to change everything else. But columns are the last step, the selection after the filters, joins, etc. They can be changed at any time without affecting the logic.

taeric 4 years ago
This is... An odd choice. I'd assume I'm not without context looking at a query to know why I would want those columns.
And the auto complete story is backwards. Often I know what columns I want, but I'm not clear what table I need to get them from. Such that, if you make a smarter suggest in the from to only include tables that have the columns, I'd be much happier.
- sanderjd 4 years ago
  
  Just throwing in another point of anecdata onto this pile: "Often I know what columns I want, but I'm not clear what table I need to get them from" does not make sense to me. I don't relate at all to their being a global namespace of columns, rather than a namespace of tables, each with its own columns specific to its context.
  
  4 replies →
BeefWellington 4 years ago
It's query asking for the id, name, and author fields. Very straightforward, I have no idea how this is confusing.
> The table is the context: you can't change that without having to change everything else.
Except even in the provided single-table example this isn't true - you're getting subselected/CTEd results. No functional joins are demonstrated unfortunately.
For example:
from employees left_join positions [id=employee_id] ...is equivalent to... SELECT * FROM employees LEFT JOIN positions ON id = employee_id
No data is selected from positions in either example, and it's unclear on why we're joining that table (other than just for the heck of it). It's not a workable example.
- phailhaus 4 years ago
  
  You restated the query; I was asking what it's about. Is it a query across publications? Or is it a query over news articles? That context changes everything: how the query is written, what it can be joined with, how it can be filtered, how it is used, etc. Putting the FROM clause first means that you immediately have context to understand the rest of the query.
  
  2 replies →

inglor 4 years ago

The big advantage of "from first" like we have in Kusto KQL (a database we use at Microsoft) is that it provides much better autocomplete (if I write the `from` it can easily autocomplete the projection).

If you want an interesting example of how a query language built for developer experience and autocompletions looks definitely check it out!.

lemmsjid 4 years ago

That's interesting because it also explains why I was going to say I do like having from first. When trying to reason about a query, I mentally go through the following:
1. What tables are being pulled from? This speaks to the potential domain of the query. 2. What data is being selected (I can now know what is or isn't being pulled from the aforementioned tables...) 3. What operations, aggregations, groupings, etc. are being performed to work on the pulle data
Of course from vs select ordering is completely arguable, but my thinking process seems to follow that of the auto complete--in other words that my cognitive load of looking at the select statement is lessened when I know from what the columns are being selected.
It also follows (at least to me) the mental process of writing the query. First look at the tables, then decide what columns, then decide what functions to apply.
taeric 4 years ago

I said it in a sibling, but I feel this is somewhat missed. Auto complete that simply lists the tables is easier if from is first. But... Auto complete that helps me know what tables can give me my requested columns works the other direction.
majkinetor 4 years ago

Too bad we can't use Kusto with anything except Azure.
_jal 4 years ago
Designing languages around autocomplete is like designing toilets for better toilet paper dispensers.
The language should be right for human understanding, not automated mad-lib generation.
- inglor 4 years ago
  
  You would think that but having used both I find writing Kusto/KQL much smoother, neater and faster and if I have to choose between writing a query in either one I'd pick KQL.
  I understand this is just an opinion but it's an opinion held by everyone in my org who writes both.
  Theoretical correctness loses to pragmatism a lot and I'd read the KQL every day. Look at the examples at https://docs.microsoft.com/en-us/azure/data-explorer/kusto/q... - look at the examples at https://docs.microsoft.com/en-us/azure/data-explorer/write-q... and tell me they're not more readable than comparable SQL?
  (I can see the result type both by hovering on the query but also by just looking at the end of it - and in SQL most of the SELECTed items in complex queries are from subqueries anyway - at least in my use case)
  
  2 replies →
- phailhaus 4 years ago
  
  Building for autocomplete is building for human understanding. If it is impossible for a computer to determine the context of your query, why would a human do much better?
  
  4 replies →
- jiggawatts 4 years ago
  
  I’ve been frustrated by toilets where I have to contort my body to reach the dispenser. Similarly, I’ve had dispensers intrude on the space where my legs would normally be and make it awkward to even just sit on the toilet.
  Toilets are absolutely designed to make the dispenser placement convenient. You just don’t think about it because 95% of toilets get it right, so it just doesn’t bother you that much that it can be wrong.
  In SQL, some decisions are right about 10% of the time and are annoying and awkward the other 90%.
  That’s why the order matters. Because everything else got it right.
- oldsecondhand 4 years ago
  
  There's a person behind the IDE. If you help the IDE, you help the person using the IDE.
- emteycz 4 years ago
  
  I'd agree if there was any way whatsoever of fixing this issue, but there simply isn't. The editor can't even begin to guess what you might want until you write your FROM.
  
  6 replies →
- Xelbair 4 years ago
  
  And one of use cases is writing queries which it helps immensely. Best of both worlds would allow both orders. Just automatically transform the query to the usual form after it's execution.

kortex 4 years ago

I think it's quite a common convention in engineering - not just software - that the input to a process "goes in the top and out the bottom". We humans read top->bottom (regardless of left/right/vertical, I don't know any languages that write bottom up). Conventional voltage in circuit diagrams usually flow top to bottom. Gravity loads in schematics flow top to bottom. Chemical pathways are usually written top to bottom. And of course functions take arguments up top and return at the bottom, maybe with some short circuits. I think the only counter example of note is distillation columns.

Where is the data coming from? Employees table. What's coming out? 20 rows of sum_gross_cost.

What could improve this is function signatures. It's kind of nice to have the whole abstraction up top...like an abstract.

quocanh 4 years ago

I agree that the columns of the results should be more obvious. But I am a proponent of "from should be first". I have never written a SQL query without thinking about the contents of a table or its relations. If it was my way, I would describe where the data I'm pulling from, then describe any filters/joins, then describe the columns that I'm interested in (last).

BeefWellington 4 years ago
You've never authored a SQL query that does things like check special functions that don't exist in a table?
For example:
select @@version
- fragmede 4 years ago
  
  Seems easy enough to work around with a magic table name in this hypothetical future reworked dialect of SQL?
  from @@special select version
  
  1 reply →

barrkel 4 years ago

It's a fair sentiment, but it can be handled without losing directional flow and composability, some of the bigger advantages of reworking SQL.

One idea would be along the lines of a function prototype: a declaration, up front, about the columns and types that a query is expected to return. It's a good place to put documentation, it's redundant information which should protect against mistakes but not so redundant that it would be too taxing - the author should know what the query returns. The prototype would only be used for validation of column names and types.

Another idea would be requiring the last element in a query to be a projection, a bit like the return statement in a function body: here's what I'm returning out of the grand set of symbols available (e.g. via various joins) in scope from previous operations in the flow.

samatman 4 years ago

I'm also completely unfamiliar with the PRQL syntax, outside of right now.

Reading the comment however, it would seem that `let` adds columns which are implicitly returned in the order they are defined.

I do see benefits in this, and can imagine pitfalls. Hard to judge without kicking the tires.

Update: It's quite possible we saw different syntax!

https://news.ycombinator.com/item?id=30063266

Without the `let` I would imagine having trouble reading it as well, I'm not sure if that would go away with familiarity but my instinct is that it's a useful addition.

KerryJones 4 years ago

This feels like a English-language thing. In english we tend to put our adjectives first, it feels natural, "Where is my red, round ball?", rather than some other languages (like German) where you put the subject first. Equivalent of "Where is my ball, red & round?"

While it inherently feels unnatural I do agree with the others here that the context is actually easier to understand once over the initial uncomfort.

zigzag312 4 years ago

_from_ is kind of one of the most important context about the data being returned. It provides the type information. Columns you select are just properties of that type.

In SQL, where _from_ is placed at the end, we are essentially writing equivalent of 'property.object'; eg.: name.person, age.person

mcsoft 4 years ago

Both CTEs and this idea address the same problem: poor readability of complex SQL queries. Compared to CTEs, the author takes the idea to split the complex query into parts to the next level.

To your point - a solid IDE will show you what's being processed at each line (or returned, if the cursor is on the last line) - in an autocomplete window or a side panel.