Comment by BeefWellington
1 year ago
Every time this FROM-first syntax style crops up it's always the most basic simple query (one table, no projections / subselects / consideration to SP/Views).
Just for once I want to see complete examples of the syntax on an actual advanced query of any kind right away. Sure, toss out one simple case, but then show me how it looks when I have to join 4-5 reference tables to a fact table and then filter based on those things.
Once you do that, it becomes clear why SELECT first won out originally: legibility and troubleshooting.
As long as DBs continue to support standard SQL they can add whatever additional syntax support they want but based on history this'll wind up being a whole new generation of emacs vs vi style holy war.
Sounds a bit like "new thing scary" unless you show why having select in front actually avoids problems, and I don't think there's a clear problem they avoid, but it does make it really hard to autocomplete (can you even do it properly?) while something along the lines of just swap select for from is well defined.
> Sounds a bit like "new thing scary" unless you show why having select in front actually avoids problems
This isn't really fair. BeefWellington gave a reason why SQL is how it is (and how it has been for ~50 years). It's reasonable to ask for a compelling reason to change the clause order. Simon's post says it "has always been confusing", but doesn't really explain why except by linking to a blog post that says that the SQL engine (sort of but not really) executes the clauses in a different order.
I think the onus of proof that SQL clauses are in the wrong order is on the people who claim they're in the wrong order.
But it has been explained many times from many angles.
* SELECT first makes autocomplete hard
* SELECT first is the only out of order clause in the SQL statement when you look at it from execution perspective
* you cannot use aliases defined in SELECT in following clauses
* in some places SELECT is pointless but it is still required (to keep things consistent?)
Probably many more.
1 reply →
This is a case where stating your opinion and credentials will make you sound really old and conservative so it will be easy to take cheap shots like "you are just afraid of change".
At my previous gig I worked for a decade with an application that meant creating and maintaining large hairy sql that was created to offload application logic to the database (_very_ original) And we used to talk about this "wrong order" often but I never once actually missed it. It was at the most a bit annoying when you jumped in a server to troubleshoot and you knew the two columns you were interested in and you could have saved two seconds. But when working with maintaining those massive queries it always felt good to have the projection up top because that is the end result and what the query is all about. I would not have liked if the method signature in eg Java was just the parameters and the return type was after the final brace. This analogy falls apart of course since params are all over the place but swapping things around wouldn't help.
So just go 'SELECT *...' and go back and expand later, I want my sql syntax "simple". /old developer
It really isn't. I've been working in this field for ages and did a lot of those years as a DBA and data modeler. I've worked with other syntaxes too, mostly MDX but some others specific to Hadoop/Spark. I'm not afraid of new things. I just want them to improve on what we have. I want them to be honest about situations where their solution isn't great.
SQL has lots of warts, e.g.: the fact that you can write SQL that joins tables without including those tables in a JOIN, which leads to confusion. It's fragmented too -- the other example I posted shows two different syntaxes for TOP N / LIMIT N because different vendors went different ways. The fact that some RDBMSes provide locking hint mechanics and some don't (at least not reliably). The fact that there's no standard set of "library" functions defined anywhere, so porting between databases requires a lot of validation work. It makes portability hard, and some of those features are missing from standards.
You'll note I also mentioned that if they want to add it that's fine but it's gonna wind up being a point of contention in a lot of places. That's because I've seen the same thing happen with the "Big Data" vs "what we have works" crowd.
Having select up front avoids problems in a couple key ways:
1. App devs who are working on their application can immediately see what fields they should expect in their resultset. For CRUD, it's probably usually just whatever fields they selected or `*` because everyone's in the habit of asking for every field they'll never use.
2. Troubleshooting problems is far easier because they almost always stem from a field in the projection. Seeing the projected field list (and thus, table aliases that field comes from) are literally the first pieces of information you need (what field is it and where does that field come from) to start troubleshooting. This is why SELECT ... FROM makes the most sense -- it's literally the two most crucial pieces of information right up front.
3. Query planners already optimize and essentially compile the entire thing anyways, so legibility trumps other options IME.
Another point I'd make to you and everyone else bringing up autocomplete: If you need it, nothing is stopping you from writing your FROM clause first and then moving a line up to write your SELECT. Kinda like how you might stub out a function definition and later add arguments. This doesn't affect the final form for legibility.
> becomes clear why SELECT first won out originally: legibility and troubleshooting
nothing "becomes clear" just by you claiming so, better elaborate
For examples of larger queries, see here for all TPC-H queries in standard syntax and converted to pipe syntax: https://github.com/google/zetasql/blob/master/zetasql/exampl...
And several more examples with pipe syntax here: https://github.com/google/zetasql/blob/master/zetasql/exampl...
> Once you do that, it becomes clear why SELECT first won out originally: legibility and troubleshooting.
Select first was as much an accident of "it sounded better as an English sentence" to the early SQL designers. Plus also they were working with early era parsers with very limited look ahead and putting the primary "verb" up front was important at the time.
But English is very flexible, especially in "command syntax" and From first is surprisingly common: "From the middle cupboard, grab a plate". SQL trying to sound like English here only shows how inflexible it still is in comparison to actual English.
I've been using C#'s LINQ since it was added to the language in 2007 and the from/where/join/group by/select order feels great, is very legible especially because it gives you great autocomplete support, and troubleshooting is easier than people think.
https://prql-lang.org/ has a bunch of good examples on its home page.
If you engage the syntax with your System 2 thinking (prefrontal cortex, slow, the part of thinking we're naturally lazy to engage) rather than System 1 (automated, instinctual, optimized brain path to things we're used to) you'll most likely find that it is simpler, makes more logical sense so that you're filtering down things naturally like a sieve and composes far better than SQL as complexity grows.
After you've internalized that, imagine the kind of developer tooling we can build on top of that logical structure.
> If you engage the syntax with your System 2 thinking (prefrontal cortex, slow, the part of thinking we're naturally lazy to engage) rather than System 1 (automated, instinctual, optimized brain path to things we're used to)
You might not have intended it this way, but your choice of phrasing is very condescending.
Re-reading it I can see how it could be perceived by some people as such, thanks for pointing it out. There's probably better phrasing or adding more context could make it more amicable:
The goal was to explicitly tell people not to bother "just reading it" as one (and by one I mean myself and most people I know, surely there are exceptions) is naturally inclined to do unless something is particularly piquing our interest.
Without engaging in active, conscious effort, syntax that is different than what we're used to (specially something as established as SQL) where the changes aren't groundbreaking at first glance can easily make us dismissive without realizing the benefits. And after seeing it too many times with all kinds of technologies that stray away from the familiar, I just want to prepare the reader so that their judgment can be formed with full use of their faculties rather than a reflex response.
Edit: In my pre-coffee rush this morning I completely missed the grouping by role (which is not that much harder FWIW). This unfortunately invalidates my entire post as it was posted and I don't want to spread misinfo.
I don't think your alternatives actually solve the same problem. Your alternatives would give you the single most recently joined employee. The actual problem being solved is to find the most recently joined employee in each role.
You'd need to do some grouping in there to be able to get one employee per role instead of a single employee out of the whole data set.
1 reply →
As a test, I refactored a 500 line-ish analytical query that joins more than 20 tables with tens of complex CTE and I can say that this FROM-first syntax is superior than the legacy syntax on almost every single aspect.
> SELECT first won out originally: legibility and troubleshooting.
It quite interesting to dive into history of SQL alternatives in 70x/80x.
> Once you do that, it becomes clear why SELECT first won out originally: legibility and troubleshooting.
Also, tools can trivially tell DQL from DML by the first word they encounter, barring data-modifying functions (o great heavens, no!).
FROM order is, like, the least offensive and least wrong thing about SQL.
Bikeshedding par excellence.