Comment by suzzer99
6 hours ago
I’ve had to do a ton of SQL stuff lately, which I haven’t really worked with since the late 90s. ChatGPT has been a godsend, not just for me, but for our only coworker who knows SQL well, whom I’d probably be bugging several times a day at my wits’ end.
But no one cares about those kinds of productivity gains. Just the ones that will completely replace us.
I find SQL and data(bases) in general to be LLM’s Achilles’ heel. Databases are rarely under version control, so the training data only has one half of the knowledge.
My comments are more in the context of OLAP queries and other non-normalised data often queried via SQL.
I train non-LLM transformer models on (older and rarer) datasets, and automating the ingestion of sprawling datasets with hundreds of columns, often in a variety of local languages and different naming conventions adopted over decades, with quite a few duplicated columns…. The LLMs perform badly, it’s nigh impossible to test (for me as a user in prod) and it’s nearly impossible for the LLM companies to test (in training) to RLVR and RLHF this.
I'm the old school type who writes out a document that explains what I plan on doing in markdown even if it's generic like "a window with x and y buttons" and the logic flow and then use that to have ai write a plan with me before I send it off to execute it. This has worked super well.
I do enjoy giving the frontier models wacky projects that I can't even find examples of how to do online but I don't expect any results or need them and some have done really well with it while others fall on their face (models)
I'm always amazed by those comments. Why couldn't you buy a book on SQL[0], and spend a week on it? Or just go over to YouTube for a refresher?
[0]: Like https://www.oreilly.com/library/view/sql-queries-for/9780134...
I'm amazed you think that instead of using an LLM that someone will go buy a book and spend a week learning something that, judging by the fact that they last used it 30 years ago, likely won't be relevant for them soon.
It's not only that I rarely use it, it's also that it's ugly. It's Relational Cobol. It's as loveable as Oracle. The vendor specific dialects don't even agree on how to do recursive queries do they?
Unfortunately I am very good at forgetting things I resented having to learn, and SQL is definitively one of them.
When you have a general idea of what smells bad vs what's okay...why?
I'd rather get it from the LLM and review
This is fine for a moderately sized query. When your queries start taking in 8 joins and 20 fields per table because you're running queries on Presto with 5 TB of data, not only is it drastically better at writing (because it doesn't mess up the fields), you can ask it to try the query 5 different ways to help you land on the most optimal.
That's exactly where I would expect it to fail somewhere, changing some part of the query every time it writes one.
This is a great example of AI tech-debt and fragility.
An eight-join query is going to be nigh on unmaintainable should the requirements change, leading to a change-break-change-break spiral as your preferred coding agent tries to fix its previous fixes.
Maybe the wise way to use AI would be to sort out the schema.
1 reply →