Comment by ignoreusernames
1 day ago
I recommend anyone who works with databases to write a simple engine. It's a lot simpler than you may think and it's a great exercise. If using python, sqlglot (https://github.com/tobymao/sqlglot) let's you skip all the parsing and it even does some simple optimizations. From the parsed query tree it's pretty straightforward to build a logical plan and execute that. You can even use python's builtin ast module to convert sql expressions into python ones (so no need for a custom interpreter!)
Sorry for slight digression.
In a larger system we are building we need a text-to-sql capability for some structured data retrieval.
Is there a way one could utilize this library (sqlglot) to build a multi-dialect sql generator -- that is not currently solved by directly relying on a LLM that is better at code generation in general?
You can use an LLM to generate query-builder expressions from popular libraries in whatever language.
For example, on the JVM there is jOOQ, which allows you to write something like:
And then it will render dialect-specific SQL. It has very advanced emulation functionality for things like JSON aggregations and working around quirks of dialects.
Alternatively, you can ask an LLM to generate a specific dialect of SQL, and then use jOOQ to parse it to an AST, and then render it as a different dialect, like:
Unsure if functionality like this exists in other Query Builder libraries for other languages.
This is a SQL to X library, though. I don’t think it’s what you need.
Second this - building even a simple engine gives real insight into query planning and execution. Once parsing is handled, the core ideas are a lot more approachable than they seem.