Comment by ignoreusernames

2 days ago

I recommend anyone who works with databases to write a simple engine. It's a lot simpler than you may think and it's a great exercise. If using python, sqlglot (https://github.com/tobymao/sqlglot) let's you skip all the parsing and it even does some simple optimizations. From the parsed query tree it's pretty straightforward to build a logical plan and execute that. You can even use python's builtin ast module to convert sql expressions into python ones (so no need for a custom interpreter!)

Sorry for slight digression.

In a larger system we are building we need a text-to-sql capability for some structured data retrieval.

Is there a way one could utilize this library (sqlglot) to build a multi-dialect sql generator -- that is not currently solved by directly relying on a LLM that is better at code generation in general?

  • You can use an LLM to generate query-builder expressions from popular libraries in whatever language.

    For example, on the JVM there is jOOQ, which allows you to write something like:

      select(field("foo"), avg("bar")).from(table("todos"))
    

    And then it will render dialect-specific SQL. It has very advanced emulation functionality for things like JSON aggregations and working around quirks of dialects.

    Alternatively, you can ask an LLM to generate a specific dialect of SQL, and then use jOOQ to parse it to an AST, and then render it as a different dialect, like:

        val parser= DSL.using(SQLDialect.POSTGRES).parser()
        val parsedQuery = parser.parseQuery(postgresQuery)
        val renderedMySQL = DSL.using(SQLDialect.MYSQL).renderInlined(parsedQuery)
        println(renderedMySQL)
    

    Unsure if functionality like this exists in other Query Builder libraries for other languages.

Second this - building even a simple engine gives real insight into query planning and execution. Once parsing is handled, the core ideas are a lot more approachable than they seem.