Comment by snthpy
13 days ago
Thank you @xlinux and @mbid. Very interesting and not something I knew much about before.
I had a look at eglog and egglog and if I'm understanding things correctly then one possible use case is type inference and optimization. I'm particular I looked at the example in [1].
I'm thinking that this could be useful in the PRQL [2] compiler, in particular for: a) inference of type restrictions on input relations and resultant output relation types, b) optimization of resultant SQL queries.
Would you be able to comment on whether that's correct?
Any links to related examples, papers, or work would be appreciated. Thanks!
1: https://egglog-python.readthedocs.io/latest/tutorials/sklear...
I actually started working on Eqlog because I wanted to use it to implement a type checker. You might want to skim the posts in my series on implementing a Hindley-Milner type system using Eqlog, starting here [1]. The meat is in posts 3 - 5.
The type checker of Eqlog is mostly implement in Eqlog itself [2]. The general idea is that your parser populates a Database with syntax nodes, which are represented as `...Node` types in the Eqlog program at [2], and then you propagate type information with Datalog/Eqlog evaluation. Afterwards, you check whether the Database contains certain patterns that you want to rule out, e.g. a variable that doesn't have a type [3].
There are still some unsolved problems if you're interested in writing the whole type checker in Datalog. For example, variable lookup requires quadratic memory when implemented in Datalog. I mention this and a possible solution at [4]. However, Datalog as is can probably still be useful for some subtasks during type checking. For example, the Rust compiler uses Datalog in some parts of the type checker I believe. Reach out via e.g. github to mbid@ if you'd like to discuss in more detail.
Regarding optimization you probably want to talk with somebody working with egglog, they have a dedicated Zulip [5]. I'd imagine that for prql you want to encode the algebraic rules of pipeline transformations, e.g. associativity of filter over append. Given the query AST, eqlog or egglog would give you all equivalent ways to write the query according to your rules. You'd then select the representation you estimate to be the most performant based on a score you assign to (sub)expression.
[1] https://www.mbid.me/posts/type-checking-with-eqlog-parsing/
[2] https://github.com/eqlog/eqlog/blob/9efb4d3cb3d9b024d681401b...
[3] https://github.com/eqlog/eqlog/blob/9efb4d3cb3d9b024d681401b...
[4] https://www.mbid.me/posts/dependent-types-for-datalog/#morph...
[5] https://egraphs.zulipchat.com
Thank you. Will try to get into this on the weekend. I'll reach out once I can ask a more informed question.