← Back to context

Comment by AndyKelley

2 days ago

OK, parser then:

https://github.com/rust-lang/rust/tree/1.87.0/compiler/rustc... (main logic seems to be in expr.rs)

vs

https://github.com/ziglang/zig/blob/0.14.1/lib/std/zig/Parse...

Again, for those who wish to form their own opinions.

I think a reasonable comparison would have to be DoD Rust parser vs current Rust parser. Comparing across languages isn't very useful, because Zig has very different syntax rules, and doesn't provide diagnostics near the same level as Rust does. The Rust compiler (and also its parser) spends an incredible amount of effort on diagnostics, to the point of actually trying to parse syntax from other languages (e.g. Python), just to warn people not to use Python syntax in Rust. Not to mention that it needs to deal with decl and proc macros, intertwine that with name resolution, etc. etc. This all of course hurts parsing performance quite a lot, and IMO would make it both much harder to write the whole thing in DoD, and also the DoD performance benefits would be not so big, because of all the heterogeneous functionality the Rust frontend does. Those are of course deliberate decisions of Rust that favor other things than compilation performance.

  • [edited to correct formatting]

    Your points here don't really make sense. There are many ways you can apply DoD to a codebase, but by far the main one (both easiest and most important) is to optimize the in-memory layout of long-lived objects. I won't claim to be familiar with the Rust compiler pipeline, but for most compilers, that means you'd have a nice compact representation for a `Token` and `AstNode` (or whatever you call those concepts), but the code between them -- i.e. the parser -- isn't really affected. In other words, all the fancy features you describe -- macros intertwined with name resolution, parsing syntax from other languages, high-quality diagnostics -- don't care about DoD! Our approach in the Zig compiler has evolved over time, but we're slowly converging towards a style where all of the access to the memory-efficient dense representation is abstracted behind functions. So, you write your actual processing (e.g. your parser with all the features you mention) just the same; the only real difference is that when your parser wants to, for instance, get a token (as input) or emit an AST node (as output), it calls functions to do that, and those functions pull out the bytes you need into a lovely `struct` or (in Rust terms) `enum` or whatever the case may be.

    Our typical style in Zig, or at least what we tend to do when writing DoD structures nowadays, is to have the function[s] for "reading" that long-lived data (e.g. getting a single token out from a memory-efficient packed representation of "all the tokens") in the implementation of the DoD type, and the functions for "writing" it in the one place that generates that thing. For instance, the parser has functions to deal with writing a "completed" AST node to the efficient representation it's building, and the AST type itself has functions (used by the next phase of the compiler pipeline, in our case a phase called AstGen) to extract data about a single AST node from that efficient representation. That way, barely any code has to actually be aware of the optimized representation being used behind the scenes. As mentioned above, what you end up with is that the actual processing phases look more-or-less identical to how they would without DoD.

    FWIW, I don't think the parser is our best code here: it's one of the oldest "DoD-ified" things in the Zig codebase so has some outdated patterns and questionable naming. Personally, I'm partial to `ZonGen`[0] as a fairly good example of a "processing" phase (although I'm admittedly biased!). It inputs an AST and outputs a simple tree IR for a subset of Zig which is analagous to JSON. Then, for an example of code consuming that generated IR, take a look at `print_zoir`[1], which just dumps the tree to stdout (or whatever) for debugging purposes. The interesting logic is in `PrintZon.renderNode` in that file: note how it calls `node.get`, and then just has a nice convenient tagged union (`enum` in Rust terms) value to work with.

    [0]: https://github.com/ziglang/zig/blob/dd75e7bcb1fe142f4d60dc2d...

    [1]: https://github.com/ziglang/zig/blob/dd75e7bcb1fe142f4d60dc2d...