← Back to context

Comment by estebank

5 years ago

As you allude to, treating a compiler as an isolated batch process is an outdated strategy. Any compiler started today should be designed such that it can be used in an IDE from the start, even if you don't implement it in that way until after you've reached a certain level of maturity.

For error messages in particular is a very expansive topic because a compiler might have the following components:

    Lexer
    Parser
    Macro Expansion
    Name Resolution
    Lowering
    Type Checking
    Privacy Checking
    Lifetime Analysis
    Code Generation
    Linking

and all of them can produce errors. If you introduce error recovery in your parser, then you better make sure you're carrying that information forward to name resolution and type checking. If you introduce a error type placeholder, better account for it during the rest of your analysis to avoid complaining about non-existing methods on something that wasn't resolved in the first place. Error deduplication. Exploring the grammar space to identify things that a human may attempt but isn't allowed for good reason but that will not blow up until some quasi-random stage. Now that I think about it you could write a whole book about these topic.

What is lowering?

  • Some languages require higher level internal representations ("IR"s) to be "lowered" to lower level representations before code generation - in other words like high level semantics or syntactic sugar and transform into simpler constructs to compile.

    For example, a language could support special iterator semantics that will be "lowered" into a simple for loop. Or a for loop itself may be lowered into a do-while loop, depending on the design of the lowered IR.

    • To give you an example, rustc has an AST, and a high-level intermediate representation tree called the HIR. The former can encode anything that is syntactically valid (and some things that aren't), while the HIR only encodes things that are valid and transforms things like `if foo {} else {}` to `match foo { true => {} false => {}}`. That makes this internal representation more fundamental, there are fewer constructs to care about, and allows unifying some validations that would otherwise have to be written for all the alternative user visible constructs. It also makes later stages of the compiler not have to care about how code was written, just about this higher level representation.