Comment by throwaway_pdp09

6 years ago

Text -> parser -> AST -> job done. If it's any different in an IDE vs anything else I'd like to know how.

Your IDE parser will be unusable if it goes bananas while you're typing the characters needed to get from one fully, correctly parseable state to the next.

It needs to be able to handle:

    printf("hello");

and also:

    prin

and also:

    printf("He

It also needs to be able to autocomplete function signatures that exist below the current line being edited, so the parser can't simply bail out as soon as it reaches the first incomplete or incorrect line.

  • Isn't that pretty standard for a parser? When was the last time a compiler bailed on the very first error it hit and refused to do anything else?

    The solution is to pick synchronisation points to start parsing again, i.e. ; at the end of a statement or } at the end of a block.

    • > When was the last time a compiler bailed on the very first error it hit and refused to do anything else?

      Make still does this. (That's the "Stop." in the famous "* missing separator. Stop.") Many errors in Python still do this.

      As late as 2010 I still saw some major C compilers do this.

      99% of the toy compilers written for DSLs do this, or worse.

      Good error recovery / line blaming is still an active field of development.

      2 replies →

    • How can you be sure that that } is the end of a certain defined block? This most importantly affects the scoping and in many cases it's ambiguous. IDEs do have rich metadata besides from the source code but then parsers should be aware of them.

      2 replies →

Partial parse state and recovery are critical. You don't want the entire bottom half of a file to lose semantic analysis while the programmer figures out what to put before a closing ).

  • Does that issue go away if you use packrat vs some other means?

    • Packrat parsers are notably faster than recursive descent parsers (also critical for IDE use) and by turning them "inside out" (replacing their memoization with dynamic programming) you get a pika parser which has very good recovery ability.

      There are varying techniques to improve error recovery for all forms of parsing but hacked-up recursive descent (certainly the most common kind of parser I still write for my hacked-up DSLs!) have poor error recovery unless you put in the work. Most LR parsers are also awful by default.

      When I was in university most focus was on LL and LR parsers with no discussion of error recovery and more focus on memory usage/bounds than speed. I also have no idea how common it is to teach combinator-based parser grammers these days; stuff like ANTLR and yacc dominated during my studies. This would add another level of unfamiliarity for students going to work on a "real compiler".

      4 replies →