Comment by santaclaus
10 years ago
> the whole Dragon book
I learned compilers from Al Aho, and believe me, the class was no better in person. Mostly in-person waxing poetic about his time at Bell Labs...
10 years ago
> the whole Dragon book
I learned compilers from Al Aho, and believe me, the class was no better in person. Mostly in-person waxing poetic about his time at Bell Labs...
Dinner with Al Aho was one of the highlights of when I visited Columbia a while ago. He's simply an awesome person. And he's done some fantastic work - Aho-Corasick is a great algorithm. Alas, the Dragon Book was one of the low points of my undergraduate education, though I had a great teacher who pulled out a terrifically fun class despite the book. (Thanks, +Wilson Hsieh.)
In fairness, though -- or more an admission of how wrong we can be about things like this -- I subscribed to the "parsing is boring and solved" belief until Bryan Ford's Packrat Parsing work made me realize that it had just gotten stuck at a traffic light for a few decades.
Veering a bit offtopic, but does anyone have any pointers to important recent work on parsing? There are alot of papers out there and I guess I don't know how to sift through them. I've heard the "parsing is solved" line before, but so much of my time is spent doing some type of parsing that even incremental improvements are extremely interesting.
It really depends on the kind of parsing that you're doing.
Full context-free grammar are supported by "generalised parsing". The older stuff is GLR by Tomita, Early parsing, the CYK algorithm. The newer stuff based on Tomita is the particular rabbit hole I stuck with for a while. I read about SGLR which eliminates lexers, GLL which is GLR but based on the LL algorithm. The people who do GLL research also did improvements on SGLR with improved speed on right-nulled grammars. Then there is the SGLR improvements with disambiguation filters, automatic derivation of error recovery with island grammars, etc. The disambiguation filters include a kind of negation, making the current implementation of SGLR for SDF capable to parsing Boolean Grammars, which are a superset of context-free grammars.
Anyway, there's more than context-free grammars. Definitely look into data-dependant grammars! It's able to define a lot of interesting things, like network protocol formats where you parse the length of the payload, then based on that length you know how much to interpret as the payload before you read the footer of the message. And you write all of that in a more declarative way and get a nice parser generated from it.
There is so much more, but I think I should to stop now :)
I can probably help if you can narrow things down a bit. Are you parsing deterministic languages (e.g. for programming languages), general context-free languages, binary file formats, or something else?
Because parsing is a field of little active research interest where most of the work happened more than 20 years ago, there are a lot of techniques from the 70s, 80s, and 90s that are relatively unknown.
2 replies →
> Veering a bit offtopic, but does anyone have any pointers to important recent work on parsing?
Parser combinators are a relatively modern topic in parsing. First research into the topic was made in the late 1980s, but parser combinators became popular after Parsec [0], a practical implementation of parser combinators in Haskell. These ideas have been borrowed to many different implementations in different languages since then.
[0] Daan Leijen: "Parsec, a fast combinator parser", 2001 http://research.microsoft.com/en-us/um/people/daan/download/...
Check out ANTLR which is probably the best modern parser generator. ANTLR 4 uses "adaptive LL(*)" (aka "all star") for its underlying theory. http://www.antlr.org/papers/allstar-techreport.pdf
PEG (Packrat) is hot now.
5 replies →