Want to Write a Compiler? Read These Two Papers (2008)

10 years ago (prog21.dadgum.com)

72 comments

rspivak

'Compiler Construction' by Niklaus Wirth[1] is a pretty good book too. It's got the hands-on feel of Crenshaw's book with a quick but, not too superficial, introduction to theory. The book is little more than 100 pages long.

The compiler is written in Oberon however, which is also the source language (actually the source language is Oberon-0, a subset) but, Oberon is super simple and can be learned on the go.

[1] https://www.inf.ethz.ch/personal/wirth/CompilerConstruction/...

gnuvince 10 years ago

I wish someone would take that content and give it proper typesetting; the content is quite good and accessible, but the presentation makes reading quite unpleasant.
pjmlp 10 years ago

Then one can jump into "Project Oberon" and see how to build a graphical workstation OS in a GC enabled systems programming language.
Also accessible from that site.
fanf2 10 years ago

I quite like "understanding and writing compilers" by Richard Bornat - written in the 1970s using BCPL as the implementation language, so rather old-fashioned, but it gives a friendly gentle overview of how to do it, without excessive quantities of parsing theory.
eterps 10 years ago

Oberon was also a major inspiration for the Go language.
fanf2 10 years ago

I quite like "understanding and writing compilers" by Richard Bornat - written in the 1970s using BCPL as the implementation language, so rather old-fashioned, but it gives a friendly gentle overview of how to do it, without excessive quantities of parsing theory.

d0m 10 years ago

I agree. I remember my compiler class where we went through the whole Dragon book and we had a lot of theoretical knowledge, but we were disappointed because we practically couldn't build a compiler. We went in too much detail too fast without having a good understanding and practical examples with compilers.

santaclaus 10 years ago
> the whole Dragon book
I learned compilers from Al Aho, and believe me, the class was no better in person. Mostly in-person waxing poetic about his time at Bell Labs...
- dgacmu 10 years ago
  
  Dinner with Al Aho was one of the highlights of when I visited Columbia a while ago. He's simply an awesome person. And he's done some fantastic work - Aho-Corasick is a great algorithm. Alas, the Dragon Book was one of the low points of my undergraduate education, though I had a great teacher who pulled out a terrifically fun class despite the book. (Thanks, +Wilson Hsieh.)
  In fairness, though -- or more an admission of how wrong we can be about things like this -- I subscribed to the "parsing is boring and solved" belief until Bryan Ford's Packrat Parsing work made me realize that it had just gotten stuck at a traffic light for a few decades.
  
  13 replies →
ww520 10 years ago

I went through a undergrad compiler class with the Dragon book, not the whole thing though, just enough chapters to build a compiler. We did build a full fledged compiler for an OO language, with all the parts of a compiler all the way to code generating assembly. It's kind of fast pace with too many stuff crammed into a class. Some theoretical topics were kind of fuzzy at the time. Still it was one of the best classes I have ever attended. Later on when I re-read the book on my own pace, I thought the book was the best thing since slice bread. Got a much better understanding of the theories behind the compiler topics. A lot of the things I learned I can still apply to this day.
ChuckMcM 10 years ago

Interesting, I took compilers where the Dragon book was the standard text book, we implemented a PASCAL like language (which compiled for the DEC10 as a target) and it was one of the really transformative classes I had taken. It brought together algorithms and data structures into a curriculum where they joined up to convert text into executable code. Really one of those moments of sudden clarity for me.
YZF 10 years ago

In my compiler class we also used the Dragon book and we did write a compiler for a C like language... I don't think we covered the book in its entirety though but sounds like that was a better mix. I felt quite capable of writing simple compilers after I finished the course...
vidarh 10 years ago

I loved the Dragon book, but I agree not a practical book. I probably loved it because at the time I'd already been experimenting with compiler writing for 5-6 years, and it gave me a theoretical grounding for a lot of stuff I'd been doing or doing subtly "wrong".
I think it - and similar books - should be banished from first compiler classes, and instead be used later. Much more practical books like Wirth's that actually walk you through writing a compiler would be much better for most beginners.

Animats 10 years ago

I was amused by "The authors promote using dozens or hundreds of compiler passes, each being as simple as possible."

That's kind of a desperation move. There was a COBOL compiler for the IBM 1401 with about 79 passes.[1] They only had 4K or so of memory, but they had fast tape drives, so each pass read the previous intermediate form, did some processing on it, and wrote out the next intermediate form. Except for the passes which did sorts.

[1] http://bitsavers.informatik.uni-stuttgart.de/pdf/ibm/140x/C2...

hawkice 10 years ago

Not a bad way of organizing code, though, at least for relative beginners like me. I'm sure there are more clever things to do, or things that aren't quite possible to do reasonably or efficiently with independent small passes, but I like the way it lets me chunk everything I want to do. So maybe it started because there wasn't any other option -- but pretty good advice in the context of the article.
sklogic 10 years ago
This is the only sane approach. Small passes are easy to write, easy to understand and easy to reason about.
And if yoy worry about performance, your compiler can fuse passes together in many cases.
- vidarh 10 years ago
  
  Have you tried this approach? If so I'm curious to hear about experiences.
  My own experience with far fewer stages is that while it becomes easy to understand what each stage does and how, it becomes hard to keep track of how each stage interact, as each intermediate output in effect becomes a language dialect of sorts.
  I'm still not decided on whether it's a net win or loss.
  
  10 replies →
srean 10 years ago

Now, if they were written as generators or coroutines you may have a winner (RAM allowing).

bonobo3000 10 years ago

If you're totally new to this, Peter Norvigs guide to a scheme interpreter in Python[1][2] is the best! Its short and sweet and gave me enough of an idea of the basic parts to start piecing stuff together.

[1] http://norvig.com/lispy.html [2] http://norvig.com/lispy2.html

lmm 10 years ago

Post-2008 I'd really push http://www.hokstad.com/compiler . Writing a compiler in the same way you'd write an ordinary program, this was the first explanation where I actually understood the rationale for the choices being made.

vidarh 10 years ago
Thanks for the recommendation, though I'd give the caveat that I've been rambling all over the place, especially in the later parts. I also have lots of changes after the latest part that I need to write about...
I'd really like to get a chance to take the time to retrospectively go over and tighten it up. The trouble with that is that it's easily 10 times+ as much effort to follow along with a project this way and write about each change as it is to cover the finished code (especially thorny issues such as how to avoid wasting too much time on bug fixes that may or may not have affected understanding of previous parts).
Personally, while I'm very happy that people find my series useful, I'd recommend perhaps supplementing or starting with Niklaus Wirth's books or Crenshaws "Let's write a compiler" (one of the ones recommended in the linked artile) for something much more focused. I feel the angles are sufficiently different that it's worth it. And Wirth is my big hero when it comes to language and compiler construction.
While the other suggestion in the article is well worth a read, I'd give one caveat: Lots of passes is hard to keep mental track of. My compiler sort-of does something similar in applying multiple transformation stages to the AST, and I'm mulling collapsing it into fewer stages rather than more because I feel it's getting more complex than necessary. You can work around some of the debugging complexity by outputting each stage to a suitable format so you can test each transformation individually, but it's still hard to keep track of exactly what each stage expects and delivers.
- lmm 10 years ago
  
  > You can work around some of the debugging complexity by outputting each stage to a suitable format so you can test each transformation individually, but it's still hard to keep track of exactly what each stage expects and delivers.
  Honestly this sounds like a good case for a type system - which of course isn't available in ruby.
  
  2 replies →
sklogic 10 years ago
But you should not write your compiler the same way you write your ordinary, complex, convoluted, Turing-complete programs.
Compilers are far simpler than that. You do not need a Turing-complete language to write a compiler.
- lmm 10 years ago
  
  Maybe. Non-turing-complete type systems seem to be very limited.

toolslive 10 years ago

"Implementing functional languages: a tutorial" Simon Peyton Jones http://research.microsoft.com/en-us/um/people/simonpj/Papers...

Is very good and shows different strategies for the runtime.

emmanueloga_ 10 years ago

On a related subject, two good resources to learn how to write interpreters are [1] and [2]. The nice thing about approaching programming language implementation with Scheme is that the whole parsing subject is unneeded (the interpreters work by manipulating S-expressions directly), or can at least be delayed until later.

1: http://cs.brown.edu/courses/cs173/2012/ 2: http://www.eopl3.com/

userbinator 10 years ago

There's an x86 version of Crenshaw's excellent tutorial here: http://www.pp4s.co.uk/main/tu-trans-comp-jc-intro.html

I'd definitely recommend it highly; the only way it could be better is if it arrived at a compiler that could compile itself.

Another article which I think everyone attempting to write a compiler should read is https://www.engr.mun.ca/~theo/Misc/exp_parsing.htm - how to simply and efficiently parse expressions.

cbsmith 10 years ago

I haven't read the Nanopass Framework paper, but...

The idea of doing tons of compiler passes works well in a functional context, where functional separation i the preferred method of abstraction, and you can use composition to effectively transform all your passes into a single pass.

However, actually doing that many passes seems like it is going to end badly. Even if you don't touch disk, you're doing mean things to the memory subsystem...

exDM69 10 years ago

Doing many passes seems to be working very well in LLVM, which is definitely not targeted at or implemented with functional languages and is not a toy compiler framework for educational purposes but a full-blown industrial strength compiler framework.
Yes, you need to take care of garbage collection (in one way or another) when you're manipulating tree and graph structures.
I can't share your opinion about "ending badly". This has been shown to be a good strategy in practice.
cbsmith 10 years ago

I always notice how badly the phone mucked it up after the edit rule passes... That should be:
"The idea of doing tons of compiler passes works well in a functional programming context, where functional separation is the preferred method of abstraction, and you can use composition to effectively transform all your passes into a single pass."
sklogic 10 years ago
It is much more memory-efficient than thrashing a single graph over and over again. Rewriting and compacting trees is good for cache locality.
Also, for the trivial rewrites (e.g., variable renaming) your compiler can safely choose to replace a cooying rewrite with a destructive update.
- cbsmith 10 years ago
  
  > It is much more memory-efficient than thrashing a single graph over and over again. Rewriting and compacting trees is good for cache locality.
  To be clear: if you are thrashing a single graph over-and-over again, you are doing many passes. Rewriting and compacting trees is just a way to make the multiple passes a bit less painful.
  
  9 replies →
pilingual 10 years ago

Its purpose is for education. It's a great course, and I wish Dybvig would create a Coursera.
mamcx 10 years ago

Any example on how do nano-pass with F# or oCalm?

PaulHoule 10 years ago

I like the nanopass concept, particularly I am applying it to domain specific languages. It is not necessarily so inefficient if it is optimized. For instance, data indexing and methods such as RETE networks can make the cost of running an irrelevant pass close to nothing.

fuzzieozzie 10 years ago

You want to be paid to write a compiler? Want to learn some techniques not yet covered in the literature?

Take a look at http://www.compilerworks.com/dev.html

kybernetyk 10 years ago

>Now imagine that it's more than just a poor choice, but that all the books on programming are at written at that level.

Hmm, I personally found the "Dragon Book" very accessible. And I'm someone without formal CS education.