Comment by projektfu

6 days ago

The dragon book almost convinced me never to try to write a compiler. I don't know why people recommend it. I guess you're a lot smarter than I am.

There are some excellent books out there. In its own way, the dragon book is excellent, but it is a terrible starting place.

Here are a bunch of references from the same vintage as OP. I recommend starting with a book that actually walks through the process of building a compiler and doesn't spend its time exclusively with theory.

https://news.ycombinator.com/item?id=136875

You're not the only one. In college I took a compilers course and we used the dragon book, to me it sucked the joy out of the magical concept of making a compiler.

Some years later I (re-) discovered Forth, and I thought "why not?" and built my own forth in 32-bit Intel assembly, _that_ brought back the wonder and "magical" feeling of compilers again. All in less than 4KB.

I guess I wasn't the right audience for the dragon book.

Great thread. If you have 1 hour to get started, I recommend opening Engineering a Compiler and studying Static Single-Assignment (SSA) from ch 9.3.

The book is famous for its SSA treatment. Chapters 1-8 are not required to understand SSA. This allows you to walk away with a clear win. Refer to 9.2 if you're struggling with dominance + liveness.

http://www.r-5.org/files/books/computers/compilers/writing/K...

  • I bought this book when I was working on a toy language and I think I was too stupid to understand most of it. The first few chapters were great, but it quickly surpassed my capacity to understand. Seeing it mentioned makes me want to revisit.

It was a product of its time I guess, much better ones from similar vintage,

The Tiger book (with C, Standard ML, and Java variants)

https://www.cs.princeton.edu/~appel/modern/

Compiler Design in C (freely available nowadays, beware this is between K&R C and C89)

https://holub.com/compiler/

lcc, A Retargetable Compiler for ANSI C

https://drh.github.io/lcc/

Or if one wants to go with more clever stuff,

Compiling with Continuations

Lisp in Small Pieces

  • Another vote for Lisp in Small Pieces. Great high level compiler book that teaches you how to build a Lisp and doesn’t get bogged down in lexing and parsing.

  • Instead of Lisp in Small Pieces I'd recommend SICP instead. No continuation passing, but much better written.

    • And no information on how to actually do a compiler, end to end, only a self hosted interpreter.

      The authors don't have the same audience in mind.

      I would recommend both, one is about actual Lisp compilers, the other alternative computation models.

Imho the problem is the fixation on parser generators and BNF. It's just a lot easier to write a recursive descent parser than to figure out the correct BNF for anything other than a toy language with horrible syntax.

  • Imo BNF (or some other formal notation) is quite useful for defining your syntax, my biggest gripe with BNF in particular is the way it handles operator precedence (through nested recursive expressions), which can get messy quite fast.

    Pratt parsers dont even use this recursion, they only have a concept of 'binding strength', which means in laymans terms that if I'm parsing the left side of say a '' expression, and I managed to parse something a binary subexpression, and the next token I'm looking at is another binary op, do I continue parsing that subexpression, which will be the RHS of the '' expression, or do I finish my original expression which will then be the LHS of the new one?

    It represents this through the concept of stickiness, with onesimple rule - the subexpression always sticks to the operator that's more sticky.

    This is both quite easy to imagine, and easy to encode, as stickiness is just a number.

    I think a simpler most straightforward notation that incorporates precedence would be better.

  • I would argue the opposite: Being describable in BNF is exactly the hallmark of sensible syntax in a language, and of a language easily amenable to recursive descent parsing. Wirth routinely published (E)BNF for the languages he designed.

  • The problem with recursive descent parsers is that they don't restrict you into using simple grammars.

    But then, pushing regular languages theory into the curriculum, just to rush over it so you can use them for parsing is way worse.

    • > But then, pushing regular languages theory into the curriculum, just to rush over it so you can use them for parsing is way worse.

      At least in the typical curriculum of German universities, the students already know the whole theory of regular languages from their Theoretical Computer Science lectures quite well, thus in a compiler lecture, the lecturer can indeed rush over this topic because it is just a repetition.

      1 reply →

When I was professionally writing a compiler professionally (see https://ciex-software.com/intro-to-compilers.html) the Dragon book was the second book that I read. I found it very helpful. That was the first Dragon book. I got the second one later. I would have been ok to start with the Dragon book--the Compiler Generator book was a harder study.

I started with the dragon book, and I found it to be a good introductory text.

A lot of people say the dragon book is difficult, so I suppose there must be something there. But I don't see what it is, I thought it was quite accessible.

I'm curious, what parts/aspects of the dragon book make it difficult to start with?

  • It's been a few years since I worked with the dragon book, but I think the most common complaint was that it starts with like 350 pages on parser theory: generating bottom-up and top-down parsers from context free grammars, optimizing lexers for systems that don't have enough RAM to store an entire source file, etc... before ever getting to what most people who want to write a compiler care about (implementing type inference, optimizing intermediate representations, generating assembly code). Of course parsing is important, and very interesting to some. But there's a reason most modern resources skip over all of that and just make the reader write a recursive descent parser.

    • I guess "back in the day" you had to be able to write an efficient parser, as no parser generators existed. If you couldn't implement whatever you wanted due to memory shortage at the parser level, then obviously it's gonna be a huge topic. Even now I believe it is good to know about this - if only to avoid pitfalls in your own grammar.

      I repeatedly skip parts that are not important to me when reading books like this. I grabbed a book about embedded design and skipped about half of it, which was bus protocols, as I knew I wouldn't need it. There is no need to read the dragon book from front to back.

        > But there's a reason most modern resources skip over all of that and just make the reader write a recursive descent parser.
      

      Unless the reason is explicitly stated there is no way to verify it's any good. There's a reason people use AI to write do their homework - it just doesn't mean it's a good one. I can think of plenty arguments for why you wouldn't look into the pros and cons of different parsing strategies in an introduction to compilers, "everyone is(or isn't) doing it" does not belong to them. In the end, it has to be written down somewhere, and if no other book is doing it for whatever reason, then the dragon book it shall be. You can always recommend skipping that part if someone asks about what book to use.

    • The thing about parsing (and algorithms in general) is that it can be hair raisingly complex for arbitrary grammars, but in practice, people have recently discovered, that making simple, unambiguous grammars, and avoiding problems, like context dependent parsing, make the parsing problem trival.

      Accepting such constraints is quite practical, and lead to little to no loss of power.

      In fact, most modern languages are designed with little to no necessary backtracking and simple parsing, Go and Rust being noteworthy examples.

      2 replies →

    • I actually think the parsing part is more important for laymen. Like, there may be a total of 10K programmers who are interested in learning compiler theories, but maybe 100 of them are ever going to write the backend -- the rest of them are stuck with either toy languages, or use parsing to help with their job. Parsing is definitely more useful for most of us who are not smart enough :D

      1 reply →

the dragon book is how to write a production grade thing i guess. it has all the interesting concepts very elaborated on which is great but it dives quickly into things that can clutter a project if its just for fun..

  • It’s academic and comprehensive, that’s the issue. It’s not about writing a production grade compiler, though, in my humble opinion. There are more things to learn for that, unfortunately… is just a pretty big topic with lots of stuff to learn.

    • the dragon book is all i have on the topic. it was a big investment for me.

      it taught me to think very differently but i am sure i am still not ready to write a compiler :D

     The dragon book almost convinced me never to try to write a compiler.

That was the point. That's why it's not a cute beaver on the cover :)