Comment by tom_mellior

5 years ago

> I'm not an expert by far, but I feel like many people who write compilers or develop languages no longer recommend The Dragon Book as it's very outdated.

Yes. There are newer editions that I can't judge, but the 1986 edition linked from the page should nowadays only be read by historians. If you want to implement a compiler, look elsewhere, for something newer.

This isn't to say that this was a bad book 35 years ago. It wasn't. But it is simply outdated. It does not contain things that are important today. It has very little treatment of type systems and type inference, for example. Its treatment of data flow analysis is based on intervals, which have limitations (though also some interesting theoretical advantages) and are used by zero compilers written in this millennium (and not even described very well in the Dragon book IIRC). And then of course there is the parsing problem, see my rant elsethread.

As for the article, I think A Catalogue of Optimizing Transformations (https://www.clear.rice.edu/comp512/Lectures/Papers/1971-alle...) would make an interesting addition.

Edit: Forgot to add that the old old Dragon book doesn't cover two topics that the article itself considers very relevant: SSA form (which I believe was discovered just around that time) and linear scan register allocation (which is much newer).

What would you recommend that covers data flow analysis better than the dragon book? I'm currently starting data flow analysis for my own compiler using intervals and/or modular congruences (operations can be reduced to congruences, then combined with the Chinese Remainder Theorem). What are the theoretical advantages of intervals and what do you recommend over that?

  • Ah, sorry, I should probably clarify that there are two completely distinct uses of the term "interval" in data flow analysis.

    One (introduced in https://amturing.acm.org/p137-allen.pdf and discussed in the old Dragon Book) groups program points into nested "intervals", i.e., intervals are pieces of code. Analysis information is then propagated among these intervals. This is what I was referring to above. As far as I know this approach is completely obsolete by now, with everyone using methods based more or less indirectly on Kildall's worklist iteration approach (http://static.aminer.org/pdf/PDF/000/546/451/a_unified_appro...). The theoretical advantage I was referring to above was that the interval approach is AFAIR sometimes more efficient because it can take program structure (loop nesting) into account more directly.

    The other meaning of "intervals" is used for an analysis domain, i.e., the kind of data you want your analysis to compute. For example, you might determine that the possible values of some variable x are in the interval [10, 20], the values of y are in the interval [5, 15], and then you can compute that the values of z = x + y must be in [15, 35]. There is nothing wrong with this approach, and any optimizing compiler is pretty much forced to use something like this. Based on your reference to modular congruences I think you probably want to do analysis of numerical values and meant this sense, which is anyway the only one in common use nowadays. Go ahead and do this. Extensions are with congruences, like "x is in [10, 20] and also congruent to 1 modulo 4" and/or "known bits", like "x is in [10, 20], and I know that (counting from the LSB) its bit 2 is 0 and bit 3 is 1".

    Personally I learned about data flow analysis in a university course based on Principles of Program Analysis: https://www.springer.com/gp/book/9783540654100. It's a book that goes into a lot of mathematical depth but not much breadth of analyses, and it doesn't handle data flow analysis on SSA form. Then I learned additional details from various well-known papers like the ones linked above, and the classical early papers around analysis on SSA form, like https://www.cse.wustl.edu/~cytron/531Pages/f11/Resources/Pap.... I'm sure there must be a more straightforward text to get you there, but I don't think I can give a definite recommendation. I have vague memories of the coverage in Muchnick's "Advanced Compiler Design and Implementation" and/or Cooper and Torczon's "Engineering a Compiler" being reasonable, but I can't find my copies right now.