Comment by jcranmer
1 day ago
Well, course numbers are regular enough that you can look up what the "intro compilers" course is: https://www.cs.cornell.edu/courses/cs4120/2026sp/?schedule
The short answer is that compilers is basically broken up into two courses, with the first course largely being the minimum necessary to build a compiler (lexing, parsing, codegen, register allocation), and the second course being how to build an optimizing compiler.
The academic literature on register allocation is scary.
First is presented a linear time optimal algorithm for graph coloring then it is claimed better can be done by a O(N^2) algorithm that uses a heuristic.
I do believe the dragon book got caught with the emperor's new register allocator and the literature hasn't really recovered yet.
I didn't believe the optimal algorithm is linear time, so I checked the source:
"Most of the performance" is not optimal. Were you referring to a different source?
The big caveat in what I was saying is that it is only the pure graph coloring portion that has an optimal linear time algorithm.
Take code in static single assignment form, or another form where it is values that are tracked with live ranges. For a single basic block the interference graph of the live ranges is an interval graph. As the live ranges are just intervals from the instruction that produces the value to the last instruction that consumes the value.
It is annoying but totally doable to follow this into graph theory and see that interval graphs, are a subset of chordal graphs, and these graphs have an algorithm that colors the graph optimally in linear time.
Poke a little more and it turns out that algorithm is the same algorithm as linear-scan register allocation.
Add multiple basic blocks and it is trickier to see, but there are proofs in the academic literature that the interference graph remains an interval graph. Something about phi functions not counting.
Which is what I meant when I said a linear time algorithm.
Compare that to the O(N^2) heuristic from the paper register allocation by graph coloring.
The big change is going from a linear time optimal algorithm to a quadratic time heuristic and proclaiming the quadratic time heuristic is better. (What???)
In practice there are a lot of complications that are not accounted for by graph coloring. Instructions and function calls that take fixed registers. Spilling registers when you need more values alive then there are registers. Deliberately keeping a non-minimal number of values in registers to increase instruction level parallelism.
To the best of my knowledge all of those problems are very tractable if you start with a linear scan register allocator, as the code can be changed without requiring the register allocation to restart.
Code based on computing the interference graph has to reconpute the interference graph and restart whenever the code has to be changed. Making it much worse time wise. If for no other reason than computing the interference graph is O(N^2).
So yes more complications but only because the model described is overly simplified.
2 replies →
Looks like there is quite a bit of overlap with regards to the optimizing parts between these two courses. I guess it's switching from the dragon book to academic papers that makes it advanced.
I am actively opposed to this design for a first compiler. There is no need for a lexer with a recursive descent parser. Register allocation is also an unnecessary distraction. It is better for a first compiler to compile to a higher level language in which neither register assignment nor memory management are necessary.
Optimizing compilers are suboptimal in that they waste enormous amount of time optimizing code that can't or needn't be optimized and even where the optimizations are helpful, they are opaque and at risk of unexpectedly regressing both due to small changes at the source code level or changes in the compiler optimizer, both of which are quite insidious.
If instead of optimizing compilers, we had languages that allowed for seamless interop between low level and high level functions, then perhaps an llm becomes the optimizer (or you can invoke the compiler to optimize a specific function by source level rewrite). The benefit of this compared to a traditional optimizing compiler is that the optimization is done once per function and never repeated (until prompted) and the implementation is human readable and not buried in a binary. Moreover, and perhaps even more importantly, by not doing optimizations in the compiler, compilation times can be much faster, easily 100-1000x than state of the art optimizing compiler, while generating equivalent or even better runtime performance. As it has been said: premature optimization is the root of all evil.
I disagree with several parts here. But hopefully, this leads to a fun discussion!
> no need for a lexer with a recursive descent parser
I'd argue that teaching how to write a lexer + recursive descent parser is more relevant in the context of production compilers: many major production compilers out there use hand-written recursive descent parsers (cpp, javac, rust, go,javascript...). Recursive descent parsers are also really nice for emitting error messages.
> It is better for a first compiler to compile to a higher level language in which neither register assignment nor memory management are necessary.
Compiling to a high-level target can be a reasonable first project(e.g., you can emit LLVM), but imo its a different objective from learning the full stack. Emitting actual ISA instructions(even sub-optimally, after all it's a university course) forces you to learn calling conventions, isel, register pressure, stack layouts etc. Building a compiler,at least for me, is probably one of the easiest ways to understand how all of it works together.
> optimize a specific function by source level rewrite
I don't think replacing optimizations with a per-function source-level rewrite works as a general model. Many optimizations are not local to a single function (for example, inlining function calls can lead to new constant-propagation opportunities). If your argument rests on the fact that not all functions are hot, a lot of general-purpose JIT compilers out there already use runtime info to decide when to optimize hot functions, so part of what you're proposing already exists.
> implementation is human readable and not buried in a binary
Is this really a requirement for your program? In most cases, I think the optimization story is more like: "code you want to write" != "code you want to run"
> Moreover, and perhaps even more importantly, by not doing optimizations in the compiler, compilation times can be much faster, easily 100-1000x than state of the art optimizing compiler, while generating equivalent or even better runtime performance
I think the actual answer here is "it depends". For long-running programs, one tradeoff is build time vs future execution time. Also many optimizations cannot be expressed in source code itself. For example, in C++, you can do stuff like whole program de-virtualization only at link time, which is why LTO exists.
Aside: I personally work on source-to-source automatic differentiation inside compilers, and I can give examples for missed optimizations in generated derivative code if you don't run existing optimization passes like LICM/CSE before differentiating a function.
> even more importantly, by not doing optimizations in the compiler, compilation times can be much faster, easily 100-1000x than state of the art optimizing compiler, while generating equivalent or even better runtime performance
As Kent Dybvig (of Chez Scheme fame) said, "As I tell my compiler students now, there is a fine line between "optimization" and "not being stupid". There is a lot of small, simple tweaks that don't take all that much time during the compilation yet drastically improve the produced code.