Comment by munificent
20 hours ago
> I'm not sure why people seem to be under the impression that writing a compiler means that the language the compiler is implemented in should have "low level" features.
Performance.
You definitely can write a compiler in a high-level language and given the choice I certainly prefer to on my hobby projects. Having a garbage collector makes so many compiler algorithms and data structures easier.
But I also accept that that choice means there's an upper limit to how fast my compiler will. If you're writing a compiler that will be used to (at least aspirationally) compile huge programs, then performance really matters. Users hate waiting on the compiler.
When you want to squeeze every ounce of speed you can get out of the hardware, a low-level language that gives you explicit control over things like memory layout matters a lot.
I think once you get the design of the IR right and implement it relatively efficiently, an optimizing compiler is going to be complicated enough that tweaking the heck out of low-level data structures won't help much. (For a baseline compiler, maybe...but).
E.g. when I ported C1 from C++ to Java for Maxine, straightforward choices of modeling the IR the same and basic optimizations allowed me to make it even faster than C1. C1X was a basic SSA+CFG design with a linear scan allocator. Nothing fancy.
The Virgil compiler is written in Virgil. It's a very similar SSA+CFG design. It compiles plenty fast without a lot of low-level tricks. Though, truth be told I went overboard optimizing[1] the x86 backend and it's significantly faster (maybe 2x) than the nicer, more pretty x86-64 backend. I introduced a bunch of fancy representation optimizations for Virgil since then, but they don't really close the gap.
[1] It's sad that even in the 2020s the best way to make something fast is to give up on abstractions and use integers and custom encodings into integers for everything. Trying to fix that though!
For context, this is the author of https://craftinginterpreters.com/ , and https://gameprogrammingpatterns.com/ , and https://journal.stuffwithstuff.com/2015/09/08/the-hardest-pr... .
I promise that he knows a thing or two about compilers and performance!
For what it's worth, I agree with him. A recent example is the porting of the TypeScript compiler to Go: it hasn't been fully released yet, but people are already going wild for its performance improvement over the original in-TS compiler.
Of course, it took them over a decade to reach the point where a port was necessary - so it's up to you to decide when that decision makes sense for your language.
Does “low level” translate to performance? Is Rust a “low level” language?
Take C#. You can write a compiler in it that is very fast. It gives you explicit control over memory layout of data structures and of course total control over what you wrote to disk. It is certainly not “low level”.
>Having a garbage collector makes so many compiler algorithms and data structures easier.
Does it really? Compilers tend to be programs that just appends a bunch of data to lists, hashmaps, queues and trees, processes it, then shuts down. So you can just make append-only data structures and not care too much about freeing stuff.
I never worry about memory management when I write compilers in C.
Any AOT compiled language offers enough performance for writing full compiler toolchain.
> But I also accept that that choice means there's an upper limit to how fast my compiler will.
Don't buy it.
A decent OCaml version of a C or Zig compiler would almost certainly not be 10x slower. And it would be significantly easier to parallelize without introducing bugs so it might even be quite a bit faster on big codebases.
Actually designing your programming language to be processed quickly (can definitively figure things out with local parsing, minimizing the number of files that need to be touched, etc.) is WAY more important than the low-level implementation for overall compilation speed.
And I suspect that the author would have gotten a lot further had he been using a GC language and not had to deal with all the low-level issues and debugging.
I like Zig, and I use it a lot. But it is NOT my general purpose language. I'm definitely going to reach for Python first unless I absolutely know that I'm going to be doing systems programming. Python (or anything garbage collected with solid libraries) simply is way more productive on short time scales for small codebases.