← Back to context

Comment by DannyBee

2 years ago

Some of this reasoning is just wild.

1. "We can attract direct contributions from Intel, ARM, RISC-V chip manufacturers, etc., who have a vested interest in making our machine code better on their CPUs."

They are nowhere near popular enough (or used enough by some particularly important customer) for any major architecture vendor to spend any time contributing except as someone's random side project. To think otherwise is ... really out there.

2. "We can implement our own optimization passes that push the state of the art of computing forward."

There are in fact, no magic bullets. Once you pass the baseline of optimization capability, the reason these compilers do well is because they've been worked on forever, and made better 0.2% at a time.

Also, anything you implement they can implement. Maybe it takes annotations, or whatever, but that's about speed and not capability.

3. "Compilation speed is increased by orders of magnitude."

Uh, not if you are doing #2. Most optimization passes, especially when you are first productionizing research, are quite bad. It takes a tremendous amount of applied engineering to make them fast.

This is what i did on both GCC and LLVM. Implement and speed up ad nauseum. I implemented plenty of high-optimization-value, never been productionized before algorithms. It usually took a few versions and lots of slow-compiler bugs to figure out the best way to implement. It turns out most researchers are not spending their time working on the compilation speed. At best, they care that it's passable.

For existing well-productionized algorithms (which don't push the state of the art), you will not get orders of magnitude speedup. You may get some percent depending on how you structure your compiler.

There are certainly slow parts of LLVM, but it's hubris to believe you are going to make something both better optimizing, and seriously faster, for this kind of language. There are other languages for which it is true. Zig is super unlikely to to be one of them.

The way you gain compilation speed for this kind of language is to optimize less. Spend as little time processing things into machine code as possible, using as fast of algorithms as possible, and where you can't, relying on heuristics and such more to help generate good enough code most of the time.

There is more, but man, this feels out there.

If they said "we want to get 90% of the performance at 60% of the cost", sure, maybe. But saying, basically, we will get >100% of the performance at "orders of magnitude" (their claim) less cost is just, as i said, a wild idea. I wish them the best of luck.

Everyone who tries to reinvent good infrastructure is doomed to discover why that infrastructure was invented in the first place .

Long time compiler hacker/engineer and compiler/programming language PhD here all great points. Worth saying out loud that many reasons why this stuff is slow is not due to bad code, its due to the fact that many of the best algorithms are in higher complexity classes and just scale poorly with program/translation unit size. For example when I was working on `rustc` a big challenge was heavy reliance on inlining and inlining has a cost, plus it increases program sizes making everything else take longer as well.

I feel like Go already went through a whole saga of this where the community started with "LLVM and SSA are bad and slow", then a few years go by and they end up building their own SSA IR and spending a bunch of time trying to bring compilation time closer to what it was before as it made everything much slower.

  • > I feel like Go already went through a whole saga of this where the community started with "LLVM and SSA are bad and slow"

    I've been a contributor since the Go compiler was a tree-based C program and I've never heard anyone say that. What they said (and it's in the Go FAQ page) is: "At the beginning of the project we considered using LLVM for gc but decided it was too large and slow to meet our performance goals." [1]

    If you're building a language with the explicit goal to make it compile fast, it's objectively true that starting out with LLVM is not the best approach. You'll get incredible runtime performance of the generated code since the early days, but NOT fast compilation. The Go makers choose a different tradeoff.

    > and they end up building their own SSA IR

    They switched to a SSA IR because it was a good idea to begin with, after an initial phase with the tree-base prototype. I've also never heard anyone argue that "SSA is bad", despite what you claim. The first compiler was tree-based because they reused a simple tree-based C compiler from plan9.

    > building their own SSA IR and spending a bunch of time trying to bring compilation time closer to what it was before as it made everything much slower

    The new compiler was ported to Go (machine-rewritten from C) and that's the main reason it was ~2x slower than the old compiler. It's not due to the switch to a SSA-IR.

    [1] https://go.dev/doc/faq#Implementation

  • "Worth saying out loud that many reasons why this stuff is slow is not due to bad code, its due to the fact that many of the best algorithms are in higher complexity classes and just scale poorly with program/translation unit size"

    Yes, this is totally true.

    It's also possible to affect this in theory but hard in practice. Usually you shoot for making it O(N^2) (or whatever) where N is the number of variables you want to try to optimize instead of N being number of blocks.

    The complete GVN and GVN-PRE in LLVM is theoretically N^3 or N^4, but as engineered it's often much faster (while getting more optimization) or at least not more than a few percent slower than the the existing O(N^2) GVN. It achieves this by being sparser in most cases. The old GVN has to iterate the algorithm, and iterates non-sparsely. Everything is reprocessed because it doesn't know what can change. The new one iterates only the things that could change using fine grained dependency tracking (something akin to how sparse constant prop works). This is often the best you can do.

    I will say it's possible to affect these time bounds in theory because if you go down the single static rabbit hole, you realize it's more generally applicable. Kenny (and others) proved this in his thesis about sparse dataflow. That was the whole point. SSA is just one example of a form that enables things to be linear time (and in fact, you can do it without SSA at all using interval methods and such). There are papers that show this about other SS* forms and show examples, but they were mostly (sadly) ignored.

    Concretely, for most optimizations like PRE/etc, there are linear time single static transforms of the IR that will make the optimization linear time (Or at least remove a factor of N for you) by explicitly exposing the dataflow in a way that matches what the optimization algorithm wants.

    This is awesome in theory - like really cool when you think about it (if you are a compiler nerd), but also completely impractical (at least, AFAIK). Rewriting the IR form for each optimization to the exact requirements of the optimization (single static use, single static exposed uses, single static upwards exposed uses, single static exposed defs, etc) is much more expensive in practice than well engineered, higher complexity, "N in the number of variables" optimizations.

    Equality saturation stands more of a chance, IMHO.

I agree that it's probably impossible to write something that is orders of magnitude faster than LLVM and better than LLVM at optimizing, but I don't see that claim in the original text.

It seems like a lot of work but also doable to create something that is orders of magnitude faster than LLVM for unoptimized debug builds. Furthermore, the Zig compiler might be an easier thing to work with than LLVM/Clang for doing productionizing of optimization research.

  • Sure, the latter is possible.

    I read the claims and comments differently than you, fwiw. It definitely reads like people who think they can make it both faster and better.

  • Just to play devils advocate..who would've thought Javascript would ever end up as blazingly fast as it has become since V8. Maybe Zigworld can do it.

    • As the parent comment said, and I mentioned in my reply. JavaScript was just incredibly poorly optimized/not compiled and they applied 20-30 years worth of compiler research to make it significantly faster. You also had an alliance of every hyperscaler working on the tooling for a decade plus with help from all major hardware vendors to bring the best performance out of it. One driver of LLVM was Apple and WebKit which at one point was using LLVM for its JIT compiler so many improvements figured out in that period have also already been applied to LLVM.

      LLVM already has decades of research applied to it to make it produce fast code, it will be incredibly challenging to even match its performance across all the targets it supports let alone improve on it in significant ways. It would be better to spend the time building an optimization pipeline for Zig itself and being more thoughtful about what code you send to LLVM versus trying to replace it wholesale.

      4 replies →

    • Anyone that has ever played with dynamic languages like Smalltalk and SELF, or read their research papers.

My impression is that LLVM spends the bulk of its time chasing pointers, so one could fix the issue by "only" changing the layout of the data to one that is friendlier to computers.

  • downvoters are of course free to post perf traces demonstrating that LLVM retires many instructions per cycle :)