Comment by DannyBee
2 years ago
"Worth saying out loud that many reasons why this stuff is slow is not due to bad code, its due to the fact that many of the best algorithms are in higher complexity classes and just scale poorly with program/translation unit size"
Yes, this is totally true.
It's also possible to affect this in theory but hard in practice. Usually you shoot for making it O(N^2) (or whatever) where N is the number of variables you want to try to optimize instead of N being number of blocks.
The complete GVN and GVN-PRE in LLVM is theoretically N^3 or N^4, but as engineered it's often much faster (while getting more optimization) or at least not more than a few percent slower than the the existing O(N^2) GVN. It achieves this by being sparser in most cases. The old GVN has to iterate the algorithm, and iterates non-sparsely. Everything is reprocessed because it doesn't know what can change. The new one iterates only the things that could change using fine grained dependency tracking (something akin to how sparse constant prop works). This is often the best you can do.
I will say it's possible to affect these time bounds in theory because if you go down the single static rabbit hole, you realize it's more generally applicable. Kenny (and others) proved this in his thesis about sparse dataflow. That was the whole point. SSA is just one example of a form that enables things to be linear time (and in fact, you can do it without SSA at all using interval methods and such). There are papers that show this about other SS* forms and show examples, but they were mostly (sadly) ignored.
Concretely, for most optimizations like PRE/etc, there are linear time single static transforms of the IR that will make the optimization linear time (Or at least remove a factor of N for you) by explicitly exposing the dataflow in a way that matches what the optimization algorithm wants.
This is awesome in theory - like really cool when you think about it (if you are a compiler nerd), but also completely impractical (at least, AFAIK). Rewriting the IR form for each optimization to the exact requirements of the optimization (single static use, single static exposed uses, single static upwards exposed uses, single static exposed defs, etc) is much more expensive in practice than well engineered, higher complexity, "N in the number of variables" optimizations.
Equality saturation stands more of a chance, IMHO.
No comments yet
Contribute on Hacker News ↗