Comment by nicula

4 months ago

This issue doesn't require large switch tables in order to show up. Even if you have 4 cases and the rest of them are default'ed, Clang 18 optimizes that to a switch, while Clang 19 does the (potentially) inefficient labels+jumps approach: https://godbolt.org/z/Y6njP8j38

This whole investigation started because I was writing some Rust code with a couple of small `match`es, and for some reason they weren't being optimized to a lookup table. I wrote a more minimal reproduction of that issue in C++ and eventually found the Clang regression. Since Rust also uses LLVM, `match`es suffer from the same regression (depending on which Rust version you're using).

2 comments

nicula

qalmakka 4 months ago

So it's not a Clang regression per se, it's an issue with the LLVM core? Clang is just a frontend, and Rust AFAIK does not use it at all. If you run LLVM 18's `opt` on bytecode generated by Clang 19 and then compile it, does it also generate the same bad assembly?

nicula 4 months ago

> So it's not a Clang regression per se, it's an issue with the LLVM core?
Yes.
> If you run LLVM 18's `opt` on bytecode generated by Clang 19 and then compile it, does it also generate the same bad assembly?
No. If you pass the LLVM IR bitcode generated by Clang 18 to Clang 19, then the assembly is good.
I called it a 'Clang regression' in the sense that the way in which I discovered and tested this difference in performance was via Clang. So from a typical user's perspective (who doesn't care about the inner workings and distinct components of Clang), this is a 'Clang regression'.