Comment by packetlost
1 year ago
> Vector stuff is typically hand coded with intrinsics or assembly. Autovectorization has mixed results because there’s no way to request the compiler to promise that it vectorized the code.
Right, but most of the time those are architecture specific and RVV 1.0 is substantially different than say, NEON or SSE2, so you need to change it anyways. You also typically use specialized registers for those, not the general purpose registers. I'm not saying there isn't work to be done (especially in for an application like this one, that is extremely performance sensitive), I'm saying that most applications won't have these problems are be so sensitive that register spills matter much if at all.
I’m highlighting that the compiler doesn’t automatically take care of vector code quite as automatically and as well as it does register allocation and instruction selection which are slightly more solved problems. And it’s easy to imagine that a compiler will fail to optimize a piece of code as well on something that’s architecturally quite novel. RISCV and ARM aren’t actually hugely dissimilar architectures at a high level that completely different optimization need to be written and even selectively weighted by architecture, but I imagine something like a Mill CPU might require quite a reimagining to get anything approaching optimal performance.