Comment by vintagedave

2 months ago

I’m curious if WoW is using any newer x86 instruction sets like AVX. I’ve been testing some math benchmarks on ARM emulating x64, and saw very little performance improvement with the AVX2+FMA builds, compared to the SSE4.x level. (X64 v2 to v3.) It was unexpected.

It’s the first Windows build with Prism and the first time they’ve introduced AVX(2) support, so I wonder simply if the performance isn’t there yet. I’ve found very little info online about this.

AVX(2)'s main advantage is 256-bit width, since many of its operations are simply concatenated 128-bit ops (weird for ops like VPALIGNR), and cross-lane operations are expensive. NEON, on the other hand, only supports 128-bit ops, so AVX operations need to be split by the emulator.

I'd expect more of a gain from enabling FMA, but that's assuming the program actually got built to use FMA -- it needs to either use it explicitly or have relaxations to allow the contraction. Oryon has 4 x 128-bit NEON pipes with 3c latency fadd and 4c latency fmul/fma, so it easily ends up latency bottlenecked unless there are plenty of independent calculations.