Comment by oxxoxoxooo

7 hours ago

On x86, there is no vector instruction to get the upper half of integer product (64-bits x 64-bits). ARM SVE2 and RISC-V RVV have one, x86 unfortunately does not (and probably wont for a long time as AVX10 does not add it, either).

There is one for the f64 FMA recycling IFMA from AVX512 they have for bignum libraries;it's a 52 bit unsigned multiply and accumulates either the low or the high output halves into a 64bit accumulator.

It's surely no 64 bit but it's much more than 32 bit. And it's giving you access to the high halves so you can use it to compute 32x32->64 on vector even if only half as packed as that could be.