← Back to context

Comment by LeFantome

2 months ago

Interestingly, RISC-V vector extensions are variable length.

So, you can compile your RISC-V software to require the equivalent of AVX and it will run on whatever size vectors the hardwre supports.

So, on x86-64, if I write AVX2 software and run it on AVX512 capable hardware, I am leaving performance on the table. But if I write software that uses AVX512, it will not run on hardware that does not support those extensions (flags).

On RISC-V, the same binary that uses 256 bit vectors on hardware that only supports that will use 512 bit vectors on hardware that supports it, or even 1024 bit vectors on hardware like the A100 cores of the SpacemiT K3.

So, I guess X86-64 is is the RyanAir of processors.

(Personal opinion) I get the impression that RISC-V-related discussions often lack of awareness of prior work/alternatives. A large amount of (x86) software actually uses our Highway library to run on whatever size vectors and instructions the CPU offers.

This works quite well in practice. As to leaving performance on the table, it seems RVV has some egregious performance differences/cliffs. For example, should we use vrgather (with what LMUL), or interesting workarounds such as widening+slide1, to implement a basic operation such as interleaving two vectors?

  • > For example, should we use vrgather (with what LMUL), or interesting workarounds such as widening+slide1, to implement a basic operation such as interleaving two vectors?

    Use Zvzip, in the mean time:

    zip: vwmaccu.vx(vwaddu.vv(a, b), -1, b), or segmented load/store when you are touching memory anyways

    unzip: vsnrl

    trn1/trn2: masked vslide1up/vslide1down with even/odd mask

    The only thing base RVV does bad in those is register to register zip, which takes twice as many instructions as other ISAs. Zvzip gives you dedicated instructions of the above.