Comment by janwas

2 months ago

(Personal opinion) I get the impression that RISC-V-related discussions often lack of awareness of prior work/alternatives. A large amount of (x86) software actually uses our Highway library to run on whatever size vectors and instructions the CPU offers.

This works quite well in practice. As to leaving performance on the table, it seems RVV has some egregious performance differences/cliffs. For example, should we use vrgather (with what LMUL), or interesting workarounds such as widening+slide1, to implement a basic operation such as interleaving two vectors?

2 comments

janwas

camel-cdr 2 months ago

> For example, should we use vrgather (with what LMUL), or interesting workarounds such as widening+slide1, to implement a basic operation such as interleaving two vectors?

Use Zvzip, in the mean time:

zip: vwmaccu.vx(vwaddu.vv(a, b), -1, b), or segmented load/store when you are touching memory anyways

unzip: vsnrl

trn1/trn2: masked vslide1up/vslide1down with even/odd mask

The only thing base RVV does bad in those is register to register zip, which takes twice as many instructions as other ISAs. Zvzip gives you dedicated instructions of the above.

janwas 2 months ago

Looks like the ratification plan for Zvzip is November. So maybe 3y until HW is actually usable? That's a neat trick with wmacc, congrats. But still, half the speed for quite a fundamental operation that has been heavily used in other ISAs for 20+ years :(
Great that you did a gap analysis [1]. I'm curious if one of the inputs for that was the list of Highway ops [2]?
[1]: https://gist.github.com/camel-cdr/99a41367d6529f390d25e36ca3... [2]: https://github.com/google/highway/blob/master/g3doc/quick_re...