Comment by dragontamer

5 days ago

Oh, if you need the best of both worlds, consider pshufb (4-bit lookup table), or if you have access to AVX512 you could use vpermi2b as an effective 7-bit lookup table.

It's not quite a full memory lookup table but these instructions get a lookup-like behavior but using the vector units (128-bit or 512-bit registers).