Comment by Sesse__

6 months ago

“Memory movement”? None of the instructions you list involve memory.

I find the perfect hash implementation a bit weird; it seems to obfuscate that you simply look at the lowest two bits (since they differ between the four values). You can do the x + 3 and 3 - expr at the very end, once, instead of for every element.

3 comments

Sesse__

Voultapher 6 months ago

Doing the phf as shown is an and + neg instruction and just doing % 4 is just the and. I tested it on a Apple M1 machine and saw no difference in performance at all. It's possible to go much faster with vectorization 3x on the Zen 3 machine.

Sesse__ 6 months ago

I didn't say it was slower, just that it was more obfuscated.

akoboldfrying 6 months ago

You're right about memory movement, not sure what I was thinking.