Comment by dan-robertson
1 day ago
This description reminds me of a hashtable used for some APL things, which is described in a talk online somewhere. The difference is that the APL table did Robin Hood insertions (an open table where new elements are inserted at or after the position corresponding to their hash, but you rearrange things to try to reduce the maximum distance any element is from where it should be) so it stored 4 bits of offset information and 4 bits of hash in each entry of the metadata table. Lookups were still sse based, something like:
position = (hash>>4) & mask
test = set1_16x8(hash&0xf) | 0xf0e0d0c0b0a090807060504030201000
candidates = eq_16x8(load_16x8(table + position), test)
// convert candidates to mask and test each option
FWIW, I found that when doing Robin Hood, SSE is just a loss, since you usually find the element so soon. (If you are dominated by non-matching lookups, you can consider putting a Bloom filter in front.)
With linear probing RH, a failed lookup can stop as early as a successful one, as long as you also store the hash (useful for fast insertion): stop looking when you'd insert the key you're looking for before the current entry.