← Back to context

Comment by connicpu

7 hours ago

Unless your lookup table is small enough to only use a portion of your L1 cache and you're calling it so much that the lookup table is never evicted :)

It's still less space for other things in the L1 cache, isn't it?

Even that is not necessarily needed, I have gotten major speedups from LUTs even as large as 1MB because the lookup distribution was not uniform. Modern CPUs have high cache associativity and faster transfers between L1 and L2.

L1D caches have also gotten bigger -- as big as 128KB. A Deflate/zlib implementation, for instance, can use a brute force full 32K entry LUT for the 15-bit Huffman decoding on some chips, no longer needing the fast small table.