Comment by mananaysiempre
7 days ago
> Lookup tables are always faster than calculation - is that true?
Maybe on the Z80. Contemporary RAM was quite fast compared to it, by our sad standards.
A table lookup per byte will see you hit a pretty hard limit of about 1 cycle per byte on all x86 CPUs of the last decade. If you’re doing a state machine or a multistage table[1] where the next table index depends on both the next byte and the previous table value, you’ll be lucky to see half that. Outracing your SSD[2] you’re not, with this approach.
If instead you can load a 64-bit chunk (or several!) at a time, you’ll have quite a bit of leeway to do some computation to it before you’re losing to the lookup table, especially considering you’ve got fast shifts and even multiplies (another difference from the Z80). And if you’re doing 128- or 256-bit vectors, you’ve got even more compute budget—but you’re also going to spend a good portion of it just shuffling the right bytes into the right positions. Ultimately, though, one of your best tools there is going to be ... an instruction that does 16 resp. 32 lookups in a 16-entry table at a time[3].
So no, if you want to be fast on longer pieces of data, in-memory tables are not your friend. On smaller data, with just a couple of lookups, they could be[4]. In any case, you need to be thinking about your code’s performance in detail for these things to matter—I can’t think of a situation where “prefer a lookup table” is a useful heuristic. “Consider a lookup table” (then measure), maybe.
[1] https://www.unicode.org/versions/latest/ch05.pdf
[2] https://lemire.me/en/talk/perfsummit2020/
[3] http://0x80.pl/notesen/2008-05-24-sse-popcount.html
[4] https://tia.mat.br/posts/2014/06/23/integer_to_string_conver...
No comments yet
Contribute on Hacker News ↗