Comment by xeeeeeeeeeeenu
8 hours ago
> So on x86_64 processors, we have to branch to say “a 32-bit zero value has 32 leading zeros”.
Not if you're targeting x86-64-v3 or higher. Haswell (Intel) and Piledriver (AMD) introduced the LZCNT instruction that doesn't have this problem.
You can also very trivially do (codepoint | 1).leading_zeros(), then you can also shave one byte off the LEN table. (This doesn't affect the result because LEN[32] == LEN[33] == 1).
Easy to count leading zeroes in a branch-free manner without a hardware instruction using a conditional move and a de Bruijn sequence; see https://github.com/llvm/llvm-project/blob/main/flang/include... .
Isn't there another way to do this without so many data races?
I feel like this should be
By data races I assume you actually mean data dependencies?