Comment by FpUser
6 months ago
I did another type of experiment which evaluates benefits of branch prediction on AMD 9950X on contiguous array with 1,000,000 elements. Calculated sum adding element if it is bigger than 125 (50% of 256). Difference between random and sorted was 10 times. I guess branch prediction plays a huge role as well.
Thanks for sharing that.
Presumably if you'd split the elements into 16 shares (one for each CPU), summed with 16 threads, and then summed the lot at the end, then random would be faster than sorted?
I don’t think random should be faster than contiguous access, if you parallelize both of them.
Although, it looks like that chip has a 1MB L2 cache for each core. If these are 4 Bytes ints, then I guess they won’t all fit in one core’s L2, but maybe they can all start out in their respective cores’ L2 if it is parallelized (well, depends on how you set it up).
Maybe it will be closer. Contiguous should still win.
What if you factored in time to sort them first?