Comment by bee_rider
18 hours ago
I don’t think random should be faster than contiguous access, if you parallelize both of them.
Although, it looks like that chip has a 1MB L2 cache for each core. If these are 4 Bytes ints, then I guess they won’t all fit in one core’s L2, but maybe they can all start out in their respective cores’ L2 if it is parallelized (well, depends on how you set it up).
Maybe it will be closer. Contiguous should still win.
No comments yet
Contribute on Hacker News ↗