Comment by reificator

5 years ago

Are you reading the whole 1 MiB in linear order? If so, then the prefetcher should help in exactly the same way.

If you're not, then why is all of that data packed together? Is there an alternate layout you could use instead where you iterate over the exact values you need? If performance is important to you in that context it might be worth it.

Right - I suppose any large associated data that does not need to be searched can be malloc()'d (via standard or fancy allocator) and you just store a pointer. Or if the other data is relatively small, just in another array.

One case where I don't know what to think: the implications of memmoving the subarray on NUMA cache invalidation in an extremely multithreaded application.