Comment by PhilipTrettner

8 days ago

I looked into this because part of our pipeline is forced to be chunked. Most advice I've seen boils down to "more contiguity = better", but without numbers, or at least not generalizable ones.

My concrete tasks will already reach peak performance before 128 kB and I couldn't find pure processing workloads that benefit significantly beyond 1 MB chunk size. Code is linked in the post, it would be nice to see results on more systems.

Your results match similar analyses of database systems I’ve seen.

64KB-128KB seems like the sweet spot.

Doesn't it depend what you're doing? xz data compression or some video codecs? Retrograde chess analysis (endgame tablebases)? Number Field Sieve factorization in the linear algebra phase?