← Back to context

Comment by Nokinside

8 years ago

You should think memory as fixed length continuous blocks of different sizes.

Inside the processor there are cache-lines (usually 64 bytes). They are blocks of memory that are are tagged by CPU at once and moved together.

In the typical n-way set associative cache architecture the main RAM memory is divided blocks with n-lines each. Each set on the memory cache can hold up to n-lines from the same memory block. 8-way cache would divide 1 GB RAM to 1,024 1 MB blocks. If you work with more than 512 bytes (= 8X64) at the time within that block, there will be cache misses. In other words, CPU caches have have limited amount of cache lines dedicated to large continuous block of RAM (unless they are fully associative caches)

From CPU to DRAM access there is typically 64-byte and 4096-byte regions with range cross penalties. I think 64-byte cross penalty is typically less than 10 cycles, 4096-byte region range cross penalty is several tens of cycles (this on the top of the penalty of accessing DRAM).