Comment by cjhanks
8 years ago
There is typically L1 cache, L2 cache, L3 cache, RAM, and SWAP. These ranges from kb to mb, and then to gb. And each lower level is faster than all levels above it.
Most of the relevant memory here is L2 and L3. As you have more items in memory, the statistical likelihood that the page you're requesting will be in an efficient cache level decreases. Eventually you have so many items that you're statistically only getting the performance of RAM (let's ignore the swap case)... that's the point where you'll get closer to a flat `O(1)`... but it will be a lot slower than the `O(1)` with more cache hits.
All true, to add to this there's all sorts of other caveats on modern processors. E.g. it may be slower to access any of L1-3 or RAM depending on what processor core you're on.
This is because even though your program can transparently access all that memory, the underlying machine is implemented in terms of memory areas and it can be costly to "jump across" a memory boundary[1][2][3].
1. https://linux.die.net/man/8/numactl
2. https://linux.die.net/man/1/taskset
3. http://iopscience.iop.org/article/10.1088/1742-6596/664/9/09...