Comment by OskarS
8 years ago
Is that really true, though? It's certainly the case that a cache miss takes longer than a cache hit, but the time for a cache miss doesn't depend on how much memory you have. Or does it? Like, a cache miss isn't going to be slower if you have 64 gigabytes of memory compared to having 1 gigabyte of memory, it's just a cache miss. That makes random memory access O(1) in the sense of complexity analysis.
There is typically L1 cache, L2 cache, L3 cache, RAM, and SWAP. These ranges from kb to mb, and then to gb. And each lower level is faster than all levels above it.
Most of the relevant memory here is L2 and L3. As you have more items in memory, the statistical likelihood that the page you're requesting will be in an efficient cache level decreases. Eventually you have so many items that you're statistically only getting the performance of RAM (let's ignore the swap case)... that's the point where you'll get closer to a flat `O(1)`... but it will be a lot slower than the `O(1)` with more cache hits.
All true, to add to this there's all sorts of other caveats on modern processors. E.g. it may be slower to access any of L1-3 or RAM depending on what processor core you're on.
This is because even though your program can transparently access all that memory, the underlying machine is implemented in terms of memory areas and it can be costly to "jump across" a memory boundary[1][2][3].
1. https://linux.die.net/man/8/numactl
2. https://linux.die.net/man/1/taskset
3. http://iopscience.iop.org/article/10.1088/1742-6596/664/9/09...
More RAM doesn't mean more cache misses, but more elements in your data set does — "n" isn't the amount of memory in your system, it has the same meaning here as it does everywhere else in complexity analysis.
Small datasets fit in L3, smaller datasets fit in L2, and if they're smaller still they'll fit in L1, all of which are progressively faster. If your data set is bigger than L3, you need to hit RAM, and bigger still you'll need to hit the SSD or HDD — going progressively slower.
I mean, yeah, obviously, I'm not disputing that, but the point is that it's still "constant time", it doesn't scale with the amount of memory.
My point wasn't that it doesn't get slower if you have larger datasets: of course there's a change when you go from cache-hits to cache-misses. My point was that trying to say "it has higher complexity than O(1)" is an inaccurate way of describing that slow-down. "O(1) random memory access" doesn't mean "fast memory access", it means "memory access where the time is bounded by a constant".
That's because people use big-oh notation when they really mean something close to theta-oh and are looking for a tighter average case bound (e.g. any graph where you benchmark 'average' case data).
E.g. for almost any commonly used algo you could say it is O(e^n) and you would be technically correct as far as what big-oh demands.
O(1) is indeed an upper bound on memory access, but in the presence of cache it is not the tightest possible bound, hence one reason for divergence against numerical estimates given by rough big-oh bounds.
3 replies →
Yes. Here is a four-part series exploring RAM models: http://www.ilikebigbits.com/blog/2014/4/21/the-myth-of-ram-p...
You should see it from the other side: a cache miss takes longer the bigger the data you have to fit in your cache (as you go through bigger and slower cache levels).
So, as your N (length of data) changes, access time changes too.