← Back to context

Comment by pdpi

8 years ago

More RAM doesn't mean more cache misses, but more elements in your data set does — "n" isn't the amount of memory in your system, it has the same meaning here as it does everywhere else in complexity analysis.

Small datasets fit in L3, smaller datasets fit in L2, and if they're smaller still they'll fit in L1, all of which are progressively faster. If your data set is bigger than L3, you need to hit RAM, and bigger still you'll need to hit the SSD or HDD — going progressively slower.

I mean, yeah, obviously, I'm not disputing that, but the point is that it's still "constant time", it doesn't scale with the amount of memory.

My point wasn't that it doesn't get slower if you have larger datasets: of course there's a change when you go from cache-hits to cache-misses. My point was that trying to say "it has higher complexity than O(1)" is an inaccurate way of describing that slow-down. "O(1) random memory access" doesn't mean "fast memory access", it means "memory access where the time is bounded by a constant".

  • That's because people use big-oh notation when they really mean something close to theta-oh and are looking for a tighter average case bound (e.g. any graph where you benchmark 'average' case data).

    E.g. for almost any commonly used algo you could say it is O(e^n) and you would be technically correct as far as what big-oh demands.

    O(1) is indeed an upper bound on memory access, but in the presence of cache it is not the tightest possible bound, hence one reason for divergence against numerical estimates given by rough big-oh bounds.

    • > O(1) is indeed an upper bound on memory access, but in the presence of cache it is not the tightest possible bound, hence one reason for divergence against numerical estimates given by rough big-oh bounds.

      Quite the opposite — O(1) is in fact too tight.

      2 replies →