Caches: LRU vs. Random (2014)

2 years ago (danluu.com)

27 comments

eatonphil

I've been playing a lot of Baldur's Gate 3 lately and this kind of reminds me of when you have to roll a d20 in a skill check. Early in the game there are skill checks with a difficultly class of 2 (need to roll a 2 or higher to succeed) but you might have proficiency in that skill so you get a +3 modifier to your roll.

But if your base d20 roll result is a 1 then this is considered a critical fail and your modifiers are not added – you fail no matter what. So there is a 5% chance of failing no matter what (1 in 20).

However, you can have special items or spells which can grant you advantage in a roll: you roll 2 d20 instead of 1 and take the higher of each. Now the only way to critical fail is if you roll a 1 with both d20s. This has only a 0.25% chance (1 in 400).

So the "2 random choices" cache eviction policy feels kinda like rolling with advantage in D&D: Roll 2 die (pick to random keys) and evict the higher time since last access to evict.

jvanderbot 2 years ago
I love that game and all the source material, but I really dislike the d20 as a base. I've been working on a system that uses a fixed number of d6, since with just a few d6 you get something more normally distributed, and so you can actually model real life skills more accurately, and provide a better spread of outcomes for higher level players over " I have a 5% chance to fail, or a 90+% chance in something I'm not good at "
20 years of d&d will motivate some unofficial errata.
- itishappy 2 years ago
  
  Check out the Dominions series, which just released it's 6th installment this week! It uses "exploding D6s" for it's rolls, which means rolling a bunch of D6s and then re-rolling any 6s (recursively) and summing the result. This results in a fascinating long-tailed distribution (similar to advantage/disadvantage) so that even the weakest unit can deal damage to bosses on occasion.
  
  1 reply →
- hutattedonmyarm 2 years ago
  
  I love the Shadowrun system, where you roll a whole bunch of d6. To suceed a certain number of these need to hit a threshold. Proficiency and other advantages are granted by allowing you to increase the number of d6 you roll
  
  1 reply →
_aavaa_ 2 years ago

The critical fail on a 1 is a variant that I hate, it makes everything feel like a slapstick comedy. I do wish it would be played as written.

gnulinux 2 years ago

> So we've seen that this works, but why would anyone think to do this in the first place? The Power of Two Random Choices: A Survey of Techniques and Results by Mitzenmacher, Richa, and Sitaraman has a great explanation. The mathematical intuition is that if we (randomly) throw n balls into n bins, the maximum number of balls in any bin is O(log n / log log n) with high probability, which is pretty much just O(log n). But if (instead of choosing randomly) we choose the least loaded of k random bins, the maximum is O(log log n / log k) with high probability, i.e., even with two random choices, it's basically O(log log n) and each additional choice only reduces the load by a constant factor.

This sounds very powerful! And intuitively doesn't make any sense to me. So, say I have n choices, I do not know which choice is better. Does this result say that it's more efficient to randomly choose k and find the best among them (even for k=2), instead of randomly choosing a single choice? Could someone point me in the right direction?

aidenn0 2 years ago
> This sounds very powerful! And intuitively doesn't make any sense to me. So, say I have n choices, I do not know which choice is better. Does this result say that it's more efficient to randomly choose k and find the best among them (even for k=2), instead of randomly choosing a single choice? Could someone point me in the right direction?
Yes, let's consider the k=2 vs selecting a random item from a set of N items:
Selecting one item uniformly randomly from a set of N is identical to selecting two distinct items and then picking one of those two uniformly randomly.
So if you select items A and B, then pick the best item, you end up with an item of MAX(A,B) utility instead of MEAN(A,B) utility. MAX(A,B) >= MEAN(A,B) should hopefully be obvious.
- mjevans 2 years ago
  
  Random numbers sound possibly expensive for 'small' caches. Though I /offhand/ I guess if there's a hardware RNG source it doesn't need to be cryptographically secure or trusted as long as it's fast for this task.
  Similarly if updates are bursty an idle cull could use a bitvector to mark free slots within a fixed array for rather low overhead and a burst of speedy inserts when load picks up. Something close is probably already necessary overhead for when the cache isn't yet populated.
- girvo 2 years ago
  
  You've explained that wonderfully, thank you! I'm not the original commenter, but was also confused by my intuition here. That makes a lot more sense now :)
  
  4 replies →

bhaney 2 years ago

The algorithm described here (randomly picking some small number of keys and evicting the oldest one among them) is the core of how Redis' "LRU" cache eviction policies work, in case anyone was wondering.

refset 2 years ago
In a similar vein, there's a neat "Efficient Page Replacement" strategy described in a LeanStore paper [0] that combines random selection with FIFO:
> Instead of tracking frequently accessed pages in order to avoid evicting them, our replacement strategy identifies infrequently-accessed pages. We argue that with the large buffer pool sizes that are common today, this is much more efficient as it avoids any additional work when accessing a hot page
> by speculatively unswizzling random pages, we identify infrequently-accessed pages without having to track each access. In addition, a FIFO queue serves as a probational cooling stage during which pages have a chance to be swizzled. Together, these techniques implement an effective replacement strategy at low cost
[0] https://db.in.tum.de/~leis/papers/leanstore.pdf
- cmrdporcupine 2 years ago
  
  Came here to say the same thing.
  There's a follow-up paper from last year that talks about some refinement of this strategy (6.2 page replacement), among other things.
  "The Evolution of LeanStore"
  https://dl.gi.de/server/api/core/bitstreams/edd344ab-d765-44...
  
  2 replies →

eru 2 years ago

chmike 2 years ago

It depends on the distribution of probabilities of being used. If least recently used data has the same probability to be requested than the others, then random picking will do as well as LRU.

When the least recently used data does have a lower probability to be requested, than LRU will outperform random picking.

There is no silver bullet algorithm.

shiandow 2 years ago

Even just as a cheap way of approximating LRU the k-random algorithm is quite nice.

Honestly I think LRU doesn't get enough recognition for being provably only a fixed factor away from what an optimal algorithm could do with less memory [1]. In fact LRU does about as well as an online algorithm can do, without knowing up front which memory is about to be accessed (basically you can do statistical analysis, but this means other patterns must perform worse). My take from it is that more memory beats clever algorithms.

The paper is interesting by the way, I think it's one of the first to use potentials to prove amortized optimality, and it actually generalizes a bit further showing that 'move-to-front' is similarly close to optimal if the cost function of accessing an item increases with some convex function.

[1]: https://dl.acm.org/doi/10.1145/2786.2793

eru 2 years ago
> My take from it is that more memory beats clever algorithms.
Well, that's about to be expected from a cache?
- shiandow 2 years ago
  
  Not really, I mean more CPU only beats better algorithms up to a point. With caching you can guarantee 90% of the performance of the best possible algorithm by increasing the RAM tenfold. This is not the case with CPU, you can increase it however many times you want there's no 'master' algorithm that guarantees a performance up to 90% of the optimal algorithm, no matter how much more CPU you throw at the problem (though it would be handy, we could finally solve the Collatz conjecture).
  I mean I do vaguely recall that there are sometimes ways to guarantee close to optimal asymptotic complexity by running all possible programs in parallel and seeing which one finishes first, but it's nowhere near as practical as LRU.
  
  1 reply →