Comment by waynesonfire

2 years ago

It didnt make any sense to me.

I think an intuitive perspective is that if you pick one item randomly, you cant use the LRU signal. So, you pick two items. Now you get to also use LRU. Notice that it doesnt add any value to pick three random items and then apply LRU.

> Notice that it doesnt add any value to pick three random items and then apply LRU.

It actually does. The more items you pick, the better it works, because at the ultimate stage of k=N you will always be picking the best item. For a lot of real-world distributions, there are rapidly diminishing returns with higher k, but the returns are still there.

[edit]

The same example as above works for k=3, observing that discarding one of the 3 chosen uniformly randomly devolves to the k=2 form.

k=3: MAX(A,B,C)

k=2: MEAN(MAX(A,B),MAX(B,C),MAX(A,C))

k=1:

  • If you think you need a better K, because you have insight into your distribtution, then don't use uniform, instead, use something more appropriate.

    When k=N you degrade to LRU.

    Your MAX / MEAN logic is completely flawed. If k=N then you'd have MAX(0..N), wouldn't this be optimal? But, this is LRU and it's not optimal.

    All this is hand-waving. The proper response is to use the language of mathematics to prove how this algorithm behaves. Perhaps there is an optimal k.

    • I didn't realize we had moved on to talking about cache-replacement rather than putting balls in buckets. For putting balls in buckets (or any case where it is possible to compare options and come out with a "best" then k=N is optimal.

      For cache-replacement strategies, you can prove very little for the general case, since for any two reasonable replacement strategies, there is usually an access pattern that favors one over the other. When TFA says "Random and FIFO are both strictly worse than either LRU or 2-random" the context is in running SPEC CPU with an 8-way associative cache. It's not hard to come up with artificial benchmarks that invert that relationship.

      Also TFA contradicts your earlier assertion that k=3 is not better than k=2, at least when used with a pseudo-LRU: "Also, we can see that pseudo 3-random is substantially better than pseudo 2-random, which indicates that k-random is probably an improvement over 2-random for the k."