Comment by equark

13 years ago

Obviously epsilon-greedy is not optimal. But I think the point is it's surprisingly effective and probably better than A/B testing in most applications. Most alternatives get very complicated very quickly.

Even your thought experiment is not conclusive. First, it will take some time to be convinced which variation is better. The lost revenue during the 50/50 experimentation phase may outweigh the long-run benefit if your discount rate is high. This can happen in markets where customer acquisition is slow and expensive (i.e. a niche targeted via AdWords). Second, except in the unrealistic 0% vs 100% case, you could get stuck on wrong choice forever if you pass a statistical significance test by chance. Epsilon-greedy will correct itself over time and is asymptotically guaranteed to make the correct choice most of the time. A/B testing only has this asymptotic guarantee if you spend asymptotically long experimenting, which defeats the whole exercise.

Or, get complicated and vary the epsilon based on the recent change in conversions.

For example, record the conversions from the exploration path vs each individual exploitation attempt. Then, adjust epsilon accordingly, so that it is within a certain percentage of one of your best choices.