Comment by aaronjg
13 years ago
There are appropriate solutions to the multi-armed bandit problem, and a wealth of literature out there, however this is not one of those solutions.
Here's a simple thought experiment to show that this will not 'beat A/B testing every time.' Imagine you have two designs, one has a 100% conversion rate, one has a 0% conversion rate. Simple A/B testing will allow you to pick the the winning example. Whereas this solution is still picking the 0% design 10% of the time.
For some other implementations check out the following links:
For Dynamic Resampling:
http://jmlr.csail.mit.edu/papers/volume3/auer02a/auer02a.pdf
For Optimal Termination Time:
http://blog.custora.com/2012/05/a-bayesian-approach-to-ab-te...
OK, in AB testing that same 0% design, you're showing it 50% of the time.
You seem to be saying "I'll AB test it just for a little, then weed out the 0% one. but in the case of this new algorithm, I'll let it run for a long time." That's not exactly fair. Not to mention, both algorithms would allow you to clearly see the 0% option sucks.
But the only way this testing method is superior (at least as explained in the article) is that it automatically adjusts itself. If you're going in and adjusting manually, it sounds like this is — at best — precisely as reliable as A/B testing and subject to the same critique the OP levels at A/B testing.
"But the only way this testing method is superior (at least as explained in the article) is that it automatically adjusts itself."
That's actually very useful for me though. Especially if a site has a lot of tests, or I'm running tests for a multitude of clients. It means I have to babysit the tests less frequently.
It will adjust itself in the sense that while it is still enabled and testing (a suboptimal state for your product), it will let your product perform better than plain old A/B.
At some point you still step in and decide based on the data, especially if you detect degenerate cases, and move on to the next experiment.
There's no point in continuing to run A/B testing indefinitely once a winner has been chosen, and like-wise there is no point in running this indefinitely either.
Sure the OP advocates "set and forget", but it seems simply silly to do this indefinitely. Obviously, after a while, you're going to want to check in on the results and prune out low performers.
Is that not the entire point of all these different testing methodologies to begin with?
Also, if you were to follow the same "set and forget" strategy with A/B, you'd have a 50% chance of showing the 0% CR design, where-as it's only 10% with this method.
Seems like it wins out in both cases to me.
Possibly pedantic - but it would only show the loser 5% of the time. 10% of the time it picks a random design, including the winner.
Obviously epsilon-greedy is not optimal. But I think the point is it's surprisingly effective and probably better than A/B testing in most applications. Most alternatives get very complicated very quickly.
Even your thought experiment is not conclusive. First, it will take some time to be convinced which variation is better. The lost revenue during the 50/50 experimentation phase may outweigh the long-run benefit if your discount rate is high. This can happen in markets where customer acquisition is slow and expensive (i.e. a niche targeted via AdWords). Second, except in the unrealistic 0% vs 100% case, you could get stuck on wrong choice forever if you pass a statistical significance test by chance. Epsilon-greedy will correct itself over time and is asymptotically guaranteed to make the correct choice most of the time. A/B testing only has this asymptotic guarantee if you spend asymptotically long experimenting, which defeats the whole exercise.
Or, get complicated and vary the epsilon based on the recent change in conversions.
For example, record the conversions from the exploration path vs each individual exploitation attempt. Then, adjust epsilon accordingly, so that it is within a certain percentage of one of your best choices.