Comment by crazygringo
6 days ago
No, multi-armed bandit doesn't "beat" A/B testing, nor does it beat it "every time".
Statistical significance is statistical significance, end of story. If you want to show that option B is better than A, then you need to test B enough times.
It doesn't matter if you test it half the time (in the simplest A/B) or 10% of the time (as suggested in the article). If you do it 10% of the time, it's just going to take you five times longer.
And A/B testing can handle multiple options just fine, contrary to the post. The name "A/B" suggests two, but you're free to use more, and this is extremely common. It's still called "A/B testing".
Generally speaking, you want to find the best option and then remove the other ones because they're suboptimal and code cruft. The author suggests always keeping 10% exploring other options. But if you already know they're worse, that's just making your product worse for those 10% of users.
Multi-arm bandit does beat A/B testing in the sense that standard A/B testing does not seek to maximize reward during the testing period, MAB does. MAB also generalizes better to testing many things than A/B testing.
This is a double-edged sword. There are often cases in real-world systems where the "reward" the MAB maximizes is biased by eligibility issues, system caching, bugs, etc. If this happens, your MAB has the potential to converge on the worst possible experience for your users, something a static treatment allocation won't do.
I haven’t seen these particular shortcomings before, but I certainly agree that if your data is bad, this ML approach will also be bad.
Can you share some more details about your experiences with those particular types of failures?
1 reply →
No -- you can't have your cake and eat it too.
You get zero benefits from MAB over A/B if you simply end your A/B test once you've achieved statistical significance and pick the best option. Which is what any efficient A/B test does -- there no reason to have any fixed "testing period" beyond what is needed to achieve statistical significance.
While, to the contrary, the MAB described in the article does not maximize reward -- as I explained in my previous comment. Because the post's version runs indefinitely, it has worse long-term reward because it continues to test inferior options long after they've been proven worse. If you leave it running, you're harming yourself.
And I have no idea what you mean by MAB "generalizing" more. But it doesn't matter if it's worse to begin with.
(Also, it's a huge red flag that the post doesn't even mention statistical significance.)
> you can't have your cake and eat it too
I disagree. There is a vast array of literature on solving the MAB problem that may as well be grouped into a bin called “how to optimally strike a balance between having one’s cake and eating it too.”
The optimization techniques to solve MAB problem seek to optimize reward by giving the right balance of exploration and exploitation. In other words, these techniques attempt to determine the optimal way to strike a balance between exploring if another option is better and exploiting the option currently predicted to be best.
There is a strong reason this literature doesn’t start and end with: “just do A/B testing, there is no better approach”
6 replies →
Isn't that the point of testing (to not maximize reward but rather wait and collect data)? It sounds like maximizing reward during the experiment period can bias the results
The great thing is that you can do both.