Comment by 3pt14159
13 years ago
I might not be understanding you correctly, but wouldn't the independent improvement also help the random bandit choices? If you are using the forgetting factor this shouldn't be a real issue.
My problem with the bandit method is that I want to show the same test choice to the same person every time he sees the page so you can hide that there is a test. If I do this with the bandit algo then it warps the results because different cohorts have different weightings of the choices and differing cohorts behave very differently for lots of reasons.
The independent improvement also helps the random bandit choices. The problem is that you are comparing A from a largely new population with Bs that are mostly from an old population. It takes a long time to accumulate enough new Bs to resolve this issue.
A forgetting factor will help.
This is a variant of the cohort issue that you're talking about.
The cohort issue that you're talking about raises another interesting problem. If you have a population of active users, and you want to test per user, you often will find that your test population ramps up very quickly until most active users are in, and then slows down. The window where most users are assigned is a period where you have poor data (you have not collected for long, users have not necessarily had time to go to final sale).
It seems to me that if you want to use a bandit method in this scenario, you'd be strongly advised to make your fundamental unit the impression, and not the user. But then you can't hide the fact that the test is going on. Whether or not this is acceptable is a business problem, and the answer is not always going to be yes.