Comment by timr

13 years ago

"After a few days of testing, can't you take a look at the statistics and analyze them the same way that you would for an A/B test? Then just stop testing?"

No, because the assumptions that underpin many statistical techniques are violated when you're not assigning people to cohorts consistently, and at random.

3 comments

timr

conductrics 13 years ago

If you are using an epsilon-greedy approach (or something similar), then I believe that the data collected during the exploration portion - (the random calls) are open, albeit with less power due to reduced sample size, to standard hypothesis testing. Think of it this way, you might normally run your experiment on a subset of your traffic (population) - so only 20%, with the rest (80%) getting the current experience. With the e-greedy type of approach you are just swapping the 'current experience' with the maximum estimated experience, but that other 20% is still a random draw.

timr 13 years ago
The draws aren't independent. At any given time, the probability of assigning a user to a cohort is dependent upon a function of the previous observations (in other words, it's a markov model).
The standard confidence tests -- t-tests, G-tests, chi-squared tests, etc. -- based on distributions of independent, identically distributed (iid) data.
I'd have to think about it more, but I believe that btilly's examples are also the most intuitive reasons why independence matters. If your data is time-dependent, then assigning users to cohorts based on past performance lets the time dependency dominate. There may be other good examples.
- conductrics 13 years ago
  
  Is that true in the e-greedy case? Sure, during the exploit call, they are not independent, but during the explore portion I would assume they are, since they have been randomly assigned into the exploration pool (epsilon) and then drawn from a uniform random draw. There is no information that I can see from prior draws being used.