Comment by conductrics

13 years ago

If you are using an epsilon-greedy approach (or something similar), then I believe that the data collected during the exploration portion - (the random calls) are open, albeit with less power due to reduced sample size, to standard hypothesis testing. Think of it this way, you might normally run your experiment on a subset of your traffic (population) - so only 20%, with the rest (80%) getting the current experience. With the e-greedy type of approach you are just swapping the 'current experience' with the maximum estimated experience, but that other 20% is still a random draw.

The draws aren't independent. At any given time, the probability of assigning a user to a cohort is dependent upon a function of the previous observations (in other words, it's a markov model).

The standard confidence tests -- t-tests, G-tests, chi-squared tests, etc. -- based on distributions of independent, identically distributed (iid) data.

I'd have to think about it more, but I believe that btilly's examples are also the most intuitive reasons why independence matters. If your data is time-dependent, then assigning users to cohorts based on past performance lets the time dependency dominate. There may be other good examples.

  • Is that true in the e-greedy case? Sure, during the exploit call, they are not independent, but during the explore portion I would assume they are, since they have been randomly assigned into the exploration pool (epsilon) and then drawn from a uniform random draw. There is no information that I can see from prior draws being used.