Comment by medell
13 years ago
Nice post. One hypothetical case it could end up serving the worse design more often is if you had times of the day where your users behave very differently due to time zones, i.e. Europe vs North America.
Say you have a sports site and test a new soccer oriented layout vs an old baseball heavy one. In the day, the old baseball version wins easily. When NA goes to sleep it would serve up baseball to the Europeans until it loses, then after several hours soccer is the winner. But then it is too late and the Europeans go to sleep and on and on. This is an odd example and assumes equal balance, and the site should really be localized, but you get the point.
This is actually pretty straightforward to overcome (and one of the real strengths of the bayesian approach). Rather than using the direct counts of success for each group, add a prior belief that americans will favor baseball and non-americans will favor soccer (you can experiment to determine this number).
You can now evaluate the results conditioned on each group (american / non-american).
Or if you don't want to add in a prior specific belief about americans vs. non-americans, just keep one counter per option per continent-of-source-IP (or whatever) and the learning algorithm should work it out on its own. Of course, if you use too many bins then learning is going to take far too long.