Comment by TimJRobinson

13 years ago

I created a tool a few years ago built on a similar strategy, but instead of only showing the best performing variation the chance of each variation showing was based on how well it was converting (so in an a/b/c test with conversion rates of 3%/2%/1% version a would show 1/2 of the time, version b would show 1/3 of the time and version c would show 1/6th of the time).

There was one major flaw with this strategy though:

Lets say you're testing a landing page and have had 1000 visitors and version A is converting at 40% while version B is converting at 30%. So it looks like so:

Version A - 200 / 500 - 40% Version B - 150 / 500 - 30%

A new affiliate comes on board and decides to send 200 visitors to your page from some "Buy 200 visitors for $1" domain redirection service. These visitors are such low quality that they will never ever buy anything and will probably just close the window immediately (or are bots). Now your results look something like this:

Version A - 200 / 680 - 29.4% Version B - 150 / 520 - 28.8%

And with just 200 visitors some random affiliate has killed all your results. Now you could add filtering and reports based on the affiliate or traffic source but this is more code and more attention you have to pay to the test.

If you were running a traditional A/B test your results would look like this:

Version A - 200 / 600 - 33% Version B - 150 / 600 - 25%

And even though the overall conversion rate is lower you can still see version A is better than B.

The idea is good and I love the idea of auto optimization, but it does have it's flaws which require more than 20 lines of code to overcome.

1 comment

TimJRobinson

conductrics 13 years ago

You might want to look at botlzman/softmax if you want to weight the prob of selection as a function of the current estimated value. One tricky bit is figuring out a good setting for the temperature parameter. Another poster alluded to softmax. In my experience it dosn't really perform better than a simple e-greedy approach, but maybe it has worked well for others?