← Back to context

Comment by mattmanser

13 years ago

These also don't make much sense to me, but to avoid getting d/ved to oblivion I'll say why.

Firstly both you and the author aren't really quoting any maths or real world papers or anything. He's backing his up with saying that all the advertisers are using this over A/B though, which is a pretty strong argument. But it occurs to me that for most of your points to stand you need to tackle this particular paragraph:

Like many techniques in machine learning, the simplest strategy is hard to beat. More complicated techniques are worth considering, but they may eke out only a few hundredths of a percentage point of performance. The strategy that has been shown to win out time after time in practical problems is the epsilon-greedy method.

So to tackle you points:

1. Only stands if you show his paragraph above to be wrong. Does epsilon-greedy only work on consistent payouts, or does it work on fluctuating payouts too? It would seem to me that this would be a common occurrence in advertising on websites. I imagine there is some research out there to settle this!

2. He addresses this directly in the post: This won't adapt to change. (Your visitors probably don't change. But if you really want to, in the reward function, multiply the old reward value by a forgetting factor)

3. There is no difference between this and A/B testing, the mock code he shows is supposed to go in your A/B testing framework, the code in the controllers is supposed to be the same (and you can remove it the same way).

4. Isn't A/B testing just as bad at testing multiple factors? Why wouldn't you 'notice', you should theoretically see the same percentages for each stage. And would be able to notice the oddity.

5. Again this only stands if you show his paragraph above to be wrong. You are suggesting that a complicated strategy will win, which he says isn't true.

I showed that paragraph to be wrong. Not slightly, but wildly in the common scenario where, while a test is running, you make another change to your website that lifts conversion.

The difference in this case is not a few hundredths of a point of convergence. It is a question of potentially drawing wrong conclusions about the wrong version for 100x as long as you need to.

  • What your forgetting is it's adaptive. That 10% random factor means it's constantly adding in new information. Also, you can graph trends over time so if you make a significant change you could reset the historical data to zero, but simply letting it run and it will adapt to the change.

    If your really concerned about rapidly changing events just add a diminishing return. AKA multiply both the success and failure number by say .9999 after each test. so 34/2760 = 34.9966/2760.724 on success or 33.9966/2760.724 on a failure.

    • I am not forgetting that it is adaptive. I'm pointing out that the new information that is added will cause it to mis-adapt for a surprisingly long time. Adding in a diminishing return is possible. But by what factor do we diminish? We could easily get a disturbing amount of customization required.

      1 reply →