Comment by Retric
13 years ago
What your forgetting is it's adaptive. That 10% random factor means it's constantly adding in new information. Also, you can graph trends over time so if you make a significant change you could reset the historical data to zero, but simply letting it run and it will adapt to the change.
If your really concerned about rapidly changing events just add a diminishing return. AKA multiply both the success and failure number by say .9999 after each test. so 34/2760 = 34.9966/2760.724 on success or 33.9966/2760.724 on a failure.
I am not forgetting that it is adaptive. I'm pointing out that the new information that is added will cause it to mis-adapt for a surprisingly long time. Adding in a diminishing return is possible. But by what factor do we diminish? We could easily get a disturbing amount of customization required.
The time it takes to adapt is directly related to the magnitude of the difference if it takes six months to from 1.006% efficient strategy to a 1.007% efficient strategy it's not that important. The goal is to find significant wins quickly and any strategy that focuses on micro optimization will tend to find local maxima not global ones. If the top two strategy's are close enough this greedy algorithm will tend to bounce between them and that's ok.
As to diminishing factor you diminish both the numerator and the denominator for a bucket every time you test that bucket. If you want something next to perfect try http://en.wikipedia.org/wiki/Bayesian_statistics, but that eat's a lot of CPU and is harder to code for minimal gain.