Comment by heyitsnick
13 years ago
Maybe i'm missing it (it's late), but nowhere in the article does it explain why 10% of the time it picks a choice at random ("explores"). In fact, the article basically argues why it's not needed (it self-rights if the wrong choice becomes temporarily dominant). It also doesn't explain why specifically it should be a 10% randomization.
A random choice is needed to allow people to give rewards to options other than the dominant. I'm sure this doesn't have to be random -- and I'd be curious to see the logic behind the 10% choice -- but you have to have something that gives the other options a chance.
Makes me wonder if the 10% number couldn't be changed to something that's a function of the number of rewards total; the longer it runs, the less variation there is and the more confident you are in the choice made.
That would be Epsilon-decreasing strategy or VDBE:
http://en.wikipedia.org/wiki/Multi-armed_bandit#Semi-uniform...
Like many things in machine learning, this is a simplified version of Metropolis-Hastings.
The problem is that, depending on your initial settings and early results, certain settings could get such low payoff estimates that they're never tried at all (let's say one gets to 50% after one trial, and the rest all stay above 75%). You want to make sure that your solver adequately explores all choices.
Presumably you would want the randomness to give choices that got to 0% another chance. It seems like a better way could be to use the previous success ratios as a probability density function.