Comment by daeken
13 years ago
A random choice is needed to allow people to give rewards to options other than the dominant. I'm sure this doesn't have to be random -- and I'd be curious to see the logic behind the 10% choice -- but you have to have something that gives the other options a chance.
Makes me wonder if the 10% number couldn't be changed to something that's a function of the number of rewards total; the longer it runs, the less variation there is and the more confident you are in the choice made.
That would be Epsilon-decreasing strategy or VDBE:
http://en.wikipedia.org/wiki/Multi-armed_bandit#Semi-uniform...
Like many things in machine learning, this is a simplified version of Metropolis-Hastings.