Comment by daeken

13 years ago

A random choice is needed to allow people to give rewards to options other than the dominant. I'm sure this doesn't have to be random -- and I'd be curious to see the logic behind the 10% choice -- but you have to have something that gives the other options a chance.

Makes me wonder if the 10% number couldn't be changed to something that's a function of the number of rewards total; the longer it runs, the less variation there is and the more confident you are in the choice made.

2 comments

daeken

nerdo 13 years ago

That would be Epsilon-decreasing strategy or VDBE:

http://en.wikipedia.org/wiki/Multi-armed_bandit#Semi-uniform...

jasonwatkinspdx 13 years ago

Like many things in machine learning, this is a simplified version of Metropolis-Hastings.