Comment by munro
18 days ago
Here's an interesting write up on various algorithms & different epsilon greedy % values.
https://github.com/raffg/multi_armed_bandit
It shows 10% exploration performs the best, very great simple algorithm.
Also it shows the Thompson Sampling algorithm converges a bit faster-- the best arm chosen by sampling from the beta distribution, and eliminates the explore phase. And you can use the builtin random.betavariate !
https://github.com/raffg/multi_armed_bandit/blob/42b7377541c...
No comments yet
Contribute on Hacker News ↗