Comment by nphrk
13 years ago
There are better approaches for tackling this problem (with 0-regret asymptotically). You can take a look at the UCB (Upper Confidence Bound) algorithm, and you can do even more if you assume some continuity, e.g. what is commonly done is to assume that the whole distribution is from a Gaussian Processes. Many interesting ideas in the literature indeed :)
No comments yet
Contribute on Hacker News ↗