Comment by conductrics
13 years ago
Of course, of the environment is truely stationary, then the easiest simpest hack method for exploration is just seed the initial values for each option (A/B../Z) with a an optimistic guess (so something you know is higher than the true value. Then just make decisions based on the current best estimate. The estimates will be driven down over time to their true values. Not claiming you should do this or that is optimal or anything but keep it in mind as a quick hack to solve the problem.
No comments yet
Contribute on Hacker News ↗