← Back to context

Comment by sravfeyn

13 years ago

I can draw many parallels between this and Genetic Algorithms. There is percentage of probability in choosing next choice(In GA, children), and we have highest probability for the most profitable(In GA, fittest) choices. The solutions evolve. And the most profitable solutions (Most fittest in GA) remains.

How is this different from Genetic Algorithms?

A GA is a zeroth order optimization method. A Bandit is a type of decision problem. So, bandit is a single state RL problem were one is trying to make decisions in an environment in order to min regret. GA is a general optimization approach when there is no gradient or second order info about the problem to use. Take a look at XCS classifiers for an approach that can solve bandit type problems, but uses GAs to estimate the mailings between features and rewards.