← Back to context

Comment by nonameiguess

3 days ago

RL is a technique for finding an optimal policy for Markov decision processes. If you can define state spaces and action spaces for a sequential decision problem with uncertain outcomes, then reinforcement learning is typically a pretty good way of finding a function mapping states to actions, assuming it isn't a sufficiently small problem that an exact solution exists.

I don't really see why you would want to use it for binary classification or continuous predictive modeling. It's why it excels in game play and operational control. You need to make decisions now that constrain possible decision in the future, but you cannot know the outcome until that future comes and you cannot attribute causality to the outcome even when you learn what it is. This isn't "hot dog/not a hot dog" that generally has an unambiguously correct answer and the classification itself is directly either correct or incorrect. In RL, a decision made early in a game probably leads causally to a particular outcome somewhere down the line, but the exact extent to which any single action contributes is unknown and probably unknowable in many cases.