Comment by cbarrick
1 day ago
You should read more about Markov Decision Processes (MDPs) and Game Theory in general.
The word "solve" is well defined in this context, and is used accurately by OP. You "solve" a game by finding the policy that maximizes the expected return.
Obviously we can only talk about "expectation" because the outcome is random. But that doesn't mean that an optimal policy doesn't exist.
Optimal policies are also often random, expressed like "in state S, perform action A with probability P and action B with probability 1-P". A policy then boils down to a table, with a row for each state and a column for each action, where each cell is the probability of performing that action in that state.
Even more interesting are partially observable Markov decision processes, where your agent doesn't even know what state it is in. Instead, you get observations that hint to the true state, and you model the state as a probability distribution over possible concrete states. Solving these POMDPs is quite a bit more difficult than traditional MDPs.
It is possible to solve some MDPs (and POMDPs) by hand, but in practice we often use reinforcement learning to learn the policy table by simulating games.
No comments yet
Contribute on Hacker News ↗