Comment by egl2020
3 days ago
Three considerations that come into play in deciding about using RL: 1) how informative is the loss on each example, 2) can you see how to adjust the model based on the loss signal, and 3) how complex is the feature space?
For the house value problem, you can quantify how far the prediction is from the true value, there are lots of regression models with proven methods of adjusting the model parameters (e.g. gradient descent), and the feature space comprises mostly monotone, weakly interacting features like quality of neighborhood schools and square footage. It's a "traditional" problem and can be solved as well as possible by the traditional methods we know and love. RL is unnecessary, might require more data than you have, and might produce an inferior result.
In contrast, for a sequential decision problem like playing go, the binary won-lost signal doesn't tell us much about how well or poorly the game was played, it's not clear how to improve the strategy, and there are a large number of moves at each turn with no evident ranking. In this setting RL is a difficult but possible approach.
No comments yet
Contribute on Hacker News ↗