Comment by hodgehog11

16 hours ago

This argument, that LLMs can develop new crazy strategies using RLVR on math problems (like what happened with Chess), turns out to be false without a serious paradigm shift. Essentially, the search space is far too large, and the model will need help to explore better, probably with human feedback.

https://arxiv.org/abs/2504.13837

9 comments

hodgehog11

narrator 16 hours ago

The search space for the game of Go was also thought to be too large for computers to manage.

thesz 8 hours ago
It still is [1].
[1] https://www.vice.com/en/article/a-human-amateur-beat-a-top-g...
- stalfie 4 hours ago
  
  The blind spot exploiting strategy you link to was found by an adverserial ML model...
sealeck 14 hours ago
Yes and making a horse drawn cart drive itself was thought to be impossible so why don't we have faster than light travel yet...
- Finbel 9 hours ago
  
  Yes but "the search space is too large" is something that has been said about innumerable AI-problems that were then solved. So it's not unreasonable that one doubts the merit of the statement when it's said for the umpteenth time.
  
  1 reply →

throwaway27448 7 hours ago

I agree that LLMs are a bad fit for mathematical reasoning, but it's very hard for me to buy that humans are a better fit than a computational approach. Search will always beat our intuition.

hodgehog11 4 hours ago

Yes and no. I think we have vastly underestimated the extent of the search space for math problems. I also think we underestimate the degree to which our worldview influences the directions with which we attempt proofs. Problems are derived from constructions that we can relate to, often physically. Consequently, the technique in the solution often involves a construction that is similarly physical in its form. I think measure theory is a prime example of this, and it effectively unlocked solutions to a lot of long-standing statistical problems.