Comment by cxvrfr
1 day ago
Well being able to extrapolate solutions to "novel" mathematical exercises based on a very large sample of similar tasks in your dataset seems like a reasonable explanation.
Question is how well it would do if it was trained without those samples?
Gee, I don't know. How would you do at a math competition if you weren't trained with math books? Sample problems and solutions are not sufficient unless you can genuinely apply human-level inductive and deductive reasoning to them. If you don't understand that and agree with it, I don't see a way forward here.
A more interesting question is, how would you do at a math competition if you were taught to read, then left alone in your room with a bunch of math books? You wouldn't get very far at a competition like IMO, calculator or no calculator, unless you happen to be some kind of prodigy at the level of von Neumann or Ramanujan.
> A more interesting question is, how would you do at a math competition if you were taught to read, then left alone in your room with a bunch of math books?
But that isn't how an LLM learnt to solve math olympiad problems. This isn't a base model just trained on a bunch of math books.
The way they get LLMs to be good at specialized things like math olympiad problems is to custom train them for this using reinforcement learning - they give the LLM lots of examples of similar math problems being solved, showing all the individual solution steps, and train on these, rewarding the model when (due to having selected an appropriate sequence of solution steps) it is able itself to correctly solve the problem.
So, it's not a matter of the LLM reading a bunch of math books and then being expert at math reasoning and problem solving, but more along the lines "of monkey see, monkey do". The LLM was explicitly shown how to step by step solve these problems, then trained extensively until it got it and was able to do it itself. It's probably a reflection of the self-contained and logical nature of math that this works - that the LLM can be trained on one group of problems and the generalizations it has learnt works on unseen problems.
The dream is to be able to teach LLMs to reason more generally, but the reasons this works for math don't generally apply, so it's not clear that this math success can be used to predict future LLM advances in general reasoning.
The dream is to be able to teach LLMs to reason more generally, but the reasons this works for math don't generally apply
Why is that? Any suggestions for further reading that justifies this point?
Ultimately, reinforcement learning is still just a matter of shoveling in more text. Would RL work on humans? Why or why not? How similar is it to what kids are exposed to in school?
1 reply →