Comment by alach11
4 days ago
> it will likely generalize to all kinds of reasoning problems, not just mathematical proofs
Big if true. Setting up an RL loop for training on math problems seems significantly easier than many other reasoning domains. Much easier to verify correctness of a proof than to verify correctness (what would this even mean?) for a short story.
No comments yet
Contribute on Hacker News ↗