Comment by blazespin
21 hours ago
Verifying math requires something like Lean which is a huge bottleneck, as the paper explains.
Plus there isn't a lot of training data in lean.
Most gains come from training on stuff already out there, not really the RLVR part which just amps it up a bit.
No comments yet
Contribute on Hacker News ↗