Comment by cubefox
9 hours ago
I assume they use self-verification only during RL training to provide the reward signal, but not for benchmarks.
9 hours ago
I assume they use self-verification only during RL training to provide the reward signal, but not for benchmarks.
No comments yet
Contribute on Hacker News ↗