Comment by mrtesthah
17 hours ago
>"is the RLHF judge happy with the answer."
Reinforcement Learning with Verifiable Rewards (RLVR) to improve math and coding success rates seems like an exception.
17 hours ago
>"is the RLHF judge happy with the answer."
Reinforcement Learning with Verifiable Rewards (RLVR) to improve math and coding success rates seems like an exception.
No comments yet
Contribute on Hacker News ↗