Comment by Davidzheng
6 days ago
RLHF means Reinforcement Learning from Human Feedback. The right/wrong ones are either called RL or RLVR (Verfiable Rewards)
6 days ago
RLHF means Reinforcement Learning from Human Feedback. The right/wrong ones are either called RL or RLVR (Verfiable Rewards)
No comments yet
Contribute on Hacker News ↗