Comment by Davidzheng
7 months ago
RLHF means Reinforcement Learning from Human Feedback. The right/wrong ones are either called RL or RLVR (Verfiable Rewards)
7 months ago
RLHF means Reinforcement Learning from Human Feedback. The right/wrong ones are either called RL or RLVR (Verfiable Rewards)
No comments yet
Contribute on Hacker News ↗