Comment by ben_w
15 hours ago
We could call this "reinforcement learning from human feedback" (RLHF) :)
https://en.wikipedia.org/wiki/Reinforcement_learning_from_hu...
15 hours ago
We could call this "reinforcement learning from human feedback" (RLHF) :)
https://en.wikipedia.org/wiki/Reinforcement_learning_from_hu...
No comments yet
Contribute on Hacker News ↗