Comment by ben_w

14 hours ago

We could call this "reinforcement learning from human feedback" (RLHF) :)

https://en.wikipedia.org/wiki/Reinforcement_learning_from_hu...

0 comments

ben_w

Reply

No comments yet

Contribute on Hacker News ↗