Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library

Comment by wwind123

13 hours ago

This kind of approach would generally still need human guidance, otherwise these models might get stuck in weird niche corners of the problem space that would not be relevant to any real world project.

1 comment

wwind123

Reply

ben_w  13 hours ago

We could call this "reinforcement learning from human feedback" (RLHF) :)

https://en.wikipedia.org/wiki/Reinforcement_learning_from_hu...

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities