Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library

Comment by nrhrjrjrjtntbt

20 hours ago

Yes. The learning comes from running tests on the program and ensuring they pass. So running as an agent. Tests and compiler give hard feedback- thats the data outside the model that it learns from.

I think modern RLHF schemes have models that train LLMs. LLMs teaching each other isn't new.

My knowledge is limited, just based on a read of https://huyenchip.com/2023/05/02/rlhf.html though.

1 comment

nrhrjrjrjtntbt

Reply

suddenlybananas  20 hours ago

RLHF

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities