← Back to context

Comment by nrhrjrjrjtntbt

1 day ago

LLM could generate such a corpus, right? With feedback mechanisms such as side by side tests.

So… llm learns from a corpus it has created?

  • It’s basically called “reinforced learning” and it’s a common technique for machine learning.

    You provide a goal as a big reward (eg test passing), and smaller rewards for any particular behaviours you want to encourage, and then leave the machine to figure out the best way to achieve those rewards through trial and error.

    After a few million attempts, you generally either have a decent result, or more data around additional weights you need to apply before reiterating on the training.