Comment by tbossanova

20 hours ago

So… llm learns from a corpus it has created?

8 comments

tbossanova

Yes. The learning comes from running tests on the program and ensuring they pass. So running as an agent. Tests and compiler give hard feedback- thats the data outside the model that it learns from.

I think modern RLHF schemes have models that train LLMs. LLMs teaching each other isn't new.

My knowledge is limited, just based on a read of https://huyenchip.com/2023/05/02/rlhf.html though.

suddenlybananas 20 hours ago

RLHF

hnlmorg 20 hours ago

It’s basically called “reinforced learning” and it’s a common technique for machine learning.

You provide a goal as a big reward (eg test passing), and smaller rewards for any particular behaviours you want to encourage, and then leave the machine to figure out the best way to achieve those rewards through trial and error.

After a few million attempts, you generally either have a decent result, or more data around additional weights you need to apply before reiterating on the training.

suddenlybananas 20 hours ago
How do you define the goal? This kind of de novo neural program synthesis is a very hard problem.
- hnlmorg 19 hours ago
  
  Defining the goal is the easy part: as I said in my OP, the goal is unit tests passing.
  It’s the other weights that are harder. You might want execution speed to be one metric. But how do you add weights to prevent cheating (eg hardcoding the results)? Or use of anti-patterns like global variables? (For example. Though one could argue that scoped variables aren’t something an AI-first language would need)
  This is where the human feedback part comes into play.
  It’s definitely not an easy problem. But it’s still more pragmatic than having a human curate the corpus. Particularly considering the end goal (no pun intended) is having an AI-first programming language.
  I should close off by saying that I’m very skeptical that there’s any real value in an AI-first PL. so all of this is just a thought experiment rather than something I’d advocate.
  
  2 replies →
- nrhrjrjrjtntbt 19 hours ago
  
  1. Choose set of code challenges (generate them, leetcode, AOC etc.)
  2. LLM generates python solution and seperate python test (as in python test calls code as black box process so it can test non python code)
  3. Agent using skills etc. tries to write new language let's call it Shark.
  4. Run Shark code against test. If fails use agentic flows to correct until test passes.
  5. Now have list of challenges, working code (maybe not beautiful) for training.
  A bit of human spot checking may not go amiss!