Comment by mrtranscendence

3 years ago

> A LLM can pull from itself training text, LLM code, and fine-tuning examples. Then it can monitor its own re-training.

You can train a weaker model with output from a stronger, but can you train an LLM from output from itself?

Yes, if you amplify the model. It can do many things to increase its level, for example look for consistency between multiple attempts, reflect on its own output, use more intermediate steps, use external tools and extra information from search engines, formulate the task as a game with a score, etc. You just need to make a superior environment for the LLM than just LLM alone. AlphaGo famously used Monte Carlo Tree Search to amplify one step predictions.

In essence the idea is: use more expensive computation to derive better result, then retrain the model on the new data. System 2 works (model + toys), then system 1 learns (model by itself).

On top of that, what's the method to keep "errors" from compounding? It also seems like the capabilities of the trained model would approach an asymptote that is the limit of the training model, and never pass it.

  • You loop the LLM with code execution, or a simulator, or a game environment. And use feedback to learn.