Comment by visarga

3 years ago

Yes, if you amplify the model. It can do many things to increase its level, for example look for consistency between multiple attempts, reflect on its own output, use more intermediate steps, use external tools and extra information from search engines, formulate the task as a game with a score, etc. You just need to make a superior environment for the LLM than just LLM alone. AlphaGo famously used Monte Carlo Tree Search to amplify one step predictions.

In essence the idea is: use more expensive computation to derive better result, then retrain the model on the new data. System 2 works (model + toys), then system 1 learns (model by itself).