Comment by int_19h

5 months ago

At this point we have considerable evidence in favor of the hypothesis that LLMs construct world models. Ones that are trained at some specific task construct a model that is relevant for that task (see Othello GPT). The generic ones that are trained on, basically, "stuff humans write", can therefore be assumed to contain very crude models of human thinking. It is still "just predicting tokens"; it's just that if you demand sufficient accuracy at prediction, and you're predicting something that is produced by reasoning, the predictor will necessarily have to learn some approximation of reasoning (unless it's large enough to just remember all the training data).