CoT is literally just telling an LLM to "reason through it step by step", so that it talks itself through the solution instead of just giving the final answer. There's no searching involved in any of that.
i don't write understand how that would lead to anything but a slightly different response. How can token prediction have this capability without explicitly enabling some heretofore unenabled mechanism? People have been asking this for years.
Let's just assume the model is a statistical parrot, which it probably is. The probability for the next token is influenced based on the input. So far so good, if I now ask a question, the probability that I generate the corresponding answer increases. But is it the right one? This is exactly where CoT tries to start, in which context is generated you change the probability of the tokens for the answer and we can at least experimentally show that the answers get better. Perhaps it is easier to speak of a kind of refinement, the more context is generated, the more focused the model is on the currently important topic.
At this point we have considerable evidence in favor of the hypothesis that LLMs construct world models. Ones that are trained at some specific task construct a model that is relevant for that task (see Othello GPT). The generic ones that are trained on, basically, "stuff humans write", can therefore be assumed to contain very crude models of human thinking. It is still "just predicting tokens"; it's just that if you demand sufficient accuracy at prediction, and you're predicting something that is produced by reasoning, the predictor will necessarily have to learn some approximation of reasoning (unless it's large enough to just remember all the training data).
The theory is that you increase the context with more relevant tokens to the problem at hand, as well as its solutions, which in theory makes it more likely to predict the correct solution.
CoT is literally just telling an LLM to "reason through it step by step", so that it talks itself through the solution instead of just giving the final answer. There's no searching involved in any of that.
i don't write understand how that would lead to anything but a slightly different response. How can token prediction have this capability without explicitly enabling some heretofore unenabled mechanism? People have been asking this for years.
Let's just assume the model is a statistical parrot, which it probably is. The probability for the next token is influenced based on the input. So far so good, if I now ask a question, the probability that I generate the corresponding answer increases. But is it the right one? This is exactly where CoT tries to start, in which context is generated you change the probability of the tokens for the answer and we can at least experimentally show that the answers get better. Perhaps it is easier to speak of a kind of refinement, the more context is generated, the more focused the model is on the currently important topic.
At this point we have considerable evidence in favor of the hypothesis that LLMs construct world models. Ones that are trained at some specific task construct a model that is relevant for that task (see Othello GPT). The generic ones that are trained on, basically, "stuff humans write", can therefore be assumed to contain very crude models of human thinking. It is still "just predicting tokens"; it's just that if you demand sufficient accuracy at prediction, and you're predicting something that is produced by reasoning, the predictor will necessarily have to learn some approximation of reasoning (unless it's large enough to just remember all the training data).
The theory is that you increase the context with more relevant tokens to the problem at hand, as well as its solutions, which in theory makes it more likely to predict the correct solution.
English is turing complete.
1 reply →