Comment by daxfohl
13 days ago
Which matches what they are. They're first and foremost pattern recognition engines extraordinaire. If they can identify some pattern that's out of whack in your code compared to something in the training data, or a bug that is similar to others that have been fixed in their training set, they can usually thwack those patterns over to your latent space and clean up the residuals. If comparing pattern matching alone, they are superhuman, significantly.
"Reasoning", however, is a feature that has been bolted on with a hacksaw and duct tape. Their ability to pattern match makes reasoning seem more powerful than it actually is. If your bug is within some reasonable distance of a pattern it has seen in training, reasoning can get it over the final hump. But if your problem is too far removed from what it has seen in its latent space, it's not likely to figure it out by reasoning alone.
Exactly. I go back to a recent ancestor of LLMs, seq2seq. Its purpose was to translate things. Thats all. That needed representation learning and an attention mechanism, and it lead to some really freaky emergent capabilities, but its trained to trainslate language.
And thats exactly what its good for. It works great if you already solve a tough problem and provide it the solution in natural language, because the program is already there, it just needs to translate it to python.
Anything more than that that might emerge from this is going to be unreliable sleight of next-token-prediction at best.
We need a new architectural leap to have these things reason, maybe something that involves reinforcement learning at the token represention level, idk. But scaling the context window and training data arent going to cut it
>"Reasoning", however, is a feature that has been bolted on with a hacksaw and duct tape.
What do you mean by this? Especially for tasks like coding where there is a deterministic correct or incorrect signal it should be possible to train.
it's meant in the literal sense but with metaphorical hacksaws and duct tape.
Early on, some advanced LLM users noticed they could get better results by forcing insertion of a word like "Wait," or "Hang on," or "Actually," and then running the model for a few more paragraphs. This would increase the chance of a model noticing a mistake it made.
Reasoning is basically this.
It's not just force inserting a word. Reasoning is integrated into the training process of the model.
1 reply →