Comment by direwolf20
22 days ago
it's meant in the literal sense but with metaphorical hacksaws and duct tape.
Early on, some advanced LLM users noticed they could get better results by forcing insertion of a word like "Wait," or "Hang on," or "Actually," and then running the model for a few more paragraphs. This would increase the chance of a model noticing a mistake it made.
Reasoning is basically this.
It's not just force inserting a word. Reasoning is integrated into the training process of the model.
Not the core foundation model. The foundation model still only predicts the next token in a static way. The reasoning is tacked onto the instructGPT style finetuning step and its done through prompt engineering. Which is the shittiest way a model like this could have been done, and it shows