Comment by figarus314
15 days ago
A model solving original math problems may look like human reasoning, but internally the model is choosing the next token based on what it has learned about probability around various patterns and structures. The model knows about correlations between problems, proof techniques and answer structures, and when it "reasons" it's selecting a high probability trajectory through that learned knowledge.
A calculator is different because it is not probabilistic; it executes a fixed procedure. One of these models, when doing math, is more like a learned probabilistic system that understands enough structure around mathematics that some of its high probability trajectories seem like genuine reasoning.
The difference is that when a human reasoner goes to solve a problem, they'll think "this kind of proof usually goes this way" - following an explicit rule enforcement. The model may produce the same output, and may even appear to approach it the same way, but the mechanism is a probabilistic pattern selection rather than explicit rule enforcement.
You talk as if problem solving is a supervised (imitation) learning problem. No, it is a reinforcement learning problem, models learn by solving problems and getting rated. They generate their own training data. Optimal budget allocation is 1/3 cost pre-training, 1/3 for RL, and 1/3 on inference.
> The difference is that when a human reasoner goes to solve a problem, they'll think "this kind of proof usually goes this way" - following an explicit rule enforcement.
How is this different from "probabilistic pattern selection"?
Because... it's just different, that's all! OK?
I don’t think there’s any evidence that “human reasoning” isn’t also based on probabilistic pattern selection.
[flagged]
You should leave this site. Comments like this are not good for this site. You should go somewhere else.