Comment by skywhopper
4 hours ago
So, you agree with the point that they’re making and you’re mad about it? It’s important to state that the models aren’t doing real reasoning because they are being marketed and sold as if they are.
As for your question: ‘So what does "sophisticated simulators of reasoning-like text" even mean here?’
It means CoT interstitial “reasoning” steps produce text that looks like reasoning, but is just a rough approximation, given that the reasoning often doesn’t line up with the conclusion, or the priors, or reality.
What is "real reasoning"? The mechanism that the models use is well described. They do what they do. What is this article's complaint?
For example - at minimum reasoning should match what actually happened. This is not even a complete set of criteria for reasoning, but at least a minimal baseline. Currently LLM programs are generating BS in the "reasoning" part of the output. For example ask the LLM program to "reason" how it produces a sum of two numbers and you will see that it doesn't match at all with what LLM program did in the background. The "reasoning" it outputs is simply an extract of the reasoning which humans did in the LLM dataset. Even Anthropic officially admits this. If you ask a program how to do maintenance on a gearbox and it replies with very well articulated and correct (important!) guide to harvest wheat, then we can't call it reasoning of any kind, despite that wheat farming guide was correct and logical.
As soon as you introduce multiple constraints on what is and isn't reasoning people get confused and disengage.
I like this approach of setting a minimum constraint. But i feel adding more will just make people ignore the point entirely.
“the mechanism the models us is well described”
Vs
Total AI capex in the past 6 months was greater than US consumer spending
Or
AGI is coming
Or
AI Agents will be able to do most white collar work
——
The paper is addressing parts of the conversation and expectations of AI that are in the HYPE quadrant. There’s money riding on the idea that AI is going to begin to reason reliably. That it will work as a ghost in the machine.
This is why research like this is important and needs to keep being published.
What we have seen the last few years is a conscious marketing effort to rebrand everything ML as AI and to use terms like "Reasoning", "Extended Thinking" and others that for many non technical people give the impression that it is doing far more than it is actually doing.
Many of us here can see his research and be like... well yeah we already knew this. But there is a very well funded effort to oversell what these systems can actually do and that is reaching the people that ultimately make the decisions at companies.
So the question is no longer will AI Agents be able to do most white collar work. They can probably fake it well enough to accomplish a few tasks and management will see that. But will the output actually be valuable long term vs short term gains.
2 replies →
"the reasoning often doesn’t line up with the conclusion, or the priors, or reality."
My dude, have you ever interacted with human reasoning?
Are you sure you are not comparing to human unreason?
Most of what humans think of as reason is actually "will to power". The capability to use our faculties in a way that produces logical conclusions seems like an evolutionary accident, an off-lable use of the brain's machinery for complex social interaction. Most people never learn to catch themselves doing the former when they intended to engage in the latter, some don't know the difference. Fortunately, the latter provides a means of self-correction, the research here hopes to elucidate whether an LLM based reasoning system has the same property.
In other words, given consistent application of reason I would expect a human to eventually draw logically correct conclusions, decline to answer, rephrase the question, etc. But with an LLM, should I expect a non-determisitic infinite walk though plausible nonsense? I expect reaaoning to converge.