Comment by Yizahi
1 day ago
For example - at minimum reasoning should match what actually happened. This is not even a complete set of criteria for reasoning, but at least a minimal baseline. Currently LLM programs are generating BS in the "reasoning" part of the output. For example ask the LLM program to "reason" how it produces a sum of two numbers and you will see that it doesn't match at all with what LLM program did in the background. The "reasoning" it outputs is simply an extract of the reasoning which humans did in the LLM dataset. Even Anthropic officially admits this. If you ask a program how to do maintenance on a gearbox and it replies with very well articulated and correct (important!) guide to harvest wheat, then we can't call it reasoning of any kind, despite that wheat farming guide was correct and logical.
As soon as you introduce multiple constraints on what is and isn't reasoning people get confused and disengage.
I like this approach of setting a minimum constraint. But i feel adding more will just make people ignore the point entirely.
The reality is obvious. The only way not to see it when looking at research like this is to not want to see it. The idea that this critique is somehow more confusing than the use of the word "reasoning" itself is farcical.
LLMs are cool and some of the things they can do now are useful, even surprising. But when it comes to AI, business leaders are talking their books and many people are swept up by that breathless talk and their own misleading intuitions, frequently parroted by the media.
The "but human reasoning is also flawed, so I can't possibly understand what you mean!" objection cannot be sustained in good faith short of delusion.