Comment by syllogism

6 hours ago

What is "real reasoning"? The mechanism that the models use is well described. They do what they do. What is this article's complaint?

8 comments

syllogism

Yizahi 6 hours ago

For example - at minimum reasoning should match what actually happened. This is not even a complete set of criteria for reasoning, but at least a minimal baseline. Currently LLM programs are generating BS in the "reasoning" part of the output. For example ask the LLM program to "reason" how it produces a sum of two numbers and you will see that it doesn't match at all with what LLM program did in the background. The "reasoning" it outputs is simply an extract of the reasoning which humans did in the LLM dataset. Even Anthropic officially admits this. If you ask a program how to do maintenance on a gearbox and it replies with very well articulated and correct (important!) guide to harvest wheat, then we can't call it reasoning of any kind, despite that wheat farming guide was correct and logical.

Xss3 5 hours ago
As soon as you introduce multiple constraints on what is and isn't reasoning people get confused and disengage.
I like this approach of setting a minimum constraint. But i feel adding more will just make people ignore the point entirely.
- pxc 1 hour ago
  
  The reality is obvious. The only way not to see it when looking at research like this is to not want to see it. The idea that this critique is somehow more confusing than the use of the word "reasoning" itself is farcical.
  LLMs are cool and some of the things they can do now are useful, even surprising. But when it comes to AI, business leaders are talking their books and many people are swept up by that breathless talk and their own misleading intuitions, frequently parroted by the media.
  The "but human reasoning is also flawed, so I can't possibly understand what you mean!" objection cannot be sustained in good faith short of delusion.

intended 6 hours ago

“the mechanism the models us is well described”

Total AI capex in the past 6 months was greater than US consumer spending

AGI is coming

AI Agents will be able to do most white collar work

——

The paper is addressing parts of the conversation and expectations of AI that are in the HYPE quadrant. There’s money riding on the idea that AI is going to begin to reason reliably. That it will work as a ghost in the machine.

throwaway173738 30 minutes ago

The scary thing about ML isn’t that it’s poised to eat a lot of lower-reasoning tasks, it’s that we’re going to find ourselves in a landscape of “that’s just what the AI said to do” kind of excuses for all kinds of bad behavior, and we’re completely unwilling to explore what biases are encoded in the models we’re producing. It’s like how Facebook abdicates responsibility for how users feel because it’s just the product of an algorithm. And if I were a betting person I’d bet all this stuff is going to be used for making rental determinations and for deciding who gets exceptions to overdraft fees well before it’s used for anything else. It’s an enabling technology for all kinds of inhumanity.
nerdjon 6 hours ago
This is why research like this is important and needs to keep being published.
What we have seen the last few years is a conscious marketing effort to rebrand everything ML as AI and to use terms like "Reasoning", "Extended Thinking" and others that for many non technical people give the impression that it is doing far more than it is actually doing.
Many of us here can see his research and be like... well yeah we already knew this. But there is a very well funded effort to oversell what these systems can actually do and that is reaching the people that ultimately make the decisions at companies.
So the question is no longer will AI Agents be able to do most white collar work. They can probably fake it well enough to accomplish a few tasks and management will see that. But will the output actually be valuable long term vs short term gains.
- ponow 5 hours ago
  
  I'm happy enough if I'm better off for having used a tool than having not.
  
  1 reply →