Comment by crazygringo

5 days ago

You're claiming that the thinking is just a fictional story intended to look like it.

But this is false, because the thinking exhibits cause and effect and a lot of good reasoning. If you change the inputs, the thinking continues to be pretty good with the new inputs.

It's not a story, it's not fictional, it's producing genuinely reasonable conclusions around data it hasn't seen before. So how is it therefore not actual thinking?

And I have no idea what your short document example has to do with anything. It seems nonsensical and bears no resemblance to the actual, grounded chain of thought processes high-quality reasoning LLM's produce.

> OK, so that document technically has a "chain of thought" and "reasoning"... But whose?

What does it matter? If an LLM produces output, we say it's the LLM's. But I fail to see how that is significant?

16 comments

crazygringo

czl 5 days ago

> So how is it therefore not actual thinking?

Many consider "thinking" something only animals can do, and they are uncomfortable with the idea that animals are biological machines or that life, consciousness, and thinking are fundamentally machine processes.

When an LLM generates chain-of-thought tokens, what we might casually call “thinking,” it fills its context window with a sequence of tokens that improves its ability to answer correctly.

This “thinking” process is not rigid deduction like in a symbolic rule system; it is more like an associative walk through a high-dimensional manifold shaped by training. The walk is partly stochastic (depending on temperature, sampling strategy, and similar factors) yet remarkably robust.

Even when you manually introduce logical errors into a chain-of-thought trace, the model’s overall accuracy usually remains better than if it had produced no reasoning tokens at all. Unlike a strict forward- or backward-chaining proof system, the LLM’s reasoning relies on statistical association rather than brittle rule-following. In a way, that fuzziness is its strength because it generalizes instead of collapsing under contradiction.

Terr_ 5 days ago
Well put, and if it doesn't notice/collapse under introduced contradictions, that's evidence it's not the kind of reasoning we were hoping for. The "real thing" is actually brittle when you do it right.
- czl 4 days ago
  
  Human reasoning is, in practice, much closer to statistical association than to brittle rule-following. The kind of strict, formal deduction we teach in logic courses is a special, slow mode we invoke mainly when we’re trying to check or communicate something, not the default way our minds actually operate.
  Everyday reasoning is full of heuristics, analogies, and pattern matches: we jump to conclusions, then backfill justification afterward. Psychologists call this “post hoc rationalization,” and there’s plenty of evidence that people form beliefs first and then search for logical scaffolding to support them. In fact, that’s how we manage to think fluidly at all; the world is too noisy and underspecified for purely deductive inference to function outside of controlled systems.
  Even mathematicians, our best examples of deliberate, formal thinkers, often work this way. Many major proofs have been discovered intuitively and later found to contain errors that didn’t actually invalidate the final result. The insight was right, even if the intermediate steps were shaky. When the details get repaired, the overall structure stands. That’s very much like an LLM producing a chain of reasoning tokens that might include small logical missteps yet still landing on the correct conclusion: the “thinking” process is not literal step-by-step deduction, but a guided traversal through a manifold of associations shaped by prior experience (or training data, in the model’s case).
  So if an LLM doesn’t collapse under contradictions, that’s not necessarily a bug; it may reflect the same resilience we see in human reasoning. Our minds aren’t brittle theorem provers; they’re pattern-recognition engines that trade strict logical consistency for generalization and robustness. In that sense, the fuzziness is the strength.
  
  3 replies →

rustystump 5 days ago

The problem is that the overwhelming majority of input it has in-fact seen somewhere in the corpus it was trained on. Certainly not one for one but easily an 98% match. This is the whole point of what the other person is trying to comment on i think. The reality is most of language is regurgitating 99% to communicate an internal state in a very compressed form. That 1% tho maybe is the magic that makes us human. We create net new information unseen in the corpus.

crazygringo 5 days ago
> the overwhelming majority of input it has in-fact seen somewhere in the corpus it was trained on.
But it thinks just great on stuff it wasn't trained on.
I give it code I wrote that is not in its training data, using new concepts I've come up with in an academic paper I'm writing, and ask it to extend the code in a certain way in accordance with those concepts, and it does a great job.
This isn't regurgitation. Even if a lot of LLM usage is, the whole point is that it does fantastically with stuff that is brand new too. It's genuinely creating new, valuable stuff it's never seen before. Assembling it in ways that require thinking.
- rustystump 5 days ago
  
  I think you may think too highly of academic papers or more so that they oft still only have 1% in there.
  
  3 replies →
- zeroonetwothree 5 days ago
  
  I think it would be hard to prove that it's truly so novel that nothing similar is present in the training data. I've certainly seen in research that it's quite easy to miss related work even with extensive searching.
the_pwner224 5 days ago
Except it's more than capable of solving novel problems that aren't in the training set and aren't a close match to anything in the training set. I've done it multiple times across multiple domains.
Creating complex Excel spreadsheet structures comes to mind, I just did that earlier today - and with plain GPT-5, not even -Thinking. Sure, maybe the Excel formulas themselves are a "98% match" to training data, but it takes real cognition (or whatever you want to call it) to figure out which ones to use and how to use them appropriately for a given situation, and how to structure the spreadsheet etc.
- rustystump 5 days ago
  
  I think people confuse novel to them with novel to humanity. Most of our work is not so special
  
  1 reply →