Comment by pierrekin

21 hours ago

I agree that the model can help troubleshoot and debug itself.

I argue that the model has no access to its thoughts at the time.

Split brain experiments notwithstanding I believe that I can remember what my faulty assumptions were when I did something.

If you ask a model “why did you do that” it is literally not the same “brain instance” anymore and it can only create reasons retroactively based on whatever context it recorded (chain of thought for example).

23 comments

pierrekin

XenophileJKO 20 hours ago

Anthropic's introspection experiments have seemed to show that your argument is falsifiable.

https://www.anthropic.com/research/introspection

sumeno 19 hours ago
> In fact, most of the time models fail to demonstrate introspection—they’re either unaware of their internal states or unable to report on them coherently.
You got the wrong takeaway from your link.
- XenophileJKO 19 hours ago
  
  The parent said: "I argue that the model has no access to its thoughts at the time."
  This is falsified by that study, showing that on the frontier models generalized introspection does exist. It isn't consistent, but is is provable.
  "no access" vs. "limited access"
  
  2 replies →

fragmede 20 hours ago

Claude code and codex both hide the Chain of Thought (CoT) but it's just words inside a set of <thinking> tags </thinking> and the agent within the same session has access to that plaintext.

fc417fc802 20 hours ago
Those are just words inside arbitrary tags, they aren't actually thoughts. Think of it as asking the model to role play a human narrating his internal thought process. The exercise improves performance and can aid in human understanding of the final output but it isn't real.
- lmm 9 hours ago
  
  What would be different if it was "real"? What makes you think that when humans "narrate" "their" "internal thought process", it's any more "real"?
- antonvs 19 hours ago
  
  Why do you believe that humans have access to an “internal thought process”? I.e. what do you think is different about an agent’s narration of a thought process vs. a human’s?
  I suspect you’re making assumptions that don’t hold up to scrutiny.
  
  6 replies →

jmalicki 20 hours ago

It does have access to its thoughts. This is literally what thinking models do. They write out thoughts to a scratch pad (which you can see!) and use that as part of the prompt.

fc417fc802 20 hours ago

It's important to be aware that while those "thoughts" can be a useful aid for human understanding they don't seem to reliably reflect what's going on under the hood. There are various academic papers on the matter or you can closely inspect the traces of a more logically oriented question for yourself and spot impossible inconsistencies.
mmoll 20 hours ago
It doesn’t mean that these “thoughts” influenced their final decision the way they would in humans. An LLM will tell you a lot of things it “considered” and its final output might still be completely independent of that.
- jmalicki 18 hours ago
  
  Its output quite literally is not independent, as the "thinking tokens" are attended to by the attention mechanism.
grey-area 20 hours ago

They do not in fact do that. The ‘thoughts’ are not a chain of logic.
sumeno 19 hours ago

You have a fundamental misunderstanding of what the model is doing. It's not your fault though, you're buying into the advertising of how it works
eleumik 15 hours ago

Those are a funny progress bar made by a micro model , is just ui