← Back to context

Comment by Terr_

1 day ago

> inability to self-reflect

IMO the One Weird Trick for LLMs is recognizing that there's no real entity, and that users are being tricked into a suspended-disbelief story.

In most cases cases you're contributing text-lines for a User-character in a movie-script document, and the LLM algorithm is periodically triggered to autocomplete incomplete lines for a Chatbot character.

You can have an interview with a vampire DraculaBot, but that character can only "self-reflect" in the same shallow/fictional way that it can "thirst for blood" or "turn into a cloud of bats."

This is a tired semantic argument that does not bring any insight into the discussion. A token-predictor could still be trained to predict the tokens “I’m not sure what you mean because of points x, y, and z; could you elaborate?”

  • It could be trained to say that, but it's not exactly clear how you would reinforce the absence of certain training data in order to emit that response accurately, rather than just based on embedding proximity.

  • It means if you want something resembling a self-introspective theory of mind, you need to arrange the overall document to cohere to documents where such things are/appear-to-be happening.

    This leads us to new questions: How can we characterize and identify real-world documents which fit? How can we determine what features may be significant, and which of those can be easily transplanted to our use-case?

    • There are a lot of words but it feels like you have never really used LLM's (apologies for the bluntness).

      We see LLM's introspecting all the time[1].

      >Notably, DeepSeek-AI et al. report that the average response length and downstreamperformance of DeepSeek-R1-Zero increases as training progresses. They further report an “aha moment” during training, which refers to the “emergence” of the model’s ability to reconsider its previously generated content. As we show in Section 3.2, this reconsideration behaviour is often indicated by the generation of phrases such as ‘wait, ...’ or ‘alternatively, ...’

      [1] https://arxiv.org/pdf/2504.07128

      2 replies →

    • You are just doubling down on protecting your argument.

      I operate LLMs in many conversational modes where it does ask clarifying questions, probing questions, baseline determining questions.

      It takes at most one sentence in the prompt to get them to act this way.

      5 replies →

  • How would an LLM “know” when it isn’t sure? Their baseline for truth is competent text, they don’t have a baseline for truth based on observed reality. That’s why they can be “tricked” into things like “Mr Bean is the president of the USA”

    • It would "know" the same way it "knows" anything else: The probability of the sequence "I don't know" would be higher than the probability of any other sequence.

      1 reply →

    • The answer is the same as how the messy bag of chemistry that is the human brain "knows" when it isn't sure:

      Badly, and with great difficulty, so while it can just about be done, even then only kinda.

      7 replies →

    • Humans can just as easily be tricked. Something like 25% of the American Electorate believed Obama was the antichrist.

      So saying LLMs have no "baseline for truth" doesn't really mean much one way of the other, they are much smart and accurate than 99% of humans.

  • I agree that it's a tired argument, but there appears to be two separate things being discussed in this little corner of HN. Clarity in the problem it's being asked to solve, and confidence that the answer it has is correct.

    I can trivially get any of the foundational models to ask me clarifying questions. I've never had one respond with 'I don't know'.

    • I've gotten lots of responses like "with the information you provided, I cannot answer that. Can you provide more information?"

      Which IMO is the name as "idk"

  • I disagree, it's a very insightful comment.

    The problem is that any information about any internal processes used to generate a particular token is lost; the LLM is stateless, apart from the generated text. If you ask an LLM-character (which I agree should be held distinct from the LLM itself and exists at a different layer of abstraction) why it said something, the best it can do is a post-hoc guess. The "character", and any internal state we might wish it to have, only exists insofar as it can be derived anew from the text.

    • I certainly agree with the point about post-hoc justifications – but isn't it amazing that it's also something very familiar to humans who do that all the time and manage to lie to ourselves about it very convincingly?! The more you read about neuropsychology the more you're forced to assume a view where the conscious self, whatever it is, has only a very tenuous grasp of what is going on and how much it actually has control over things.

      In any case, you don't need accurate understanding of how your mind works (hello humans, again!) to be able to converge on

             INSUFFICIENT DATA FOR A MEANINGFUL ANSWER
      

      when there's no other uniquely good local optimum in the search space.

  • Anthropic found that it Claude will pretend that it used the "standard" way to do addition- add the digits, carry the 1, etc- but the pattern of activations showed it using a completely different algorithm. So these things can role play as introspecting- they come up with plausible post-hoc explanations for their output- but they are still just pretending, so they will get it wrong.

    So you can teach a model to sometimes ask for clarification, but will it actually have insight into when it really needs it, or will it just interject for clarification more or less at random? These models have really awful insight into their own capabilities, ChatGPT eg insists to me that it can read braille, and then cheerfully generates a pure hallucination.

    • > Anthropic found that it Claude will pretend that it used the "standard" way to do addition- add the digits, carry the 1, etc- but the pattern of activations showed it using a completely different algorithm.

      That doesn't mean much; humans sometimes do the same thing. I recall a fun story about a mathematician with synesthesia multiplying numbers by mixing the colours together. With a bit of training such a person could also pretend to be executing a normal algorithm for the purposes of passing tests.

      1 reply →

  • It's not a tired argument, and not just a semantic one it's a foundational characteristic of LLM.

    > A token-predictor could still be trained to predict the tokens “I’m not sure what you mean because of points x, y, and z; could you elaborate?”

    This is entirely true, and the key insight is even right in your sentence but you don't seem to grasp it. “could still be trained”: you can train an LLM into doing whatever you want it to, but you have to train it specifically for that!

    In the beginning of LLM we witnessed this impressive phenomenon where the LLM exhibited emergent capabilities (I'm particularly thinking about LLMs being few shots learners about stuff that wasn't in their training corpus). And these emergent capabilities legitimately raised the question about “how intelligent these things are, really”.

    But for the past three years, the key lesson is that this kind of emergent effect is too small to be useful, and the focus has been put towards creating purposely built datasets (with tons of “artificial data”) to train the model to explicitly do things we want it to do. And it works pretty well, as models' capabilities kept improving at a fast pace (and in particular, I don't see would we couldn't overcome the problem highlighted by this paper, with more synthetic data specifically designed for multi-turn conversation). But their progress is now strictly limited by their makers' own intelligence. You cannot just scrap the web throw compute at the problem and expect emergent intelligence to occur anymore. It's more “simulated intelligence” than “artificial intelligence”, really.

    • It's definitely a tired and semantical one because as he said, it brings no insight and is not even good at the analogy level. I can't have a conversation with Dracula and Dracula can't make decisions that affect the real world, so LLMs already break key aspects and assumptions of the 'Document Simulator'.

      Pre-trained LLMs will ask clarifying questions just fine. So I think this is just another consequence of post-training recipes.

      6 replies →