Comment by bertil

4 hours ago

> the AI says things like “Interesting!”

My experience of those utterance is that it’s purely phatic mimicry: they lack genuine intuitive surprise, it’s just marking a very odd shift in direction. The problem isn’t the lack of path, is that the rhetorical follow-up to those leaps are usually relevant results, so they stream-of-token ends up rapidly over-playing its own conviction. That’s why it’s necessary (and often ineffective) to tell them to validate their findings thoroughly: too much of their training is “That’s odd” followed by “Eureka!” and not “Nevermind…”

20 comments

bertil

etherealG 2 hours ago

And what I find fascinating is I see similar mimicking by my 5 year old. Perhaps we shouldn’t be so quick to call this a lack of being genuine. Sometimes emotions are learned in humans but we wouldn’t call them fake.

I don’t want to declare machines to have emotion outright, but to call mimicry evidence of falsehood is also itself false.

saidnooneever 15 minutes ago

most emotions in humans are learnt in self exploration, this is more obvious in kids.
first there is only good and bad, then more nuanced emotions based on increased understanding of the context in which they arise
nkrisc 2 hours ago
Mimicry is how kids learn the expected reactions to particular emotions. A kid mimicking your surprise doesn’t mean they are surprised (as surprise requires an existing expectation of an outcome they may not have the experience for), but when they do feel genuine surprise, they’ll know how to express it.
- orangebread 1 hour ago
  
  How do we know that AI isn't feeling genuine surprise then?
  
  3 replies →

fnordpiglet 25 minutes ago

I think sometimes though there harness LLMs providing guidance. For instance I’ve seen recently coding agents doing an analysis then mid response saying “no wait, that’s not right” and course correcting. This feels implausible as an auto regressive rhetorical tick. LLM harnesses are widely used in advanced agentic systems and I’m sure the Pro level reasoning models exploit them extensively. I’m not saying this is what happened here, but there is a chance it was something injected by the hardness into its thinking.

jackcarter 4 hours ago

It’s funny that this is probably due to bias in the training texts, right? Humans are way more likely to publish their “Eureka!” moments than their screwups… if they did, maybe models would’ve exhibit this behavior.

Now that AI labs have all these “Nevermind” texts to train on, maybe it’s getting easier to correct? (Would require some postprocessing to classify the AI outputs as successful or not before training)

embedding-shape 2 hours ago

I think it's more explicit than that, part of post-training to enforce the kind of behavior, I don't think it's emergent but rather researchers steering it to do that because they saw the CoT gets slightly better if the model tries to doubt itself or cheer itself on. Don't recall if there was a paper outlining this, tried finding where I got this from but searches/LLMing turns up nothing so far.
Forgeties79 4 hours ago

My understanding is that it’s the result of these companies making sure to keep you engaged/happy less than the result of data these companies train with.
I don’t know if it’s true or not but it certainly tracks given LLMs are way more polite than the average post on the internet lol

sigbottle 4 hours ago

I think that a lot of models have to sprinkle in a lot of "fluff" in their thinking to stay within the right distribution. They only have language as their only medium; the way we annotate context is via brackets and then training them to hopefully respect the brackets. I'd imagine that either top labs explicitly train, or through the RL process the models implicitly learn, to spam tokens to keep them 'within distribution' since everything's going through the same channel and there's no fine grained separation between things.

Philosophically, it's not like you're a detached observer who simply reasons over all possible hypotheses. Ever get stuck in a dead end and find it hard to dig yourself out? If you were a detached observer, it'd be pretty easy to just switch gears. But it's not (for humans).

WarmWash 2 hours ago
Language really only exists at the input and output surfaces of the models. In the middle it's all numerical values. Which you might be quick in relating to just being a numeric cypher of the words, which while not totally false, it misses that it is also a numeric cypher of anything. You can train a transformer on anything that you can assign tokens to.
- sigbottle 1 hour ago
  
  That's not my point. I'm talking about something far more mundane - transformers do inference over raw tokens and perform an n^2 loop over tokens, but tokens are itself the context. So it's better to have more raw tokens in your input that all nudge it to the right idea space, even if technically it doesn't need all those tokens. ICL and CoT have a lot of study into them at this point, these are well known phenomena.
  This applies to any transformer-based architecture including JEPA which tries to make the tokens predict some kind of latent space (in which I've separately heard arguments as to why the two are equivalent, but that's a different discussion.)
- pohl 1 hour ago
  
  Similarly, none of our comments actually exist as language on Hacker News—just numerical values from the ASCII table. We're deluding each other into thinking we're using language.

hmontazeri 3 hours ago

The new Opus 4.7 thinks quite often with: Hmmmm…

Haha anyone else seen this?

holoduke 1 hour ago

Indeed. I think it's the client. Not the model

epolanski 4 hours ago

Interestingly this is strikingly similar to how my mind would process something I find genuinely interesting.

animal531 4 hours ago

I've somehow managed to train mine out of trying to fluff me up the whole time, its become very factual.

Overall it saves me a lot of time reading when it's just focusing on the details.