← Back to context

Comment by phpnode

6 hours ago

Claude doesn't know why it acted the way it acted, it is only predicting why it acted. I see people falling for this trap all the time

Yes, this pitfall is a hard one. It is very easy to interpret the LLM in a way there is no real ground for.

It's not even predicting why it acted, it's predicting an explanation of why it acted, which is even worse since there's no consistent mental model.

IDK how far AIs are from intelligence, but they are close enough that there is no room for anthropomorphizing them. When they are anthropomorphized its assumed to be a misunderstanding of how they work.

Whereas someone might say "geeze my computer really hates me today" if it's slow to start, and we wouldn't feel the need to explain the computer cannot actually feel hatred. We understand the analogy.

I mean your distinction is totally valid and I dont blame you for observing it because I think there is a huge misunderstanding. But when I have the same thought, it often occurs to me that people aren't necessarily speaking literally.

  • This is a sort of interesting point, it's true that knowingly-metaphorical anthropomorphisation is hard to distinguish from genuine anthropomorphisation with them and that's food for thought, but the actual situation here just isn't applicable to it. This is a very specific mistaken conception that people make all the time. The OP explicitly thought that the model would know why it did the wrong thing, or at least followed a strategy adjacent to that misunderstanding. He was surprised that adding extra slop to the prompt was no more effective than telling it what to do himself. It's not a figure of speech.

    • A good time to quote our dear leader:

      > No one gets in trouble for saying that 2 + 2 is 5, or that people in Pittsburgh are ten feet tall. Such obviously false statements might be treated as jokes, or at worst as evidence of insanity, but they are not likely to make anyone mad. The statements that make people mad are the ones they worry might be believed. I suspect the statements that make people maddest are those they worry might be true.

      People are upset when AIs are anthropomorphized because they feel threatened by the idea that they might actually be intelligent.

      Hence the woefully insufficient descriptions of AIs such as "next token predictors" which are about as fitting as describing Terry Tao as an advanced gastrointestinal processor.

    • There's this underlying assumption of consistency too - people seem to easily grasp that when starting on a task the LLM could go in a completely unexpected direction, but when that direction has been set a lot of people expect the model to stay consistent. The confidence with which it answers questions plays tricks on the interlocutor.

    • Whats not a figure of speech?

      I am speaking general terms - not just this conversation here. The only specific figure of speech I see in the original comment is "self reflection" which doesn't seem to be in question here.