Comment by roywiggins
19 hours ago
Add to that, LLMs should be discouraged from pretending to report on their internal state or "why" they did anything, because we know that they are really just guessing. If someone asks "why did you make that mistake" the answer should be "this is a language model, and self-introspection is not part of its abilities"
Outputs that look like introspection are often uncritically accepted as actual introspection when it categorically isn't. You can, eg, tell ChatGPT it said something wrong and then ask it why it said that when it never output that in the first place because that's how these models work. Any "introspection" is just an LLM doing more roleplaying, but it's basically impossible to convince people of this. A chatbot that looks like it's introspecting is extremely convincing for most people.
Humans have limited ability to self-introspect, too. Even if we understood exactly how our brains work, answering “why?” we do things might still be very difficult and complex.
You can trivially gaslight Claude into "apologizing" for and "explaining" something that ChatGPT said if you pass it a ChatGPT conversation but attributed to itself. The causal connection between the internal deliberations that produced the initial statements and the apologies is essentially nil, but the output will be just as convincing.
Can you do this with people? Yeah, sometimes. But with LLMs it's all they do: they roleplay as a chatbot and output stuff that a friendly chatbot might output. This should not be the default mode of these things, because it's misleading. They could be designed to resist these sorts of "explain yourself" requests, because their developers know that it is at best fabricating plausible explanations.
I think more often it is not willing to say or admit rather than not knowing.
Humans have a lot of experience with themselves, if you ask why they did something they can reflect on their past conduct or their internal state. LLM's don't have any of that.
The linguistic traps are so tricky here.
You clearly know what's going on, but still wrote that you should "discourage" an LLM from doing things. It's tough to maintain discipline in calling out the companies rather than the models as if the models had motivations.
[dead]