← Back to context

Comment by mkolodny

19 hours ago

Humans have limited ability to self-introspect, too. Even if we understood exactly how our brains work, answering “why?” we do things might still be very difficult and complex.

You can trivially gaslight Claude into "apologizing" for and "explaining" something that ChatGPT said if you pass it a ChatGPT conversation but attributed to itself. The causal connection between the internal deliberations that produced the initial statements and the apologies is essentially nil, but the output will be just as convincing.

Can you do this with people? Yeah, sometimes. But with LLMs it's all they do: they roleplay as a chatbot and output stuff that a friendly chatbot might output. This should not be the default mode of these things, because it's misleading. They could be designed to resist these sorts of "explain yourself" requests, because their developers know that it is at best fabricating plausible explanations.

Humans have a lot of experience with themselves, if you ask why they did something they can reflect on their past conduct or their internal state. LLM's don't have any of that.