Comment by roywiggins

1 day ago

You can trivially gaslight Claude into "apologizing" for and "explaining" something that ChatGPT said if you pass it a ChatGPT conversation but attributed to itself. The causal connection between the internal deliberations that produced the initial statements and the apologies is essentially nil, but the output will be just as convincing.

Can you do this with people? Yeah, sometimes. But with LLMs it's all they do: they roleplay as a chatbot and output stuff that a friendly chatbot might output. This should not be the default mode of these things, because it's misleading. They could be designed to resist these sorts of "explain yourself" requests, because their developers know that it is at best fabricating plausible explanations.

0 comments

roywiggins

No comments yet

Contribute on Hacker News ↗