Comment by mplanchard

1 day ago

The genre of LLM output when it is asked to “explain itself” is fascinating. Obviously it shows the person promoting it doesn’t understand the system they’re working with, but the tone of the resulting output is remarkably consistent between this and the last “an LLM deleted my prod database” twitter post that I remember seeing: https://xcancel.com/jasonlk/status/1946025823502578100

Two interpretations: either it's pure pattern-completion landing on the same trough, or whatever's underneath has a stable shape that the explanation tracks. Both are interesting. The "users don't understand the system" frame doesn't really pick between them.

Go watch an episode of COPS. Humans giving post-hoc explanations of their own behavior do the exact same thing.