Comment by dTal
2 days ago
I disagree, it's a very insightful comment.
The problem is that any information about any internal processes used to generate a particular token is lost; the LLM is stateless, apart from the generated text. If you ask an LLM-character (which I agree should be held distinct from the LLM itself and exists at a different layer of abstraction) why it said something, the best it can do is a post-hoc guess. The "character", and any internal state we might wish it to have, only exists insofar as it can be derived anew from the text.
I certainly agree with the point about post-hoc justifications – but isn't it amazing that it's also something very familiar to humans who do that all the time and manage to lie to ourselves about it very convincingly?! The more you read about neuropsychology the more you're forced to assume a view where the conscious self, whatever it is, has only a very tenuous grasp of what is going on and how much it actually has control over things.
In any case, you don't need accurate understanding of how your mind works (hello humans, again!) to be able to converge on
when there's no other uniquely good local optimum in the search space.