Comment by famouswaffles
1 year ago
Pretty much all research (and there's a fair few with different methodologies) on this converge on the same conclusion:
LLMs internally know a lot more about the uncertainty and factualness of their predictions than they say. "LLMs are always hallucinating" is a popular stance but wrong all the same. Maybe rather than asking Why models hallucinate, the better question is to ask "Why not?". During pre-training, there's close to zero incentive to push any uncertainty to the forefront (words).
GPT-4 logits calibration pre RLHF - https://imgur.com/a/3gYel9r
Language Models (Mostly) Know What They Know - https://arxiv.org/abs/2207.05221
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets - https://arxiv.org/abs/2310.06824
The Internal State of an LLM Knows When It's Lying - https://arxiv.org/abs/2304.13734
LLMs Know More Than What They Say - https://arjunbansal.substack.com/p/llms-know-more-than-what-...
Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback - https://arxiv.org/abs/2305.14975
Teaching Models to Express Their Uncertainty in Words - https://arxiv.org/abs/2205.14334
Yes, because just like chess ELO that we discussed the other day, they need to learn this in order to do well on their training objective - impersonating (continuing) their training sources. If they are continuing a lie then they need to have recognized the input as having this "context", and take that into account during prediction.
Right but then the problem of hallucination has little to do with statistical generation and much more the utter lack of any incentive in pre-training or otherwise to push features the model has already learnt into the words it generates.
Right, more due to the inherent nature of an LLM than due to that nature being a statistical generator, although as such they amount to the same thing.
One way of looking at it is model talking itself into a corner, with no good way to escape/continue, due to not planning ahead...
e.g. Say we ask an LLM "What is the capital of Scotland?", and so it starts off with an answer of the sort it has learnt should follow such a question "The capital of Scotland is ...". Now, at this point in the generation it's a bit late if the answer wasn't actually in the training data, but the model needs to keep on generating, so does the best it can and draws upon other statistics such as capital cities being large and famous, so maybe continues with "Glasgow" (a large famous Scottish city), which unfortunately is incorrect.
Another way of looking at it rather than talking itself into a corner (and having to LLM it's way out of it), is that hallucinations (non-sequiturs) happen when the model is operating out of distribution and has to combine multiple sources such as the expected form of a "What is .." question reply, and a word matching the (city, Scottish, large, famous) "template".
2 replies →