Comment by HarHarVeryFunny

1 year ago

Yes, because just like chess ELO that we discussed the other day, they need to learn this in order to do well on their training objective - impersonating (continuing) their training sources. If they are continuing a lie then they need to have recognized the input as having this "context", and take that into account during prediction.

Right but then the problem of hallucination has little to do with statistical generation and much more the utter lack of any incentive in pre-training or otherwise to push features the model has already learnt into the words it generates.

  • Right, more due to the inherent nature of an LLM than due to that nature being a statistical generator, although as such they amount to the same thing.

    One way of looking at it is model talking itself into a corner, with no good way to escape/continue, due to not planning ahead...

    e.g. Say we ask an LLM "What is the capital of Scotland?", and so it starts off with an answer of the sort it has learnt should follow such a question "The capital of Scotland is ...". Now, at this point in the generation it's a bit late if the answer wasn't actually in the training data, but the model needs to keep on generating, so does the best it can and draws upon other statistics such as capital cities being large and famous, so maybe continues with "Glasgow" (a large famous Scottish city), which unfortunately is incorrect.

    Another way of looking at it rather than talking itself into a corner (and having to LLM it's way out of it), is that hallucinations (non-sequiturs) happen when the model is operating out of distribution and has to combine multiple sources such as the expected form of a "What is .." question reply, and a word matching the (city, Scottish, large, famous) "template".

    • I think this may be the best explanation I've seen on the topic!

      But, shouldn't that situation be handled somewhat by backtracking sampling techniques like beam search? But maybe that is not used much in practice due to being more expensive.. don't know.

      1 reply →