Comment by atleastoptimal

21 hours ago

I think the use of the word "hallucination" with respect to AI confidently making errors has led a lot of people astray, including the author.

He claims that his company has "solved" hallucination by creating a verifiable fact-finding system, which is like saying that a person has solved plan crashes by creating a plane that never leaves the ground.

When an LLM says something incorrect, it often is due to that LLM reaching the limits of its abilities, but it doesn't "know" (for lack of a better term) what being wrong feels like, so it will try its best to fit the information it has into a compelling story. The reason why scaling leads to fewer hallucinations is that the model can hold more abstractions, more facts about the world, it can work through the complex, vague machinery of reason with more scaffolding, and more of a buffer (via its weights) to reason with nuance. This is why LLM's are useful, not because they can be fed into a fact-retrieval system, but because they can produce new information via the association of things they know.

The point is, we want LLM's to actually produce new information and work out things via their thinking, not be limited to citing facts that already exist and avoid veering into the limits of its abilities. In that sense hallucination is really just exposing the limits of scale, which would necessitate scaling models further.

Scaling is the only way we have gotten to this interesting, emergent property of LLM's. Further, the best way to make small models which don't hallucinate (that we've found so far) is to train a big model first, then distill it, or use it as a teacher to a smaller model. Either way, pursuing scale is the most defensible strategy, and a more robust solution to hallucination.

> it can work through the complex, vague machinery of reason with more scaffolding

No, it can hold more floating point numbers.

I'm not an expert in the field, but I've yet to see a solid rebuttal to this paper;

https://arxiv.org/abs/2401.11817

  • A claim that LLM's can in a theoretical sense be 100% accurate all the time is not the same as the claim that scaling models with more compute/params will reduce hallucination. The former is a far stronger claim and I agree with the paper in that it probably isn't the case, but we don't rely on general reasoners (a.k.a. humans) to be 100% accurate all the time either.

    > No, it can hold more floating point numbers.

    Fallacy of composition. Just because an LLM is made up of floating point numbers doesn't mean its capabilities are limited to that of bare floating point numbers, in the same way that the individual faculties of a neuron don't preclude the human brain from emergent properties born from the synthesis of its synapses.

    • You're the one who started with the "complex, vague machinery of reason with more scaffolding" here. I'm simply pointing out that that's not actually a thing: it's just floating point numbers.

  • That paper shows hallucinations can't be eliminated, due to approximation error. But it is completely compatible with hallucination becoming less probable as scale reduces that approximation error.