← Back to context

Comment by brokenmachine

11 hours ago

LLMs do not have the ability to make decisions and they don't even have any awareness of the veracity of the tokens they are responding with.

They are useful for certain tasks, but have no inherent intelligence.

There is also no guarantee that they will improve, as can be seen by ChatGPT5 doing worse than ChatGPT4 by some metrics.

Increasing an AI's training data and model size does not automatically eliminate hallucinations, and can sometimes worsen them, and can also make the errors and hallucinations it makes both more confident and more complex.

Overstating their abilities just continues the hype train.

LLMs do have some internal representations that predict pretty well when they are making stuff up.

https://arxiv.org/abs/2509.03531v1 - We present a cheap, scalable method for real-time identification of hallucinated tokens in long-form generations, and scale it effectively to 70B parameter models. Our approach targets \emph{entity-level hallucinations} -- e.g., fabricated names, dates, citations -- rather than claim-level, thereby naturally mapping to token-level labels and enabling streaming detection. We develop an annotation methodology that leverages web search to annotate model responses with grounded labels indicating which tokens correspond to fabricated entities. This dataset enables us to train effective hallucination classifiers with simple and efficient methods such as linear probes. Evaluating across four model families, our classifiers consistently outperform baselines on long-form responses, including more expensive methods such as semantic entropy (e.g., AUC 0.90 vs 0.71 for Llama-3.3-70B)