Comment by camgunz
13 hours ago
I could quibble with some things, but this is right. I don't have a paid account so I can't ping away at 5.4 or whatever, but, I do have access to frontier models at work, and they hallucinate regularly. Dunno what to do if you don't believe this; good luck I guess.
I agree that they hallucinate sometimes. I agree they bullshit sometimes. But the extent is way overblown. They basically don't bullshit ever under the constraints of
1. 2-3 pages of text context
2. GPT-5.4 thinking
I don't think the spirit of the original article (not your comments to be fair) captured this, hence the challenge. I believe we are on the same page here.
> I don't think the spirit of the original article (not your comments to be fair) captured this, hence the challenge. I believe we are on the same page here.
No. GPT-5 has a 40% hallucination rate [0] on SimpleQA [1] without web searching. The SimpleQA questions meet your criteria of "2-3 pages of text content. Unless 5.4 + web searching erases that (I bet it doesn't!) these are bullshit machines.
[0]: https://arxiv.org/pdf/2601.03267
[1]: https://github.com/openai/simple-evals
Specifically in the case where it can use tools - no it doesn't hallucinate. Which is why you are struggling to find counterexamples.
7 replies →