Comment by johnfn
16 hours ago
I'm curious about the statement that hallucinations are "fundamentally unsolvable". I don't think an AI agent has left a hallucination in my code - by which I mean a reference to something which doesn't exist at all - in many months. I have had great luck driving hallucinations to effectively 0% by using a language with static typechecking, telling LLMs to iterate on type errors until there are none left, and of course having a robust unit and e2e test suite. I mean, sure, I run into other problems -- it does make logic errors at some rate, but those I would hardly categorize the same as hallucinations.
So type errors are not hallucinations in your book, but "a reference to something which doesn't exist at all" is?
In the context of AI most people I know tend to mean wrong output, not just hallucinations in the literal sense of the word or things you cannot catch in an automated way.
My statement is that if your only hallucinations are type errors, that can be solved by simply wrapping the LLM in a harness that says "Please continue working until `yarn run tsc` is clean". Yes, the LLM still hallucinates, but it doesn't affect me, because by the time I see the code, the hallucinations are gone.
This is something I do every day; to be quite honest, it's a fairly mundane use of AI and I don't understand why it's controversial. To give context, I've probably generated somewhere on the order of 100k loc of AI generated code and I can't remember the last time I have seen a hallucination.
Well of course it'll eventually work, just a random text generator will eventually produce code that passes your tests if you run it hard enough.
The problem is it's devouring your tokens as it does so. While you're on a subsidized plan that seems like a non-issue, but once the providers start charging you actual costs for usage.. yeah, the hallucinations will be a showstopper for you.
1 reply →
Maybe you're lucky. I had Opus 4.6 hallucinate a non-existing configuration key in a well known framework literally a few hours ago.
Granted, it fixed the problem in the very next prompt.
Couldn’t that problem be solved with static typechecking?
In a yaml file? I don't think so.
ChatGPT 5.2 kept gaslighting me yesterday telling me that LLMs were explainable with Shapley values, and it kept referencing papers which talk about LLMs, and about SHAP, but talk about LLMs being used to explain the SHAP values of other ML models.
I encounter stuff like this every week, I don't know how you don't. I suppose a well-structured codebase in a statically typed language might not provide as much of a surface for hallucinations to present themselves? But like you say, logical problems of course still occur.
I mean to say that code generation never hallucinates. I suppose that was unclear.
>> I don't think an AI agent has left a hallucination in my code
I literally just went on Gemini, latest and best model and asked it "hey can you give me the best prices for 12TB hard drives available with the British retailer CeX?" and it went "sure, I just checked their live stock and here they are:". Every single one was made up. I pointed it out, it said sorry, I just checked again, here they are, definitely 100% correct now. Again, all of them were made up. This repeated a few times, I accused it of lying, then it went "you're right, I don't actually have the ability to check, so I just used products and values closest to what they should have in stock".
So yeah, hallucinations are still very much there and still very much feeding people garbage.
Not to mention I'm a part of multiple FB groups for car enthusiasts and the amount of AI misinformation that we have to correct daily is just staggering. I'm not talking political stuff - just people copy pasting responses from AI which confidently state that feature X exists or works in a certain way, where in reality it has never existed at all.
My comment was about code, not fact checking - that’s why I said they were a solved problem provided you use static typechecking and tests.