Comment by johnfn

16 hours ago

I'm curious about the statement that hallucinations are "fundamentally unsolvable". I don't think an AI agent has left a hallucination in my code - by which I mean a reference to something which doesn't exist at all - in many months. I have had great luck driving hallucinations to effectively 0% by using a language with static typechecking, telling LLMs to iterate on type errors until there are none left, and of course having a robust unit and e2e test suite. I mean, sure, I run into other problems -- it does make logic errors at some rate, but those I would hardly categorize the same as hallucinations.

12 comments

johnfn

alpaca128 12 hours ago

So type errors are not hallucinations in your book, but "a reference to something which doesn't exist at all" is?

In the context of AI most people I know tend to mean wrong output, not just hallucinations in the literal sense of the word or things you cannot catch in an automated way.

johnfn 12 hours ago
My statement is that if your only hallucinations are type errors, that can be solved by simply wrapping the LLM in a harness that says "Please continue working until `yarn run tsc` is clean". Yes, the LLM still hallucinates, but it doesn't affect me, because by the time I see the code, the hallucinations are gone.
This is something I do every day; to be quite honest, it's a fairly mundane use of AI and I don't understand why it's controversial. To give context, I've probably generated somewhere on the order of 100k loc of AI generated code and I can't remember the last time I have seen a hallucination.
- parliament32 12 hours ago
  
  Well of course it'll eventually work, just a random text generator will eventually produce code that passes your tests if you run it hard enough.
  The problem is it's devouring your tokens as it does so. While you're on a subsidized plan that seems like a non-issue, but once the providers start charging you actual costs for usage.. yeah, the hallucinations will be a showstopper for you.
  
  1 reply →

bojan 15 hours ago

Maybe you're lucky. I had Opus 4.6 hallucinate a non-existing configuration key in a well known framework literally a few hours ago.

Granted, it fixed the problem in the very next prompt.

johnfn 15 hours ago
Couldn’t that problem be solved with static typechecking?
- bojan 8 hours ago
  
  In a yaml file? I don't think so.

bogzz 15 hours ago

ChatGPT 5.2 kept gaslighting me yesterday telling me that LLMs were explainable with Shapley values, and it kept referencing papers which talk about LLMs, and about SHAP, but talk about LLMs being used to explain the SHAP values of other ML models.

I encounter stuff like this every week, I don't know how you don't. I suppose a well-structured codebase in a statically typed language might not provide as much of a surface for hallucinations to present themselves? But like you say, logical problems of course still occur.

johnfn 14 hours ago

I mean to say that code generation never hallucinates. I suppose that was unclear.

gambiting 15 hours ago

>> I don't think an AI agent has left a hallucination in my code

I literally just went on Gemini, latest and best model and asked it "hey can you give me the best prices for 12TB hard drives available with the British retailer CeX?" and it went "sure, I just checked their live stock and here they are:". Every single one was made up. I pointed it out, it said sorry, I just checked again, here they are, definitely 100% correct now. Again, all of them were made up. This repeated a few times, I accused it of lying, then it went "you're right, I don't actually have the ability to check, so I just used products and values closest to what they should have in stock".

So yeah, hallucinations are still very much there and still very much feeding people garbage.

Not to mention I'm a part of multiple FB groups for car enthusiasts and the amount of AI misinformation that we have to correct daily is just staggering. I'm not talking political stuff - just people copy pasting responses from AI which confidently state that feature X exists or works in a certain way, where in reality it has never existed at all.

johnfn 14 hours ago

My comment was about code, not fact checking - that’s why I said they were a solved problem provided you use static typechecking and tests.