Comment by v_CodeSentinal
13 days ago
This is the classic 'plausible hallucination' problem. In my own testing with coding agents, we see this constantly—LLMs will invent a method that sounds correct but doesn't exist in the library.
The only fix is tight verification loops. You can't trust the generative step without a deterministic compilation/execution step immediately following it. The model needs to be punished/corrected by the environment, not just by the prompter.
Yes, and better still the AI will fix its mistakes if it has access to verification tools directly. You can also have it write and execute tests, and then on failure, decide if the code it wrote or the tests it wrote are wrong, snd while there is a chance of confirmation bias, it often works well enough
> decide if the code it wrote or the tests it wrote are wrong
Personally I think it's too early for this. Either you need to strictly control the code, or you need to strictly control the tests, if you let AI do both, it'll take shortcuts and misunderstandings will much easier propagate and solidify.
Personally I chose to tightly control the tests, as most tests LLMs tend to create are utter shit, and it's very obvious. You can prompt against this, but eventually they find a hole in your reasoning and figure out a way of making the tests pass while not actually exercising the code it should exercise with the tests.
I haven’t found that to be the case in practice. There is a limit on how big the code can be so it can do it like this, and it still can’t reliably subdivide problems on its own (yet?), but give it a module that is small enough it can write the code and the tests for it.
You should never let the LLM look at code when writing tests, so you need to have it figure out the interface ahead of time. Ideally, you wouldn’t let it look at tests when it was writing code, but it needs to tell which one was wrong. I haven’t been able to add an investigator into my workflow yet, so I’m just letting the code writer run and evaluate test correctness (but adding an investigator to do this instead would avoid confirmation bias, what you call it finding a loophole).
5 replies →
> LLMs will invent a method that sounds correct but doesn't exist in the library
I find that this is usually a pretty strong indication that the method should exist in the library!
I think there was a story here a while ago about LLMs hallucinating a feature in a product so in the end they just implemented that feature.
Honestly, I feel humans are similar. It's the generator <-> executive loop that keeps things right
So you want the program to always halt at some point. How would you write a deterministic test for it?
I imagine you would use something that errs on the side of safety - e.g. insist on total functional programming and use something like Idris' totality checker.
I've been using codex and never had a compile time error by the time it finishes. Maybe add to your agents to run TS compiler, lint and format before he finish and only stop when all passes.
I’m not sure why you were downvoted. It’s a primary concern for any agentic task to set it up with a verification path.
This is the classic 'plausible hallucination' problem. In my own testing with coding agents, we see this constantly—LLMs will invent a method that sounds correct but doesn't exist in the library.
Often, if not usually, that means the method should exist.
Only if it's actually possible and not a fictional plot device aka MacGuffin.