← Back to context

Comment by tptacek

3 days ago

Why do you care about hallucination for coding problems? You're in an agent loop; the compiler is ground truth. If the LLM hallucinates, the agent just iterates. You don't even see it unless you make the mistake of looking closely.

What on earth are you talking about??

If the LLM hallucinates, then the code it produces is wrong. That wrong code isn't obviously or programmatically determinable as wrong, the agent has no way to figure out that it's wrong, it's not as if the LLM produces at the same time tests that identify that hallucinated code as being wrong. The only way that this wrong code can be identified as wrong is by the human user "looking closely" and figuring out that it is wrong.

You seem to have this fundamental belief that the code that's produced by your LLM is valid and doesn't need to be evaluated, line-by-line, by a human, before it can be committed?? I have no idea how you came to this belief but it certainly doesn't match my experience.

  • No, what's happening here is we're talking past each other.

    An agent lints and compiles code. The LLM is stochastic and unreliable. The agent is ~200 lines of Python code that checks the exit code of the compiler and relays it back to the LLM. You can easily fool an LLM. You can't fool the compiler.

    I didn't say anything about whether code needs to be reviewed line-by-line by humans. I review LLM code line-by-line. Lots of code that compiles clean is nonetheless horrible. But none of it includes hallucinated API calls.

    Also, from where did this "you seem to have a fundamental belief" stuff come from? You had like 35 words to go on.

    • > If the LLM hallucinates, then the code it produces is wrong. That wrong code isn't obviously or programmatically determinable as wrong, the agent has no way to figure out that it's wrong, it's not as if the LLM produces at the same time tests that identify that hallucinated code as being wrong. The only way that this wrong code can be identified as wrong is by the human user "looking closely" and figuring out that it is wrong

      The LLM can easily hallucinate code that will satisfy the agent and the compiler but will still fail the actual intent of the user.

      > I review LLM code line-by-line. Lots of code that compiles clean is nonetheless horrible.

      Indeed most code that LLMs generate compiles clean and is nevertheless horrible! I'm happy that you recognize this truth, but the fact that you review that LLM-generated code line-by-line makes you an extraordinary exception vs. the normal user, who generates LLM code and absolutely does not review it line-by-line.

      > But none of [the LLM generated code] includes hallucinated API calls.

      Hallucinated API calls are just one of many many possible kinds of hallucinated code that an LLM can generate, by no means does "hallucinated code" describe only "hallucinated API calls" -- !

      8 replies →

    • Linting isn't verification of correctness, and yes, you can fool the compiler, linters, etc. Work with some human interns, they are great at it. Agents will do crazy things to get around linting errors, including removing functionality.

      6 replies →

    • My guy didn't you spend like half your life in the field where your job was to sift through code that compiled but nonetheless had bugs that you tried to exploit? How can you possibly have this belief about AI generated code?

      1 reply →

  • > You seem to have this fundamental belief that the code that's produced by your LLM is valid and doesn't need to be evaluated, line-by-line, by a human, before it can be committed??

    This is a mistaken understanding. The person you responded to has written on these thoughts already and they used memorable words in response to this proposal:

    > Are you a vibe coding Youtuber? Can you not read code? If so: astute point. Otherwise: what the fuck is wrong with you?

    It should be obvious that one would read and verify the code before they commit it. Especially if one works on a team.

    https://fly.io/blog/youre-all-nuts/

    • We should go one step past this and come up with an industry practice where we get someone other than the author to read the code before we merge it.

      2 replies →