← Back to context

Comment by deanc

7 hours ago

> Can you predict when and how the SOTA model will hallucinate? Yes or no. Can you predict the severity impact of that error beforehand? Yes or no.

No, but the same can be said for your colleagues. You might call what the LLM does hallucinations, I'd call them mistakes. I think we have totally forgotten that humans make them all the time and are confidently wrong too.

Your original question, doesn't really get to the bottom of the point I'm trying to make, and I don't really feel it fairly represents the issue we are talking about here. They are not the same things.

This is such a tired, meaningless argument. I've never seen a human in 10 years of professional software engineering at a large company ever so confidently, consistently create and send out seemingly well-reasoned code that's as wrong as what SOTA models using CC or Codex do. If a human did this, they would be fired or perpetually remain a junior who no one wants to work with.

Also, if a human does this, you can replace them and get a human who will not do it. The default for an LLM is to generate plausible-looking text that may or may not be completely incoherent. That is not the default for a human. Again, if you find that your colleague consistently fabricates APIs, you can hire someone who isn't crazy instead, but you cannot do the same with LLMs.

If a human was hallucinating and polluting a codebase with errors, they would be fired and possibly treated for dementia. Even worse, an LLM is trained to produce plausible-looking results, so it's harder to detect the mistakes.

>No, but the same can be said for your colleagues.

That's absolutely false. My collegues don't routinely and confidently invent apis that are not there, or spectacularly and repeatedly misunderstand the purpose of certain functions or exhibit extreme forgetfullness. Especially when I've warned them. Hallucinations and confabulations in otherwise healthy individuals are mental disorders. When I ask them why they made an certain kind of error, I can expect to get a reasonable answer. No one has uttered the phrase "Bob hallucinated again while writing those tests" when the Bob in question is a human.

  • Well, your experience doesn't align with mine. I have been using, and in part of an organisation that is extensively using, Claude with Opus for everything for about 3 months now and I am not experiencing the problems you describe. We'll have to agree to disagree here.

    • Not only have I never ran across a hallucination in the past ~6 months or so; the latest Opus models have gotten to the point where they can emit inline assembly that is _superior_ to what gcc or clang can generate from optimized cpp. Had it rewrite a hot simd loop that took it from ~10 flops/cyc to ~14 by shaving off broadcasts. I _could not_ get any compiler to do this, no matter which flags I tried to use. So I literally have no idea what these people are talking about when they claim that SOA models hallucinate constantly.

    • That is fine. "Your experience may vary" is the crux of my argument amusingly. You can't have just realized that people are having different experiences using AI, or even that the same person has different experiences when they change domains or technical contexts. There's been lots of comments littered on this forum to that effect.

      Calling hallucinations simply mistakes does not seem to me to be a healthy way to reason about LLMs. I can ask a collegue how well they can program in Ada and adjust my expectations on productivity and bug rates. I can't ask an LLM how well they can code in Ada (just a throwaway example), or even how much of Ada was in its training data. I have to actually spend money and spend time code reviewing before I can even formulate any expectations at all.