Comment by coldtea

7 hours ago

It might be extra demand for rigor that's not equally applied to humans. One could argue that other coders in our teams, or even ourselves, often fail in "a miserable way", say about 20% of the time. But we block this out, or consider it "regular functioning", or just a one-off based on something we got wrong, "just a try" we redo, etc.

But when an LLM does it on an area we know, we notice and suddenly it's too much.

8 comments

coldtea

nibbleyou 5 hours ago

Because a human fails in a known way. If a human does not have expertise in domain X or tech Y, they will fail there and the expectation is that they will fail.

With an LLM you never know where it can fail. There is no domain expertise for an LLM. It can fail in a miserable way in the same domain it worked spectacularly for.

Aeolos 19 minutes ago
Humans fail in infinitely more complicated ways than LLMs. They can have a difficult personality, a medical issue, family stress, hangover, sleep deprivation or they can just wake on the wrong side of the bed. On any given day, you never know if you will get an expert in domain X or a sleep-deprived version of the same that accidentally drops a database.
Indeed, if you remember before AI took the world by storm, HN used to be chock-full of articles about how the hiring process is broken for both employers and candidates, where you can never tell if what you see is what you get.
When I run a local LLM I get none of that. I hit the intelligence walls or buggy behaviour, but it doesn't matter if it's 8am or 8pm, the model behaves exactly the same. If something doesn't work as I wished, I can retry as many times as I wanted without the model getting angry at me.
- darkwater 6 minutes ago
  
  Damned squishy humans, with their feelings and moods...

jtbayly 4 hours ago

No. It is not intelligent at all to confidently assert false things you know nothing about, and humans don’t do this outside of compulsive liars. For example…

A few days ago I asked ChatGPT where a Spurgeon quote came from. Response:

“That quote is widely attributed to Charles Spurgeon, but pinning down an exact sermon or written source is surprisingly difficult—and that’s a red flag.

Short answer There’s no well-attested primary source (sermon, lecture, or publication) where Spurgeon clearly says that exact wording.” Etc. etc. … Why it sounds like Spurgeon It fits his theology and rhetoric almost perfectly: • etc etc. … Closest authentic themes (but not the quote) Spurgeon repeatedly says things like: • etc etc. … So the quote is basically: a modern condensation of real Spurgeon ideas, not a verifiable citation etc. etc.”

Utter bullshit. One web search produces the full sermon manuscript with the quote.

One could argue that the previous context in the thread primed the LLM to fail here, but once again, a person is not confused by the change of topic.

542354234235 3 hours ago

>It is not intelligent at all to confidently assert false things you know nothing about, and humans don’t do this outside of compulsive liars.
"The Dunning-Kruger effect describes a disturbing cognitive bias that afflicts us all. People with limited expertise in an area tend to overestimate how much they know—and we all have gaps in our expertise." [1]
[1] https://www.openmindmag.org/articles/david-dunning-on-expert...

girvo 5 hours ago

> But when an LLM does it on an area we know, we notice and suddenly it's too much.

Well of course. The owners of the companies building this are constantly talking about it replacing us all. Why would it be surprising that it would then be held to a higher standard?

coldtea 4 hours ago
Because it doesn't need to match a higher standard to "replace us all". It's enough that it works on the same standard, or even a lesser one, but for cheaper, with no complaints, and 24/7.
- lenkite 2 hours ago
  
  Anthropic says that LLM code "structurally exceeds human standards".