Comment by coldtea
5 hours ago
It might be extra demand for rigor that's not equally applied to humans. One could argue that other coders in our teams, or even ourselves, often fail in "a miserable way", say about 20% of the time. But we block this out, or consider it "regular functioning", or just a one-off based on something we got wrong, "just a try" we redo, etc.
But when an LLM does it on an area we know, we notice and suddenly it's too much.
Because a human fails in a known way. If a human does not have expertise in domain X or tech Y, they will fail there and the expectation is that they will fail.
With an LLM you never know where it can fail. There is no domain expertise for an LLM. It can fail in a miserable way in the same domain it worked spectacularly for.
No. It is not intelligent at all to confidently assert false things you know nothing about, and humans don’t do this outside of compulsive liars. For example…
A few days ago I asked ChatGPT where a Spurgeon quote came from. Response:
“That quote is widely attributed to Charles Spurgeon, but pinning down an exact sermon or written source is surprisingly difficult—and that’s a red flag.
Short answer There’s no well-attested primary source (sermon, lecture, or publication) where Spurgeon clearly says that exact wording.” Etc. etc. … Why it sounds like Spurgeon It fits his theology and rhetoric almost perfectly: • etc etc. … Closest authentic themes (but not the quote) Spurgeon repeatedly says things like: • etc etc. … So the quote is basically: a modern condensation of real Spurgeon ideas, not a verifiable citation etc. etc.”
Utter bullshit. One web search produces the full sermon manuscript with the quote.
One could argue that the previous context in the thread primed the LLM to fail here, but once again, a person is not confused by the change of topic.
>It is not intelligent at all to confidently assert false things you know nothing about, and humans don’t do this outside of compulsive liars.
"The Dunning-Kruger effect describes a disturbing cognitive bias that afflicts us all. People with limited expertise in an area tend to overestimate how much they know—and we all have gaps in our expertise." [1]
[1] https://www.openmindmag.org/articles/david-dunning-on-expert...
> But when an LLM does it on an area we know, we notice and suddenly it's too much.
Well of course. The owners of the companies building this are constantly talking about it replacing us all. Why would it be surprising that it would then be held to a higher standard?
Because it doesn't need to match a higher standard to "replace us all". It's enough that it works on the same standard, or even a lesser one, but for cheaper, with no complaints, and 24/7.
Anthropic says that LLM code "structurally exceeds human standards".