Comment by rainsford
14 days ago
While humans can and do make mistakes, it seems to me like there is a larger problem here that LLMs make mistakes for different reasons than humans and that those reasons make them much worse than humans at certain types of problems (e.g. OCR). Worse, this weakness might be fundamental to LLM design rather than something that can be fixed by just LLM-ing harder.
I think a lot of this gets lost in the discussion because people insist on using terminology that anthropomorphizes LLMs to make their mistakes sound human. So LLMs are "hallucinating" rather than having faulty output because their lossy, probabilistic model fundamentally doesn't actually "understand" what's being asked of it the way a human would.
This is what a lot of people miss. We have thousands of years of understanding the kinds of mistakes that humans make; we only have months to years of experience with the mistakes that LLM's and other AI's make.
This means that most of our verification and testing processes won't inherently catch AI errors because they're designed to catch human errors. Things like "check to see if the two sides of these transactions sum to 0" are fine for human typos, but they won't catch a fake (yet accurately entered) transaction.
It's similar to a language barrier. You don't realize how much you rely on context clues until you spend 3 days of emails trying to communicate a complex topic to someone in their second language.
> we only have months to years of experience with the mistakes that LLM's and other AI's make.
The mistakes are also very much model dependent. That you have build a system which improves the accuracy of one models output give you no confidence that it will work on even the next generation of the same model.