Comment by kundan_s__r
1 month ago
Whether or not Hallucination “happens often” depends heavily on the task domain and how you define correctness. In a simple conversational question about general knowledge, an LLM might be right more often than not — but in complex domains like cloud config, compliance, law, or system design, even a single confidently wrong answer can be catastrophic.
The real risk isn’t frequency averaged across all use cases — it’s impact when it does occur. That’s why confidence alone isn’t a good proxy: models inherently generate fluent text whether they know the right answer or not.
A better way to think about it is: Does this output satisfy the contract you intended for your use case? If not, it’s unfit for production regardless of overall accuracy rates.
No comments yet
Contribute on Hacker News ↗