← Back to context

Comment by LeftHandPath

5 days ago

I was worried about that a couple of years ago, when there was a lot of hope that deeper reasoning skills and hallucination avoidance would simply arrive as emergent properties of a large enough model.

More recently, it seems like that's not the case. Larger models sometimes even hallucinate more [0]. I think the entire sector is suffering from a Dunning Kruger effect -- making an LLM is difficult, and they managed to get something incredible working in a much shorter timeframe than anyone really expected back in the early 2010s. But that led to overconfidence and hype, and I think there will be a much longer tail in terms of future improvements than the industry would like to admit.

Even the more advanced reasoning models will struggle to play a valid game of chess, much less win one, despite having plenty of chess games in their training data [1]. I think that, combined with the trouble of hallucinations, hints at where the limitations of the technology really are.

Hopefully LLMs will scare society into planning how to handle mass automation of thinking and logic, before a more powerful technology that can really do it arrives.

[0]: https://techcrunch.com/2025/04/18/openais-new-reasoning-ai-m...

[1]: https://dev.to/maximsaplin/can-llms-play-chess-ive-tested-13...

really? I find newer models hallucinate less, and I think they have room for improvement, with better training.

I believe hallucinations are partly an artifact of imperfect model training, and thus can be ameliorated with better technique.