Comment by jmyeet
1 month ago
This is a known problem and an active area of research [1][2][3][4].
[1]: https://arxiv.org/html/2505.15623v1
[2]: https://medium.com/@adnanmasood/why-large-language-models-st...
[3]: https://www.reachcapital.com/resources/thought-leadership/wh...
[4]: https://mathoverflow.net/questions/502120/examples-for-the-u...
the research does't capture the fact that LLM's can easily multiply these results. I mean it literally won gold in IMO, Putnam.
Take 10,000 such multiplications. I'm sure not even a single one would be incorrect with GPT 5.2 (thinking). Want a wager?