Comment by wedn3sday

4 months ago

The only metric I really care about, and the one that I think shows the fundamental failure of LLMs as a technology, is this one here [1]. The fact that o1 fails a non-zero amount of the time on the question, "what is 6*1?" means that the models just do not "understand" _anything_ and are still just fancy stochastic parrots. Now, stochastic parrots are still useful! Just not the digital god a lot of people seam to think we're heading towards.

[1] https://www.reddit.com/media?url=https%3A%2F%2Fpreview.redd....

7 comments

wedn3sday

brianush1 4 months ago

I'm not seeing anything in that graph that implies that o1 ever fails on "what is 6*1?" The chart is graphing the number of digits on each axis; it fails on "what is (some 6 digit number) * (some 1 digit number)"

loufe 4 months ago

I don't think this will or necessarily should ever be fixed. The eventual solution (I imagine) will be to simply plug in a calculator. All the MCP talk on HN pushed me to try MCP out, and I'm sold. A Swiss army knife of tools like a calculator available would let a brain do what a brain is best at, and a calculator what a calculator is best at.

jug 4 months ago

The chart you show is about the accuracy of x*y where X and Y are an increasing amount of digits.

This graph shows that both o1 and o3-mini are better at calculating in one’s head than any human I have known. It only starts to break down towards calculating the product of two eight digit factors etc.

ranman 4 months ago

Humanity fails that question an embarrassingly large number of times.

croon 4 months ago
I can proudly (?) proclaim I will never fail that question. Pretty sure I don't know anyone who would either, including my 7yo.
- waxingjnts 4 months ago
  
  I've certainly heard things wrong before, and had high fever, and loads of alcohol, and migraines, and high sleep deprivation, and andrenaline. All things that greatly affect whether I can do something seemingly simple or not
- hnben 4 months ago
  
  your 7yo can multiply a 5digit number with another 5digit number with >95% accuracy?