Comment by otabdeveloper4

1 hour ago

> the spicy autocomplete can solve difficult open math problems

No it can't. It can't even solve my son's 4th grade math homework. (This is a real use case for me, not a dumb benchmark.)

You just know nothing about math and are happy to parrot bullshit AI salesmen are selling you.

13 comments

otabdeveloper4

ConceptJunkie 8 minutes ago

Terrence Tao disagrees with what you're saying. I think he's in a slightly better position to speak on the subject.

sanderjd 1 hour ago

I would genuinely be interested in knowing what you're doing that led you to this conclusion.

I would be shocked if I was unable to solve 4th grade math homework with any of the contemporary frontier models. I spend most days using them to do significantly more complex things than that.

margalabargala 1 hour ago
If they took a blurry photo of the piece of paper and uploaded to chatGPT saying "solve this" then I would totally believe it. The frontier models are mostly obnoxiously bad at OCR and properly ingesting what's on an image of a page.
If you write out the 4th grade math problem, they would have no trouble.
- otabdeveloper4 1 hour ago
  
  No, LLMs just can't do math.
  
  2 replies →

skinner_ 1 hour ago

> You just know nothing about math and are happy to parrot bullshit AI salesmen are selling you.

Not the parent poster here. I do know things about math. I wrote a few papers related to the unit distance problem (https://arxiv.org/abs/2311.10069, https://arxiv.org/abs/2406.15317) and spent quite some time trying to solve it. I had no chance of coming up with the proof that the spicy autocomplete came up with. Dumb benchmark, sure.

otabdeveloper4 1 hour ago

LLMs are good with symbolic manipulation but can't reason.
You can skirt around not reasoning in research math because so much of it is just extremely tedious symbolic manipulation.
You can't cheat with advanced fourth grade math, though. They don't know algebra yet and can't substitute verbosity for reasoning.

threatofrain 1 hour ago

We've already long past that threshold.

simonw 1 hour ago

Reasoning models with access to Python have been able to solve 4th grade math homework for over a year now. Prove me wrong: show me a 4th grade math problem they can't handle.

otabdeveloper4 1 hour ago
> show me a 4th grade math problem they can't handle
Sure.
"8 7 6 5 4 3 2 1 - add minus signs and parenthesis to get 31."
P.S. There is an answer online and some LLMs will just copy it verbatim. This doesn't count.
- simonw 44 minutes ago
  
  Whoa, 4th grade math problems got hard! I'm not sure how I'd tackle that one myself.
- dwohnitmok 2 minutes ago
  
  [dead]