← Back to context

Comment by otabdeveloper4

1 hour ago

> the spicy autocomplete can solve difficult open math problems

No it can't. It can't even solve my son's 4th grade math homework. (This is a real use case for me, not a dumb benchmark.)

You just know nothing about math and are happy to parrot bullshit AI salesmen are selling you.

Terrence Tao disagrees with what you're saying. I think he's in a slightly better position to speak on the subject.

I would genuinely be interested in knowing what you're doing that led you to this conclusion.

I would be shocked if I was unable to solve 4th grade math homework with any of the contemporary frontier models. I spend most days using them to do significantly more complex things than that.

  • If they took a blurry photo of the piece of paper and uploaded to chatGPT saying "solve this" then I would totally believe it. The frontier models are mostly obnoxiously bad at OCR and properly ingesting what's on an image of a page.

    If you write out the 4th grade math problem, they would have no trouble.

> You just know nothing about math and are happy to parrot bullshit AI salesmen are selling you.

Not the parent poster here. I do know things about math. I wrote a few papers related to the unit distance problem (https://arxiv.org/abs/2311.10069, https://arxiv.org/abs/2406.15317) and spent quite some time trying to solve it. I had no chance of coming up with the proof that the spicy autocomplete came up with. Dumb benchmark, sure.

  • LLMs are good with symbolic manipulation but can't reason.

    You can skirt around not reasoning in research math because so much of it is just extremely tedious symbolic manipulation.

    You can't cheat with advanced fourth grade math, though. They don't know algebra yet and can't substitute verbosity for reasoning.

Reasoning models with access to Python have been able to solve 4th grade math homework for over a year now. Prove me wrong: show me a 4th grade math problem they can't handle.

  • > show me a 4th grade math problem they can't handle

    Sure.

    "8 7 6 5 4 3 2 1 - add minus signs and parenthesis to get 31."

    P.S. There is an answer online and some LLMs will just copy it verbatim. This doesn't count.