Comment by tptacek

1 month ago

I remember that being true of early ChatGPT, but it's certainly not true anymore; GPT 4o and 5 have tagged along with me through all of MathAcademy MFII, MFIII, and MFML (this is roughly undergrad Calc 2 and then like half a stat class and 2/3rds of a linear algebra class) and I can't remember it getting anything wrong.

Presumably this is all a consequence of better tool call training and better math tool calls behind the scenes, but: they're really good at math stuff now, including checking my proofs (of course, the proof stuff I've had to do is extremely boring and nothing resembling actual science; I'm just saying, they don't make 7th-grader mistakes anymore.)

3 comments

tptacek

tombert 1 month ago

It's definitely gotten considerably better, though I still have issues with it generating proofs, at least with TLAPS.

I think behind the scenes it's phoning Wolfram Alpha nowadays for a lot of the numeric and algebraic stuff. For all I know, they might even have an Isabelle instance running for some of the even-more abstract mathematics.

I agree that this is largely an early ChatGPT problem though, I just thought it was interesting in that they were "plausible" mistakes. I could totally see twelve-year-old tombert making these exact mistakes, so I thought it was interesting that a robot is making the same mistakes an amateur human makes.

CamperBob2 1 month ago

I think behind the scenes it's phoning Wolfram Alpha nowadays for a lot of the numeric and algebraic stuff. For all I know, they might even have an Isabelle instance running for some of the even-more abstract mathematics.
Maybe, but they swear they didn't use external tools on the IMO problem set.
tptacek 1 month ago

I assumed it was just writing SymPy or something.