Comment by untitled2

7 months ago

Exactly. Whom to believe?

13 comments

untitled2

The last time someone claimed a medal in an olympiad like this, turned out they manually translated the problem into Lean and then ran a brute force search algorithm to find a proof. For 60 hours. On a supercomputer.

Meanwhile high schoolers get a piece of paper and 4.5 hours.

wslh 7 months ago

Even though chess is now effectively solved against human players, I still remember Kasparov's suspicion that one of Deep Blue's moves had a human touch. It was never proven or disproven, but I trust Kasparov's deep intuition amplified by Kasparov requesting access to Deep Blue’s logs, and IBM refusing to share them in full. For more discussions see [1][2][3].
[1] https://chess.stackexchange.com/questions/9959/did-deep-blue...
[2] https://nautil.us/why-the-chess-computer-deep-blue-played-li...
[3] https://en.chessbase.com/post/deep-blue-s-cheating-move
throwawaymaths 7 months ago

kinda wild that an llm cant translate to lean?

changoplatanero 7 months ago

Both are true. One spent $400 in compute and the other one spent a lot more.

masterjack 7 months ago

Exactly. And presumably had a more sophisticated harness around the model, longer reasoning chains, best of N, self judging, etc

kenjackson 7 months ago

OpenAI achieved Gold on an unreleased model. GPT-5. Read the tweets and they explain what they did.

idiotsecant 7 months ago
Actually, I did it a year ago but I just don't want to release my model.
- senkora 7 months ago
  
  Where should I address the billion dollar check?
- emp17344 7 months ago
  
  My buddy did it 5 years ago. You wouldn’t know him, he lives in Canada.
- souldeux 7 months ago
  
  my model goes to a different school
- esafak 7 months ago
  
  The dog ate mine. And the solution didn't fit in the margin, anyway.
e1g 7 months ago
OpenAI explicitly said it’s not GPT-5 but another experimental research model https://x.com/alexwei_/status/1946477756738629827?s=46
- kenjackson 7 months ago
  
  Thanks. I parsed that wrong. In either case not the same thing Math Arena used.