Comment by charlieyu1

4 days ago

Only have basic o3 to try. Spent like 10 minutes but did not return any response due to a network error. Checking the thoughts, the model was doing a lot of brute forcing up to n=8, and found k=0,1,3, but no mathematical reasoning was seen.

2 comments

charlieyu1

CamperBob2 4 days ago

See how this compares to what you got from o3: https://chatgpt.com/share/687bf8bf-c1b0-800b-b316-ca7dd9b009...

It convincingly argues that Gemini's answer was wrong, and Gemini agrees ( https://g.co/gemini/share/aa26fb1a4344 ).

So that's pretty cool, IMO. Pitting these two models against each other in a cage match is an underused hack in my experience.

Another observation worth making is that (looking at the Github link) OpenAI didn't just paste an image of the question into the prompt, hit the button and walk away, like I did. They rewrote the prompts carefully to get the best results, and I'm a little surprised people aren't crying foul about that. So I'm pretty impressed with o3-pro's unassisted performance.

charlieyu1 4 days ago

This answer from o3 looks better. There are still some holes, Lemma 1 works on the four right-most columns, then the model tries to apply it to any four columns. "Hence at least four columns lack a vertical line; take the last four columns above." is a slip in logic, and Lemma 1 doesn't work for any two columns. For example, if we choose columns 1, 3, 5, 7 as columns lacking a vertical line, then we can take a sunny line with slope -1/2 and this will meets at all these four columns. Still, this looks more promising than what I got.
Interesting result from Gemini, I don't know its thought process but it seemed like Gemini tried to improve from its own previous answer and then got there.