← Back to context

Comment by falcor84

5 months ago

That's a good point. Assuming they're strategic about releasing this benchmark, they likely already evaluated o3 on it and saw that it performs favorably. Perhaps they're now holding off until they have a chance to tune it further, and then release a strong improvement and get additional buzz a bit later on.

Although I wouldn't bet against o3, I think it works to their favor to release it later no matter how well it is doing.

Case 1, does worse than or is on-par with o1: Would be shocking and not a great sign for their test-time compute approach, at least in this domain. Obviously they would not want to release results.

Case 2, slightly better than o1: I think "holding off until they have a chance to tune it further" applies.

Case 3, does much better than o3: They get to release it after another model makes a noticeable improvement on the benchmark, get another good press release to keep hype high, and they get to tune it further before releasing results.

  • Altman stated they won't release o3 by itself. They plan to release it as part of GPT-5. GPT-5 will incorporate all sub types of model: reasoning, image, video, voice, etc.