Comment by lvl155

7 months ago

Seems OpenAI knew this is forthcoming so they front ran the news? I was really high on Gemini 2.5 Pro after release but I kept going back to o3 for anything I cared about.

12 comments

lvl155

erichocean 7 months ago

I regularly have the opposite experience: o3 is almost unusable, and Gemini 2.5 Pro is reliably great. Claude Opus 4 is a close second.

o3 is so bad it makes me wonder if I'm being served a different model? My o3 responses are so truncated and simplified as to be useless. Maybe my problems aren't a good fit, but whatever it is: o3 output isn't useful.

Davidzheng 7 months ago
I have this distinctive feeling that o3 tries to trick me intentionally when it can't solve a problem by cleverly hiding its mistakes. But I could be imagining it
- int_19h 7 months ago
  
  It's certainly the "laziest" model, in the sense that it seems to be the likeliest to avoid doing the actual work and generate "TBD" stubs instead.
helloplanets 7 months ago

Are you using a tool other than ChatGPT? If so, check the full prompt that's being sent. It can sometimes kneecap the model.
Tools having slightly unsuitable built in prompts/context sometimes lead to the models saying weird stuff out of the blue, instead of it actually being a 'baked in' behavior of the model itself. Seen this happen for both Gemini 2.5 Pro and o3.
square_usual 7 months ago
Are you using o3 on the official ChatGPT app or via API? I use it on the app and it performs very well, it's my go-to model for general purpose LLM use.
- erichocean 7 months ago
  
  official ChatGPT app

sigmoid10 7 months ago

>I was really high on Gemini 2.5 Pro after release but I kept going back to o3 for anything I cared about

Same here. I was impressed by their benchmarks and topping most leaderboards, but in day to day use they still feel so far behind.

aerhardt 7 months ago
I use o3, openAI API and Claude Code. Genuinely curious what about Gemini 2.5 is so far behind?
- nicce 7 months ago
  
  I don’t think it is behind in anything. It is just harder to make obey and redefine the default system command. It is very verbose model by default.
  
  1 reply →
danjl 7 months ago
I think that's most likely just your view, and not really based on evidence.
- ml-anon 7 months ago
  
  [flagged]