Comment by littlestymaar

1 year ago

press X to doubt

o3-mini wasn't even the second place for non-STEM tasks, and in today's announcement they don't even publish benchmarks for those. What's impressive about Gemini 2.5 pro (and was also really impressive with R1) is how good the model is for a very broad range of tasks, not just benchmaxing on AIME.

5 comments

littlestymaar

kadushka 1 year ago

I had a philosophical discussion with o3 model earlier today. It was much better than 2.5 pro. In fact it was pretty much what I would expect from a professional philosopher.

littlestymaar 1 year ago
I'm not expecting someone paying $200 a month to access something to be objective about that particular something.
Also “what I would expect from a professional philosopher”, is that your argument, really?
- kadushka 1 year ago
  
  I’m paying $20/mo, and I’m paying the same for Gemini and for Claude.
  What’s wrong with my argument? You questioned the performance of the model on non-STEM tasks, and I gave you my impression.
  
  2 replies →