Comment by nopinsight

4 hours ago

I assume you're using the "regular" Pro version of Gemini 3.1 for the above, rather than the Deep Think mode, which is more comparable to GPT-5.5 Pro. To my knowledge, regular 3.1 Pro is a tier below and often makes mistakes.

Moreover, there's no reason to believe the progress of LLMs, which couldn't reliably solve high-school math problems just 3–4 years ago, will stop anytime soon.

You might want to track the progress of these models on the CritPt benchmark, which is built on *unpublished, research-level* physics problems:

https://critpt.com/

Frontier models are still nowhere near solving it, but progress has been rapid.

* o3 (high) <1.5 years ago was at 1.4%

* GPT 5.4 (xhigh), 23.4%

* GPT-5.5 (xhigh), 27.1%

* GPT-5.5 Pro (xhigh) 30.6%.

https://artificialanalysis.ai/evaluations/critpt.

11 comments

nopinsight

FrojoS 2 hours ago

> there's no reason to believe the progress of LLMs [...] will stop anytime soon

Wrong. Every advancement has followed a s curve. Where we are on that curve is anyones guess. Or maybe "this time its different".

aurareturn 2 hours ago
He said "will stop anytime soon". He didn't say forever.
- Lionga 2 hours ago
  
  Which still makes no sense. There is the same chance we are flatlining now as that we are flatlining in e.g. 3 years or 5 years.
  
  1 reply →

civvv 2 hours ago

There are many indications that model progress is slowing down, so that is not entirely accurate.

StrauXX 2 hours ago
Which indications are that?
- lionkor 1 hour ago
  
  Nobody is releasing NEW models
  
  1 reply →
- overfeed 2 hours ago
  
  Investment dollars.
  
  1 reply →