Comment by stephc_int13

6 hours ago

This is a nice benchmark IMO. I would be curious to see how competitors and improved models would compare.

And how long will it take before an open model recreates this. The "vibe" consensus before "thinking" models really took off was that open was ~6mo behind SotA. With the massive RL improvements, over the past 6 months I've thought the gap was actually increasing. This will be a nice little verifiable test going forward.