Comment by stephc_int13

6 hours ago

This is a nice benchmark IMO. I would be curious to see how competitors and improved models would compare.

1 comment

stephc_int13

And how long will it take before an open model recreates this. The "vibe" consensus before "thinking" models really took off was that open was ~6mo behind SotA. With the massive RL improvements, over the past 6 months I've thought the gap was actually increasing. This will be a nice little verifiable test going forward.