Comment by jkelleyrtp

6 hours ago

claude swe-bench is 80.8 and codex is 56.8

Seems like 4.6 is still all-around better?

Its SWE bench pro not swe bench verified. The verified benchmark has stagnated