← Back to context

Comment by jkelleyrtp

5 hours ago

claude swe-bench is 80.8 and codex is 56.8

Seems like 4.6 is still all-around better?

Its SWE bench pro not swe bench verified. The verified benchmark has stagnated