Comment by jpau
2 months ago
Interesting!
Is there anything to read into needing twice the "Avg Attempts", or is this column relatively uninteresting in the overall context of the bench?
2 months ago
Interesting!
Is there anything to read into needing twice the "Avg Attempts", or is this column relatively uninteresting in the overall context of the bench?
No it's definitely interesting. It suggests that Opus 4 actually failed to write proper syntax on the first attempt, but given feedback it absolutely nailed the 2nd attempt. My takeaway is that this is great for peer-coding workflows - less "FIX IT CLAUDE"