Comment by Jarwain

1 year ago

Aider's benchmarks show 4.1 (and 4o) work better in its architect mode, for planning the changes, and o3 for making the actual edits

2 comments

Jarwain

You have that backwards. The leaderboard results have the thinking model as the architect.

In this case, o3 is the architect and 4.1 is the editor.

I see o3 (high) + gpt-4.1 at 82.7% -- the highest on the benchmark currently.