Slacker News Slacker News logo featuring a lazy sloth with a folded newspaper hat
  • top
  • new
  • show
  • ask
  • jobs
Library
← Back to context

Comment by Jarwain

2 months ago

Aider's benchmarks show 4.1 (and 4o) work better in its architect mode, for planning the changes, and o3 for making the actual edits

2 comments

Jarwain

Reply

SparkyMcUnicorn  2 months ago

You have that backwards. The leaderboard results have the thinking model as the architect.

In this case, o3 is the architect and 4.1 is the editor.

drewnick  1 month ago

I see o3 (high) + gpt-4.1 at 82.7% -- the highest on the benchmark currently.

Slacker News

Product

  • API Reference
  • Hacker News RSS
  • Source on GitHub

Community

  • Support Ukraine
  • Equal Justice Initiative
  • GiveWell Charities