Comment by prodigycorp

1 month ago

This article was more true than not a year ago but now the harnesses are so far past the simple agent loop that I'd argue that this is not even close to an accurate mental model of what claude code is doing.

33 comments

prodigycorp

qsort 1 month ago

Obviously modern harnesses have better features but I wouldn't say it invalidates the mental model. Simpler agents aren't that far behind in performance if the underlying model is the same, including very minimal ones with basic tools.

I'd say it's similar to how a "make your own relational DB" article might feature a basic B-tree with merge-joins. Yeah, obviously real engines have sophisticated planners, multiple join methods, bloom filters, etc., but the underlying mental model is still accurate.

prodigycorp 1 month ago
You’re not wrong but I still think that the harness matters a lot when trying to accurately describe Claude Code.
Here’s a reframing:
If you asked people “what would you rather work with, today’s Claude Code harness with sonnet 3.7, or the 200 line agentic loop in the article with Opus 4.5, which would you choose?”
I suspect many people would choose 3.7 with the harness. Moreover, that is true, then I’d say the article is no longer useful for a modern understanding of Claude Code.
- aszen 1 month ago
  
  I don't think so, model improvements far outweigh any harness or tooling.
  Look at https://github.com/SWE-agent/mini-swe-agent for proof
  
  3 replies →
- rfw300 1 month ago
  
  Any person who would choose 3.7 with a fancy harness has a very poor memory about how dramatically the model capabilities have improved between then and now.
  
  4 replies →
- nl 1 month ago
  
  This is SO wrong.
  I actually wrote my own simple agent (with some twists) in part so I could compare models.
  Opus 4.5 is in a completely different league to Sonnet 4.5, and 3.7 isn't even on the same planet.
  I happily use my agent with Opus but there is no world in which I'd use a Sonnet 3.7 level model for anything beyond simple code completion.

alright2565 1 month ago

But does that extra complexity actually improve performance?

https://www.tbench.ai/leaderboard/terminal-bench/2.0 says yes, but not as much as you'd think. "Terminus" is basically just a tmux session and LLM in a loop.

prodigycorp 1 month ago
I'm not a good representative for claude code because I'm primarily a codex user now, but I know that if codex had subagents it would be at least twice as productive. Time spent is an important aspect of performance so yup, the complexity improved performance.
- nyellin 1 month ago
  
  Not necessarily true. Subagents allow for parallelization but they can decrease accuracy dramatically if you're not careful because there are often dependencies between tasks and swapping context windows with a summary is extremely lossy.
  For the longest time, Claude Code itself didnt really use subagents much by default, other than supporting them as a feature eager users could configure. (Source is reverse engineering we did on Claude code using the fantastic CC tracing tool Simon Willison wrote about once. This is also no longer true on latest versions that have e.g. an Explore subagent that is actively used.)
  
  1 reply →
- terminalshort 1 month ago
  
  Are subagents a fundamental change, or just acting as inner loops to the agentic loop similar to the one in the article?
  
  1 reply →

lukan 1 month ago

The article was also published one year ago on january 2025.

(Should have 2025 in the title? Time flies)

llmslave2 1 month ago
Claude Code didn't exist in January 2025. I think it's a typo and should be 2026.
- prodigycorp 1 month ago
  
  You’re right. No wonder the date felt odd. iirc Claude code was released around march.
  
  1 reply →

CuriouslyC 1 month ago

Less true than you think. A lot of the progress in the last year has been tightening agentic prompts/tools and getting out of the way so the model can flex. Subagents/MCP/Skills are all pretty mid, and while there has been some context pruning optimization to avoid carrying tool output along forever, that's mainly a benefit to long running agents and for short tasks you won't notice.

prodigycorp 1 month ago

All of these things you mentioned are put into a footnote of the article.

dkdcio 1 month ago

it seems to have changed a ton in recent versions too — I would love more details on what exactly

I find it doing what I in the past had to interrupt and tell it to do fairly frequently now

terminalshort 1 month ago
For one thing it seems to splitting up the work and making some determination of complexity, then allocating it out to a model based on that complexity to save resources. When I run Claude with Opus 4.5 and run /cost I see tokens for Opus 4.5, but also a lot in Sonnet and Haiku, with the majority of tokens actually being used by Haiku.
- nyellin 1 month ago
  
  Haiku is called often, but not always the way you think. E.g. every time you write something CC invokes Haiku multiple times to generate the 'delightful 1-2 word phrase used to indicate progress to the user' (Doing Stuff, Wizarding, etc)
  
  1 reply →

pama 1 month ago

Agreed. You can get a better model using the codex-cli repo and having an agent help you analyze the core functionality.

splike 1 month ago

I'm interested, could you expand on that?

prodigycorp 1 month ago
Off the top of my head: parallel subagents, hooks, skills, and a much better plan mode. These features enable way better steering than we had last year. Subagents are a huge boon to productivity.
- rtgfhyuj 1 month ago
  
  are subagents just tools that are agents themselves?
  
  1 reply →