Comment by isaacdl

1 month ago

Anywhere we can read more about what a "harness issue" means? What was the impact of it?

6 comments

isaacdl

One thing that could be a strong degradation especially for benchmarks is they switched the default "Exit Plan" mode from:

    "Proceed"

   "Clear Context and Proceed"

It's rare you'd want to do that unless you're actually near the context window after planning.

I pressed it accidentally once, and it managed to forget one of the clarifying questions it asked me because it hadn't properly written that to the plan file.

If you're running in yolo mode ( --dangerously-skip-permissions ) then it wouldn't surprise me to see many tasks suddenly do a lot worse.

Even in the best case, you've just used a ton of tokens searching your codebase, and it then has to repeat all that to implement because it's been cleared.

I'd like to see the option of:

    "Compact and proceed"

because that would be useful, but just proceed should still be the default imo.

samusiam 25 days ago
I disagree that this was the issue, or that it's "rare that you'd want to do that unless you're near the context window". Clearing context after writing a plan, before starting implementation of said plan, is common practice (probably standard practice) with spec driven development. If the plan is adequate, then compaction would be redundant.
- xnorswap 25 days ago
  
  For a 2M+ LOC codebase, the plans alone are never adequate. They miss nuance that the agent will only have to rediscover when it comes to operate on them.
  For spec driven development (which I do for larger issues), this badly affects the plan to generate the spec, not the spec itself.
  I'll typically put it in plan mode, and ask it to generate documentation about an issue or feature request.
  When it comes to write the output to the .typ file, it does much much worse if it has a cleared context and a plan file than if it has it's full context.
  The previously "thought" is typically, "I know what to write now, let me exit plan mode".
  Clearing context on exiting that plan mode is a disaster which leaves you much worse off and skeletal documentation and specs compared to letting it flow.
  A new context to then actually implement the documented spec is not so bad, although I'd still rather compact.
plexicle 25 days ago

"It's rare you'd want to do that unless you're actually near the context window after planning."
Highly disagree. It's rare you WOULDN'T want to do this. This was a good change, and a lot of us were doing this anyway, but just manually.
Getting the plan together and then starting fresh will almost always produce better results.
rubslopes 25 days ago

Not disagreeing with you, but FYI you can roll back to the conversation before the 'clear context and proceed' with 'claude --resume'.

airstrike 1 month ago

Pretty sure they mean the issue is on the agentic loop and related tool calling, not on the model itself

In other words, it was the Claude Code _app_ that was busted