One thing that could be a strong degradation especially for benchmarks is they switched the default "Exit Plan" mode from:
"Proceed"
to
"Clear Context and Proceed"
It's rare you'd want to do that unless you're actually near the context window after planning.
I pressed it accidentally once, and it managed to forget one of the clarifying questions it asked me because it hadn't properly written that to the plan file.
If you're running in yolo mode ( --dangerously-skip-permissions ) then it wouldn't surprise me to see many tasks suddenly do a lot worse.
Even in the best case, you've just used a ton of tokens searching your codebase, and it then has to repeat all that to implement because it's been cleared.
I'd like to see the option of:
"Compact and proceed"
because that would be useful, but just proceed should still be the default imo.
One thing that could be a strong degradation especially for benchmarks is they switched the default "Exit Plan" mode from:
to
It's rare you'd want to do that unless you're actually near the context window after planning.
I pressed it accidentally once, and it managed to forget one of the clarifying questions it asked me because it hadn't properly written that to the plan file.
If you're running in yolo mode ( --dangerously-skip-permissions ) then it wouldn't surprise me to see many tasks suddenly do a lot worse.
Even in the best case, you've just used a ton of tokens searching your codebase, and it then has to repeat all that to implement because it's been cleared.
I'd like to see the option of:
because that would be useful, but just proceed should still be the default imo.
Pretty sure they mean the issue is on the agentic loop and related tool calling, not on the model itself
In other words, it was the Claude Code _app_ that was busted