Comment by dudeinhawaii

6 months ago

I've noticed a degradation in Opus 4.5, also with Gemini-3-Pro. For me, it was a sudden rapid decline in adherence to specs in Claude Code. On an internal benchmark we developed, Gemini-3-Pro also dramatically declined. Going from being clearly beyond every other model (as benchmarks would lead you to believe) to being quite mediocre. Delivering mediocre results in chat queries and coding also missing the mark.

I didn't "try 100 times" so it's unclear if this is an unfortunate series of bad runs on Claude Code and Gemini CLI or actual regression.

I shouldn't have to benchmark this sort of thing but here we are.

1 comment

dudeinhawaii

acuozzo 6 months ago

Write your work order with phases (to a file) and, between each phase, give it a non-negotiable directive to re-read the entire work order file.

Claude-Code is terrible with context compaction. This solves that problem for me.