Comment by throwaway2037
10 hours ago
Spot on. This is excellent analysis.
I was also bothered by this:
> Until recently, I was rather skeptical of agentic code. February 2026, however, has been a sort of inflection point even stubborn developers like myself can’t ignore.
"February 2026" is just way to specific. It feels like a PR/marketing team wrote it. It acts like a jump scare in the post for any normie programmer.
Perhaps it's specific because it's Opus 4.6, released February 5th.
https://www.anthropic.com/news/claude-opus-4-6
Opus 4.5 to 4.6 was pretty incremental, I didn't see much of a difference.
The big coding model moments in recent recollection, IMO, were something like:
- Sonnet 3.5 update in October 2024: ability to generate actually-working code using context from a codebase became genuinely feasible.
- Claude 4 release in May 2025: big tool calling improvements meant that agentic editors like Claude Code could operate on a noticeably longer leash without falling apart.
- Gemini 3 Pro, Claude 4.5, GPT 5.2 in Nov/Dec 2025: with some caveats these were a pretty major jump in the difficulty and scale of tasks that coding assistants are able to handle, working on much more complex projects over longer time scales without supervision, and testing their own work effectively.
Maybe they're like me, who didn't spend a lot of time investigating Claude until 4.6 launched and the hype was enough to be the tipping point to invest energy. I do know that I've been having good/great results with Opus 4.6 and the CLI, but after an hour or so, it'll suddenly forget that the codebase has tab-formatted files and burn up my quota trying to figure out how to read text files. And apparently this snafu has been around since at least late last year [0]. Again, I can't complain about the overall speed and quality for my relatively light projects, I'm just fascinated by people who say their agents can get through a whole weekend without supervision, when even 4.6 appears to randomly get tripped up in a very rookie way?
[0] https://github.com/anthropics/claude-code/issues/11447
1 reply →
This is also supported by the Opus degradation tracker [1]. The dotted line is when they switched from Opus 4.5 to 4.6. There's no difference on statistically significant difference the tested benchmark.
1: https://marginlab.ai/trackers/claude-code-historical-perform...
4.5 is a big jump, but there’s no way 4.5 to 4.6 is what convinced this person.
I feel like 4.6 is worse than 4.5 lol
2 replies →