← Back to context

Comment by StanAngeloff

8 hours ago

(Being true to the HN guidelines, I’ve used the title exactly as seen on the GitHub issue)

I was wondering if anyone else is also experiencing this? I have personally found that I have to add more and more CLAUDE.md guide rails, and my CLAUDE.md files have been exploding since around mid-March, to the point where I actually started looking for information online and for other people collaborating my personal observations.

This GH issue report sounds very plausible, but as with anything AI-generated (the issue itself appears to be largely AI assisted) it’s kind of hard to know for sure if it is accurate or completely made up. _Correlation does not imply causation_ and all that. Speaking personally, findings match my own circumstances where I’ve seen noticeable degradation in Opus outputs and thinking.

EDIT: The Claude Code Opus 4.6 Performance Tracker[1] is reporting Nominal.

[1]: https://marginlab.ai/trackers/claude-code/

What I've noticed is that whenever Claude says something like "the simplest fix is..." it's usually suggesting some horrible hack. And whenever I see that I go straight to the code it wants to write and challenge it.

  • That is the kind of thing that I've been fighting by being super explicit in CLAUDE.md. For whatever reason, instead of being much more thorough and making sure that files are being changed only after fully understanding the scope of the change (behaviour prior to Feb/Mar), Claude would just jump to the easiest fix now, with no backwards compatibility thinking and to hell with all existing tests. What is even worse is I've seen it try and edit files before even reading them on a couple of occasions, which is a big red flag. (/effort max)

    Another thing that worked like magic prior to Feb/Mar was how likely Claude was to load a skill whenever it deduced that a skill might be useful. I personally use [superpowers][1] a lot, and I've noticed that I have to be very explicit when I want a specific skill to be used - to the point that I have to reference the skill by name.

    [1]: https://github.com/obra/superpowers

    • I did not use the previous version of Opus to notice the difference, but Sonnet 4.6 seems optimized to output the shortest possible answer. Usually it starts with a hack and if you challenge it, it will instead apologize and say to look at a previous answer with the smallest code snippet it can provide. Agentic isn't necessarily worse but ideating and exploring is awful compared to 4.5

      1 reply →

    • Superpowers, Serena, Context7 feel like requried plugins to me. Serena in particular feels like a secret weapon sometimes. But superpowers (with "brainstorm" keyword) might be the thing that helps people complaining about quality issues.

  • lol this one time Claude showed me two options for an implementation of a new feature on existing project, one JavaScript client side and the other Python server side.

    I told it to implement the server side one, it said ok, I tabbed away for a while, came to find the js implementation, checking the log Claude said “on second thought I think I’ll do the client side version instead”.

    Rarely do I throw an expletive bomb at Claude - this was one such time.

    • Using superpowers in brainstorm mode like the parent suggested would have resulted in a plan markdown and a spec markdown for the subagents to follow.

      1 reply →

  • this prompt is actually in claude cli. it says something like implement simplest solution. dont over abstract. On my phone but I saw an article mention this in the leak analysis.

If that tracker is using paid tokens, as opposed to the regular subscription, then there's no financial incentive for Antrophic to degrade their thinking, so their benchmark likely would not be affected by the cost-cutting measures that regular users face.

Also, it's probably very easy to spot such benchmarks and lock-in full thinking just for them. Some ISPs do the same where your internet speed magically resets to normal as soon as you open speedtest.net ...

I haven't noticed any changes but my stuff isn't that complex. People are saying they quantized Opus because they're training the next model. No idea if that's true... It's certainly impacting my decision to upgrade to Max though. I don't want to pay for Opus and get an inferior version.

  • I haven't noticed any changes either, but I noticed that opus 4.6 is now offered as part of perplexity enterprise pro instead of max, so I'm guessing another model is on the horizon

    • I just finished reading the full analysis on GitHub.

      > When thinking is deep, the model resolves contradictions internally before producing output.

      > When thinking is shallow, contradictions surface in the output as visible self-corrections: "oh wait", "actually,", "let me reconsider", "hmm, actually", "no wait."

      Yeah, THIS is something that I've seen happen a lot. Sometimes even on Opus with max effort.

      1 reply →

Cannot say I've noticed, but I run virtually everything through plan mode and a few back and forth rounds of that for anything moderately complex, so that could be helping.

  • I used to one-shot design plans early in the year, but lately it is taking several iterations just to get the design plan right. Claude would frequently forget to update back references, it would not keep the plan up to date with the evolving conversation. I have had to run several review loops on the design spec before I can move on to implementation because it has gotten so bad. At one point, I thought it was the actual superpowers plugin that got auto-updated and self-nerfed, but there weren't any updates on my end anyway. Shrug.