Comment by Terretta

5 hours ago

> Opus 4.6 which was bad for some reason

If I recall, that model had a couple issues. One was the issue of being monkeyed with, for which they gave everyone credits.

The other feature/bug, depending on your POV, was being Anthropic's least personable release, not papering over everything with self help guru therapy language.

Opus 4.6 didn't LARP. It was more direct, less fussy, less discussy, and very much less "wait, one more thing" within a couple edits after embarking on what should have been the spec, than 4.7 or 4.8 are.

When in engineer brain mode, working as as you describe (good old fashioned XP-style staff engineer pair programming with a language-savvy mentee not yet full-stack or system wise), I found the clearer I was about my goal and the better I could express it, the more often I'd get an expanded clarified response I could then iterate to steer for ever tighter cleaner more specified responses, then let it go build the whole thing without it agonizing and waffling.

The next two releases regressed on that dimension, wanting to figuratively "sit with" every decision and re-validate spiritual alignment along the way, no matter how clearly expressed.

Curiously to me, Fable seemed to hit the best of both worlds, I had the highest commit per turn with Fable, approaching 73%, where I'm usually under 17% of LOC written being good enough to commit, usually taking 9 - 11 turns to get the code where I'm comfortable with it.

Thanks to this, Fable cost more, but actually cost less, if that makes sense.

Arguably, Fable, and 4.6, played more outcome-correctness oriented than journey-experience oriented. It's easy to see how this could happen with human reinforced learning if not all judges are staff or principal engineer level, or constitution values are more Portlandia than Finlandia.

ANTHROP\C needs to balance these at the constitution level:

“We will work in a humane and thoughtful way, but production is the final judge. We will listen to people, but we will not let discussion replace decision. We will value craft, but not at the expense of usefulness. We will move fast, but not by hiding risk. We will measure outcomes, but not pretend that everything important is easy to measure.”

1 comment

Terretta

lukaslalinsky 15 minutes ago

I considered Opus 4.5 to be the peak for a while. Opus 4.6 tended to over think, and generally get lost in thinking. I asked something and Claude Code would just spin for 15 minutes. And it was not the harness, if I changed the model to 4.5, it was fine again. So I skipped the following releases. I've been working with Opus 4.8 the last weeks and while I don't like how talkative it is, but it is fine to work with interactively. I've also used Fable for the few days it was available, and indeed, that was model worth using for my use case. To the point, but still very interactive.