← Back to context

Comment by Retr0id

6 hours ago

This seems anecdotal but with extra words. I'm fairly sure this is just the "wow this is so much better than the previous-gen model" effect wearing off.

I've always been a believer in the "post honey-moon new model phase" being a thing, but if you look at their analysis of how often the postEdit hooks fire + how Anthropic has started obfuscating thinking blocks, it seems fishy and not just vibes

  • I was in this camp as well until recently, in the last 2-3 weeks I've been seeing problems that I wasn't seeing before, largely in line with the issues highlighted in the ticket (ownership dodging, hacky fixes, not finishing a task).

Nope, there is a categorical degradation in quality of output, especially with medium to high effort thinking tasks.

I suspect you might be right but I don't really know. Wouldn't these proposed regressions be trivial to confirm with benchmarks?