Comment by Retr0id
6 hours ago
This seems anecdotal but with extra words. I'm fairly sure this is just the "wow this is so much better than the previous-gen model" effect wearing off.
6 hours ago
This seems anecdotal but with extra words. I'm fairly sure this is just the "wow this is so much better than the previous-gen model" effect wearing off.
I've always been a believer in the "post honey-moon new model phase" being a thing, but if you look at their analysis of how often the postEdit hooks fire + how Anthropic has started obfuscating thinking blocks, it seems fishy and not just vibes
I was in this camp as well until recently, in the last 2-3 weeks I've been seeing problems that I wasn't seeing before, largely in line with the issues highlighted in the ticket (ownership dodging, hacky fixes, not finishing a task).
Nope, there is a categorical degradation in quality of output, especially with medium to high effort thinking tasks.
What about the analysis evidences?
You mean the Claude output? The same claude that has "regressed to the point it cannot be trusted"?
What you saying the OP fabricated/hallucinated the evidence?
6 replies →
I suspect you might be right but I don't really know. Wouldn't these proposed regressions be trivial to confirm with benchmarks?