Comment by Retr0id

6 hours ago

This seems anecdotal but with extra words. I'm fairly sure this is just the "wow this is so much better than the previous-gen model" effect wearing off.

13 comments

Retr0id

codessta 6 hours ago

I've always been a believer in the "post honey-moon new model phase" being a thing, but if you look at their analysis of how often the postEdit hooks fire + how Anthropic has started obfuscating thinking blocks, it seems fishy and not just vibes

robertfw 5 hours ago

I was in this camp as well until recently, in the last 2-3 weeks I've been seeing problems that I wasn't seeing before, largely in line with the issues highlighted in the ticket (ownership dodging, hacky fixes, not finishing a task).

rishabhaiover 6 hours ago

Nope, there is a categorical degradation in quality of output, especially with medium to high effort thinking tasks.

gchamonlive 6 hours ago

What about the analysis evidences?

Retr0id 5 hours ago
You mean the Claude output? The same claude that has "regressed to the point it cannot be trusted"?
- gchamonlive 5 hours ago
  
  What you saying the OP fabricated/hallucinated the evidence?
  
  6 replies →

rzmmm 6 hours ago

I suspect you might be right but I don't really know. Wouldn't these proposed regressions be trivial to confirm with benchmarks?