Comment by matheusmoreira

21 hours ago

I certainly noticed a significant drop in reasoning power at some point after I subscribed to Claude. Since then I've applied all sorts of fixes that range from disabling adaptive thinking to maxing out thinking tokens to patching system prompts with an ad-hoc shell script from a gist. Even after all this, Opus will still sometimes go round and round in illogical circles, self-correcting constantly with the telltale "no wait" and undoing everything until it ends up right where it started with nothing to show for it after 100k tokens spent.

Whether it's due to bugs or actual malice, it's not a good look. I genuinely can't tell if it's buggy, if it's been intentionally degraded, if it's placebo or if it's all just an elaborate OpenAI psyop.

8 comments

matheusmoreira

beering 14 hours ago

The real question I see nobody asking is how GPT-5.4 beats Opus at a fraction of the price. I doubt it’s only a question of subsidization. My impression from the past is that GPT-5 was around a Sonnet-sized model, and 5-mini was Haiku-sized. At least on my codebase anyways, Codex one-shots tricky things that Opus needs several tries to fully get right.

alphabettsy 1 hour ago

IMO it doesn’t handily beat it.
It’s typically equivalent, sometimes better, sometimes behind. Better at following a well defined plan, less good at concept exploration and planning imo.
At 1m context it’s basically the same price.
matheusmoreira 13 hours ago
I wanted to choose Anthropic because they were apparently more ethical compared to OpenAI, but... Yeah.
Right now the only blocker for me is the lack of Linux support.
- trollbridge 11 hours ago
  
  Cursor?

babaganoosh89 21 hours ago

There's a github issue for this: https://github.com/anthropics/claude-code/issues/42796

watt 11 hours ago

That issue now is closed, probably as "not planned".
matheusmoreira 21 hours ago

Yes, I commented on it and applied all remedies suggested.
https://news.ycombinator.com/item?id=47664442
Configuration and environment variables seem to have improved things somewhat but it still seems to be hit or miss.