← Back to context

Comment by jclardy

15 hours ago

Just anecdotal, but I was using Claude Code for everything a few months ago, and it seemed great. Now, it is making a ton of mistakes, doing the wrong thing, misunderstanding context, and just generally being unusable.

I now have been using Codex and everything has been great (I still swap back and forth but generally to check things out.)

My theory is just that the models are great after release to get people switching, then they cut them back in capabilities slowly over time until the next major release to increase the hype cycle.

Is it the models themselves or the tools around them? There's that patch[1] that floats around for Claude Code that's supposed to solve a lot of these problems by adjusting its tool-level prompts. Also, if it were the models themselves, wouldn't Cursor users have the same complaints (do they? I haven't heard anything but the only Cursor users I talk to are coworkers)?

I think it's more likely they're trying to optimize the Claude Code prompts to reduce load on their system and have overcorrected at the cost of quality.

1: https://gist.github.com/roman01la/483d1db15043018096ac3babf5...

People keep saying this, but I’m not sure I buy it.

I was using both Codex and Claude Code heavily on some projects this weekend.

In one project Codex was screwing everything up and in another one absolutely killing it. I’ve seen the same from Claude.

In the bad Codex example it had the wrong idea and kept trying to figure out how to accomplish the same thing no matter how many times I attempt to correct it. Undoing the recent changes where it went down the wrong path was the only way to get things back on track.

I wonder if context poisoning is a bigger problem than people realize.

Yeah, shorter time frame but I've been noticing that too. Just the other day I was experimenting with some workflow stuff. "Do x and y and run tests and then merge into develop."

Duly runs, and finishes. "All merged into develop".

I do some other work, don't see any of this, double check myself, I'm working off of develop.

"Hey, where is this work?"

"It is in this branch and this worktree, as you would expect, you will need to merge into develop."

"I'm confused, I asked you to do that and you said it was done."

"You're right and I did say that but I didn't do it. Shall I do it now?"

There's like this really weird balancing act between managing usage, but making people burn more tokens...