← Back to context

Comment by throwatdem12311

9 hours ago

I spent more than half my day yesterday telling Claude to correct itself because it did things I explicitly told it not to do in my prompt.

“You’re right - I overstepped”

Is the new “You’re absolutely right”.

I don’t know if we can qualify something that actively goes against the explicit instructions you give it as “something great”. It just sounds like Dario is building snake oil and selling it too.

I have a script at work that writes out some config files and I'm having Claude run them after making changes. The script if it detects breaking changes will spit out a message saying what the breaking changes are, and not do anything, telling you to rerun it after validation with the override flag.

If I don't tell Claude about this behavior, it ignores the script output and lies about passing tests that validate if the config files were regenerated.

So I added to my prompt instructions to observe it, and if it sees that message, double check its work and then inform me and ask what to do before proceeding.

This has had the net result of Claude either running the script with the override flag from the get go (explicitly forbidden) or it seeing the message and convincing itself that the override is warranted and running it a second time with the override flag. It's never once stopped to ask me what to do like instructed.

This is one of a few reason I strongly prefer GPT and its codex variants. It seldom frustrates me, sure its not omnipotent in any way, but it just feels very "tuned in" when it comes to understanding intent and scope.

Imagine worker that did loop of "you're absolutely right -> same fuckup again" multiple days every week, wasting time of whoever told them to do the task

They'd be out of company after a week

  • Such workers exist. AI is cheaper and faster than such workers, though, so management might still like them. Ugh.

  • I do want to fire Claude at this point and switch to Codex. Unfortunately the guy with the purse strings is ride or die full Claude psychosis and our business can’t afford to just buy anything and everything for funsies.

  • That depends on the company. I worked at an S&P 500 company that muddled along like this. They still make critical software for local and state governments.