Comment by lucianbr
7 hours ago
> Needless to say, a human accountant would never behave in these ways. In fact, we explicitly prompt against this behavior in no uncertain terms, but the instructions – and the entire spirit of the task – are lost in the interest of making forward progress. Claude and Grok keep trying until they find some way to get past the checks, even if it explicitly violates their instructions and the core goal.
I recently read a similar thing here on HN. There the model was making commits with some problem like tests failing, then the human added a pre-commit hook, then the model started editing the hook to make forward progress, then the hook was made read-only, then the model was trying to make it writeable...
To me it feels like the model clearly does not have an understanding of what is happening, what the goal is and if it is really making progress towards the goal. And this lack of understanding is an actual problem. You can paper over it for a short while, but as here and in the other article, over a longer experiment it results in failure.
Seriously watching Cursor (backed by Claude) go off the rails sometimes can be... frustrating. If it misses the intention behind a fix it can spin out and all of a sudden you have hundreds of lines of changes across 10 different files when you just wanted it to do a simple find/replace of a single line. If you don't watch it spin out and stop it immediately you will be manually rejecting a bunch of files.