Comment by 8note

6 hours ago

I've noticed a strong degradation as its started doing more skill like things and writing more one off python scripts rather than using tools.

the agent has a set of scripts that are well tested, but instead it chooses to write a new bespoke script everytime it needs to do something, and as a result writes both the same bugs over and over again, and also unique new bugs every time as well.

2 comments

8note

SkyPuncher 6 hours ago

I'm going absolutely insane with this. Nearly all of my "agent engineering" effort is now figuring out how to keep Opus from YOLO'ing is own implementation of everything.

I've lost track of the number of times it's started a task by building it's own tools, I remind it that it has a tool for doing that exact task, then it proceeds to build it's own tools anyways.

This wasn't happening 2 months ago.

giwook 4 hours ago

Can you just tell it not to do that? Maybe you have to remind it every so often once context starts filling up.