Comment by pshirshov

3 days ago

> Is success mostly the result of it being able to test its output reliably, and of how easy it is to set up the environment for this testing?

I won't say so. From my experience the key to success is the ability to split big tasks into smaller ones and help the model with solutions when it's stuck.

Reproducible environments (Nix) help a lot, yes, same for sound testing strategies. But the ability to plan is the key.

One other thing I've observed is that Claude fares much better in a well engineered pre-existing codebase. It adopts to most of the style and has plenty of "positive" examples to follow. It also benefits from the existing test infrastructure. It will still tend to go in infinite loops or introduce bugs and then oscillate between them, but I've found it to be scarily efficient at implement medium sized features in complicated codebases.

  • Yes, that too, but this particular project was an ancient C++ codebase with extremely tight coupling, manual memory management and very little abstraction.