← Back to context

Comment by yosefk

3 days ago

I am very impressed with the kind of things people pull out of Claude's жопа but can't see such opportunities in my own work. Is success mostly the result of it being able to test its output reliably, and of how easy it is to set up the environment for this testing?

> Is success mostly the result of it being able to test its output reliably, and of how easy it is to set up the environment for this testing?

I won't say so. From my experience the key to success is the ability to split big tasks into smaller ones and help the model with solutions when it's stuck.

Reproducible environments (Nix) help a lot, yes, same for sound testing strategies. But the ability to plan is the key.

  • One other thing I've observed is that Claude fares much better in a well engineered pre-existing codebase. It adopts to most of the style and has plenty of "positive" examples to follow. It also benefits from the existing test infrastructure. It will still tend to go in infinite loops or introduce bugs and then oscillate between them, but I've found it to be scarily efficient at implement medium sized features in complicated codebases.

    • Yes, that too, but this particular project was an ancient C++ codebase with extremely tight coupling, manual memory management and very little abstraction.

Claude will also tend to go for the "test-passing" development style where it gets super fixated on making the tests pass with no regards to how the features will work with whatever is intended to be built later.

I had to throw away a couple days worth of work because the code it built to pass the tests wasn't able to do the actual thing it was designed for and the only workaround was to go back and build it correctly while, ironically, still keeping the same tests.

You kind of have to keep it on a short leash but it'll get there in the end... hopefully.