Comment by simonw

1 month ago

I work in Python which helps a lot because there are a TON of good examples of pytest tests floating around in the training data, including things like usage of fixture libraries for mocking external HTTP APIs and snapshot testing and other neat patterns.

Or I can say "use pytest-httpx to mock the endpoints" and Claude knows what I mean.

Keeping an eye on the tests is important. The most common anti-pattern I see is large amounts of duplicated test setup code - which isn't a huge deal, I'm much more more tolerant of duplicated logic in tests than I am in implementation, but it's still worth pushing back on.

"Refactor those tests to use pytest.mark.parametrize" and "extract the common setup into a pytest fixture" work really well there.

Generally though the best way to get good tests out of a coding agent is to make sure it's working in a project with an existing test suite that uses good patterns. Coding agents pick the existing patterns up without needing any extra prompting at all.

I find that once a project has clean basic tests the new tests added by the agents tend to match them in quality. It's similar to how working on large projects with a team of other developers work - keeping the code clean means when people look for examples of how to write a test they'll be pointed in the right direction.

One last tip I use a lot is this:

  Clone datasette/datasette-enrichments
  from GitHub to /tmp and imitate the
  testing patterns it uses

I do this all the time with different existing projects I've written - the quickest way to show an agent how you like something to be done is to have it look at an example.

2 comments

simonw

disgruntledphd2 1 month ago

> Generally though the best way to get good tests out of a coding agent is to make sure it's working in a project with an existing test suite that uses good patterns. Coding agents pick the existing patterns up without needing any extra prompting at all.

Yeah, this is where I too have seen better results. The worse ones have been in places where it was greenfield and I didn't have an amazing idea of how to write tests (a data person working on a django app).

Thanks for the information, that's super helpful!

thunspa 1 month ago

I work in Python as well and find Claude quite poor at writing proper tests, might be using it wrong. Just last week, I asked Opus to create a small integration test (with pre-existing examples) and it tried to create a 200-line file with 20 tests I didn't ask for.

I am not sure why, but it kept trying to do that, although I made several attempts.

Ended up writing it on my own, very odd. This was in Cursor, however.