Comment by danenania

3 years ago

Not at all stupid. I'm sure people are already using ChatGPT to generate unit tests. There are limits, of course, given that (for now) it doesn't have the full context of your code, but it's definitely capable of generating tests for pure functions that are named well and don't require too much outside context. Some projects have tons of these.

Yes, I use it in that way. And if ChatGPT didn't generated the code with pure functions (usually it didn't), you can explicitly ask to generate the code with pure functions. Then ask to generate the tests.

Usually I get good tests from ChatGPT when I approach it as an iterative process, requesting multiple improvements to the generated test based on what it gives to me. Note that it doesn't replace the skillsets you need to know to write good test coverage.

For example, you can ask to generate integration tests instead of unit tests in case it need context. Providing details on how the testing code should be generated really helps. Also, asking to refactoring the code in preparation to make it testeable, for example, or requesting to convert some functions to actual pure functions, or requesting to refactor a piece of code it generated to a separate function. Then you ask to generate tests for normal and also for boundary conditions. The more specific you get, the chance of getting a good and extensive tests from it is much higher.

Having the tests and code generated by ChatGPT really helps to catch the subtle bugs it usually generates on the generated code (fixing it manually), usually I get test coverage that proof the robustness I needed for production code.

This approach still needs manual fine tuning of the generated code, I think ChatGPT still struggle to get the context right but in general, when it makes sense to use it, I'm more productive writing tests in this way than manually.

I used my Copilot trial to explore writing tests more than writing code. I found that it actually worked well, especially for data and boilerplate heavy tests, like table-driven tests in Go. It was saving me quite literally dozens, if not a hundred or more, of keystrokes. I wrote Go for side projects these days, not for work, so I don't pay for the license, but if I was writing Go again professionally I'd pay for the license for this alone.

In my day job, I write Ruby, and it didn't impress me much when I used it with Rspec. I'd says it was saving me maybe a dozen or so keystrokes in total to write a new spec.

What about chatGPT’s unit tests? Does it even make sense to unit test code that generates a language model?

  • code with output that can only be evaluated artistically can't be called right or wrong anyway. As long as it throws no errors, it's a valid language model. I can only imagine how many nulls and undefineds and Infinitys they just throw out and shove under the rug. That's actually the kind of code where not knowing how it works is okay. Encouraged, even.

chatGPT works surprisingly well for python (and am sure other popular languages too). I can dump a code bit and tell it to write the tests + fixtures for me. The tests it wrote actually showed it "understood" (or recognized the pattern of) the code too. For example part of the code renamed some columns for a data frame that was loaded from a csv. The fixture it created for the csv had the correct column names the code was renaming.

Unless you're doing TDD or chatGPT is too busy, there's almost no excuse now to not write some unit tests ;)

I wonder which is good at for GPT finally: generate impl from test or vice versa. I think former, as a noob.