You end up with a pile of unit tests called things like "regression, don't crash when rhs null" or "regression, terminate on this" which seems fine.
The "did it change?" genre of characterisation/snapshot tests can be created very effectively using a fuzzer, but should probably be kept separate from the unit tests checking for specific behaviour, and partially regenerated when deliberately changing behaviour.
Llvm has a bunch of tests generated mechanically from whatever the implementation does and checked in. I do not rate these - they're thousands of lines long, glow red in code review and I'm pretty sure don't get read by anyone in practice - but because they exist more focused tests do not.
Pointer errors. Null pointer returns instead of using the correct types. Flow/state problems. Multithreading problems. I/O errors. Network errors. Parsing bugs... etc
Basically the whole world of bugs introduced by someone being a too smart C/C++ coder. You can battletest parsers quite nicely with fuzzers, because parsers often have multiple states that assume naive input data structures.
You can use the fuzzer to generate test cases instead of writing test cases manually.
For example you can make it generate queries and data for a database and generate a list of operations and timings for the operations.
Then you can mix assertions into the test so you make sure everything is going as expected.
This is very useful because there can be many combinations of inputs and timings etc. and it tests basically everything for you without you needing to write a million unit tests
That sounds worse than letting an LLM dream up tests tbh. I wouldn’t consider grooming a huge number of tests for their usefulness after they‘ve been generated randomly. And just keeping all of them will just lock the implementation in place where it currently is, not validate its correctness.
You can often find memory errors not directly related to string handling with fuzz testing. More generally, if your program embodies any kind of state machine, you may find that a good fuzzer drives it into states that you did not think should exist.
You end up with a pile of unit tests called things like "regression, don't crash when rhs null" or "regression, terminate on this" which seems fine.
The "did it change?" genre of characterisation/snapshot tests can be created very effectively using a fuzzer, but should probably be kept separate from the unit tests checking for specific behaviour, and partially regenerated when deliberately changing behaviour.
Llvm has a bunch of tests generated mechanically from whatever the implementation does and checked in. I do not rate these - they're thousands of lines long, glow red in code review and I'm pretty sure don't get read by anyone in practice - but because they exist more focused tests do not.
What kind of bugs do you find this way, besides missing sanitization?
Pointer errors. Null pointer returns instead of using the correct types. Flow/state problems. Multithreading problems. I/O errors. Network errors. Parsing bugs... etc
Basically the whole world of bugs introduced by someone being a too smart C/C++ coder. You can battletest parsers quite nicely with fuzzers, because parsers often have multiple states that assume naive input data structures.
You can use the fuzzer to generate test cases instead of writing test cases manually.
For example you can make it generate queries and data for a database and generate a list of operations and timings for the operations.
Then you can mix assertions into the test so you make sure everything is going as expected.
This is very useful because there can be many combinations of inputs and timings etc. and it tests basically everything for you without you needing to write a million unit tests
That sounds worse than letting an LLM dream up tests tbh. I wouldn’t consider grooming a huge number of tests for their usefulness after they‘ve been generated randomly. And just keeping all of them will just lock the implementation in place where it currently is, not validate its correctness.
You can often find memory errors not directly related to string handling with fuzz testing. More generally, if your program embodies any kind of state machine, you may find that a good fuzzer drives it into states that you did not think should exist.
That sounds a bit like using a jackhammer to drive in a nail. Wouldn’t it be smarter to enumerate edge cases and test all permutations of those?
3 replies →