Comment by iainmerrick

16 days ago

One thing I'm curious about, which I couldn't figure out from a skim of your post, is whether the generated test inputs are random, sequential, or adversarial.

IIRC there are fuzz testers that will analyze the branches of the code to look for edge cases that might break it -- that seems like something that would be wonderful to have in a property tester, but it also seems very difficult to do, especially in a language agnostic way.

How long does it take to find breaking cases like "0/0" or "ß"? Do they pop up immediately, or does it only happen after hundreds or thousands of runs?

They're random but with a lot of tweaks to the distribution that makes weird edge cases pop up with fairly high probability, and with some degree of internal mutation, followed by shrinking to turn them into nice tidy test cases. In Python we do a little bit of code analysis to find interesting constants, but Hegel doesn't do that, it's just tuned to common edge cases.

I think all the examples I had in the post are typically found in the first 100 test cases and reliably found in the first 1000, but I wouldn't swear that that's the case without double checking.

We don't do any coverage-guidance in Hegel or Hypothesis, because for unit testing style workflows it's rarely worth it - it's very hard to do good coverage guidance in under like... 10k test runs at a minimum, 100k is more likely. You don't have enough time to get really good at exploring the state space, and you haven't hit the point where pure random testing has exhausted itself enough that you have to do something smarter to win.

It's been a long-standing desire of mine to figure out a way to use coverage to do better even on short runs, and there are some kinda neat things you can do with it, but we've not found anything really compelling.