Comment by Jtsummers

3 months ago

My experience is that PBT tests are mostly hard in devising the generators, not in the testing itself.

Since it came up in another thread (yes, it's trivial), a function `add` is no easier or harder to test with examples than with PBT, here are some of the tests as both PBT-style and example-based style:

  @given(st.integers())
  def test_left_identity_pbt(a):
    assert add(a, 0) == a

  def test_left_identity():
    assert add(10, 0) == 10

  @given(st.integers(), st.integers())
  def test_commutative(a, b):
    assert add(a, b) == add(b, a)

  @parametrize("a,b", examples)
  def test_commutative():
    assert add(a, b) == add(b, a)

They're the same test, but one is more comprehensive than the other. And you can use them together. Supposing you do find an error, you add it to your example-based tests to build out your regression test suite. This is how I try to get people into PBT in the first place, just take your existing example-based tests and build a generator. If they start failing, that means your examples weren't sufficiently comprehensive (not surprising). Because PBT systems like Hypothesis run so many tests, though, you may need to either restrict the number of generated examples for performance reason or breakup complex tests into a set of smaller, but faster running, tests to get the benefit.

Other things become much simpler, or at least simpler to test comprehensively, like stateful and end-to-end tests (assuming you have a way to programmatically control your system). Real-world, I used Hypothesis to drive an application by sending a series of commands/queries and seeing how it behaved. There are so many possible sequences that manually developing a useful set of end-to-end tests is non-trivial. However, with Hypothesis it just generated sequences of interactions for me and found errors in the system. After each command (which may or may not change the application state) it issued queries in the invariant checks and verified the results against the model. Like with example-based testing, these can be turned into hard-coded examples in your regression test suite.

3 comments

Jtsummers

ibizaman 3 months ago

For sure, the hardest part is to create meaningful generators for the problem at hand which can test interesting cases in a finite amount of time. That’s where the combinatory explosion takes place in my experience.

I wanted to highlight one unexpected but very welcomed side effect of having those stateful property tests is we could use them to design high fidelity stubs. I wrote a follow-up blog post about it https://blog.tiserbox.com/posts/2024-07-08-make-good-stubs-w...

imiric 3 months ago

> Since it came up in another thread (yes, it's trivial), a function `add` is no easier or harder to test with examples than with PBT

Come on, that example is practically useless for comparing both approaches.

Take a look at the article linked above. The amount of non-trivial code required to setup a PBT should raise an eyebrow, at the very least.

It's quite possible that the value of such a test outweighs the complexity overhead, and that implementing all the test variations with EBT would be infeasible, but choosing one strategy over the other should be a conscious decision made by the team.

So as much as you're painting PBT in a positive light, I don't see it that clearly. I think that PBT covers certain scenarios better than EBT, while EBT can be sufficient for a wide variety of tests, and be simpler overall.

But again, I haven't actually written PBTs myself. I'm just going by the docs and articles mentioned here.

Jtsummers 3 months ago

> Come on, that example is practically useless for comparing both approaches.
Come on, I admitted it was trivial. It was a quick example that fit into a comment block. Did you expect a dissertation?
> that implementing all the test variations with EBT would be infeasible
That's kind of the point to my previous comment. PBTs will generate many more examples than you would create by hand. If you have EBTs already, you're one step away from PBTs (the generators, I never said this was trivial just to preempt another annoying "Come on"). And then you'll have more comprehensive testing than you would have had sticking to just your carefully handcrafted examples. This isn't the end of property-based testing, but it's a really good start and the easiest way to bring it into an existing project because you can mostly reuse the existing tests.
Extending this, once you get used to it, to stateful testing (which many PBT libraries support, including Hypothesis) you can generate a lot of very useful end-to-end tests that would be even harder to come up with by hand. And again, if you have any example-based end-to-end tests or integration tests, you need to come up with generators and you can start converting them into property-based tests.
> but choosing one strategy over the other should be a conscious decision made by the team.
Ok. What prompted this? I never said otherwise. It's also not an either/or situation, which you seem to want to make it. As I wrote in that previous comment, you can use both and use the property-based tests to bolster the example-based tests, and convert counterexamples into more example-based tests for your regression suite.
> I haven't actually written PBTs myself.
Huh.