← Back to context

Comment by rmunn

18 hours ago

I love property-based testing, especially the way it can uncover edge cases you wouldn't have thought about. Haven't used Hypothesis yet, but I once had FsCheck (property-based testing for F#) find a case where the data structure I was writing failed when there were exactly 24 items in the list and you tried to append a 25th. That was a test case I wouldn't have thought to write on my own, but the particular number (it was always the 25th item that failed) quickly led me to find the bug. Once my property tests were running overnight and not finding any failures after thousands and thousands of random cases, I started to feel a lot more confident that I'd nailed down the bugs.

I had a similar thing, with F# as well actually.

We had some code that used a square root, and in some very rare circumstances, we could get a negative number, which would throw an exception. I don't think i would have even considered that possibility if FsCheck hadn't generated it.

> Once my property tests were running overnight and not finding any failures

Quickly addressing the time it takes to run property-based tests, especially in Python: it's extremely helpful to make sure that running subsets of unit tests is convenient and documented and all your developers are doing it. Otherwise they may revolt against property-based testing because of the time it takes to run tests. Again, especially in Python.

I love them for this, too.

Sadly I have a really hard time getting teammates to agree to using property-based testing - or letting me use it - because they take "no non-deterministic tests" as ironclad dogma without really understanding the the principle's real intent.

(I can do it to find edge cases to convert to deterministic unit tests in the privacy of my own home, of course. But not being able to commit a library of useful strategies without people calling foul in code review is obnoxious.)

  • > "no non-deterministic tests"

    These tests can be made determenistic. Shrinking to the minimal reproducing case, is always deterministic. Also you can [re]start the tests with the specific seed - you typically log the seed, so you can replay the exact failure deterministically - prop-based tests are stochastic by design but reproducible by capture.

    • correct, I've done this at past places to verify rules engine output for variety of inputs.

That example caught my attention. What was it in your code that made length 24 special?

  • I was trying out an implementation of Relaxed Radix-Balanced Trees, a data structure for immutable lists that allows splitting and joining them in constant* time. In a normal RRB Tree implementation, each leaf in the tree has up to 32 items in it, and a tree node that's not a leaf has up to 32 pointers to leaves or to lower tree nodes. There's also a tail, a single array of up to 32 which, when full, will be promoted to a leaf and a new empty tail array created. (Or the full tail will be promoted the next time an item is appended and a new tail of length 1 will be created).

    For my implementation, I was using nodes of size 4 rather than size 32, because that let me test much easier; then after testing, I would switch the branching constant to 32 and run final tests. With a branching factor of 4, that means that a tree of size 24 contained one root node with two children, a left child that was a full tree of 16 (a tree node pointing to four leaf nodes, all of them full of 4 items), and a right child that was a leaf node of size 4. That makes 20 items in the tree, plus a full tail array of 4 making 24 items. The failure came when the next item was appended.

    Now, if you've never implemented an RRB tree you won't have noticed the mistake. But in the previous paragraph, when I said that the right child of the root was a leaf? That was the mistake; in an RRB tree all leaves are supposed to be at the same level, and the right child of the root should have been a one-child node, whose child was a leaf of size 4. My code that promoted the root up one level when it was full (when the tree was size 20, a full 16-item tree plus a tail of 4) was faulty, and had created a new root with the previous root as the left child and the previous tail as the right child, instead of creating a one-child node as the right child of the new root with the 4-item leaf as its child. The tree still functioned for all read-only purposes (traversal, lookups, etc) and even for appends, as long as the appends did no more than fill the tail. Once the tail was full (making a total of 24 items total stored in the structure) and I needed to append a 25th item, the tail-promotion code found an illegal tree state and did the wrong thing. I no longer remember which wrong thing it did, just that once that 25th item was appended, the resulting tree no longer passed all the property checks.

    * Constant for all practical purposes. In fact it's O(log32 N), but since log32 N is no more than 7 when N is 2^32, the operation is bounded by a constant. So it's O(1) for all practical purposes: most lists of any reasonable size will translate into trees whose height is no more than 2 or 3, which means only 2 or 3 nodes will need to be split when splitting the entire list (and the RRB tree structure that models the list) into two parts. And likewise for joining two RRB structures together (a list append operation): there might be up to O(H) merged nodes where H is the height of the tree, which again no more than log32 N for a tree of branching factor 32.