Comment by simonw

2 months ago

Depends on how comprehensive the test suite is.

And OK it's not equivalent to a formal proof, but passing 1,000+ tests that cover every aspect of the specification is pretty close from a practical perspective, especially for a visual formatting tool.

10 comments

simonw

boxed 2 months ago

With mutation testing you can guarantee that all the behavior in the code is tested.

odyssey7 2 months ago
UC Berkeley: “Top-level functional equivalence requires that, for any possible set of inputs x, the two pieces of code produce the same output. … testing, or input-output (I/O) equivalence, is the default correctness metric used by the community. … It is infeasible to guarantee full top-level functional equivalence (i.e., equivalence for any value of x) with testing since this would require testing on a number of inputs so large as to be practically infinite.”
https://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-...
- esrauch 2 months ago
  
  In practice mutation fuzz testers are able to whitebox see where branches are in the underlying code, with a differential fuzz test under that approach its generally able to fuzz over test cases that go over all branches.
  So I think under some computer science theory case for arbitrary functions its not possible, but for the actual shape of behavior in question from this library I think its realistic that a decent corpus of 'real' examples and then differential fuzzing would give you more confidence that anyone has in nearly any program's correctness here on real Earth.
  
  1 reply →
- boxed 2 months ago
  
  How is that relevant for mutation testing?
wizzwizz4 2 months ago
You can guarantee that all the cases in the code are tested. That doesn't necessarily mean that all the behaviour is tested. If two implementations use very different approaches, which happen to have different behaviour on the Mersenne primes (for deep mathematical reasons), but one of them special-cases byte values using a lookup table generated from the other, you wouldn't expect mutation testing to catch the discrepancy. Each implementation is still the local optimum as far as passing tests is concerned, and the mutation test harness wouldn't know that "disable the small integer cache" is the kind of mutation that shouldn't affect whether tests pass.
There are only 8 32-bit Mersenne primes, 4 of which are byte-valued. Fuzzing might catch the bug, if it happened to hit one of the four other 32-bit Mersenne primes (which, in many fuzzers, is more likely than a uniform distribution would suggest), but I'm sure you can imagine situations where it wouldn't.
- boxed 2 months ago
  
  > but one of them special-cases byte values using a lookup table generated from the other, you wouldn't expect mutation testing to catch the discrepancy
  Sure you would. If the mutation tester mutates that lookup table. Which is quite easy to do, and which mutmut will do (if that lookup table is inside a function, because mutmut is based on mutant schemata).
  
  1 reply →
- JonChesterfield 2 months ago
  
  I think if you hit full path coverage in each of them independently and run all the cases through both and check they're consistent you're still done.
  Or branch coverage for the lesser version, the idea is still to generate interesting cases based on each implementation, not based solely on one of them.
  
  1 reply →