Comment by odyssey7
6 hours ago
UC Berkeley: “Top-level functional equivalence requires that, for any possible set of inputs x, the two pieces of code produce the same output. … testing, or input-output (I/O) equivalence, is the default correctness metric used by the community. … It is infeasible to guarantee full top-level functional equivalence (i.e., equivalence for any value of x) with testing since this would require testing on a number of inputs so large as to be practically infinite.”
https://www2.eecs.berkeley.edu/Pubs/TechRpts/2025/EECS-2025-...
In practice mutation fuzz testers are able to whitebox see where branches are in the underlying code, with a differential fuzz test under that approach its generally able to fuzz over test cases that go over all branches.
So I think under some computer science theory case for arbitrary functions its not possible, but for the actual shape of behavior in question from this library I think its realistic that a decent corpus of 'real' examples and then differential fuzzing would give you more confidence that anyone has in nearly any program's correctness here on real Earth.
Yes, there are different levels of sureness being described.
When I hear guarantee, it makes me think of correctness proofs.
Confidence is more of a practical notion for how much you trust the system for a given use case. Testing can definitely provide confidence in this scenario.