← Back to context

Comment by manmal

8 days ago

> Tests are the source of truth more so than your code

Tests poke and prod with a stick at the SUT, and the SUT's behaviour is observed. The truth lives in the code, the documentation, and, unfortunately, in the heads of the dev team. I think this distinction is quite important, because this question:

> Do we have a bug? Or do we have a bad test?

cannot be answered by looking at the test + the implementation. The spec or people have to be consulted when in doubt.

> The spec

The tests are your spec. They exist precisely to document what the program is supposed to do for other humans, with the secondary benefit of also telling a machine what the program is supposed to do, allowing implementations to automatically validate themselves against the spec. If you find yourself writing specs and tests as independent things, that's how you end up with bad, brittle tests that make development a nightmare — or you simply like pointless busywork, I suppose.

But, yes, you may still have to consult a human if there is reason to believe the spec isn't accurate.

  • Unfortunately, tests can never be a complete specification unless the system is simple enough to have a finite set of possible inputs.

    For all real-world software, a test suite tests a number of points in the space of possible inputs and we hope that those points generalize to pinning down the overall behavior of the implementation.

    But there's no guarantee of that generalization. An implementation that fails a test is guaranteed to not implement the spec, but an implementation that passes all of the tests is not guaranteed to implement it.

    • > Unfortunately, tests can never be a complete specification

      They are for the human, which is the intended recipient.

      Given infinite time the machine would also be able to validate against the complete specification, but, of course, we normally cut things short because we want to release the software in a reasonable amount of time. But, as before, that this ability exists at all is merely a secondary benefit.

  •   > The tests are your spec.
    

    That's not quite right, but it's almost right.

    Tests are an *approximation* of your spec.

    Tests are a description, and like all descriptions are noisy. The thing is it is very very difficult to know if your tests have complete coverage. It's very hard to know if your description is correct.

    How often do you figure out something you didn't realize previously? How often do you not realize something and it's instead pointed out by your peers? How often do you realize something after your peers say something that sparks an idea?

    Do you think that those events are over? No more things to be found? I know I'm not that smart because if I was I would have gotten it all right from the get go.

    There are, of course, formal proofs but even they aren't invulnerable to these issues. And these aren't commonly used in practice and at that point we're back to programming/math, so I'm not sure we should go down that route.

    • > Tests are a description

      As is a spec. "Description" is literally found in the dictionary definition. Which stands to reason as tests are merely a way to write a spec. They are the same thing.

      > The thing is it is very very difficult to know if your tests have complete coverage.

      There is no way to avoid that, though. Like you point out, not even formal proofs, the closest speccing methodology we know of to try and avoid this, is immune.

      > Tests are an approximation of your spec.

      Specs are an approximation of what you actually want, sure, but that does not change that tests are the spec. There are other ways to write a spec, of course, but if you went down that road you wouldn't also have tests. That would be not only pointless, but a nightmare due to not having a single source of truth which causes all kinds of social (and sometimes technical) problems.

      6 replies →

None of the four: code, tests, spec, people's memory, are the single source of truth.

It's easy to see them as four cache layers, but empirically it's almost never the case that the correct thing to do when they disagree is to blindly purge and recreate levels that are farther from the "truth" (even ignoring the cost of doing that).

Instead, it's always an ad-hoc reasoning exercise in looking at all four of them, deciding what the correct answer is, and updating some or all of them.