Comment by dang

5 years ago

I puzzled about that for years and concluded that tests are a completely different kind of system, best thought of as executable requirements or executable documentation. For tests, you don't want a well-factored graph of abstractions—you want a flat set of concrete examples, each independently understandable. Duplication helps with that, and since the tests are executable, the downsides of duplication don't bite as hard.

A test suite with a lot of factored-out common bits makes the tests harder to understand. It's similar to the worked examples in a math textbook. If half a dozen similar examples factored out all the common bits (a la "now go do sub-example 3.3 and come back here", and so on), they would be harder to understand than repeating the similar steps each time. They would also start to use up the brain's capacity for abstraction, which is needed for understanding the math that the exercises illustrate.

These are two different cognitive styles: the top-down abstract approach of definitions and proofs, and the bottom-up concrete approach of examples and specific data. The brain handles these differently and they complement one another nicely as long as you keep them distinct. Most of us secretly 'really' learn the abstractions via the examples. Something clicks in your head as you grok each example, which gives you a mental model for 'free', which then allows you to understand the abstract description as you read it. Good tests do something like this for complex software.

Years ago when I used to consult for software teams, I would sometimes see test systems that had been abstracted into monstrosities that were as complicated as the production systems they were trying to test, and even harder to understand, because they weren't the focus of anybody's main attention. No one really cares about it, and customers don't depend on it working, so it becomes a twilight zone. Bugs in such test layers were hard to track down because no one was fresh on how they worked. Sometimes it would turn out that the production system wasn't even being tested—only the magic in the monster middle layer.

An example would be factory code to initialize objects for testing, which gradually turns into a complex network of different sorts of factory routines, each of which contribute some bit and not others. Then one day there's a problem because object A needs something from both factory B and factory C, but other bits aren't compatible, so let's make a stub bit instead and pass that in... All of this builds up ad hoc into one of those AI-generated paintings that look sort of like reality but also like a nightmare or a bad trip. The solution in such cases was to gradually dissolve the middle layer by making the tests as 'naked' as possible, and the best technique we had for that was to shamelessly duplicate whatever data and even code we needed to into each concrete test. But the same technique would be disastrous in the production system.