Comment by WorldMaker

5 years ago

> It's OK to duplicate code in places where the abstractions are far enough apart that the alternative is worse.

I don't recall where I picked up from, but the best advice I've heard on this is a "Rule of 3". You don't have a "pattern" to abstract until you reach (at least) three duplicates. ("Two is a coincidence, three is pattern. Coincidences happen all the time.") I've found it can be a useful rule of thumb to prevent "premature abstraction" (an understandable relative of "premature optimization"). It is surprising sometimes the things you find out about the abstraction only happen when you reach that third duplicate (variables or control flow decisions, for instance, that seem constants in two places for instance; or a higher level idea of why the code is duplicated that isn't clear from two very far apart points but is clearer when you can "triangulate" what their center is).

I don't hate the rule of 3. But i think it's missing the point.

You want to extract common code if it's the same now, and will always be the same in the future. If it's not going to be the same and you extract it, you now have the pain of making it do two things, or splitting. But if it is going to be the same and you don't extract it, you have the risk of only updating one copy, and then having the other copy do the wrong thing.

For example, i have a program where one component gets data and writes it to files of a certain format in a certain directory, and another component reads those files and processes the data. The code for deciding where the directory is, and what the columns in the files are, must be the same, otherwise the programs cannot do their job. Even though there are only two uses of that code, it makes sense to extract them.

Once you think about it this way, you see that extraction also serves a documentation function. It says that the two call sites of the shared code are related to each other in some fundamental way.

Taking this approach, i might even extract code that is only used once! In my example, if the files contain dates, or other structured data, then it makes sense to have the matching formatting and parsing functions extracted and placed right next to each other, to highlight the fact that they are intimately related.

  • > You want to extract common code if it's the same now, and will always be the same in the future.

    I suppose I take that as a presumption before the Rule of 3 applies. I generally assume/take for granted that all "exact duplicates" that "will always be the same in future" are going to be a single shared function anyway. The duplication I'm concerned about when I think the Rule of 3 comes into play is the duplicated but diverging. ("I need to do this thing like X does it, but…")

    If it's a simple divergence, you can add a flag sometimes, but the Rule of 3 suggests that sometimes duplicating it and diverging it that second time "is just fine" (avoiding potentially "flag soup") until you have a better handle on the pattern for why you are diverging it, what abstraction you might be missing in this code.

  • The rule of three is a guideline or principle, not a strict rule. There's nothing about it that misses the point. If, from your experience and judgement, the code can be reused, reuse it. Don't duplicate it (copy/paste or write it a second time). If, from your experience and judgement, it oughtn't be reused, but you later see that you were wrong, refactor.

    In your example, it's about modularity. The directory logic makes sense as its own module. If you wrote the code that way from the start, and had already decoupled it from the writer, then reuse is obvious. But if the code were tightly coupled (embedded in some fashion) within the writer, than rewriting it would be the obvious step because reuse wouldn't be practical without refactoring. And unless you can see how to refactor it already, then writing it the second time (or third) can help you discover the actual structure you want/need.

    As people become more experienced programmers, the good ones at least, already tend to use modular designs and keep things decoupled which promotes reuse versus copy/paste. In that case, the rule of three gets used less often by them because they have fewer occurrences of real duplication.

    • I think the point you and a lot of other commenters make is that applying hard and fast rules without referring to context is simply wrong. Surely if all we had to do was apply the rules, somebody would have long ago written a program to write programs. ;-)

  • You have a point for extracting exact duplicates that you know will remain the same.

    But the point of the rule of 3 remains. Humans do a horrible job of abstracting from one or two examples, and the act of encountering an abstraction makes code harder to understand.

> “premature abstraction”

Also known as, “AbstractMagicObjectFactoryBuilderImpl” that builds exactly one (1) factory type that generates exactly (1) object type with no more than 2 options passed into the builder and 0 options passed into the factory. :-)

The Go proverb is "A little copying is better than a little dependency." Also don't deduplicate 'text' because it's the same, deduplicate implementations if they match in both mechanism 'what' it does as well as their semantic usage. Sometimes the same thing is done with different intents which can naturally diverge and the premature deduplication is debt.

I'm coming to think that the rule of three is important within a fairly constrained context, but that other principle is worthwhile when you're working across contexts.

For example, when I did work at a microservices shop, I was deeply dissatisfied with the way the shared utility library influenced our code. A lot of what was in there was fairly throw-away and would not have been difficult to copy/paste, even to four or more different locations. And the shared nature of the library meant that any change to it was quite expensive. Technically, maybe, but, more importantly, socially. Any change to some corner of the library needed to be negotiated with every other team that was using that part of the library. The risk of the discussion spiraling away into an interminable series of bikesheddy meetings was always hiding in the shadows. So, if it was possible to leave the library function unchanged and get what you needed with a hack, teams tended to choose the hack. The effects of this phenomenon accumulated, over time, to create quite a mess.

  • An old senior colleague of mine used to insist that if i added a script to the project, i had to document it on the wiki. So i just didn't add my scripts to the project.

  • I'd argue if the code was "fairly throw-away" it probably did not meet the "Rule of 3" by the time it was included in the shared library in the first place.