Comment by jakelazaroff

5 years ago

Conversely, fixing duplication is never hard. Just move the duplicated code into a function. Going in reverse can be much tougher if the function has become an abstraction, where you have to figure out what path each function call was actually taking.

Or, put another way: I'd much rather deal with duplication than with coupling problems.

The problem with duplication it is hard to spot and fix. Converting liters to ml or quarts isn't hard, but the factors are different, and there is also other units. If you only do a few of these isn't a big deal, but if you suddenly realize that you have tons of different conversions scattered around and you really need to implement a good unit conversion system it will be really hard to retrofit everything. Note that even if you have a literToml, Literto Quart and MileToKm functions retrofitting the right system will be hard. (Where I work we have gone through 4 different designs of a uber unit system module before we actually got all the requirements right, and each transition was a major problem)

> Conversely, fixing duplication is never hard. Just move the duplicated code into a function.

I think the single biggest factor determining the difficulty of a code change is the size of the codebase. Codebases with a lot of duplication are larger, and the scale itself makes them harder to deal with. You may not even realize when duplication exists because it may be spread throughout the code. It may be difficult to tell which code is a duplicate, which is different for arbitrary reasons, and which is different in subtle but important ways.

Once you get to a huge sprawl of code that has a ton of mostly-pointless repetition, it is a nightmare to tame it. I would much rather be handed a smaller but more intertwined codebase that only says something once.

I think the opposite is true. Bad abstractions can be automatically removed with copy/paste and constant propagation. N pieces of code that are mostly the same but have subtle differences have no automatic way to combine them into a single function.

  • N pieces of code that are mostly the same but have subtle differences isn't repetition and probably shouldn't be combined into a single function, especially if the process of doing so it non-obvious.

    • > N pieces of code that are mostly the same but have subtle differences isn’t repetition

      Often, they are.

      IME, a very common pattern is divergent copypasta where – because there is no connection once the copying occurs – a bug is noticed and fixed in one place and not in others, later noticed separately and fixed a slighly different way in some of the others, in still others a different thing that needs done in the same function gets inserted in between parts of the copypasta, etc. IT’s still essentially duplication – more over its still the same logical function being performed different places, but in slightly different ways, creating a singificant multiple on maintainance cost, which – not literal code duplication in and of itself, is the actual problem addressed with DRY, which is explicitly not about duplication of code but single source of truth of knowledge: “Every piece of knowledge must have a single, unambiguous, authoritative representation within a system”. Divergent implementations of the same logical function are different representations of the same knowledge.

    • Often it is. Earlier in this thread I used the example of a unit system - one example of there there can be a ton of repetition to remove, but there are fundamental differences between liters and meters that make removing the duplication hard if you didn't realize upfront you had a problem. Once you get it right converting meters to chains isn't that hard (I wonder how many reading even know chain was a unit of measure - much less have any idea how big it is), but there are a ton of choices to make it work.

    • I think they mean if the code does the same thing but has syntax differences. Variables are named differently, one uses a for loop while the other uses functional list operations, etc.

    • You never know if these subtle differences were intentional to begin with. It might have been the same once upon a time, but then during an emergency outage someone makes a quick fix in one place but forgets to update all other copies of this code.

      Repeating what others already mentioned, often it can be the same thing but written in a slightly different way. Even basic stuff like string formatting vs string concatenation can make it non-obvious that two pieces of code are copies.

The issue I have is that duplication is a coupling problem, but there’s no evidence in the coupled parts of the code that the coupling exists. It can be ok on a single-developer project or if confined to a single file, but my experience is that it’s a major contributor to unreliability, especially when all the original authors of the code have left.

If you find a bug in the duplicated part and has no idea that it was actually duplicated (or even if you do, where are they?), you still have multiple lurking bugs around.

Fixing duplication is never hard because by nature, duplicated code will drift over time even if it shouldn't have. So it's technically not "duplicate" anymore even if they are supposed to do the exact same thing.

Fixing a bad abstraction is only hard because there's some weird culture about not tearing down abstractions. Rip them apart and toss them on the heap. It's a million times easier than finding duplicate code that has inadvertently drifted apart over time.

  • There are certain abstractions that can cause real problems: XML-driven DI, bad ORMs, code generators, etc. But, in general, I agree: people are generally too unwilling to refactor aggressively.