How much duplicated code do you tolerate?

RefactoringCoding StyleDryCode Duplication

Refactoring Problem Overview


In a recent code review I spotted a few lines of duplicated logic in a class (less than 15 lines). When I suggested that the author refactor the code, he argued that the code is simpler to understand that way. After reading the code again, I have to agree extracting the duplicated logic would hurt readability a little.

I know DRY is guideline, not an absolute rule. But in general, are you willing to hurt readability in the name of DRY?

Refactoring Solutions


Solution 1 - Refactoring

Refactoring: Improving the Design of Existing Code

The Rule of Three

> The first time you do something, you > just do it. The second time you do
> something similar, you wince at the duplication, but you do the duplicate
> thing anyway. The third time you do something similar, you refactor.

Three strikes and you refactor.


Coders at Work

> Seibel: So for each of these XII calls you're writing an > implementation.
> Did you ever find that you were accumulating lots of > bits of very similar code? > > Zawinski: Oh, yeah, definitely. Usually by the second or third time > you've cut and pasted
> that piece of code it's like, alright, time to stop > cutting and pasting and put it in a > subroutine.

Solution 2 - Refactoring

I tolerate none. I may end up having some due to time constraints or whatnot. But I still haven't found a case where duplicated code is really warranted.

Saying that it'll hurt readability only suggests that you are bad at picking names :-)

Solution 3 - Refactoring

Personally, I prefer keeping code understandable, first and foremost.

DRY is about easing the maintenance in code. Making your code less understandable in order to remove repeated code hurts the maintainability more, in many cases, than having some repeated lines of code.

That being said, I do agree that DRY is a good goal to follow, when practical.

Solution 4 - Refactoring

If the code in question has a clear business or technology-support purpose P, you should generally refactor it. Otherwise you'll have the classic problem with cloned code: eventually you'll discover a need to modify code supporting P, and you won't find all the clones that implement it.

Some folks suggest 3 or more copies is the threshold for refactoring. I believe that if you have two, you should do so; finding the other clone(s) [or even knowing they might exist] in a big system is hard, whether you have two or three or more.

Now this answer is provided in the context of not having any tools for finding the clones. If you can reliably find clones, then the original reason to refactor (avoiding maintenance errors) is less persausive (the utility of having a named abstraction is still real). What you really want is a way to find and track clones; abstracting them is one way to ensure you can "find" them (by making finding trivial).

A tool that can find clones reliably can at least prevent you from making failure-to-update-clone maintenance errors. One such tool (I'm the author) is the http://www.semanticdesigns.com/Products/Clone">CloneDR</a>;. CloneDR finds clones using the targeted langauge structure as guidance, and thus finds clones regardless of whitespace layout, changes in comments, renamed variables, etc. (It is implemented for a number a languages including C, C++, Java, C#, COBOL and PHP). CloneDR will find clones across large systems, without being given any guidance. Detected clones are shown, as well as the antiunifier, which is essentially the abstraction you might have written instead. Versions of it (for COBOL) now integrate with Eclipse, and show you when you are editing inside a clone in a buffer, as well as where the other clones are, so that you may inspect/revise the others while you are there. (One thing you might do is refactor them :).

I used to think cloning was just outright wrong, but people do it because they don't know how the clone will vary from the original and so the final abstraction isn't clear at the moment the cloning act is occurring. Now I believe that cloning is good, if you can track the clones and you attempt to refactor after the abstraction becomes clear.

Solution 5 - Refactoring

As soon as you repeat anything you're creating multiple places to have make edits if you find that you've made a mistake, need to extend it, edit, delete or any other of the dozens of other reasons you might come up against that force a change.

In most languages, extracting a block to a suitably named method can rarely hurt your readability.

It is your code, with your standards, but my basic answer to your "how much?" is none ...

Solution 6 - Refactoring

you didn't say what language but in most IDEs it is a simple Refactor -> Extract Method. How much easier is that, and a single method with some arguments is much more maintainable than 2 blocks of duplicate code.

Solution 7 - Refactoring

Very difficult to say in abstract. But my own belief is that even one line of duplicated code should be made into a function. Of course, I don't always achieve this high standard myself.

Solution 8 - Refactoring

Refactoring can be difficult, and this depends on the language. All languages have limitations, and sometimes a refactored version of duplicated logic can be linguistically more complex than the repeated code.

Often duplications of code LOGIC occur when two objects, with different base classes, have similarities in the way they operate. For example 2 GUI components that both display values, but don't implement a common interface for accessing that value. Refactoring this kind of system either requires methods taking more generic objects than needed, followed by typechecking and casting, or else the class hierarchy needs to be rethought & restructured.

This situation is different than if the code was exactly duplicated. I would not necessarily create a new interface class if I only intended it to be used twice, and both times within the same function.

Solution 9 - Refactoring

I accept NO duplicate code. If something is used in more than one place, it will be part of the framework or at least a utility library.

The best line of code is a line of code not written.

Solution 10 - Refactoring

The point of DRY is maintainability. If code is harder to understand it's harder to maintain, so if refactoring hurts readability you may actually be failing to meet DRY's goal. For less than 15 lines of code, I'd be inclined to agree with your classmate.

Solution 11 - Refactoring

In general, no. Not for readability anyway. There is always some way to refactor the duplicated code into an intention revealing common method that reads like a book, IMO.

If you want to make an argument for violating DRY in order to avoid introducing dependencies, that might carry more weight, and you can get Ayende's opinionated opinion along with code to illustrate the point here.

Unless your dev is actually Ayende though I would hold tight to DRY and get the readability through intention revealing methods.

BH

Solution 12 - Refactoring

It really depends on many factors, how much the code is used, readability, etc. In this case, if there is just one copy of the code and it is easier to read this way then maybe it is fine. But if you need to use the same code in a third place I would seriously consider refactoring it into a common function.

Solution 13 - Refactoring

Readability is one of the most important things code can have, and I'm unwilling to compromise on it. Duplicated code is a bad smell, not a mortal sin.

That being said, there are issues here.

If this code is supposed to be the same, rather than is coincidentally the same, there's a maintainability risk. I'd have comments in each place pointing to the other, and if it needed to be in a third place I'd refactor it out. (I actually do have code like this, in two different programs that don't share appropriate code files, so comments in each program point to the other.)

You haven't said if the lines make a coherent whole, performing some function you can easily describe. If they do, refactor them out. This is unlikely to be the case, since you agree that the code is more readable embedded in two places. However, you could look for a larger or smaller similarity, and perhaps factor out a function to simplify the code. Just because a dozen lines of code are repeated doesn't mean a function should consist of that dozen lines and no more.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionSylvainView Question on Stackoverflow
Solution 1 - RefactoringNick DandoulakisView Answer on Stackoverflow
Solution 2 - RefactoringVinko VrsalovicView Answer on Stackoverflow
Solution 3 - RefactoringReed CopseyView Answer on Stackoverflow
Solution 4 - RefactoringIra BaxterView Answer on Stackoverflow
Solution 5 - RefactoringUnslicedView Answer on Stackoverflow
Solution 6 - Refactoringuser177800View Answer on Stackoverflow
Solution 7 - RefactoringanonView Answer on Stackoverflow
Solution 8 - RefactoringSanjay ManoharView Answer on Stackoverflow
Solution 9 - RefactoringTuring CompleteView Answer on Stackoverflow
Solution 10 - RefactoringDarrylView Answer on Stackoverflow
Solution 11 - RefactoringBerrylView Answer on Stackoverflow
Solution 12 - RefactoringJustin EthierView Answer on Stackoverflow
Solution 13 - RefactoringDavid ThornleyView Answer on Stackoverflow