Are subversion externals an antipattern?

SvnVersion ControlAnti PatternsSvn Externals

Svn Problem Overview


Subversion lets you embed working copies of other repositories using externals, allowing easy version control of third-party library software in your project.

While these seem ideal for the reuse of libraries and version control of vendor software, they aren't without their critics:

>Please don't use Subversion externals (or similar in other tools), they are an anti-pattern and, therefore, unnecessary

Are there hidden risks in using externals? Please explain why they would they be considered an antipattern.

Svn Solutions


Solution 1 - Svn

I am the author of the quote in the question, which came from a previous answer.

Jason is right to be suspicious of brief statements such as mine, and to ask for an explanation. Of course, if I fully explained everything in that answer, I would need to have written a book.

Mike is also right to point out that one of the problems with an svn:external-like feature is that changes in the targeted source could break your own source, especially if that targeted source is in a repository that you do not own.

In further explaining my comment, let me first say that there are "safe" ways to use the svn:external-like feature, just as with any other tool or feature. However, I refer to it as an antipattern because the feature is far more likely to be misused. In my experience, it has always been misused, and I find myself very unlikely to ever use it in that safe manner nor to ever recommend that use. Please further note that I mean NO disparagement to the Subversion team--I love Subversion, although I plan to move on to Bazaar.

The primary issue with this feature is that it encourages and it is typically used to directly link the source of one build ("project") to the source of another, or to link the project to a binary (DLL, JAR, etc.) on which it depends. Neither of these uses is wise, and they constitute an antipattern.

As I said in my other answer, I believe that an essential principle for software builds is that each project constructs exactly ONE binary or primary deliverable. This can be considered an application of the principle of separation of concerns to the build process. This is particularly true regarding one project directly referencing the source of another, which is also a violation of the principle of encapsulation. Another form of this kind of violation is attempting to create a build hierarchy to construct an entire system or subsystem by recursively invoking sub-builds. Maven strongly encourages/enforces this behavior, which is one of the many reasons that I don't recommend it.

Finally, I find that there are various practical matters that make this feature undesirable. For one, svn:external has some interesting behavioral characteristics (but the details escape me for the moment). For another, I always find that I need such dependencies to be explicitly visible to my project (build process), not buried as some source control metadata.

So, what is a "safe" manner of using this feature? I would consider that to be when it is used temporarily by only one person, such as a way to "configure" a working environment. I could see where a programmer might create their own folder in the repository (or one for each programmer) where they would configure svn:external links to the various other parts of the repository that they are currently working on. Then, a checkout of that one folder will create a working copy of all their current projects. When a project is added or finished, the svn:external definitions could be adjusted and the working copy updated appropriately. However, I prefer an approach that is not tied to a particular source control system, such as doing this with a script that invokes the checkouts.

For the record, my most recent exposure to this issue occurred during the summer of 2008 at a consulting client that was using svn:external on a massive scale--EVERYTHING was cross-linked to produce a single master working copy. Their Ant & Jython-based (for WebLogic) build scripts were built on top of this master working copy. The net result: NOTHING could be built stand-alone, there were literally dozens of subprojects, but not one was safe to checkout/work on by itself. Therefore, any work on this system first required a checkout/update of over 2 GB of files (they put binaries in the repository also). Getting anything done was a exercise in futility, and I left after trying for three months (there were many other antipatterns present as well).

EDIT: Expound on recursive builds -

Over the years (especially the last decade), I have built massive systems for Fortune 500 companies and large government agencies involving many dozens of subprojects arranged in directory hierarchies that are many levels deep. I have used Microsoft Visual Studio projects/solutions to organize .NET-based systems, Ant or Maven 2 for Java-based systems, and I have begun using distutils and setuptools (easyinstall) for Python-based systems. These systems have also included huge databases typically in Oracle or Microsoft SQL Server.

I have had great success designing these massive builds for ease of use and repeatability. My design standard is that a new developer can show up on their first day, be given a new workstation (perhaps straight from Dell with just a typical OS installation), be given a simple setup document (usually just one page of installation instructions), and be able to fully setup the workstation and build the full system from source, unsupervised, unassisted, and in half a day or less. Invoking the build itself involves opening a command shell, changing to the root directory of the source tree, and issuing a one-line command to build EVERYTHING.

Despite that success, constructing such a massive build system requires great care and close adherence to solid design principles, just as with constructing a massive business-critical application/system. I have found that a crucial part is that each project (which produces a single artifact/deliverable) must have a single build script, which must have a well-defined interface (commands for invoking portions of the build process), and it must stand alone from all other (sub)projects. Historically, it is easy to build the whole system, but hard/impossible to build only one piece. Only recently have I learned to carefully ensure that each project truly stands alone.

In practice, this means that there must be at least two layers of build scripts. The lowest layer are the project build scripts that produce each deliverable/artifact. Each such script resides in the root directory of its project source tree (indeed, this script DEFINES its project source tree), these scripts know nothing about source control, they expect to be run from the command line, they reference everything in the project relative to the build script, and they reference their external dependencies (tools or binary artifacts, no other source projects) based on a few configurable settings (environment variables, configuration files, etc.).

The second layer of build scripts is also intended to be invoked from the command line, but these know about source control. Indeed, this second layer is often a single script that is invoked with a project name and a version, then it checks out the source for the named project to a new temporary directory (perhaps specified on the command line) and invokes its build script.

There may need to be more variation to accommodate continuous integration servers, multiple platforms, and various release scenarios.

Sometimes there is a need for a third layer of scripts that invokes the second layer of scripts (which invoke the first layer) for the purpose of building specific subsets of the overall project set. For example, each developer may have their own script that builds the projects that they are working on today. There may be a script to build everything in order to generate the master documentation, or to calculate metrics.

Regardless, I have found that attempting to treat the system as a hierarchy of projects is counterproductive. It ties the projects to each other so that they cannot be freely built alone, or in arbitrary locations (temporary directory on the continuous integration server), or in arbitrary order (assuming dependencies are satisfied). Often, attempting to force a hierarchy breaks any IDE integration that one might attempt.

Finally, building a massive hierarchy of projects can simply be too performance intensive. For example, during the spring of 2007 I attempted a modest source hierarchy (Java plus Oracle) that I built using Ant, which eventually failed because the build always aborted with a Java OutOfMemoryException. This was on a 2 GB RAM workstation with 3.5 GB swap space for which I had tuned the JVM to be able to use all available memory. The application/system was relatively trivial in terms of amount of code, but the recursive build invocations eventually exhausted memory, no matter how much memory I gave it. Of course, it also took forever to execute as well (30-60 minutes was common, before it aborted). I know how to tune VERY well, but ultimately I was simply exceeding the limits of the tools (Java/Ant in this case).

So do yourself a favor, construct your build as stand-alone projects, then compose them into a full system. Keep it light and flexible. Enjoy.

EDIT: More on antipatterns

Strictly speaking, an antipattern is a common solution that looks like it solves the problem but doesn't, either because it leaves important gaps or because it introduces additional problems (often worse than the original problem). A solution necessarily involves one or more tools plus the technique for applying them to the problem at hand. Therefore, it is a stretch to refer to a tool or a specific feature of a tool as an antipattern, and it seems that people are detecting and reacting to that stretch--fair enough.

On the other hand, since it seems to be common practice in our industry to focus on tools rather than technique, it is the tool/feature that gets the attention (a casual survey of questions here on StackOverflow seems to easily illustrate). My comments, and this question itself, reflect that practice.

However, sometimes it seems particularly justified to make that stretch, such as in this case. Some tools seem to "lead" the user to particular techniques for applying them, to the point where some argue that tools shape thought (slightly rephrased). It is mostly in that spirit that I suggest that svn:external is an antipattern.

To more strictly state the issue, the antipattern is to design a build solution that includes tying projects together at the source level, or to implicitly version the dependencies between projects, or to allow such dependencies to implicitly change, because each of these invokes very negative consequences. The nature of the svn:external-like feature makes avoiding those negative consequences very difficult.

Properly handling the dependencies between projects involves addressing those dynamics along with the base problem, and the tools and techniques lead down a different path. An example that should be considered is Ivy, which helps in a manner similar to Maven but without the many downsides. I am investigating Ivy, coupled with Ant, as my short-term solution to the Java build problem. Long term, I am looking to incorporate the core concepts and features into an open-source tool that facilitates a multiplatform solution.

Solution 2 - Svn

I don't think this is an anti-pattern at all. I did a few quick searches on google and came up with basically nothing... nobody is complaining that using svn:externals is bad or harmful. Of course there are some caveats that you have to be aware of... and it's not something that you should just sprinkle heavily into all of your repositories... but as for the original quotation, that's just his personal (and subjective) opinion. He never really discussed svn:externals, except to condemn them as an anti-pattern. Such sweeping statements without any support or at least reasoning as to how the person came to make the statement are always suspect.

That said, there are some issues with using externals. Like Mike answered, they can be very helpful for pointing to stable branches of released software... especially software that you already control. We use them internally in a number of projects for utility libraries and such. We have a small group that enhances and works on the utility library base, but that base code is shared across a number of projects. We don't want various teams just checking in utility project code and we don't want to deal with a million branches, so for us svn:externals works very well. For some people, they may not be the answer. However, I would strongly disagree with the statement "Please don't use..." and that these tools represent an anti-pattern.

Solution 3 - Svn

The main risk with using svn:externals is that the referenced repository will be changed in a way that breaks your code or introduces a security vulnerability. If the external repository is also under your control, then this may be acceptable.

Personally, I only use svn:externals to point to "stable" branches of a repository that I own.

Solution 4 - Svn

An old thread, but I want to address the concern that a changing external could break your code. As pointed out previously, this is most often due to an incorrect usage of the external property. External references should, in almost all instances, point to a specific revision number in the external repository URI. This ensures that the external will never change unless you change it to point to a different revision number.

For some of our internal libraries, which we use as externals in our end-user projects, I've found it useful to create a tag of the library at Major.Minor version, where we enforce no breaking changes. With a four-point versioning scheme (Major.Minor.BugFix.Build), we allow the tag to be kept current with BugFix.Build changes (again, enforcing no breaking changes). This allows us to use an external reference to the tag without a revision number. In the case of major or other breaking changes, a new tag is created.

Externals themselves aren't bad, but that doesn't stop people from creating bad implementations of them. It doesn't take much research, just a little bit of reading through some documentation, to learn how to use them safely and effectively.

Solution 5 - Svn

If plain external is an anti-pattern because it can break your repository, then one with explicit revision should'nt.

Excerpt from svn book:

> An externals definition is a mapping of a local directory to the URL**—and possibly a particular revision—**of a versioned resource.

I think it's all depend your purpose of using the feature, it is not an anti-pattern by itself.

Solution 6 - Svn

There are definite flaws in subversion externals, but we seem to use them reasonably successfully for including libraries (both our own and vendor) that the current project depends on. So I don't see them as an "anti-pattern". The important usage points for me are:

  • They point to a specific revision or tag (never the head) of the other project.
  • They are inserted into the current project well away from its own source code etc (e.g. in a subdirectory called "support files").
  • They refer only to the other projects "interface" files (e.g. include folder) and binary libraries (i.e. we don't get the full source of the other project).

I too would be interested in any major risks of this arrangement, and better approaches.

Solution 7 - Svn

Saying that a is b does not make a a b unless you say why this is so.

The main flaw I see with external references in subversion is that you're not guaranteed that the repository is present when you update your working copy.

Subversion external references can be used, and abused, and the feature itself is nothing but just that, a feature. It cannot be said to be a pattern, nor a antipattern.

I've read the answer by the person you quote, and I must say that I disagree. If your project requires files version XYZ from a repository, an external subversion reference can easily give you that.

Yes, you can use it wrong by not specifically specifying which version of that reference you need. Will that give you problems? Likely!

Is it an antipattern? Well, it depends. If you follow the link given by the author of the text you quote, ie. here, then no. That something can be used to provide a bad solution does not make the entire method of doing so an antipattern. If that was the rule, then I would say that programming languages by and large are antipatterns, because in every programming language you can make bad solutions.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionKenView Question on Stackoverflow
Solution 1 - SvnRob WilliamsView Answer on Stackoverflow
Solution 2 - SvnJason CocoView Answer on Stackoverflow
Solution 3 - SvnMikeView Answer on Stackoverflow
Solution 4 - Svnulty4lifeView Answer on Stackoverflow
Solution 5 - SvnsmoothdeveloperView Answer on Stackoverflow
Solution 6 - SvnluapyadView Answer on Stackoverflow
Solution 7 - SvnLasse V. KarlsenView Answer on Stackoverflow