Git nested submodules and dependencies

GitGit Submodules

Git Problem Overview


Let's say I have four projects named Core, A, B, Super. The dependency tree is like this:

Super ---> Core
       |-> A -> Core
       |-> B -> Core

I want each project to be stand-alone, that is, I want to be able to check-out and compile each project on its own (each with its dependencies of course).

I thought about mapping each project to a repository and then referring dependencies with submodules, but I see the following issues with that approach:

  1. When checking out Super with all its dependencies, I'd end up with three copies of Core.
  2. Since submodules are fully independent, each of these three copies could be pointing to different revisions of Core and that would be a mess.

So... Am I missing something? Did I misunderstand git submodules or misusing them? Is there any other solution to this problem (other than resorting to binary dependencies)?

Git Solutions


Solution 1 - Git

You just discovered the lack of overridden dependencies with Git submodules:

If Super depends on Core, its dependency of Core should "override" the ones A and B have with Core.

The only way to emulate that would be to create your Super project the way you did,
and to remove the sub-module Core of A and B.
(meaning Super depends now on A' and B', A' being A without Core, B' being B without Core)

Solution 2 - Git

git repositories should be fairly atomic in the way that each repository is a stand-alone entity for a specific purpose. What is the purpose of the super project other than combining projects A and B? If there isn't anything unique (i.e. files that are not in either A, B or Core) then it's fairly redundant.

EDIT: Because git submodules are especially painful at one place that I worked at we set up our own dependency system that tracks dependent repos via text files. We set it up so it always tracks the head of a branch, not a particular commit.

We were able to set up all our projects as though they are a part of the Super project like this:

Super
|-A
|-B
|-Core

The projects will reference each other using relative paths e.g. ../A/include.h. Checking out repo A will not work, you would have to create another "super" repo for working just on A:

AWorking
|-A
|-Core

EDIT Another reason for this behaviour in git is that it can't track things that are above the root repo directory(i.e. above the folder containing .git folder), which would definitely be required if you want your super-projects and sub-projects refer to the same repositories.

Solution 3 - Git

I think the issue here is a mismatch between the design of Git and the problem you are looking to solve.

Git is good for keeping track of Trees. Dependency relationships between projects can (and likely do) form a Graph. A Tree is a Graph but a Graph is not necessarily a Tree. Since your problem is how to effectively represent a Graph, a Tree is not the best tool for the job.

Here's an approach that might work:

A git project has a .gitmodules directory where it records "hints" stating which projects a commit may depend on, where they can be found, and what path inside the project they are expected to be inserted at. ( http://osdir.com/ml/git/2009-04/msg00746.html )

You could add a script which reads this information from a set of projects, maps the hints found in each project's .gitmodules file to the locations on the filesystem where those projects have actually been placed, and then adds symbolic links from the paths where git expects to check out submodules to the actual filesystem locations of the respective projects.

This approach uses symbolic links to break out of the Tree mold and build a Graph. If we record the links directly in git repos, we'd have relative paths specific to our local setup recorded in the individual projects, and the projects wouldn't be 'fully independent' like you wanted. Hence, the script to dynamically build the symlinks.

I'm thinking this approach might interfere with git in undesirable ways, since we've taken paths where it expects to find one thing, and put something else there instead. Maybe we could .gitignore the symlink paths. But now we're writing those paths down twice and violating DRY. At this point we've also gotten pretty far away from pretending to use submodules. We could record the dependencies elsewhere in each project, and leave the .gitmodules file for the things git expects. So we'll make up our own file, say, .dependencies, and each project can state its dependencies there. Our script will look there and then go and build its symlinks.

Hmm, I think I may have just described an ad-hoc package management system, with its own lightweight package format :)

megamic's suggestion seems like a good use of git submodules to me. We're only dealing with keeping track of a Set here rather than a Graph, and a Set fits easily into a Tree. A Tree one level deep is essentially a parent node and a Set of child nodes.

As you pointed out, that does not completely solve the problem stated in your question. We can break out two distinct types of "this works with that" information we're likely interested in:

  1. A statement from a version of a project (presumably by the project's author) saying "I require version X of project Y"
  2. A statement used by your own build setup saying "I've successfully tested our whole system using this set of project versions"

megamic's answer solved (2) but for (1) we still want projects to tell us what their dependencies are. Then we can use the info from (1) to compute those version sets which we'll end up recording as (2). This is a complex enough problem to warrant its own tool, which brings us back to package management systems :)

As far as I know, most of the good package management tools are made for users of a specific language or operating system. See Bundler for 'gem' packages in the ruby world and apt for '.deb' packages in the Debian world.

If anyone knows of a good language-neutral, OS-neutral solution to this that is well-suited to 'polyglot' ( http://blog.heroku.com/archives/2011/8/3/polyglot_platform/ ) programming projects, I would be very interested! I should post that as a question.

Solution 4 - Git

I think you can manage consistency like this: define a "reference" branch or series of tag with the same name(s) across all your "Core" libraries (note: there is only one "Core" library in your example). Then instruct developers of sub-projects (A, B,...) to regularly upgrade to the reference version of "Core" as soon as they can.

Before running a build, easily check that "Core(s)" is consistently used across A, B, C,... by running these three commands in a clean, recursive, "Super", top-level checkout:

# 1.  Switch to the reference version (= "Force" consistency where need be)
git submodule foreach --recursive 'git checkout [origin/]reference || true'

# 2a. Show which inconsistencies you just forced; terse output
git status -s; git submodule foreach --recursive git status -s 2>/dev/null

# 2b. Same but verbose output
git submodule; git submodule foreach --recursive git submodule

# 3. Switch back to versions individually defined by sub-projects 
git submodule update --recursive

The "Terse output" command 2a above highlights which sub-project(s) are not using the "reference" version of Core.

You can easily extend the approach to show diffs, force upgrades, or do any other thing you like.

Solution 5 - Git

A small utility task turning shared submodules into clones using hard-links might work.

You may read my full solution here: https://stackoverflow.com/a/10265084/84283

Solution 6 - Git

I would not try and map a dependency tree with sub-modules - for the reasons you have already discovered.

Sub-modules track a given revision of a given branch, so they are useful of giving a snapshot of a consistent set of modules.

So, if your project required a certain set of versions of different modules to be tracked as one unit, you can group them together as sub-modules. You can then tag different set of modules at different versions, to give a history of the project, where each tag shows what versions of what modules were compatible at a point in time.

 tags/
     release1/ 
           |-> A@1.0
           |-> B@1.1
           |-> C@1.2
     release2/
           |-> A@2.0
           |-> B@1.3
           |-> C@1.5

At least that how I understand them, although like most things with Git, there is probably a whole lot more to it than that. In terms of managing dependencies, all I can say is find another way, it is not what Git with or without sub-modules was designed for as I understand it.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMauricio SchefferView Question on Stackoverflow
Solution 1 - GitVonCView Answer on Stackoverflow
Solution 2 - GitIgor ZevakaView Answer on Stackoverflow
Solution 3 - GitCharlieView Answer on Stackoverflow
Solution 4 - GitMarcHView Answer on Stackoverflow
Solution 5 - GitAntonin HildebrandView Answer on Stackoverflow
Solution 6 - Gitaaa90210View Answer on Stackoverflow