Git-Based Source Control in the Enterprise: Suggested Tools and Practices?

GitEnterprise

Git Problem Overview


I use git for personal projects and think it's great. It's fast, flexible, powerful, and works great for remote development.

But now it's mandated at work and, frankly, we're having problems.

Out of the box, git doesn't seem to work well for centralized development in a large (20+ developer) organization with developers of varying abilities and levels of git sophistication - especially compared with other source-control systems like Perforce or Subversion, which are aimed at that kind of environment. (Yes, I know, Linus never intended it for that.)

But - for political reasons - we're stuck with git, even if it sucks for what we're trying to do with it.

Here are some of the things we're seeing:

  • The GUI tools aren't mature
  • Using the command line tools, it's far to easy to screw up a merge and obliterate someone else's changes
  • It doesn't offer per-user repository permissions beyond global read-only or read-write privileges
  • If you have a permission to ANY part of a repository, you can do that same thing to EVERY part of the repository, so you can't do something like make a small-group tracking branch on the central server that other people can't mess with.
  • Workflows other than "anything goes" or "benevolent dictator" are hard to encourage, let alone enforce
  • It's not clear whether it's better to use a single big repository (which lets everybody mess with everything) or lots of per-component repositories (which make for headaches trying to synchronize versions).
  • With multiple repositories, it's also not clear how to replicate all the sources someone else has by pulling from the central repository, or to do something like get everything as of 4:30 yesterday afternoon.

However, I've heard that people are using git successfully in large development organizations.

If you're in that situation - or if you generally have tools, tips and tricks for making it easier and more productive to use git in a large organization where some folks are not command line fans - I'd love to hear what you have to suggest.

BTW, I've asked a version of this question already on LinkedIn, and got no real answers but lots of "gosh, I'd love to know that too!"

UPDATE: Let me clarify...

Where I work, we can't use ANYTHING other than git. It's not an option. We're stuck with it. We can't use mercurial, svn, bitkeeper, Visual Source Safe, ClearCase, PVCS, SCCS, RCS, bazaar, Darcs, monotone, Perforce, Fossil, AccuRev, CVS, or even Apple's good ol' Projector that I used in 1987. So while you're welcome to discuss other options, you ain't gonna get the bounty if you don't discuss git.

Also, I'm looking for practical tips on how to use git in the enterprise. I put a whole laundry list of problems we're having at the top of this question. Again, people are welcome to discuss theory, but if you want to earn the bounty, give me solutions.

Git Solutions


Solution 1 - Git

Against the common opinion, I think that using a DVCS is an ideal choice in an enterprise setting because it enables very flexible workflows. I will talk about using a DVCS vs. CVCS first, best-practices and then about git in particular.

DVCS vs. CVCS in an enterprise context:

I wont talk about the general pros/cons here, but rather focus on your context. It is the common conception, that using a DVCS requires a more disciplined team than using a centralized system. This is because a centralized system provides you with an easy way to enforce your workflow, using a decentralized system requires more communication and discipline to stick to the established of conventions. While this may seem like it induces overhead, I see benefit in the increased communication necessary to make it a good process. Your team will need to communicate about code, about changes and about project status in general.

Another dimension in the context of discipline is encouraging branching and experiments. Here's a quote from Martin Fowler's recent bliki entry on Version Control Tools, he has found a very concise description for this phenomenon.

> DVCS encourages quick branching for > experimentation. You can do branches > in Subversion, but the fact that they > are visible to all discourages people > from opening up a branch for > experimental work. Similarly a DVCS > encourages check-pointing of work: > committing incomplete changes, that > may not even compile or pass tests, to > your local repository. Again you could > do this on a developer branch in > Subversion, but the fact that such > branches are in the shared space makes > people less likely to do so.

DVCS enables flexible workflows because they provide changeset tracking via globally unique identifiers in a directed acyclic graph (DAG) instead of simple textual diffs. This allows them to transparently track the origin and history of a changeset, which can be quite important.

Workflows:

Larry Osterman (a Microsoft dev working on the Windows team) has a great blog post about the workflow they employ at the Windows team. Most notably they have:

  • A clean, high quality code only trunk (master repo)
  • All development happens on feature branches
  • Feature teams have team repos
  • They do regularily merge the latest trunk changes into their feature branch (Forward Integrate)
  • Complete features must pass several quality gates e.g. review, test coverage, Q&A (repos on their own)
  • If a feature is completed and has acceptable quality it is merged into the trunk (Reverse Integrate)

As you can see, having each of these repositories live on their own you can decouple different teams advancing at different paces. Also the possibility to implement a flexible quality gate system distinguishes DVCS from a CVCS. You can solve your permission issues at this level too. Only a handful of people should be allowed access to the master repo. For each level of the hierachy, have a seperate repo with the corresponding access policies. Indeed, this approach can be very flexible on the team level. You should leave it up to each team to decide wether they want to share their team repo among themselves or if they want a more hierachical approach where only the team lead may commit to the team repo.

Hierachical Repositories

(The picture is stolen from Joel Spolsky's hginit.com.)

One thing remains to be said at this point though:- even though DVCS provides great merging capabilities, this is never a replacement for using Continuous Integration. Even at that point you have a great deal of flexibility: CI for the trunk repo, CI for team repos, Q&A repos etc.

Git in an enterprise context:

Git is maybe not the ideal solution for an enterprise context as you have already pointed out. Repeating some of your concerns, I think most notably they are:

  • Still somewhat immature support on Windows (please correct me if that changed recently) Now windows has github windows client , tortoisegit , SourceTree from atlassian
  • Lack of mature GUI tools, no first class citizen vdiff/merge tool integration
  • Inconsistent interface with a very low level of abstractions on top of its inner workings
  • A very steep learning curve for svn users
  • Git is very powerful and makes it easy to modify history, very dangerous if you don't know what you are doing (and you will sometimes even if you thought you knew)
  • No commercial support options available

I don't want to start a git vs. hg flamewar here, you have already done the right step by switching to a DVCS. Mercurial addresses some of the points above and I think it is therefore better suited in an enterprise context:

  • All plattforms that run python are supported
  • Great GUI tools on all major plattforms (win/linux/OS X), first class merge/vdiff tool integration
  • Very consistent interface, easy transition for svn users
  • Can do most of the things git can do too, but provides a cleaner abstraction. Dangerous operations are are always explicit. Advanced features are provided via extensions that must explicitly be enabled.
  • Commercial support is available from selenic.

In short, when using DVCS in an enterprise I think it's important to choose a tool that introduces the least friction. For the transition to be successful it's especially important to consider the varying skill between developers (in regards to VCS).


Reducing friction:

Ok, since you appear to be really stuck with the situation, there are two options left IMHO. There is no tool to make git less complicated; git is complicated. Either you confront this or work around git:-

  1. Get a git introductory course for the whole team. This should include the basics only and some exercises (important!).
  2. Convert the master repo to svn and let the "young-stars" git-svn. This gives most of the developers an easy to use interface and may compensate for the lacking discipline in your team, while the young-stars can continue to use git for their own repos.

To be honest, I think you really have a people problem rather than a tool problem. What can be done to improve upon this situation?

  • You should make it clear that you think your current process will end up with a maintainable codebase.
  • Invest some time into Continous Integration. As I outlined above, regardless which kind of VCS you use, there's never a replacement for CI. You stated that there are people who push crap into the master repo: Have them fix their crap while a red alert goes off and blames them for breaking the build (or not meeting a quality metric or whatever).

Solution 2 - Git

I'm the SCM engineer for a reasonably large development organization, and we converted to git from svn over the last year or so. We use it in a centralized fashion.

We use gitosis to host the repositories. We broke our monolithic svn repositories up into many smaller git repositories as git's branching unit is basically the repository. (There are ways around that, but they're awkward.) If you want per-branch kinds of access controls, gitolite might be a better approach. There's also an inside-the-firewall version of GitHub if you care to spend the money. For our purposes, gitosis is fine because we have pretty open permissions on our repositories. (We have groups of people who have write access to groups of repositories, and everyone has read access to all repositories.) We use gitweb for a web interface.

As for some of your specific concerns:

  • merges: You can use a visual merge tool of your choice; there are instructions in various places on how to set it up. The fact that you can do the merge and check its validity totally on your local repo is, in my opinion, a major plus for git; you can verify the merge before you push anything.
  • GUIs: We have a few people using TortoiseGit but I don't really recommend it; it seems to interact in odd ways with the command line. I have to agree that this is an area that needs improvement. (That said, I am not a fan of GUIs for version control in general.)
  • small-group tracking branches: If you use something that provides finer-grained ACLs like gitolite, it's easy enough to do this, but you can also create a shared branch by connecting various developers' local repositories — a git repo can have multiple remotes.

We switched to git because we have lots of remote developers, and because we had many issues with Subversion. We're still experimenting with workflows, but at the moment we basically use it the same way as we used to use Subversion. Another thing we liked about it was that it opened up other possible workflows, like the use of staging repositories for code review and sharing of code among small groups. It's also encouraged a lot of people to start tracking their personal scripts and so forth because it's so easy to create a repository.

Solution 3 - Git

> Yes, I know, Linus never intended it for that.

Actually, Linus argues that centralized systems just can't work.

And, what's wrong with the dictator-and-lieutenants workflow?

diagram

Remember, git is a distributed system; don't try to use it like a central one.

(updated)

Most of your problems will go away if you don't try to use git as if it was "svn on steroids" (because it's not).

Instead of using a bare repository as a central server where everyone can push to (and potentially screw up), setup a few integration managers that handle merges, so that only they can push to the bare repository.

Usually these people should be the team leads: each leader integrates his own team's work and pushes it to the blessed repository.

Even better, someone else (i.e. dictator) pulls from the team leaders and integrates their changes into the blessed repository.

> There's nothing wrong with that workflow, but we're an overworked startup and need our tools to substitute for human time and attention; nobody has bandwidth to even do code reviews, let alone be benevolent dictator.

If the integrators don't have time to review code, that's fine, but you still need to have people that integrate the merges from everybody.

Doing git pulls doesn't take all that much time.

git pull A
git pull B
git pull C

git does substitute for human time and attention; that's why it was written in the first place.

> * The GUI tools aren't mature

The gui tools can handle the basic stuff pretty well.

Advanced operations require a coder/nerdy mindset (e.g. I'm comfortable working from the command line). It takes a bit of time to grasp the concepts, but it's not that hard.

> * Using the command line tools, it's far to easy to screw up a merge and obliterate someone else's changes

This won't be a problem unless you have many incompetent developers with full write access to the "central repository".

But, if you set up your workflow so that only a few people (integrators) write to the "blessed" repository, that won't be a problem.

Git doesn't make it easy to screw up merges.

When there are merge conflicts, git will clearly mark the conflicting lines so you know which changes are yours and which are not.

It's also easy to obliterate other people's code with svn or any other (non-dsitributed) tool. In fact, it's way easier with these other tools because you tend to "sit on changes" for a long time and at some point the merges can get horribly difficult.

And because these tools don't know how to merge, you end up always having to merge things manually. For example, as soon as someone makes a commit to a file you're editing locally, it will be marked as a conflict that needs to be manually resolved; now that is a maintenance nightmare.

With git, most of the time there won't be any merge conflicts because git can actually merge. In the case where a conflict does occur, git will clearly mark the lines for you so you know exactly which changes are yours and which changes are from other people.

If someone obliterates other people's changes while resolving a merge conflict, it won't be by mistake: it will either be because it was necessary for the conflict resolution, or because they don't know what they're doing.

> * It doesn't offer per-user repository permissions beyond global read-only or read-write privileges > > * If you have a permission to ANY part of a repository, you can do that same thing to EVERY part of the repository, so you can't do something like make a small-group tracking branch on the central server that other people can't mess with. > * Workflows other than "anything goes" or "benevolent dictator" are hard to encourage, let alone enforce

These problems will go away when you stop trying to use git as if it was a centralized system.

> * It's not clear whether it's better to use a single big repository (which lets everybody mess with everything) or lots of per-component repositories (which make for headaches trying to synchronize versions).

Judgment call.

What kind of projects do you have?

For example: does version x.y of project A depend on exactly version w.z of project B such that every time you check x.y of project A you also have to checkout w.z of project B, otherwise it won't build? If so I'd put both project A and project B in the same repository, since they're obviously two parts of a single project.

The best practice here is to use your brain

> * With multiple repositories, it's also not clear how to replicate all the sources someone else has by pulling from the central repository, or to do something like get everything as of 4:30 yesterday afternoon.

I'm not sure what you mean.

Solution 4 - Git

I highly recommend http://code.google.com/p/gerrit/ for enterprise work. It gives you access control plus a built-in review based workflow. It authenticates against any LDAP system. You can hook it up to Hudson with http://wiki.hudson-ci.org/display/HUDSON/Gerrit+Plugin, letting you build and test changes while they're still under review; it's a really impressive setup.

If you decide to use gerrit, I recommend trying to keep a pretty linear history, not a branchy history like some of the open source guys like. Gerrit phrases this as "allow fast-forward changes only." Then you can use branching and merging in more the the way you're used to, for releases and whatnot.

Solution 5 - Git

I am answering this question based on my experience as developer manager in a large telco, where we adopted Git in 2010

You have quite different set of problems here:

  • workflows
  • client tools
  • server access control and integration

Workflows

We successfully adopted a central repository mode: what we have in our enterprise project (a large portal for a 5 million user base) is a de-facto central repository that produces the official builds then are taken trough the delivery process (which, in our case, is composed of three level of testing and two deployments). Every developer manages his own repo, and we work on a branch-per-feature basis.

Client tools

There are now several options available, this is now a very crowded area. Many developers are successfully using IntelliJ Idea and Eclipse with the Git plugin, without any other stuff. Also most of the Linux developers are using CLI git client, without any problem. Some Mac developers are successfully using Tower Git. Please note that none of these clients can prevent the user to "mess up" with the central repository: a server side control mechamism is needed

Server access control and integration

If you want to avoid developers "messing up" you Git repository, you reall need to choose a solution that:

  • exposes a decent web admin interface to do every operation
  • allows you to enforce user identities (using a "bare" Git repository is extremely easy to commit in behalf of someone else)
  • provides you fine grained security (so that for example you can prevent FORCE-PUSH and set some branches to read only for some developers / groups)
  • integrate with your corporate authentication system (i.e. LDAP, Windows ActiveDirectory)
  • provides you full audit (SOX compliance is sometimes very important for large corporates)

There are not so many ready-to-use server side solutions that can help this, I suggest you check out one of these:

  • Gitorious: it can provide basic access level security, but it lacks fine grained permissions control out of the box, so you will probably have to do some coding to handle scenarios such as branch level permissions. It also lacks integration with existing corporate authentication mechanisms
  • GitHub Enterprise: recently published by GitHub, it features GitHub in your corporate. It lacks SOX complianance and fine grained security
  • Gerrit: it can provide fine graind access level security and integration with corporate authentication systems but it lacks SOX compliance and SSO. Also some operations can only be done via SSH via CLI
  • GitEnterprise: it provides branch level permissions, SSO, SOX compliance, full web based administration. It was recently also integrated with Gerrit, so that it also provides you a full Gerrit instance

Hope this helps!

Solution 6 - Git

It sounds like your problem is that you haven't decided on or instituted a workflow. Git is flexible enough to use it like svn or any other VCS, but it's so powerful that if you don't establish rules that everybody must follow then you're just gonna end up with a mess. I would recommend the dictator-lieutenant workflow that somebody mentioned above, but combined with the branching model described by Vincent Driessen. For more info see these screencasts by David Bock, and this one by Mark Derricutt.

Solution 7 - Git

On the tools, MacOS-X users find GitX (http://gitx.frim.nl/) very simple and effective. Drawback is that doesn't support Git Client hooks (the ones under $GIT_ROOT/.git/hooks).

Overall I do strongly to chose a tool that supports fine-grained access control on:

  • branches (in order to segregate the stable release branches with strict security from the topic-branches that needs more agility and flexibility)
  • identity enforcement (author / committer). This is KEY for SOX
  • git commands restrictions
  • audit-trail. This is KEY for SOX

The ones I've successfully used with those features are:

  1. Gerrit Code Review (http://code.google.com/p/gerrit/)
  2. GitEnterprise (http://gitenterprise.com)
  3. CollabNet TeamForge (http://www.collab.net/gotgit), uses Gerrit 2.1.8 behind the scenes

P.S. Do not underestimate SOX and CMMI compliance: many times there is limited set of choice you have that is dictated by your Company Enterprise Policies on Security.

Hope this helps.

Luca.

Solution 8 - Git

We recently switched from svn to git. Because git-daemon doesn't work with msysgit we opted for a central repository approach on a Linux server with gitosis.

To eliminate the possibility to screw up master we simply delted it. Instead we prepare all releases by merging the branches that are selected for testing and tag the merge. If it passes tests the commit is tagged with a version and put in production.

To handle this we have a rotating role of release manager. The release manager is responsible for reviewing each branch before it is ready for test. Then when the product ownder decides it is time to bundle the approved branches together for a new test release the release manager perform the merge.

We also have a rotating role of 2'nd level help desk and at least for us the workload is such that it is possible to have both roles at the same time.

As a benefit of not having a master it is not possible to add any code to the project without going through the release manager so we discovered directly how much code that was silently added to the project before.

The review process starts with the branch owner submiting the diff to reviewboard and putting up a green post-it on the whiteboard with the branch name (we have a Kanban based workflow) under "for review", or if it's part of a completed user story, move the entire story card to "for review" and put the postit on that. The relase manager is the one who moves cards and post-its to "ready for test" and then the product owner can select which ones to incled in the next test release.

When doing the merge the release manager also makes sure that the merge commit has a sensible commit message which can be used in the changelog for the product owner.

When a release has been put in production the tag is used as the new base for branches and all existing branches are merged with it. This way all branches has a common parent which makes it easier to handle merges.

Solution 9 - Git

I'll add in a "have you considered" post too.

One of the great things about Bazaar is its flexibility. This is where it beats all the other distributed systems. You can operate Bazaar in centralized mode, distributed mode, or get this: both (meaning developers can choose which model they're comfortable with or which works best for their workgroup). You can also disconnect a centralized repository while you're on the road and reconnect it when you get back.

On top of that, excellent documentation and something which will make your enterprise happy: commercial support available.

Solution 10 - Git

  • Install a decent web interface, like Github FI
  • Stick to a relatively centralized model (initially) to keep people comfortable.
  • Run a Continuous Integration build for every shared branch.
  • Share a good set of global git config options.
  • Integrate git into your shell, with bash completion, and a prompt with the current branch.
  • Try IntelliJ's Git Integration as a merge tool.
  • Make sure you .gitignore as appropriate.

Solution 11 - Git

Regarding points 3 & 4 (per-user, per-section, per-branch permissions), have a look at gitolite (covered in the Pro Git book: http://progit.org/book/ch4-8.html).

Politics or not, Git is as good a choice of a DCVS as any. Like any powerful tool, it is worth spending a little bit of time up front in understanding how the tool is designed to work, and, to this end, I highly recommend the Pro Git book. A couple of hours spent with it will save lots of frustration in the long run.

Solution 12 - Git

GUI: At the moment, TortoiseGit v1.7.6 should be fine for most daily operations. Log, Commit, Push, Pull, Fetch, Diff, Merge, Branch, Cherry-pick, Rebase, Tag, Export, Stash, Add submodule, etc... Supports x64 natively too

Solution 13 - Git

In order to use git efficiently in a development team with lots of developers, a CI system that builds and tests continuously is required. Jenkins provides such a vehicle and I highly recommend that. The integration piece has to be done no matter what and it's a lot cheaper doing that earlier and more often.

Solution 14 - Git

More suited for collabrative development than gitosis or gitolite but open-source is Gitorious. It's a Ruby on Rails application which handles management of repositories and merging. It should solve many of your problems.

Solution 15 - Git

Git allows to create private branches. This encourage developers to commit often so as to break down modifications into small commits. When the developer is ready to publish his changes, he pushes to central server. He can make use of pre-commit scripts to verify his code if needed.

Solution 16 - Git

NXP is managing Git and Subversion with one common platform (at enterprise-scale), integrating Android mobile development with traditional software projects: http://www.youtube.com/watch?v=QX5wn0igv7Q

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBob MurphyView Question on Stackoverflow
Solution 1 - GitJohannes RudolphView Answer on Stackoverflow
Solution 2 - GitebneterView Answer on Stackoverflow
Solution 3 - GithasenView Answer on Stackoverflow
Solution 4 - GitRussell MullView Answer on Stackoverflow
Solution 5 - GitBruno BossolaView Answer on Stackoverflow
Solution 6 - GitGuillermo GarzaView Answer on Stackoverflow
Solution 7 - GitlucamilanesioView Answer on Stackoverflow
Solution 8 - GitJohn NilssonView Answer on Stackoverflow
Solution 9 - GitwadesworldView Answer on Stackoverflow
Solution 10 - GitretronymView Answer on Stackoverflow
Solution 11 - GitJeetView Answer on Stackoverflow
Solution 12 - GitlinquizeView Answer on Stackoverflow
Solution 13 - GitWilliam CheungView Answer on Stackoverflow
Solution 14 - GitMilliamsView Answer on Stackoverflow
Solution 15 - GitlinquizeView Answer on Stackoverflow
Solution 16 - GitLothar SchubertView Answer on Stackoverflow