How often should you use git-gc?

GitGit Gc

Git Problem Overview


How often should you use git-gc?

The manual page simply says:

>Users are encouraged to run this task on a regular basis within each repository to maintain good disk space utilization and good operating performance.

Are there some commands to get some object counts to find out whether it's time to gc?

Git Solutions


Solution 1 - Git

It depends mostly on how much the repository is used. With one user checking in once a day and a branch/merge/etc operation once a week you probably don't need to run it more than once a year.

With several dozen developers working on several dozen projects each checking in 2-3 times a day, you might want to run it nightly.

It won't hurt to run it more frequently than needed, though.

What I'd do is run it now, then a week from now take a measurement of disk utilization, run it again, and measure disk utilization again. If it drops 5% in size, then run it once a week. If it drops more, then run it more frequently. If it drops less, then run it less frequently.

Solution 2 - Git

Note that the downside of garbage-collecting your repository is that, well, the garbage gets collected. As we all know as computer users, files we consider garbage right now might turn out to be very valuable three days in the future. The fact that git keeps most of its debris around has saved my bacon several times – by browsing all the dangling commits, I have recovered much work that I had accidentally canned.

So don’t be too much of a neat freak in your private clones. There’s little need for it.

OTOH, the value of data recoverability is questionable for repos used mainly as remotes, eg. the place all the devs push to and/or pulled from. There, it might be sensible to kick off a GC run and a repacking frequently.

Solution 3 - Git

Recent versions of git run gc automatically when required, so you shouldn't have to do anything. See the Options section of http://www.kernel.org/pub/software/scm/git/docs/git-gc.html#_options">man git-gc(1): "Some git commands run git gc --auto after performing operations that could create many loose objects."

Solution 4 - Git

If you're using Git-Gui, it tells you when you should worry:

> This repository currently has approximately 1500 loose objects.

The following command will bring a similar number:

$ git count-objects

Except, from its source, git-gui will do the math by itself, actually counting something at .git/objects folder and probably brings an approximation (I don't know tcl to properly read that!).

In any case, it seems to give the warning based on an arbitrary number around 300 loose objects.

Solution 5 - Git

Drop it in a cron job that runs every night (afternoon?) when you're sleeping.

Solution 6 - Git

You can do it without any interruption, with the new (Git 2.0 Q2 2014) setting gc.autodetach.

See commit 4c4ac4d and commit 9f673f9 (Nguyễn Thái Ngọc Duy, aka pclouds):

> gc --auto takes time and can block the user temporarily (but not any less annoyingly).
Make it run in background on systems that support it.
The only thing lost with running in background is printouts. But gc output is not really interesting.
You can keep it in foreground by changing gc.autodetach.


Since that 2.0 release, there was a bug though: git 2.7 (Q4 2015) will make sure to not lose the error message.
See commit 329e6e8 (19 Sep 2015) by Nguyễn Thái Ngọc Duy (pclouds).
(Merged by Junio C Hamano -- gitster -- in commit 076c827, 15 Oct 2015)

> ## gc: save log from daemonized gc --auto and print it next time

> While commit 9f673f9 (gc: config option for running --auto in background - 2014-02-08) helps reduce some complaints about 'gc --auto' hogging the terminal, it creates another set of problems.

> The latest in this set is, as the result of daemonizing, stderr is closed and all warnings are lost. This warning at the end of cmd_gc() is particularly important because it tells the user how to avoid "gc --auto" running repeatedly.
Because stderr is closed, the user does not know, naturally they complain about 'gc --auto' wasting CPU.

> Daemonized gc now saves stderr to $GIT_DIR/gc.log.
Following gc --auto will not run and gc.log printed out until the user removes gc.log
.

Solution 7 - Git

I use git gc after I do a big checkout, and have a lot of new object. it can save space. E.g. if you checkout a big SVN project using git-svn, and do a git gc, you typically save a lot of space

Solution 8 - Git

This quote is taken from; Version Control with Git

> Git runs garbage collection automatically: > > • If there are too many loose objects in the repository
> > • When a push to a remote repository happens > > • After some commands that might introduce many loose objects > > • When some commands such as git reflog expire explicitly request it > > And finally, garbage collection occurs when you explicitly request it > using the git gc command. But when should that be? There’s no solid > answer to this question, but there is some good advice and best > practice. > > You should consider running git gc manually in a few > situations: > > • If you have just completed a git filter-branch . Recall that > filter-branch rewrites many commits, introduces new ones, and leaves > the old ones on a ref that should be removed when you are satisfied > with the results. All those dead objects (that are no longer > referenced since you just removed the one ref pointing to them) > should be removed via garbage collection. > > • After some commands that might introduce many loose objects. This > might be a large rebase effort, for example. > > And on the flip side, > when should you be wary of garbage collection? > > • If there are orphaned refs that you might want to recover > > • In the context of git rerere and you do not need to save the > resolutions forever > > • In the context of only tags and branches being sufficient to cause > Git to retain a commit permanently > > • In the context of FETCH_HEAD retrievals (URL-direct retrievals via > git fetch ) because they are immediately subject to garbage collection

Solution 9 - Git

I use when I do a big commit, above all when I remove more files from the repository.. after, the commits are faster

Solution 10 - Git

You don't have to use git gc very often, because git gc (Garbage collection) is run automatically on several frequently used commands:

git pull
git merge
git rebase
git commit

Source: git gc best practices and FAQS

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionreadonlyView Question on Stackoverflow
Solution 1 - GitAdam DavisView Answer on Stackoverflow
Solution 2 - GitAristotle PagaltzisView Answer on Stackoverflow
Solution 3 - GitmroweView Answer on Stackoverflow
Solution 4 - GitcregoxView Answer on Stackoverflow
Solution 5 - GitPat NotzView Answer on Stackoverflow
Solution 6 - GitVonCView Answer on Stackoverflow
Solution 7 - GitAmandasaurusView Answer on Stackoverflow
Solution 8 - GitTeoman shipahiView Answer on Stackoverflow
Solution 9 - GitghibozView Answer on Stackoverflow
Solution 10 - GitImmiView Answer on Stackoverflow