How to shrink the .git folder

Git

Git Problem Overview


My current base has a total size of approx. 200MB.

But my .git folder has an amazing size of 5GB (!). Since I push my work to an external server, i don't need any big local history...

How can I shrink the .git folder to free up some space on my notebook? Can I delete all changes that are older, than 30 days?

Git Solutions


Solution 1 - Git

Dou should not delete all changes older than 30 days (I think it's somehow possible exploiting Git, but really not recommended).

You can call git gc --aggressive --prune, which will perform garbage collection in your repository and prune old objects. Do you have a lot of binary files (archives, images, executables) which change often? Those usually lead to huge .git folders (remember, Git stores snapshots for each revision and binary files compress badly)

Solution 2 - Git

Here is what the creator of git Linus has to say about how to shrink your git repo: >The equivalent of "git gc --aggressive" - but done *properly* - is to do (overnight) something like

> git repack -a -d --depth=250 --window=250

>where that depth thing is just about how deep the delta chains can be (make them longer for old history - it's worth the space overhead), and the window thing is about how big an object window we want each delta candidate to scan.

> And here, you might well want to add the "-f" flag (which is the "drop all old deltas", since you now are actually trying to make sure that this one actually finds good candidates.

source: http://gcc.gnu.org/ml/gcc/2007-12/msg00165.html

Will this get rid of binary data that is orphaned in my repo? "git repack" will not git rid of images or binary data that you have checked into your repo and then deleted it. To delete those kind of data permanently from your repo you have to re-write your history. A common example of that is when you accidentally check in your passwords in git. You can go back and delete some files but then you have to re-write your history from then to now and then force push then new repo to your origin.

Solution 3 - Git

I tried these but my repository was still very large. The problem was I had accidentally checked in some generated large files. After some searching I found a great tutorial which makes it easy to delete the large generated files. This tutorial allowed me to shrink my repository from 60 MB to < 1 MB.

> Steve Lorek, How to Shrink a Git Repository

Updated: Here's a copy-paste version of the blog post.

How to Shrink a Git Repository

Our main Git repository had suddenly ballooned in size. It had grown overnight to 180MB (compressed) and was taking forever to clone.

The reason was obvious; somebody, somewhere, somewhen, somehow, had committed some massive files. But we had no idea what those files where.

After a few hours of trial, error and research, I was able to nail down a process to:

  • Discover the large files
  • Clean them from the repository
  • Modify the remote (GitHub) repository so that the files are never downloaded again

This process should never be attempted unless you can guarantee that all team members can produce a fresh clone. It involves altering the history and requires anyone who is contributing to the repository to pull down the newly cleaned repository before they push anything to it.

Deep Clone the Repository

If you don't already have a local clone of the repository in question, create one now:

git clone remote-url

Now—you may have cloned the repository, but you don't have all of the remote branches. This is imperative to ensure a proper 'deep clean'. To do this, we'll need a little Bash script:

#!/bin/bash
for branch in `git branch -a | grep remotes | grep -v HEAD | grep -v master`; do
    git branch --track ${branch##*/} $branch
done

Thanks to bigfish on StackOverflow for this script, which is copied verbatim.

Copy this code into a file, chmod +x filename.sh, and then execute it with ./filename.sh. You will now have all of the remote branches as well (it's a shame Git doesn't provide this functionality).

Discovering the large files

Credit is due to Antony Stubbs here - his Bash script identifies the largest files in a local Git repository, and is reproduced verbatim below:

#!/bin/bash
#set -x 

# Shows you the largest objects in your repo's pack file.
# Written for osx.
#
# @see http://stubbisms.wordpress.com/2009/07/10/git-script-to-show-largest-pack-objects-and-trim-your-waist-line/
# @author Antony Stubbs

# set the internal field spereator to line break, so that we can iterate easily over the verify-pack output
IFS=$'\n';

# list all objects including their size, sort by size, take top 10
objects=`git verify-pack -v .git/objects/pack/pack-*.idx | grep -v chain | sort -k3nr | head`

echo "All sizes are in kB. The pack column is the size of the object, compressed, inside the pack file."

output="size,pack,SHA,location"
for y in $objects
do
	# extract the size in bytes
	size=$((`echo $y | cut -f 5 -d ' '`/1024))
	# extract the compressed size in bytes
	compressedSize=$((`echo $y | cut -f 6 -d ' '`/1024))
	# extract the SHA
	sha=`echo $y | cut -f 1 -d ' '`
	# find the objects location in the repository tree
	other=`git rev-list --all --objects | grep $sha`
	#lineBreak=`echo -e "\n"`
	output="${output}\n${size},${compressedSize},${other}"
done

echo -e $output | column -t -s ', '

Execute this script as before, and you'll see some output similar to the below:

All sizes are in kB. The pack column is the size of the object, compressed, inside the pack file.

size     pack    SHA                                       location
1111686  132987  a561d25105c79aa4921fb742745de0e791483afa  08-05-2012.sql
5002     392     e501b79448b9e970ab89b048b3218c2853fdfc88  foo.sql
266      249     73fa731bb90b04dcf79eeea8fdd637ba7df4c089  app/assets/images/fw/iphone.fw.png
265      43      939b31c563bd40b1ca70e4f4a9f7d67c27c936c0  doc/models_complete.svg
247      39      03514d9e84418573f26b205bae7e4e57057c036f  unprocessed_email_replies.sql
193      49      6e601c4067aaddb26991c4bd5fbddef003800e70  public/assets/jquery-ui.min-0424e108178defa1cc794ee24fc92d24.js
178      30      c014b20b6fed9f17a0b2809ac410d74f291da26e  foo.sql
158      158     15f9e56bc0865f4f303deff053e21909661a716b  app/assets/images/iphone.png
103      36      3135e15c5cec75a4c85a0636b154b83221020c97  public/assets/application-c65733a4a64a1a885b1c32694574b12a.js
99       85      c1c80bc4c09e692d5e2127e39c87ecacdb1e816f  app/assets/images/fw/lovethis_logo_sprint.fw.png

Yep - looks like someone has been pushing some rather unnecessary files somewhere! Including a lovely 1.1GB present in the form of a SQL dump file.

Cleaning the files

Cleaning the file will take a while, depending on how busy your repository has been. You just need one command to begin the process:

git filter-branch --tag-name-filter cat --index-filter 'git rm -r --cached --ignore-unmatch filename' --prune-empty -f -- --all

This command is adapted from other sources—the principal addition is --tag-name-filter cat which ensures tags are rewritten as well.

After this command has finished executing, your repository should now be cleaned, with all branches and tags in tact. Reclaim space

While we may have rewritten the history of the repository, those files still exist in there, stealing disk space and generally making a nuisance of themselves. Let's nuke the bastards:

rm -rf .git/refs/original/

git reflog expire --expire=now --all

git gc --prune=now

git gc --aggressive --prune=now

Now we have a fresh, clean repository. In my case, it went from 180MB to 7MB.

Push the cleaned repository

Now we need to push the changes back to the remote repository, so that nobody else will suffer the pain of a 180MB download.

git push origin --force --all

The --all argument pushes all your branches as well. That's why we needed to clone them at the start of the process.

Then push the newly-rewritten tags:

git push origin --force --tags

Tell your teammates

Anyone else with a local clone of the repository will need to either use git rebase, or create a fresh clone, otherwise when they push again, those files are going to get pushed along with it and the repository will be reset to the state it was in before.

Solution 4 - Git

5GB vs 200MB is kind of weird. Try to run git gc.

But no, unless you split your repository into modules, you can't decrease the size of the .git directory.

Each clone of a git repo is a full fledged repository that can act as a server. That's the base principle of distributed version control.

Solution 5 - Git

How to shrink your .git folder in your git repo

Summary

Do, in this order, from least-dangerous and/or most-effective and/or fastest to more-dangerous and/or less-effective and/or slowest:

These test results are for a repo where du -hs --exclude=.git . shows that the total repo size, NOT including the .git dir, is about 80 GB, and du -hs .git showed that the .git folder alone started out at about 162 GB:

#                                                                   Memory Saved
#                                               Time it took        in .git dir
#                                               ------------        ------------
time git lfs prune                              #  1~60 min          62 GB
time git gc                                     #  3 min            < 1 GB
time git prune                                  #  1 min            < 1 GB
time git repack -a -d --depth=250 --window=250  #  2 min            < 1 GB
# (Note: `--prune` does nothing extra here; `man git gc` says 
# `--prune is on by default`)
time git gc --aggressive --prune                #  1.25 hrs         < 1 GB

As you can see, the last command takes a very long time for very little benefit, so don't even run it!

Also, an alternative to running git lfs prune is to just delete the whole .git/lfs directory manually instead, then re-fetch the lfs (git Large File System) contents from scratch after.
CAUTION: do NOT accidentally delete the whole .git directory instead! YOU'LL LOSE ALL GIT HISTORY, BRANCHES, AND COMMITS FOR THIS REPO! Delete only the .git/lfs directory. Something like this might work:

# Delete the whole git lfs directory
rm -rf .git/lfs

# Re-fetch all git lfs contents again from scratch.
# See: https://stackoverflow.com/a/54356137/4561887
git lfs fetch --all

Details

First off, you need to know what in the .git folder is taking up so much space. One technique is to run the ncurses-based (GUI-like) ncdu (NCurses Disk Usage) command inside your repo. Another way is to run this:

du -h --max-depth=1 .git

Side note: To see how big your repo is, NOT including your .git folder, run this instead:

du -h --max-depth=1 --exclude=.git .

Sample output of the 1st command above:

$ du -h --max-depth=1 .git
158G    .git/lfs
6.2M    .git/refs
4.0K    .git/branches
2.5M    .git/info
3.7G    .git/objects
6.2M    .git/logs
68K .git/hooks
162G    .git

As you can see, my total .git folder size is 162 GB, but 158 GB of that is my .git/lfs folder since I am using the 3rd-party "Git Large File Storage" (git lfs) tool to store large binary files. So, run this to reduce that significantly. Note: the time part of all commands below is optional:

time git lfs prune

(If git lfs prune fails with "panic: runtime error: invalid memory address or nil pointer dereference", see my notes below.)

Source: https://stackoverflow.com/questions/59680238/how-to-shrink-a-git-lfs-repo/59680272#59680272
Official documentation: git-lfs-prune(1) -- Delete old LFS files from local storage

That took 60 seconds to run!

Now I've just freed up 62 GB! My .git/lfs folder is now only 96 GB, as shown here:

$ du -h --max-depth=1 .git
96G .git/lfs
6.2M    .git/refs
4.0K    .git/branches
2.5M    .git/info
3.0G    .git/objects
6.2M    .git/logs
68K .git/hooks
99G .git

Next, run this to shrink the .git/objects folder by a few hundred MB to ~1 GB or so:

time git gc
time git prune

git gc takes about 3 minutes to run, and git prune takes about 1 minute.

Check your disk usage again with du -h --max-depth=1 .git. If you'd like to save even more space, run this:

time git repack -a -d --depth=250 --window=250

That takes about 2 minutes and saves a few hundred more MB.

Now, you can stop here, OR you can run this final command:

time git gc --aggressive --prune

That final command will save a few hundred more MB but will take about 1.25 hours.

If git lfs prune fails with "panic: runtime error: invalid memory address or nil pointer dereference"

If git lfs prune fails with:

> panic: runtime error: invalid memory address or nil pointer dereference

then you may have an old version of git-lfs installed and need to update it. Here is how:

First, check to see what version you have installed. Run man git-lfs and scroll to the bottom to see the date. Maybe it says it is from 2017, for instance. Now, update your version with these commands. The first command comes from here: https://packagecloud.io/github/git-lfs/install.

curl -s https://packagecloud.io/install/repositories/github/git-lfs/script.deb.sh | sudo bash
sudo apt update
sudo apt install git-lfs

Run man git-lfs again and scroll to the bottom. I now see my date as "March 2021", when previously it was some date in 2017.

Also, if I run sudo apt install git-lfs again, it tells me:

> git-lfs is already the newest version (2.13.3).

So, the update for git-lfs worked, and now the error is gone and git lfs prune works again!

I first documented this in a comment on GitHub here: https://github.com/git-lfs/git-lfs/issues/3395#issuecomment-889393444.

References:

  1. @knittl: https://stackoverflow.com/questions/5613345/how-to-shrink-the-git-folder/5613380#5613380
  2. @David Dehghan: https://stackoverflow.com/questions/5613345/how-to-shrink-the-git-folder/8483112#8483112
  3. git lfs prune: https://stackoverflow.com/questions/59680238/how-to-shrink-a-git-lfs-repo/59680272#59680272
  4. Linus Torvalds on git repack -a -d --depth=250 --window=250: https://gcc.gnu.org/legacy-ml/gcc/2007-12/msg00165.html
  5. https://github.com/git-lfs/git-lfs/blob/main/docs/man/git-lfs-prune.1.ronn

See also:

  1. [my Q&A] https://stackoverflow.com/questions/68552775/how-to-resume-git-lfs-post-checkout-hook-after-failed-git-checkout/68555047#68555047
  2. Note: for pure synchronization, try FreeFileSync or rsync, as I explain in my answer here. That being said, occasionally I use git for synchronization too, as I explain for my sync_git_repo_from_pc1_to_pc2.sh tool here, and in my other answer here: Work on a remote project with Eclipse via SSH.

Solution 6 - Git

> Shrink a Git Repository by removing some files log history from the .git Folder based on their last updated time.

I had faced the same issue on my Local Machine. The reason was I have deleted some massive files from my local and committed to Central Repository. But event after git status, git fetch and git pull. My .git folder size is about 3GB. later I ran the following command to reduce the size of the .git folder by considering the files which have changed/expired a month ago.

Command

$ git remote prune origin && git repack && git prune-packed && git reflog expire --expire=1.month.ago && git gc --aggressive

Git Commands and their short description:

  • git-prune - Prune all unreachable objects from the object database
  • git-repack - Pack unpacked objects in a repository
  • git-prune-packed - Remove extra objects that are already in pack files.
  • git reflog: Git keeps track of updates to the tip of branches using a mechanism called reference logs, or "reflogs." Reflogs track when Git refs were updated in the local repository. In addition to branch tip reflogs, a special reflog is maintained for the Git stash. Reflogs are stored in directories under the local repository's .git directory. git reflog directories can be found at .git/logs/refs/heads/., .git/logs/HEAD, and also .git/logs/refs/stash if the git stash has been used on the repo. git reflog at a high level on the Rewriting History Page.
    git reflog expire --expire=now --expire-unreachable=now --all
    In addition to preserving history in the reflog, Git has internal expiration dates on when it will prune detached commits. Again, these are all implementation details that git gc handles and git prune should not be used standalone.
  • git gc --aggressive: git-gc - Cleanup unnecessary files and optimize the local repository.
    Behind the scenes git gc actually executes a bundle of other internal subcommands like git prune, git repack, git pack and git rerere. The high-level responsibility of these commands is to identify any Git objects that are outside the threshold levels set from the git gc configuration. Once identified, these objects are then compressed, or pruned accordingly.

Commonad with Outcome:

$ git remote prune origin && git repack && git prune-packed && git reflog expire --expire=1.month.ago && git gc --aggressive
Enumerating objects: 535, done.
Counting objects: 100% (340/340), done.
Delta compression using up to 2 threads
Compressing objects: 100% (263/263), done.
Writing objects: 100% (340/340), done.
Total 340 (delta 104), reused 0 (delta 0)
Enumerating objects: 904, done.
Counting objects: 100% (904/904), done.
Delta compression using up to 2 threads
Compressing objects: 100% (771/771), done.
Writing objects: 100% (904/904), done.
Total 904 (delta 343), reused 561 (delta 0)

Solution 7 - Git

I'm using git more as synchronization mechanism than for version history. So my solution to this problem has been to make sure I have all my current sources in a satisfactory state, and then just delete .git and re-initialize the repos. Disk space problem solved. :-) History gone :-( I do this because my repo is on a small USB key. I don't want or need my entire history. If I had a method for just truncating the history, I would use that.

If I were interested in keeping my history I would archive the current repository. At some point later I could clone the original repository, copy over all the changes from the new repo (let's assume I haven't done much (any) renaming or deleteing). And then make one big commit that would represent all the changes made in the new repo as a single commit in the old repo. Is it possible to merge the histories? Maybe if I used a branch and then deleted the objects I didn't need. (I dont' know enough about git internals to start fooling around like that).

Solution 8 - Git

Tried above methods, nothing worked in my case (where I accidently killed the git process during git push) so I finally had to delete the repo and clone it again and now the .git folder is of normal size.

Solution 9 - Git

The best option is to use BFG Repo Cleaner (it is recommended by BitBucket and much-much faster any other option): https://rtyley.github.io/bfg-repo-cleaner/

Also I have tried to use Steve Lorek's Solution and it also works : https://web.archive.org/web/20190207210108/http://stevelorek.com/how-to-shrink-a-git-repository.html

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJMWView Question on Stackoverflow
Solution 1 - GitknittlView Answer on Stackoverflow
Solution 2 - GitDavid DehghanView Answer on Stackoverflow
Solution 3 - GitChris HinshawView Answer on Stackoverflow
Solution 4 - GitŠimon TóthView Answer on Stackoverflow
Solution 5 - GitGabriel StaplesView Answer on Stackoverflow
Solution 6 - GitYashView Answer on Stackoverflow
Solution 7 - GitDarrel LeeView Answer on Stackoverflow
Solution 8 - GitGorvGoylView Answer on Stackoverflow
Solution 9 - GitendoView Answer on Stackoverflow