git find fat commit

GitStatisticsFindCommit

Git Problem Overview


Is it possible to get info about how much space is wasted by changes in every commit — so I can find commits which added big files or a lot of files. This is all to try to reduce git repo size (rebasing and maybe filtering commits)

Git Solutions


Solution 1 - Git

You could do this:

git ls-tree -r -t -l --full-name HEAD | sort -n -k 4

This will show the largest files at the bottom (fourth column is the file (blob) size.

If you need to look at different branches you'll want to change HEAD to those branch names. Or, put this in a loop over the branches, tags, or revs you are interested in.

Solution 2 - Git

Forgot to reply, my answer is:

git rev-list --all --pretty=format:'%H%n%an%n%s'    # get all commits
git diff-tree -r -c -M -C --no-commit-id #{sha}     # get new blobs for each commit
git cat-file --batch-check << blob ids              # get size of each blob

Solution 3 - Git

All of the solutions provided here focus on file sizes but the original question asked was about commit sizes, which in my opinion, and in my case in point, was more important to find (because what I wanted is to get rid of many small binaries introduced in a single commit, which summed up accounted for a lot of size, but small size if measured individually by file).

A solution that focuses on commit sizes is the provided here, which is this perl script:

#!/usr/bin/perl
foreach my $rev (`git rev-list --all --pretty=oneline`) {
  my $tot = 0;
  ($sha = $rev) =~ s/\s.*$//;
  foreach my $blob (`git diff-tree -r -c -M -C --no-commit-id $sha`) {
    $blob = (split /\s/, $blob)[3];
    next if $blob == "0000000000000000000000000000000000000000"; # Deleted
    my $size = `echo $blob | git cat-file --batch-check`;
    $size = (split /\s/, $size)[2];
    $tot += int($size);
  }
  my $revn = substr($rev, 0, 40);
#  if ($tot > 1000000) {
    print "$tot $revn " . `git show --pretty="format:" --name-only $revn | wc -l`  ;
#  }
}

And which I call like this:

./git-commit-sizes.pl | sort -n -k 1

Solution 4 - Git

Personally, I found this answer to be most helpful when trying to find large files in the history of a git repo: https://stackoverflow.com/questions/298314/find-files-in-git-repo-over-x-megabytes-that-dont-exist-in-head/7945209#7945209

Solution 5 - Git

#!/bin/bash
COMMITSHA=$1

CURRENTSIZE=$(git ls-tree -lrt $COMMITSHA | grep blob | sed -E "s/.{53} *([0-9]*).*/\1/g" | paste -sd+ - | bc)
PREVSIZE=$(git ls-tree -lrt $COMMITSHA^ | grep blob | sed -E "s/.{53} *([0-9]*).*/\1/g" | paste -sd+ - | bc)
echo "$CURRENTSIZE - $PREVSIZE" | bc

Solution 6 - Git

git fat find N where N is in bytes will return all the files in the whole history which are larger than N bytes.

You can find out more about git-fat here: https://github.com/cyaninc/git-fat

Solution 7 - Git

git cat-file -s <object> where <object> can refer to a commit, blob, tree, or tag.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiontigView Question on Stackoverflow
Solution 1 - GitPat NotzView Answer on Stackoverflow
Solution 2 - GittigView Answer on Stackoverflow
Solution 3 - GitknocteView Answer on Stackoverflow
Solution 4 - GitMichael BaltaksView Answer on Stackoverflow
Solution 5 - GitStas DashkovskyView Answer on Stackoverflow
Solution 6 - GitCausticView Answer on Stackoverflow
Solution 7 - GitartagnonView Answer on Stackoverflow