Git Blame Commit Statistics

Git

Git Problem Overview


How can I "abuse" blame (or some better suited function, and/or in conjunction with shell commands) to give me a statistic of how much lines (of code) are currently in the repository originating from each committer?

Example Output:

Committer 1: 8046 Lines
Committer 2: 4378 Lines

Git Solutions


Solution 1 - Git

Update
git ls-tree -r -z --name-only HEAD -- */*.c  | sed 's/^/.\//' | xargs -0 -n1 git blame \
--line-porcelain HEAD |grep -ae "^author "|sort|uniq -c|sort -nr

I updated some things on the way.

For convenience, you can also put this into its own command:

#!/bin/bash

# save as i.e.: git-authors and set the executable flag
git ls-tree -r -z --name-only HEAD -- $1 | sed 's/^/.\//' | xargs -0 -n1 git blame \
 --line-porcelain HEAD |grep -ae "^author "|sort|uniq -c|sort -nr

store this somewhere in your path or modify your path and use it like

  • git authors '*/*.c' # look for all files recursively ending in .c
  • git authors '*/*.[ch]' # look for all files recursively ending in .c or .h
  • git authors 'Makefile' # just count lines of authors in the Makefile

Original Answer

While the accepted answer does the job it's very slow.

$ git ls-tree --name-only -z -r HEAD|egrep -z -Z -E '\.(cc|h|cpp|hpp|c|txt)$' \
  |xargs -0 -n1 git blame --line-porcelain|grep "^author "|sort|uniq -c|sort -nr

is almost instantaneous.

To get a list of files currently tracked you can use

git ls-tree --name-only -r HEAD

This solution avoids calling file to determine the filetype and uses grep to match the wanted extension for performance reasons. If all files should be included, just remove this from the line.

grep -E '\.(cc|h|cpp|hpp|c)$' # for C/C++ files
grep -E '\.py$'               # for Python files

if the files can contain spaces, which are bad for shells you can use:

git ls-tree -z --name-only -r HEAD | egrep -Z -z '\.py'|xargs -0 ... # passes newlines as '\0'

Give a list of files (through a pipe) one can use xargs to call a command and distribute the arguments. Commands that allow multiple files to be processed obmit the -n1. In this case we call git blame --line-porcelain and for every call we use exactly 1 argument.

xargs -n1 git blame --line-porcelain

We then filter the output for occurences of "author " sort the list and count duplicate lines by:

grep "^author "|sort|uniq -c|sort -nr
Note

Other answers actually filter out lines that contain only whitespaces.

grep -Pzo "author [^\n]*\n([^\n]*\n){10}[\w]*[^\w]"|grep "author "

The command above will print authors of lines containing at least one non-whitespace character. You can also use match \w*[^\w#] which will also exclude lines where the first non-whitespace character isn't a # (comment in many scripting languages).

Solution 2 - Git

I wrote a gem called git-fame that might be useful.

Installation and usage:

  1. $ gem install git_fame
  2. $ cd /path/to/gitdir
  3. $ git fame

Output:

Statistics based on master
Active files: 21
Active lines: 967
Total commits: 109

Note: Files matching MIME type image, binary has been ignored

+----------------+-----+---------+-------+---------------------+
| name           | loc | commits | files | distribution (%)    |
+----------------+-----+---------+-------+---------------------+
| Linus Oleander | 914 | 106     | 21    | 94.5 / 97.2 / 100.0 |
| f1yegor        | 47  | 2       | 7     |  4.9 /  1.8 / 33.3  |
| David Selassie | 6   | 1       | 2     |  0.6 /  0.9 /  9.5  |
+----------------+-----+---------+-------+---------------------+

Solution 3 - Git

git ls-tree -r HEAD|sed -re 's/^.{53}//'|while read filename; do file "$filename"; done|grep -E ': .*text'|sed -r -e 's/: .*//'|while read filename; do git blame -w "$filename"; done|sed -r -e 's/.*\((.*)[0-9]{4}-[0-9]{2}-[0-9]{2} .*/\1/' -e 's/ +$//'|sort|uniq -c

Step by step explanation:

List all the files under version control

git ls-tree -r HEAD|sed -re 's/^.{53}//'

Prune the list down to only text files

|while read filename; do file "$filename"; done|grep -E ': .*text'|sed -r -e 's/: .*//'

Git blame all the text files, ignoring whitespace changes

|while read filename; do git blame -w "$filename"; done

Pull out the author names

|sed -r -e 's/.*\((.*)[0-9]{4}-[0-9]{2}-[0-9]{2} .*/\1/' -e 's/ +$//'

Sort the list of authors, and have uniq count the number of consecutively repeating lines

|sort|uniq -c

Example output:

   1334 Maneater
   1924 Another guy
  37195 Brian Ruby
   1482 Anna Lambda

Solution 4 - Git

git summary provided by the git-extras package is exactly what you need. Checkout the documentation at git-extras - git-summary:

git summary --line

Gives output that looks like this:

project  : TestProject
lines    : 13397
authors  :
8927 John Doe            66.6%
4447 Jane Smith          33.2%
  23 Not Committed Yet   0.2%

Solution 5 - Git

Erik's solution was awesome, but I had some problems with diacritics (despite my LC_* environment variables being set ostensibly correctly) and noise leaking through on lines of code that actually had dates in them. My sed-fu is poor, so I ended up with this frankenstein snippet with ruby in it, but it works for me flawlessly on 200,000+ LOC, and it sorts the results:

git ls-tree -r HEAD | gsed -re 's/^.{53}//' | \
while read filename; do file "$filename"; done | \
grep -E ': .*text' | gsed -r -e 's/: .*//' | \
while read filename; do git blame "$filename"; done | \
ruby -ne 'puts $1.strip if $_ =~ /^\w{8} \((.*?)\s*\d{4}-\d{2}-\d{2}/' | \
sort | uniq -c | sort -rg

Also note gsed instead of sed because that's the binary homebrew installs, leaving the system sed intact.

Solution 6 - Git

git shortlog -sn

This will show a list of commits per author.

Solution 7 - Git

Here is the primary snippet from @Alex 's answer that actually does the operation of aggregating the blame lines. I've cut it down to operate on a single file rather than a set of files.

git blame --line-porcelain path/to/file.txt | grep  "^author " | sort | uniq -c | sort -nr

I post this here because I come back to this answer often and re-reading the post and re-digesting the examples to extract the portion I value it is taxing. Nor is it generic enough for my use case; its scope is for a whole C project.


I like to list stats per file, achived via with a bash for iterator instead of xargs as I find xargs less readable and hard to use/memorize, The advantage/disadvantages xargs vs for should be discussed elsewhere.

Here is a practical snippet that will show results for each file individually:

for file in $(git ls-files); do \
    echo $file; \
    git blame --line-porcelain $file \
        | grep  "^author " | sort | uniq -c | sort -nr; \
    echo; \
done

And I tested, running this stright in a bash shell is ctrl+c safe, if you need to put this inside a bash script you might need to Trap on SIGINT and SIGTERM if you want the user to be able to break your for loop.

Solution 8 - Git

Check out the gitstats command available from http://gitstats.sourceforge.net/

Solution 9 - Git

I have this solution that counts the blamed lines in all text files (excluding the binary files, even the versioned ones):

IFS=$'\n'
for file in $(git ls-files); do
	git blame `git symbolic-ref --short HEAD` --line-porcelain "$file" | \
		grep  "^author " | \
		grep -v "Binary file (standard input) matches" | \
		grep -v "Not Committed Yet" | \
		cut -d " " -f 2-
	done | \
		sort | \
		uniq -c | \
		sort -nr

Solution 10 - Git

I adopted the top answer to Powershell:

(git ls-tree -rz --name-only HEAD).Split(0x00) | where {$_ -Match '.*\.py'} |%{git blame -w --line-porcelain HEAD $_} | Select-String -Pattern '^author ' | Group-Object | Select-Object -Property Count, Name | Sort-Object -Property Count -Descending

It's optional on whether you run git blame with the -w switch, I added it because it ignores whitespace changes.

Performance on my machine was in favor of Powershell (~50s vs ~65s for the same repo), although the Bash solution was running under WSL2

Solution 11 - Git

This works in any directory of the source structure of the repo, in case you want to inspect a certain source module.

find . -name '*.c' | xargs -n1 git blame --line-porcelain | grep "^author "|sort|uniq -c|sort -nr

Solution 12 - Git

Made my own script which is a combination of @nilbus and @Alex

#!/bin/sh

for f in $(git ls-tree -r  --name-only HEAD --);
do
	j=$(file "$f" | grep -E ': .*text'| sed -r -e 's/: .*//');
	if [ "$f" != "$j" ]; then
		continue;
	fi
	git blame -w --line-porcelain HEAD "$f" | grep  "^author " | sed 's/author //'`enter code here`
done | sort | uniq -c | sort -nr

Solution 13 - Git

Bash function that targets a single source file run on MacOS.

function glac {
    # git_line_author_counts
    git blame -w "$1" |  sed -E "s/.*\((.*) +[0-9]{4}-[0-9]{2}.*/\1/g" | sort | uniq -c | sort -nr
}

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionEraView Question on Stackoverflow
Solution 1 - GitAlexander OhView Answer on Stackoverflow
Solution 2 - GitLinus OleanderView Answer on Stackoverflow
Solution 3 - GitEdward AndersonView Answer on Stackoverflow
Solution 4 - GitadiusView Answer on Stackoverflow
Solution 5 - GitgtdView Answer on Stackoverflow
Solution 6 - GitmoinudinView Answer on Stackoverflow
Solution 7 - GitThorSummonerView Answer on Stackoverflow
Solution 8 - GitIvanView Answer on Stackoverflow
Solution 9 - GitGabriel DiegoView Answer on Stackoverflow
Solution 10 - GitMattwmaster58View Answer on Stackoverflow
Solution 11 - GitMartin GView Answer on Stackoverflow
Solution 12 - Gitvossman77View Answer on Stackoverflow
Solution 13 - GitjxramosView Answer on Stackoverflow