How can I graph the Lines of Code history for git repo?

Git

Git Problem Overview


Basically I want to get the number of lines-of-code in the repository after each commit.

The only (really crappy) ways I have found is to use git filter-branch to run wc -l *, and a script that runs git reset --hard on each commit, then runs wc -l

To make it a bit clearer, when the tool is run, it would output the lines of code of the very first commit, then the second and so on. This is what I want the tool to output (as an example):

me@something:~/$ gitsloc --branch master
10
48
153
450
1734
1542

I've played around with the ruby 'git' library, but the closest I found was using the .lines() method on a diff, which seems like it should give the added lines (but does not: it returns 0 when you delete lines for example)

require 'rubygems'
require 'git'

total = 0
g = Git.open(working_dir = '/Users/dbr/Desktop/code_projects/tvdb_api')    

last = nil
g.log.each do |cur|
  diff = g.diff(last, cur)
  total = total + diff.lines
  puts total
  last = cur
end

Git Solutions


Solution 1 - Git

You might also consider gitstats, which generates this graph as an html file.

Solution 2 - Git

You may get both added and removed lines with git log, like:

git log --shortstat --reverse --pretty=oneline

From this, you can write a similar script to the one you did using this info. In python:

#!/usr/bin/python

"""
Display the per-commit size of the current git branch.
"""

import subprocess
import re
import sys

def main(argv):
  git = subprocess.Popen(["git", "log", "--shortstat", "--reverse",
                        "--pretty=oneline"], stdout=subprocess.PIPE)
  out, err = git.communicate()
  total_files, total_insertions, total_deletions = 0, 0, 0
  for line in out.split('\n'):
    if not line: continue
    if line[0] != ' ': 
      # This is a description line
      hash, desc = line.split(" ", 1)
    else:
      # This is a stat line
      data = re.findall(
        ' (\d+) files changed, (\d+) insertions\(\+\), (\d+) deletions\(-\)', 
        line)
      files, insertions, deletions = ( int(x) for x in data[0] )
      total_files += files
      total_insertions += insertions
      total_deletions += deletions
      print "%s: %d files, %d lines" % (hash, total_files,
                                        total_insertions - total_deletions)


if __name__ == '__main__':
  sys.exit(main(sys.argv))

Solution 3 - Git

http://github.com/ITikhonov/git-loc worked right out of the box for me.

Solution 4 - Git

The first thing that jumps to mind is the possibility of your git history having a nonlinear history. You might have difficulty determining a sensible sequence of commits.

Having said that, it seems like you could keep a log of commit ids and the corresponding lines of code in that commit. In a post-commit hook, starting from the HEAD revision, work backwards (branching to multiple parents if necessary) until all paths reach a commit that you've already seen before. That should give you the total lines of code for each commit id.

Does that help any? I have a feeling that I've misunderstood something about your question.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiondbrView Question on Stackoverflow
Solution 1 - GitcboettigView Answer on Stackoverflow
Solution 2 - GitfserbView Answer on Stackoverflow
Solution 3 - Gitma11hew28View Answer on Stackoverflow
Solution 4 - GitGreg HewgillView Answer on Stackoverflow