Git: discover which commits ever touched a range of lines

GitBlame

Git Problem Overview


I'm having trouble figuring out how to use git blame for getting the set of commits that ever touched a given range of lines. There are similar questions like this one but the accepted answer doesn't bring me much further.

Let's say I have a definition that starts on line 1000 of foo.rb. It's only only 5 lines long, but the number of commits that ever changed those lines is enormous. If I do

git blame foo.rb -L 1000,+5

I get references to (at most) five distinct commits that changed these lines, but I'm also interested in the commits "behind them".

Similarly,

git rev-list HEAD -- foo.rb | xargs git log --oneline

is almost what I want, but I can't specify line ranges to git rev-list

Can I pass a flag to git blame to get the list of commits that ever touched those five lines, or what's the quickest way to build a script that extracts such information? Let's ignore for the moment the possibility that the definition once had more or less than 5 lines.

Git Solutions


Solution 1 - Git

Since Git 1.8.4, git log has -L to view the evolution of a range of lines.

For example, suppose you look at git blame's output:

((aa27064...))[mlm@macbook:~/w/mlm/git]
$ git blame -L150,+11 -- git-web--browse.sh
a180055a git-web--browse.sh (Giuseppe Bilotta 2010-12-03 17:47:36 +0100 150)            die "The browser $browser is not
a180055a git-web--browse.sh (Giuseppe Bilotta 2010-12-03 17:47:36 +0100 151)    fi
5d6491c7 git-browse-help.sh (Christian Couder 2007-12-02 06:07:55 +0100 152) fi
5d6491c7 git-browse-help.sh (Christian Couder 2007-12-02 06:07:55 +0100 153) 
5d6491c7 git-browse-help.sh (Christian Couder 2007-12-02 06:07:55 +0100 154) case "$browser" in
81f42f11 git-web--browse.sh (Giuseppe Bilotta 2010-12-03 17:47:38 +0100 155) firefox|iceweasel|seamonkey|iceape)
5d6491c7 git-browse-help.sh (Christian Couder 2007-12-02 06:07:55 +0100 156)    # Check version because firefox < 2.0 do
5d6491c7 git-browse-help.sh (Christian Couder 2007-12-02 06:07:55 +0100 157)    vers=$(expr "$($browser_path -version)" 
5d6491c7 git-browse-help.sh (Christian Couder 2007-12-02 06:07:55 +0100 158)    NEWTAB='-new-tab'
5d6491c7 git-browse-help.sh (Christian Couder 2007-12-02 06:07:55 +0100 159)    test "$vers" -lt 2 && NEWTAB=''
a0685a4f git-web--browse.sh (Dmitry Potapov   2008-02-09 23:22:22 -0800 160)    "$browser_path" $NEWTAB "$@" &

And you want to know the history of what is now line 155.

Then:

((aa27064...))[mlm@macbook:~/w/mlm/git]
$ git log --topo-order --graph -u -L 155,155:git-web--browse.sh
* commit 81f42f11496b9117273939c98d270af273c8a463
| Author: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
| Date:   Fri Dec 3 17:47:38 2010 +0100
| 
|     web--browse: support opera, seamonkey and elinks
|     
|     The list of supported browsers is also updated in the documentation.
|     
|     Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
|     Signed-off-by: Junio C Hamano <gitster@pobox.com>
| 
| diff --git a/git-web--browse.sh b/git-web--browse.sh
| --- a/git-web--browse.sh
| +++ b/git-web--browse.sh
| @@ -143,1 +143,1 @@
| -firefox|iceweasel)
| +firefox|iceweasel|seamonkey|iceape)
|  
* commit a180055a47c6793eaaba6289f623cff32644215b
| Author: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
| Date:   Fri Dec 3 17:47:36 2010 +0100
| 
|     web--browse: coding style
|     
|     Retab and deindent choices in case statements.
|     
|     Signed-off-by: Giuseppe Bilotta <giuseppe.bilotta@gmail.com>
|     Signed-off-by: Junio C Hamano <gitster@pobox.com>
| 
| diff --git a/git-web--browse.sh b/git-web--browse.sh
| --- a/git-web--browse.sh
| +++ b/git-web--browse.sh
| @@ -142,1 +142,1 @@
| -    firefox|iceweasel)
| +firefox|iceweasel)
|  
* commit 5884f1fe96b33d9666a78e660042b1e3e5f9f4d9
  Author: Christian Couder <chriscool@tuxfamily.org>
  Date:   Sat Feb 2 07:32:53 2008 +0100
  
      Rename 'git-help--browse.sh' to 'git-web--browse.sh'.
      
      Signed-off-by: Christian Couder <chriscool@tuxfamily.org>
      Signed-off-by: Junio C Hamano <gitster@pobox.com>
  
  diff --git a/git-web--browse.sh b/git-web--browse.sh
  --- /dev/null
  +++ b/git-web--browse.sh
  @@ -0,0 +127,1 @@
  +    firefox|iceweasel)

If you use this functionality frequently, you might find a git alias useful. To do that, put in your ~/.gitconfig:

[alias]
    # Follow evolution of certain lines in a file
    # arg1=file, arg2=first line, arg3=last line or blank for just the first line
    follow = "!sh -c 'git log --topo-order -u -L $2,${3:-$2}:"$1"'" -

And now you can just do git follow git-web--browse.sh 155.

Solution 2 - Git

I think this is what you want:

git rev-list HEAD -- foo.rb | ( 
    while read rev; do
        git blame -l -L 1000,+5 $rev -- foo.rb | cut -d ' ' -f 1
    done;
) | awk '{ if (!h[$0]) { print $0; h[$0]=1 } }'

That'll output the the revision number for each commit that has an edit to the lines you've chosen.

Here are the steps:

  1. The first part git rev-list HEAD -- foo.rb outputs all revisions in which the chosen file is edited.

  2. Each of those revisions then goes into the second part, which takes each one and puts it into git blame -l -L 1000,+5 $rev -- foo.rb | cut -d ' ' -f 1. This is a two-part command.

  3. git blame -l -L 1000,+5 $rev -- foo.rb outputs the blame for the chosen lines. By feeding it the revision number, we are telling it to start from that commit and go from there, rather than starting at the head.

  4. Since blame outputs a bunch of info we don't need, cut -d ' ' -f 1 gives us the first column (the revision number) of the blame output.

  5. awk '{ if (!h[$0]) { print $0; h[$0]=1 } }' takes out non-adjacent duplicate lines while maintaining the order they appeared in. See http://jeetworks.org/node/94 for more info about this command.

You could add a last step here to get prettier output. Pipe everything into xargs -L 1 git log --oneline -1 and get the corresponding commit message for the list of revisions. I had a weird issue using this last step where I had to keep pressing next every few revisions that were output. I'm not sure why that was, which is why I didn't include it in my solution.

Solution 3 - Git

Not sure what you want to do, but maybe git log -S can do the trick for you:

> -S > Look for differences that introduce or remove an instance of . > Note that this is different than the string simply appearing > in diff output; see the pickaxe entry in gitdiffcore(7) for more > details.

You can put in string the change (or part of the change) you are trying to follow and this will list the commits that ever touched this change.

Solution 4 - Git

I liked this puzzle, it's got its subtleties. Source this file, say init foo.rb 1000,1005 and follow the instructions. When you're done, file @changes will have the correct list of commits in topological order and @blames will have the actual blame output from each.

This is dramatically more complex than the accepted solution above. It produces output that will sometimes be more useful, and hard to reproduce, and it was fun to code.

The problem with trying to track line-number ranges automatically while stepping backward through history is if a change hunk crosses line-numbered range boundaries you can't automatically determine where in that hunk the new range boundary should be, and you'll either have to include a big range for big additions and so accumulate (sometimes lots of) irrelevant changes, or drop into manual mode to be sure it's right (which of course gets you right back here), or accept extreme lossage at times.

If you want your output to be exact, use the answer above with trustworthy regex ranges like `/^type function(/,/^}/', or use this, which isn't actually that bad, a couple seconds per step back in time.

In exchange for the extra complexity, it does produces the hitlist in topological sequence and it does at least (fairly successfully) try to ameliorate the pain at each step. It never runs a redundant blame, for instance, and update-ranges makes adjusting line numbers easier. And of course there's the reliability of having had to individually eyeball the hunks... :-P

To run this on full auto, say { init foo.rb /^class foo/,/^end/; auto; } 2>&-

 ### functions here create random @-prefix files in the current directory ###
#
# git blame history for a range, finding every change to that range
# throughout the available history.  It's somewhat, ahh, "intended for
# customization", is that enough of a warning?  It works as advertised
# but drops @-prefix temporary files in your current directory and
# defines new commands
#
# Source this file in a subshell, it defines functions for your use.
# If you have @-prefix files you care about, change all @ in this file
# to something you don't have and source it again.
#
#    init path/to/file [<start>,<end>]  # range optional
#    update-ranges           # check range boundaries for the next step
#    cycle [<start>,<end>]   # range unchanged if not supplied
#    prettyblame             # pretty colors, 
# 		blue="child commit doesn't have this line"
#		green="parent commit doesn't have this line"
#           brown=both
#    shhh # silence the pre-cycle blurb
#
# For regex ranges, you can _usually_ source this file and say `init
# path/to/file /startpattern/,/endpattern/` and then cycle until it says 0
# commits remain in the checklist
#
# for line-number ranges, or regex ranges you think might be unworthy, you
# need to check and possibly update the range before each cycle.  File
# @next is the next blame start-point revision text; and command
# update-ranges will bring up vim with the current range V-selected.  If
# that looks good, `@M` is set up to quit even while selecting, so `@M` and
# cycle.  If it doesn't look good, 'o' and the arrow keys will make getting
# good line numbers easy, or you can find better regex's.  Either way, `@M`
# out and say `cycle <start>,<end>` to update the ranges.

init () { 
    file=$1;
    range="$2"
    rm -f @changes
    git rev-list --topo-order HEAD -- "$file" \
    | tee @checklist \
    | cat -n | sort -k2 > @sequence
    git blame "-ln${range:+L$range}" -- "$file" > @latest || echo >@checklist
    check-cycle
    cp @latest @blames
}

update-latest-checklist() {
    # update $latest with the latest sha that actually touched our range,
    # and delete that and everything later than that from the checklist.
    latest=$(
        sed s,^^,, @latest \
        | sort -uk1,1 \
        | join -1 2 -o1.1,1.2 @sequence - \
        | sort -unk1,1 \
        | sed 1q \
        | cut -d" " -f2
    )
    sed -i 1,/^$latest/d @checklist
}
shhh () { shhh=1; }

check-cycle () {
    update-latest-checklist
    sed -n q1 @checklist || git log $latest~..$latest --format=%H\ %s | tee -a @changes
    next=`sed 1q @checklist`
    git cat-file -p `git rev-parse $next:"$file"` > @next
    test -z "$shh$shhh$shhhh" && {
        echo "A blame from the (next-)most recent alteration (id `git rev-parse --short $latest`) to '$file'"
        echo is in file @latest, save its contents where you like
        echo 
        echo you will need to look in file @next to determine the correct next range,
        echo and say '`cycle its-start-line,its-end-line`' to continue
        echo the "update-ranges" function starts you out with the range selected
    } >&2
    ncommits=`wc -l @checklist | cut -d\  -f1`
    echo  $ncommits commits remain in the checklist >&2
    return $((ncommits==0))
}

update-ranges () {
    start="${range%,*}"
    end="${range#*,}"
    case "$start" in
    */*)    startcmd="1G$start"$'\n' ;;
    *)      startcmd="${start}G" ;;
    esac
    case "$end" in
    */*)    endcmd="$end"$'\n' ;;
    [0-9]*) endcmd="${end}G" ;;
    +[0-9]*) endcmd="${end}j" ;;
    *) endcmd="echohl Search|echo "can\'t" get to '${end}'\"|echohl None" ;;
    esac
    vim -c 'set buftype=nofile|let @m=":|q'$'\n"' -c "norm!${startcmd}V${endcmd}z.o" @next
}

cycle () {
    sed -n q1 @checklist && { echo "No more commits to check"; return 1; }
    range="${1:-$range}"
    git blame "-ln${range:+L$range}" $next -- "$file" >@latest || echo >@checklist
    echo >>@blames
    cat @latest >>@blames
    check-cycle
}

auto () {
    while cycle; do true; done
}

prettyblames () {
cat >@pretty <<-\EOD
BEGIN {
    RS=""
    colors[0]="\033[0;30m"
    colors[1]="\033[0;34m"
    colors[2]="\033[0;32m"
    colors[3]="\033[0;33m"
    getline commits < "@changes"
    split(commits,commit,/\n/)
}
NR!=1 { print "" }
{
    thiscommit=gensub(/ .*/,"",1,commit[NR])
    printf "%s\n","\033[0;31m"commit[NR]"\033[0m"
    split($0,line,/\n/)
    for ( n=1; n<=length(line); ++n ) {
        color=0
        split(line[n],key,/[1-9][0-9]*)/)
        if ( NR!=1 && !seen[key[1]] ) color+=1
        seen[key[1]]=1;
        linecommit = gensub(/ .*/,"",1,line[n])
        if (linecommit==thiscommit) color+=2
        printf "%s%s\033[0m\n",colors[color],line[n]
    }
}
EOD
awk -f @pretty @blames | less -R
}

Solution 5 - Git

Please refer to the answer posted here https://stackoverflow.com/questions/3701404/list-all-commits-for-a-specific-file/31980243#31980243. Its exactly what you need.

Solution 6 - Git

A few thoughts..

This sounds similar to this post, and it looks like you might get close with something like this:

git blame -L '/variable_name *= */',+1

As long as you know the definition to match against (for the regex).

There is a thread discussion here, about using tig and git gui (which apparently might handle this). I haven't tried this myself yet, so can't verify it (I'll give this a try later).

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJoao TavoraView Question on Stackoverflow
Solution 1 - GitMatt McClureView Answer on Stackoverflow
Solution 2 - GitJonathan WrenView Answer on Stackoverflow
Solution 3 - GitFerCaView Answer on Stackoverflow
Solution 4 - GitjthillView Answer on Stackoverflow
Solution 5 - GitjitendrapurohitView Answer on Stackoverflow
Solution 6 - GitJackView Answer on Stackoverflow