Can Git really track the movement of a single function from 1 file to another? If so, how?

GitGit DiffGit Log

Git Problem Overview


Several times, I have come across the statement that, if you move a single function from one file to another file, Git can track it. For example, this entry says, "Linus says that if you move a function from one file to another, Git will tell you the history of that single function across the move."

But I have a little bit of awareness of some of Git's under-the-hood design, and I don't see how this is possible. So I'm wondering ... is this is a correct statement? And if so, how is this possible?

My understanding is that Git stores each file's contents as a Blob, and each Blob has a globally unique identity which arises from the SHA hash of its contents and size. Git then represents folders as Trees. Any filename information belongs to the Tree, not to the Blob, so a file rename for example shows up as a change to a Tree, not to a Blob.

So if I have a file called "foo" with 20 functions in it, and a file called "bar" with 5 functions in it, and I move one of the functions from foo into bar (resulting in 19 and 6, respectively), how can Git detect that I moved that function from one file to another?

From my understanding, this would cause 2 new blobs to exist (one for the modified foo and one for the modified bar). I realize a diff could be calculated to show that the function was moved from one file to the other. But I don't see how history about the function could possibly become associated with bar instead of foo (not automatically, anyway).

If Git were to actually look inside of single files, and compute a blob per function (which would be crazy / infeasible, because you'd have to know how to parse any possible language), then I could see how this might be possible.

So ... is the statement correct or not? And if it is correct, then what is lacking in my understanding?

Git Solutions


Solution 1 - Git

This functionality is provided through git blame -C <file>.

The -C option drives git into trying to find matches between addition or deletion of chunks of text in the file being reviewed and the files modified in the same changesets. Additional -C -C, or -C -C -C extend the search.

Try for yourself in a test repo with git blame -C and you'll see that the block of code that you just moved is originated in the original file where it belonged to.

From the git help blame manual page:

> The origin of lines is automatically followed across whole-file renames (currently there is no option to turn the rename-following off). To follow lines moved from one file to another, or to follow lines that were copied and pasted from another file, etc., see the -C and -M options.

Solution 2 - Git

As of Git 2.15, git diff now supports detection of moved lines with the --color-moved option. It works for moves across files.

It works, obviously, for colorized terminal output. As far as I can tell, there is no option to indicate moves in plain text patch format, but that makes sense.

For default behavior, try

git diff --color-moved

The command also takes options, which currently are no, default, plain, zebra and dimmed_zebra (Use git help diff to get the latest options and their descriptions). For example:

git diff --color-moved=zebra

As to how it is done, you can glean some understanding from this email exchange by the author of the functionality.

Solution 3 - Git

A bit of this functionality is in git gui blame (+ filename). It shows an annotation of the lines of a file, each indicating when it was created and when last changed. For code movement across a file, it shows the commit of the original file as a creation, and the commit where it was added to the current file as last change. Try it.

What I really would want is to give git log as some argument a line number range additionally to a file path, and then it would show the history of this code block. There is no such option, if the documentation is right. Yes, from Linus' statement I too would think such a command should be readily available.

Solution 4 - Git

git doesn't actually track renames at all. A rename is just a delete and add, that's all. Any tools who show renames reconstruct them from this history information.

As such, tracking function renames is a simple matter of analyzing the diffs of all files in each commit after the fact. There's nothing particularly impossible about it; the existing rename tracking already handles 'fuzzy' renames, in which some changes are done to the file as well as renaming it; this requires looking at the contents to the files. It would be a simple extension to look for function renames as well.

I don't know if the base git tools actually do this however - they try to be language neutral, and function identification is very much not language neutral.

Solution 5 - Git

There's git diff that will show you that certain lines disappeared from foo and reappeared in bar. If there are no other changes in these files in the same commit, the change will be easy to spot.

An intellectual git client would be able to show you how lines moved from one file to another. A language-aware IDE would be able to correspond this change with a particular function.

A very similar thing happens when a file gets renamed. It just disappears under one name and reappears under another, but any reasonable tool is able to notice it and represent as a rename.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionCharlie FlowersView Question on Stackoverflow
Solution 1 - GitJN AvilaView Answer on Stackoverflow
Solution 2 - GitInigoView Answer on Stackoverflow
Solution 3 - GitPaŭlo EbermannView Answer on Stackoverflow
Solution 4 - GitbdonlanView Answer on Stackoverflow
Solution 5 - Git9000View Answer on Stackoverflow