does git store diff information in commit objects?

GitDiff

Git Problem Overview


According to this:

> It is important to note that this is very different from most SCM systems that you may be familiar with. Subversion, CVS, Perforce, Mercurial and the like all use Delta Storage systems - they store the differences between one commit and the next. Git does not do this - it stores a snapshot of what all the files in your project look like in this tree structure each time you commit. This is a very important concept to understand when using Git.

Yet when I run git show $SHA1ofCommitObject...

commit 4405aa474fff8247607d0bf599e054173da84113
Author: Joe Smoe <joe.smoe@example.com>
Date:   Tue May 1 08:48:21 2012 -0500

    First commit

diff --git a/index.html b/index.html
new file mode 100644
index 0000000..de8b69b
--- /dev/null
+++ b/index.html
@@ -0,0 +1 @@
+<h1>Hello World!</h1>
diff --git a/interests/chess.html b/interests/chess.html
new file mode 100644
index 0000000..e5be7dd
--- /dev/null
+++ b/interests/chess.html
@@ -0,0 +1 @@
+Did you see on Slashdot that King's Gambit accepted is solved! <a href="http://game

... it outputs the diff of the commit with the previous commits. I know that git doesn't store diffs in blob objects, but does it store diffs in commit objects? Or is git show dynamically calculating the diff?

Git Solutions


Solution 1 - Git

What the statement means is that, most other version control systems need a point of reference in the past to be able to re-create the current commit.

For example, at some point in the past, a diff-based VCS (version control system) would have stored a full snapshot:

x = snapshot
+ = diff
History:
x-----+-----+-----+-----(+) Where we are now
                         

So, in such a scenario, to re-create the state at (now), it would have to checkout (x) and then apply diffs for each (+) until it gets to now. Note that it would extremely inefficient to store deltas forever, so every so often, delta based VCSes store a full snapshot. Here's how its done for subversion.

Now, git is different. Git stores references to complete blobs and this means that with git, only one commit is sufficient to recreate the codebase at that point in time. Git does not need to look up information from past revisions to create a snapshot.

So if that is the case, then where does the delta compression that git uses come in?

Well, it is nothing but a compression concept - there is no point storing the same information twice, if only a tiny amount has changed. Therefore, represent what has changed, but store a reference to it, so that the commit that it belongs to, which is in effect a tree of references, can still be re-created without looking at past commits. The thing is, though, that Git does not do this immediately after every commit, but rather on a garbage collection run. So, if git has not run its garbage collection, you can see objects in your index with very similar content.

However, when Git runs its garbage collection (or when you call git gc manually), then the duplicates are cleaned up and a read only pack file is created. You don't have to worry about running garbage collection manually - git contains heuristics which tell it when to do so.

Solution 2 - Git

No, commit objects in git don't contain diffs - instead, each commit object contains a hash of the tree, which recursively and completely defines the content of the source tree at that commit. There's a nice explanation in the git community book of what goes into blob objects, tree objects and commit objects .

All the diffs that are shown to you by git's tools are calculated on demand from the complete content of files.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAlexander BirdView Question on Stackoverflow
Solution 1 - GitCarlView Answer on Stackoverflow
Solution 2 - GitMark LongairView Answer on Stackoverflow