What is git actually doing when it says it is "resolving deltas"?

Git

Git Problem Overview


During the first clone of a repository, git first receives the objects (which is obvious enough), and then spends about the same amount of time "resolving deltas". What's actually happening during this phase of the clone?

Git Solutions


Solution 1 - Git

The stages of git clone are:

  1. Receive a "pack" file of all the objects in the repo database
  2. Create an index file for the received pack
  3. Check out the head revision (for a non-bare repo, obviously)

"Resolving deltas" is the message shown for the second stage, indexing the pack file ("git index-pack").

Pack files do not have the actual object IDs in them, only the object content. So to determine what the object IDs are, git has to do a decompress+SHA1 of each object in the pack to produce the object ID, which is then written into the index file.

An object in a pack file may be stored as a delta i.e. a sequence of changes to make to some other object. In this case, git needs to retrieve the base object, apply the commands and SHA1 the result. The base object itself might have to be derived by applying a sequence of delta commands. (Even though in the case of a clone, the base object will have been encountered already, there is a limit to how many manufactured objects are cached in memory).

In summary, the "resolving deltas" stage involves decompressing and checksumming the entire repo database, which not surprisingly takes quite a long time. Presumably decompressing and calculating SHA1s actually takes more time than applying the delta commands.

In the case of a subsequent fetch, the received pack file may contain references (as delta object bases) to other objects that the receiving git is expected to already have. In this case, the receiving git actually rewrites the received pack file to include any such referenced objects, so that any stored pack file is self-sufficient. This might be where the message "resolving deltas" originated.

Solution 2 - Git

Git uses delta encoding to store some of the objects in packfiles. However, you don't want to have to play back every single change ever on a given file in order to get the current version, so Git also has occasional snapshots of the file contents stored as well. "Resolving deltas" is the step that deals with making sure all of that stays consistent.

Here's a chapter from the "Git Internals" section of the Pro Git book, which is available online, that talks about this.

Solution 3 - Git

Amber seems to be describing the object model that Mercurial or similar uses. Git does not store the deltas between subsequent versions of an object, but rather full snapshots of the object, every time. It then compresses these snapshots using delta compression, trying to find good deltas to use, regardless of where in the history these exist.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionNik ReimanView Question on Stackoverflow
Solution 1 - GitaraqnidView Answer on Stackoverflow
Solution 2 - GitAmberView Answer on Stackoverflow
Solution 3 - GitJohanView Answer on Stackoverflow