How does git detect that a file has been modified?

Git

Git Problem Overview


How does git detect a file modification so fast?

Does it hash every file in the repo and compare SHA1s? This would take a lot of time, wouldn't it?

Or does it compare atime, ctime or mtime?

Git Solutions


Solution 1 - Git

Git tries hard to get convinced from the lstat() value alone that the worktree matches the index, because falling back on file contents is very expensive.

Documentation/technical/racy-git.txt describes what stat fields are used, and how some race conditions due to low mtime granularity are avoided. This article has some more detail.

stat values aren't tamper-proof, see futimens(3). Git may be fooled into missing a change to a file; that does not compromise the integrity of content-hashing.

Solution 2 - Git

There's an initial mtime check for reports like "git status", but when the final commit is computed, mtimes don't matter... it's the SHA1 that matters.

Solution 3 - Git

Well I would hazard a guess that it's using a combination of stat() calls to work out what looks like it might have changed, then in turn actually tying to ascertain using it's diff'ing engine that this is the case.

You can see the code for the diff engine here to get some idea. I traced through the codebase to be sure that the status command does indeed call down into this code (it looks like a lot of stuff does!) and actually all this makes a lot of sense when you know that Git performs pretty badly on Windows where it is using an emulation layer to perform these POSIX type calls: it's an order of magnitude slower to do a git status on that platform.

Anyway, short of reading all the code from top to bottom (which I may later if I have time!) thats as far as I can take you for now...maybe someone can be more definitive if they have worked with the codebase.

Note: another possible speedup comes from judicious use of inline functions where it clearly makes sense, you can see this clearly in the headers.

[edit: see here for an explanation of stat()]

Solution 4 - Git

Depending on platform, you should be able to find out what syscalls Git uses to figure out its status. Try strace git status on Linux, truss git status on SunOS, or the seemingly DTrace-based tool that Apple ships with its Developer Tools on Mac OS X.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionhdorioView Question on Stackoverflow
Solution 1 - GitTobuView Answer on Stackoverflow
Solution 2 - GitRandal SchwartzView Answer on Stackoverflow
Solution 3 - GitjkpView Answer on Stackoverflow
Solution 4 - GitMax A.View Answer on Stackoverflow