How to assign a Git SHA1's to a file without Git?

GitSha1

Git Problem Overview


As I understand it when Git assigns a SHA1 hash to a file this SHA1 is unique to the file based on its contents.

As a result if a file moves from one repository to another the SHA1 for the file remains the same as its contents have not changed.

How does Git calculate the SHA1 digest? Does it do it on the full uncompressed file contents?

I would like to emulate assigning SHA1's outside of Git.

Git Solutions


Solution 1 - Git

This is how Git calculates the SHA1 for a file (or, in Git terms, a "blob"):

sha1("blob " + filesize + "\0" + data)

So you can easily compute it yourself without having Git installed. Note that "\0" is the NULL-byte, not a two-character string.

For example, the hash of an empty file:

sha1("blob 0\0") = "e69de29bb2d1d6434b8b29ae775ad8c2e48c5391"

$ touch empty
$ git hash-object empty
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391

Another example:

sha1("blob 7\0foobar\n") = "323fae03f4606ea9991df8befbb2fca795e648fa"

$ echo "foobar" > foo.txt
$ git hash-object foo.txt 
323fae03f4606ea9991df8befbb2fca795e648fa

Here is a Python implementation:

from hashlib import sha1
def githash(data):
    s = sha1()
    s.update("blob %u\0" % len(data))
    s.update(data)
    return s.hexdigest()

Solution 2 - Git

A little goodie: in shell

echo -en "blob ${#CONTENTS}\0$CONTENTS" | sha1sum

Solution 3 - Git

You can make a bash shell function to calculate it quite easily if you don't have git installed.

git_id () { printf 'blob %s\0' "$(ls -l "$1" | awk '{print $5;}')" | cat - "$1" | sha1sum | awk '{print $1}'; }

Solution 4 - Git

Take a look at the man page for git-hash-object. You can use it to compute the git hash of any particular file. I think that git feeds more than just the contents of the file into the hash algorithm, but I don't know for sure, and if it does feed in extra data, I don't know what it is.

Solution 5 - Git

/// Calculates the SHA1 for a given string
let calcSHA1 (text:string) =
    text 
      |> System.Text.Encoding.ASCII.GetBytes
      |> (new System.Security.Cryptography.SHA1CryptoServiceProvider()).ComputeHash
      |> Array.fold (fun acc e -> 
           let t = System.Convert.ToString(e, 16)
           if t.Length = 1 then acc + "0" + t else acc + t) 
           ""
/// Calculates the SHA1 like git
let calcGitSHA1 (text:string) =
    let s = text.Replace("\r\n","\n")
    sprintf "blob %d%c%s" (s.Length) (char 0) s
      |> calcSHA1

This is a solution in F#.

Solution 6 - Git

Full Python3 implementation:

import os
from hashlib import sha1

def hashfile(filepath):
    filesize_bytes = os.path.getsize(filepath)

    s = sha1()
    s.update(b"blob %u\0" % filesize_bytes)

    with open(filepath, 'rb') as f:
        s.update(f.read())

    return s.hexdigest() 

Solution 7 - Git

And in Perl (see also Git::PurePerl at http://search.cpan.org/dist/Git-PurePerl/ )

use strict;
use warnings;
use Digest::SHA1;

my @input = <>;

my $content = join("", @input);

my $git_blob = 'blob' . ' ' . length($content) . "\0" . $content;

my $sha1 = Digest::SHA1->new();

$sha1->add($git_blob);

print $sha1->hexdigest();

Solution 8 - Git

In Perl:

#!/usr/bin/env perl
use Digest::SHA1;

my $content = do { local $/ = undef; <> };
print Digest::SHA1->new->add('blob '.length($content)."\0".$content)->hexdigest(), "\n";

As a shell command:

perl -MDigest::SHA1 -E '$/=undef;$_=<>;say Digest::SHA1->new->add("blob ".length()."\0".$_)->hexdigest' < file

Solution 9 - Git

Using Ruby, you could do something like this:

require 'digest/sha1'

def git_hash(file)
  data = File.read(file)
  size = data.bytesize.to_s
  Digest::SHA1.hexdigest('blob ' + size + "\0" + data)
end

Solution 10 - Git

A little Bash script that should produce identical output to git hash-object:

#!/bin/sh
( 
	echo -en 'blob '"$(stat -c%s "$1")"'\0';
	cat "$1" 
) | sha1sum | cut -d\  -f 1

Solution 11 - Git

You can apply the same on files as well

$ echo "foobar" > foo.txt
$ echo "$(cat foo.txt)"|(read f; echo -en "blob "$((${#f}+1))"\0$f\n" )|openssl sha1
323fae03f4606ea9991df8befbb2fca795e648fa

Solution 12 - Git

In JavaScript

const crypto = require('crypto')
const bytes = require('utf8-bytes')

function sha1(data) {
    const shasum = crypto.createHash('sha1')
    shasum.update(data)
    return shasum.digest('hex')
}

function shaGit(data) {
    const total_bytes = bytes(data).length
    return sha1(`blob ${total_bytes}\0${data}`)
}

Solution 13 - Git

It is interesting to note that obviously Git adds a newline character to the end of the data before it will be hashed. A file containing nothing than "Hello World!" gets a blob hash of 980a0d5..., which the same as this one:

$ php -r 'echo sha1("blob 13" . chr(0) . "Hello World!\n") , PHP_EOL;'

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questiongit-noobView Question on Stackoverflow
Solution 1 - GitFerdinand BeyerView Answer on Stackoverflow
Solution 2 - GitknittlView Answer on Stackoverflow
Solution 3 - GitCB BaileyView Answer on Stackoverflow
Solution 4 - GitDale HagglundView Answer on Stackoverflow
Solution 5 - Gitforki23View Answer on Stackoverflow
Solution 6 - GitTomerView Answer on Stackoverflow
Solution 7 - GitAlec the GeekView Answer on Stackoverflow
Solution 8 - GitdolmenView Answer on Stackoverflow
Solution 9 - GitleifericfView Answer on Stackoverflow
Solution 10 - GitFordiView Answer on Stackoverflow
Solution 11 - GitFerasView Answer on Stackoverflow
Solution 12 - GitEnZoView Answer on Stackoverflow
Solution 13 - GitNudgeView Answer on Stackoverflow