How to extract one file with commit history from a Git repo with index-filter & co?

GitExtractGit Filter-Branch

Git Problem Overview


I have a Git repo converted from SVN to Mercurial to Git, and I wanted to extract just one source file. I also had weird characters like (an encoding mismatch corrupted Unicode ä) and spaces in the filenames.

How can I extract one file from a repository and place it at the root of the new repo?

Git Solutions


Solution 1 - Git

A faster and easier-to-understand filter that accomplishes the same thing:

git filter-branch --index-filter '
                        git read-tree --empty
                        git reset $GIT_COMMIT -- $your $files $here
                ' \
        -- --all -- $your $files $here

Solution 2 - Git

Seems it's not particularly easy, and that's the reason I'll be answering my own question despite many similar questions regarding git [index-filter|subdirectory-filter|filter-tree], as I needed to use all the previous to achieve this!

First a quick note, that even a spell like in a comment on https://stackoverflow.com/questions/5998987/splitting-a-set-of-files-within-a-git-repo-into-their-own-repository-preserving

SPELL='git ls-tree -r --name-only --full-tree "$GIT_COMMIT" | grep -v "trie.lisp" | tr "\n" "\0" | xargs -0 git rm --cached -r --ignore-unmatch'
git filter-branch --prune-empty --index-filter "$SPELL" -- --all

will not help with files named like imaging/DrinkkejaI<0300>$'\302\210'.txt_74x2032.gif. The aI<0300>$'\302\210' part once was a single letter: ä.

So in order to extract a single file, in addition to filter-branch I also needed to do:

git filter-branch -f --subdirectory-filter lisp/source/model HEAD

Alternatively, you can use --tree-filter: (the test is needed, because the file was at another directory earlier, see: https://stackoverflow.com/questions/3142419/how-can-i-move-a-directory-in-a-git-repo-for-all-commits)

MV_FILTER='test -f source/model/trie.lisp && mv ./source/model/trie.lisp . || echo "Nothing to do."'
git filter-branch --tree-filter $MV_FILTER HEAD --all

To see all the names a file have had, use:

git log --pretty=oneline --follow --name-only git-path/to/file | grep -v ' ' | sort -u

As described at http://whileimautomaton.net/2010/04/03012432

Also follow the steps on afterwards:

$ git reset --hard
$ git gc --aggressive
$ git prune
$ git remote rm origin # Otherwise changes will be pushed to where the repo was cloned from

Solution 3 - Git

Note that things get much easier if you combine this with the additional step of moving the desired file(s) into a new directory.

This might be a quite common use case (e.g. moving the desired single file to the root dir).
I did it (using git 1.9) like this (first moving the file(s), then deleting the old tree):

git filter-branch -f --tree-filter 'mkdir -p new_path && git mv -k -f old_path/to/file new_path/'
git filter-branch -f --prune-empty --index-filter 'git rm -r --cached --ignore-unmatch old_path'

You can even easily use wildcards for the desired files (without messing around with grep -v ).

I'd think that this ('mv' and 'rm') could also be done in one filter-branch but it did'n work for me.

I didn't try it with weird characters but I hope this helps anyway. Making things easier seems always to be a good idea to me.

Hint:
This is a time consuming action on large repos. So if you want to do several actions (like getting a bunch of files and then rearrange them in 'new_path/subdirs') it's a good idea to do the 'rm' part as soon as possible to get a smaller and faster tree.

Solution 4 - Git

I've found an elegant solution using git log and git am here: https://www.pixelite.co.nz/article/extracting-file-folder-from-git-repository-with-full-git-history/

In case it goes away, here's how you do it:

  1. in the original repo,

     git log --pretty=email --patch-with-stat --reverse --full-index --binary -- path/to/file_or_folder > /tmp/patch
    
  2. if the file was in a subdirectory, or if you want to rename it

     sed -i -e 's/deep\/path\/that\/you\/want\/shorter/short\/path/g' /tmp/patch
    
  3. in a new, empty repo

     git am < /tmp/patch
    

Solution 5 - Git

The following will rewrite the history and keep only commits that touch the list of files you give. You probably want to do that in a clone of your repository to avoid losing the original history.

FILES='path/to/file1 other-path/to/file2 file3'
git filter-branch --prune-empty --index-filter "
                        git read-tree --empty
                        git reset \$GIT_COMMIT -- $FILES
                " \
        -- --all -- $FILES

Then you can merge that new branch into your target repository, via normal merge or rebase commands according to your use-case.

Solution 6 - Git

There is a new command git filter-repo nowadays.
It has more possibilities and better performance.
See man page

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionpeterhilView Question on Stackoverflow
Solution 1 - GitjthillView Answer on Stackoverflow
Solution 2 - GitpeterhilView Answer on Stackoverflow
Solution 3 - GitRomanView Answer on Stackoverflow
Solution 4 - GitMarius GedminasView Answer on Stackoverflow
Solution 5 - GitPowerKiKiView Answer on Stackoverflow
Solution 6 - GitRomanView Answer on Stackoverflow