Can I use Git to search for matching filenames in a repository?

Git

Git Problem Overview


Just say I have a file: "HelloWorld.pm" in multiple subdirectories within a Git repository.

I would like to issue a command to find the full paths of all the files matching "HelloWorld.pm":

For example:

/path/to/repository/HelloWorld.pm
/path/to/repository/but/much/deeper/down/HelloWorld.pm
/path/to/repository/please/dont/make/me/search/through/the/lot/HelloWorld.pm

How can I use Git to efficiently find all the full paths that match a given filename?

I realise I can do this with the Linux/Unix find command but I was hoping to avoid scanning all subdirectories looking for instances of the filename.

Git Solutions


Solution 1 - Git

git ls-files will give you a listing of all files in current state of the repository (the cache or index). You can pass a pattern in to get files matching that pattern.

git ls-files HelloWorld.pm '**/HelloWorld.pm'

If you would like to find a set of files and grep through their contents, you can do that with git grep:

git grep some-string -- HelloWorld.pm '**/HelloWorld.pm'

Solution 2 - Git

Hmm, the original question was about the repository. A repository contains more than 1 commit (in the general case at least), but the answers given before search only through one commit.

Because I could not find an answer that really searches the whole commit history I wrote a quick brute force script git-find-by-name that takes (nearly) all commits into consideration.

#! /bin/sh
tmpdir=$(mktemp -td git-find.XXXX)
trap "rm -r $tmpdir" EXIT INT TERM

allrevs=$(git rev-list --all)
# well, nearly all revs, we could still check the log if we have
# dangling commits and we could include the index to be perfect...

for rev in $allrevs
do
  git ls-tree --full-tree -r $rev >$tmpdir/$rev 
done

cd $tmpdir
grep $1 * 

Maybe there is a more elegant way.

Please note the trivial way the parameter is passed into grep, so it will match parts of filename. If that is not desired anchor your search expression and/or add suitable grep options.

For deep histories the output might be too noisy, I thought about a script that converts a list of revisions into a range, like the opposite of what git rev-list can do. But so far it has remained a thought.

Solution 3 - Git

Try:

git ls-tree -r HEAD | grep HelloWorld.pm

Solution 4 - Git

git ls-files | grep -i HelloWorld.pm

The grep -i makes grep case insensitive.

Solution 5 - Git

[It's a bit of comment abuse, I admit, but I can't comment yet and thought I would improve @uwe-geuder's answer.]

#!/bin/bash
#
#

# I'm using a fixed string here, not a regular expression, but you can easily
# use a regular expression by altering the call to grep below.
name="$1"

# Verify usage.
if [[ -z "$name" ]]
then
    echo "Usage: $(basename "$0") <file name>" 1>&2
    exit 100
fi  

# Search all revisions; get unique results.
while IFS= read rev
do
    # Find $name in $rev's tree and only use its path.
    grep -F -- "$name" \
        <(git ls-tree --full-tree -r "$rev" | awk '{ print $4 }')
done < \
    <(git rev-list --all) \
    | sort -u

Again, +1 to @uwe-geuder for a great answer.

If you're interested in the BASH itself:

Unless you're guaranteed of the word-splitting in a for loop (as when using an array like this: for item in "${array[@]}"), I highly recommend using while IFS= read var ; do ... ; done < <(command) when the command output you're looping over is separated by newlines (or read -d'' when output is separated by the null string $'\0'). While git rev-list --all is guaranteed to use 40-byte hexadecimal strings (without spaces), I never like to take chances. I can now easily change the command from git rev-list --all to any command that produces lines

I also recommend using built-in BASH mechanisms to inject input and filter output instead of temporary files.

Solution 6 - Git

The script by Uwe Geuder (@uwe-geuder) is great but there really is no need to dump each of the ls-tree outputs in its own directory, unfiltered.

Much faster and using less storage: Run the grep on the output and then store it, as shown in this gist

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionNewbie GitView Question on Stackoverflow
Solution 1 - GitBrian CampbellView Answer on Stackoverflow
Solution 2 - GitUwe GeuderView Answer on Stackoverflow
Solution 3 - GitGreg HewgillView Answer on Stackoverflow
Solution 4 - GitBullView Answer on Stackoverflow
Solution 5 - GitDean HallView Answer on Stackoverflow
Solution 6 - GitdirkjotView Answer on Stackoverflow