How can I get `find` to ignore .svn directories?

LinuxFindBashGrepSvn

Linux Problem Overview


I often use the find command to search through source code, delete files, whatever. Annoyingly, because Subversion stores duplicates of each file in its .svn/text-base/ directories my simple searches end up getting lots of duplicate results. For example, I want to recursively search for uint in multiple messages.h and messages.cpp files:

# find -name 'messages.*' -exec grep -Iw uint {} +
./messages.cpp:            Log::verbose << "Discarding out of date message: id " << uint(olderMessage.id)
./messages.cpp:    Log::verbose << "Added to send queue: " << *message << ": id " << uint(preparedMessage->id)
./messages.cpp:                Log::error << "Received message with invalid SHA-1 hash: id " << uint(incomingMessage.id)
./messages.cpp:            Log::verbose << "Received " << *message << ": id " << uint(incomingMessage.id)
./messages.cpp:            Log::verbose << "Sent message: id " << uint(preparedMessage->id)
./messages.cpp:        Log::verbose << "Discarding unsent message: id " << uint(preparedMessage->id)
./messages.cpp:        for (uint i = 0; i < 10 && !_stopThreads; ++i) {
./.svn/text-base/messages.cpp.svn-base:            Log::verbose << "Discarding out of date message: id " << uint(olderMessage.id)
./.svn/text-base/messages.cpp.svn-base:    Log::verbose << "Added to send queue: " << *message << ": id " << uint(preparedMessage->id)
./.svn/text-base/messages.cpp.svn-base:                Log::error << "Received message with invalid SHA-1 hash: id " << uint(incomingMessage.id)
./.svn/text-base/messages.cpp.svn-base:            Log::verbose << "Received " << *message << ": id " << uint(incomingMessage.id)
./.svn/text-base/messages.cpp.svn-base:            Log::verbose << "Sent message: id " << uint(preparedMessage->id)
./.svn/text-base/messages.cpp.svn-base:        Log::verbose << "Discarding unsent message: id " << uint(preparedMessage->id)
./.svn/text-base/messages.cpp.svn-base:        for (uint i = 0; i < 10 && !_stopThreads; ++i) {
./virus/messages.cpp:void VsMessageProcessor::_progress(const string &fileName, uint scanCount)
./virus/messages.cpp:ProgressMessage::ProgressMessage(const string &fileName, uint scanCount)
./virus/messages.h:    void _progress(const std::string &fileName, uint scanCount);
./virus/messages.h:    ProgressMessage(const std::string &fileName, uint scanCount);
./virus/messages.h:    uint        _scanCount;
./virus/.svn/text-base/messages.cpp.svn-base:void VsMessageProcessor::_progress(const string &fileName, uint scanCount)
./virus/.svn/text-base/messages.cpp.svn-base:ProgressMessage::ProgressMessage(const string &fileName, uint scanCount)
./virus/.svn/text-base/messages.h.svn-base:    void _progress(const std::string &fileName, uint scanCount);
./virus/.svn/text-base/messages.h.svn-base:    ProgressMessage(const std::string &fileName, uint scanCount);
./virus/.svn/text-base/messages.h.svn-base:    uint        _scanCount;

How can I tell find to ignore the .svn directories?


Update: If you upgrade your SVN client to version 1.7 this is no longer an issue.

> A key feature of the changes introduced in Subversion 1.7 is the centralization of working copy metadata storage into a single location. Instead of a .svn directory in every directory in the working copy, Subversion 1.7 working copies have just one .svn directory—in the root of the working copy. This directory includes (among other things) an SQLite-backed database which contains all of the metadata Subversion needs for that working copy.

Linux Solutions


Solution 1 - Linux

why not just

find . -not -iwholename '*.svn*'

The -not predicate negates everything that has .svn anywhere in the path.

So in your case it would be

find -not -iwholename '*.svn*' -name 'messages.*' -exec grep -Iw uint {} + \;

Solution 2 - Linux

As follows:

find . -path '*/.svn*' -prune -o -print

Or, alternatively based on a directory and not a path prefix:

find . -name .svn -a -type d -prune -o -print

Solution 3 - Linux

For searching, can I suggest you look at ack ? It's a source-code aware find, and as such will automatically ignore many file types, including source code repository info such as the above.

Solution 4 - Linux

To ignore .svn, .git and other hidden directories (starting with a dot), try:

find . -type f -not -path '*/\.*'

However, if the purpose of using find is searching within the files, you may try to use these commands:

  • git grep - specially designed command for searching patterns within the Git repository.
  • ripgrep - which by default ignores hidden files and files specified in .gitignore.

Related: <https://stackoverflow.com/q/16956810/55075>

Solution 5 - Linux

Here is what I would do in your case:

find . -path .svn -prune -o -name messages.* -exec grep -Iw uint {} +

Emacs' rgrep built-in command ignores .svn directory, and many more files you're probably not interested in when performing a find | grep. Here is what it uses by default:

find . \( -path \*/SCCS -o -path \*/RCS -o -path \*/CVS -o -path \*/MCVS \
          -o -path \*/.svn -o -path \*/.git -o -path \*/.hg -o -path \*/.bzr \
          -o -path \*/_MTN -o -path \*/_darcs -o -path \*/\{arch\} \) \
     -prune -o \
       \( -name .\#\* -o -name \*.o -o -name \*\~ -o -name \*.bin -o -name \*.lbin \
          -o -name \*.so -o -name \*.a -o -name \*.ln -o -name \*.blg \
          -o -name \*.bbl -o -name \*.elc -o -name \*.lof -o -name \*.glo \
          -o -name \*.idx -o -name \*.lot -o -name \*.fmt -o -name \*.tfm \
          -o -name \*.class -o -name \*.fas -o -name \*.lib -o -name \*.mem \
          -o -name \*.x86f -o -name \*.sparcf -o -name \*.fasl -o -name \*.ufsl \
          -o -name \*.fsl -o -name \*.dxl -o -name \*.pfsl -o -name \*.dfsl \
          -o -name \*.p64fsl -o -name \*.d64fsl -o -name \*.dx64fsl -o -name \*.lo \
          -o -name \*.la -o -name \*.gmo -o -name \*.mo -o -name \*.toc \
          -o -name \*.aux -o -name \*.cp -o -name \*.fn -o -name \*.ky \
          -o -name \*.pg -o -name \*.tp -o -name \*.vr -o -name \*.cps \
          -o -name \*.fns -o -name \*.kys -o -name \*.pgs -o -name \*.tps \
          -o -name \*.vrs -o -name \*.pyc -o -name \*.pyo \) \
     -prune -o \
     -type f \( -name pattern \) -print0 \
     | xargs -0 -e grep -i -nH -e regex

It ignores directories created by most version control systems, as well as generated files for many programming languages. You could create an alias that invokes this command and replace find and grep patterns for your specific problems.

Solution 6 - Linux

GNU find

find .  ! -regex ".*[/]\.svn[/]?.*"

Solution 7 - Linux

I use grep for this purpose. Put this in your ~/.bashrc

export GREP_OPTIONS="--binary-files=without-match --color=auto --devices=skip --exclude-dir=CVS --exclude-dir=.libs --exclude-dir=.deps --exclude-dir=.svn"

grep automatically uses these options on invocation

Solution 8 - Linux

Create a script called ~/bin/svnfind:

#!/bin/bash
#
# Attempts to behave identically to a plain `find' command while ignoring .svn/
# directories.

OPTIONS=()
PATHS=()
EXPR=()

while [[ $1 =~ ^-[HLP]+ ]]; do
    OPTIONS+=("$1")
    shift
done

while [[ $# -gt 0 ]] && ! [[ $1 =~ '^[-(),!]' ]]; do
    PATHS+=("$1")
    shift
done

# If user's expression contains no action then we'll add the normally-implied
# `-print'.
ACTION=-print

while [[ $# -gt 0 ]]; do
    case "$1" in
       -delete|-exec|-execdir|-fls|-fprint|-fprint0|-fprintf|-ok|-print|-okdir|-print0|-printf|-prune|-quit|-ls)
            ACTION=;;
    esac

    EXPR+=("$1")
    shift
done

if [[ ${#EXPR} -eq 0 ]]; then
    EXPR=(-true)
fi

exec -a "$(basename "$0")" find "${OPTIONS[@]}" "${PATHS[@]}" -name .svn -type d -prune -o '(' "${EXPR[@]}" ')' $ACTION

This script behaves identically to a plain find command but it prunes out .svn directories. Otherwise the behavior is identical.

Example:

# svnfind -name 'messages.*' -exec grep -Iw uint {} +
./messages.cpp:            Log::verbose << "Discarding out of date message: id " << uint(olderMessage.id)
./messages.cpp:    Log::verbose << "Added to send queue: " << *message << ": id " << uint(preparedMessage->id)
./messages.cpp:                Log::error << "Received message with invalid SHA-1 hash: id " << uint(incomingMessage.id)
./messages.cpp:            Log::verbose << "Received " << *message << ": id " << uint(incomingMessage.id)
./messages.cpp:            Log::verbose << "Sent message: id " << uint(preparedMessage->id)
./messages.cpp:        Log::verbose << "Discarding unsent message: id " << uint(preparedMessage->id)
./messages.cpp:        for (uint i = 0; i < 10 && !_stopThreads; ++i) {
./virus/messages.cpp:void VsMessageProcessor::_progress(const string &fileName, uint scanCount)
./virus/messages.cpp:ProgressMessage::ProgressMessage(const string &fileName, uint scanCount)
./virus/messages.h:    void _progress(const std::string &fileName, uint scanCount);
./virus/messages.h:    ProgressMessage(const std::string &fileName, uint scanCount);
./virus/messages.h:    uint        _scanCount;

Solution 9 - Linux

find . | grep -v \.svn

Solution 10 - Linux

Why dont you pipe your command with grep which is easily understandable:

your find command| grep -v '\.svn'

Solution 11 - Linux

Just thought I'd add a simple alternative to Kaleb's and others' posts (which detailed the use of the find -prune option, ack, repofind commands etc.) which is particularly applicable to the usage you have described in the question (and any other similar usages):

  1. For performance, you should always try to use find ... -exec grep ... + (thanks Kenji for pointing this out) or find ... | xargs egrep ... (portable) or find ... -print0 | xargs -0 egrep ... (GNU; works on filenames containing spaces) instead of find ... -exec grep ... \;.

    The find ... -exec ... + and find | xargs form does not fork egrep for each file, but rather for a bunch of files at a time, resulting in much faster execution.

  2. When using the find | xargs form you can also use grep to easily and quickly prune .svn (or any directories or regular expression), i.e. find ... -print0 | grep -v '/\.svn' | xargs -0 egrep ... (useful when you need something quick and can't be bothered to remember how to set up find's -prune logic.)

    The find | grep | xargs approach is similar to GNU find's -regex option (see ghostdog74's post), but is more portable (will also work on platforms where GNU find is not available.)

Solution 12 - Linux

In a source code repository, I generally want to do things only to the text files.

The first line is all files, excluding CVS, SVN, and GIT repository files.

The second line excludes all binary files.

find . -not \( -name .svn -prune -o -name .git -prune -o -name CVS -prune \) -type f -print0 | \
xargs -0 file -n | grep -v binary | cut -d ":" -f1

Solution 13 - Linux

I use find with the -not -path options. I have not had good luck with prune.

find .  -name "*.groovy" -not -path "./target/*" -print

will find the groovy files not in the target directory path.

Solution 14 - Linux

To resolve this problem, you can simply use this find condition:

find \( -name 'messages.*' ! -path "*/.svn/*" \) -exec grep -Iw uint {} +

You can add more restriction like this:

find \( -name 'messages.*' ! -path "*/.svn/*" ! -path "*/CVS/*" \) -exec grep -Iw uint {} +

You can find more information about this in man page section "Operators": http://unixhelp.ed.ac.uk/CGI/man-cgi?find

Solution 15 - Linux

Note that if you do

find . -type f -name 'messages.*'

then -print is implied when the whole expression (-type f -name 'messages.*') is true, because there is no 'action' (like -exec).

While, to stop descending into certain directories, you should use anything that matches those directories and follow it by -prune (which is intended to stop descending into directories); like so:

find . -type d -name '.svn' -prune

This evaluates to True for the .svn directories, and we can use boolean short-circuit by following this by -o (OR), after which what follows after the -o is only checked when the first part is False, hence is not a .svn directory. In other words, the following:

find . -type d -name '.svn' -prune -o -name 'message.*' -exec grep -Iw uint {}

will only evalute what is right of the -o, namely -name 'message.*' -exec grep -Iw uint {}, for files NOT inside .svn directories.

Note that because .svn is likely always a directory (and not for example a file), and in this case certainly isn't matching the name 'message.*', you might as well leave out the -type d and do:

find . -name '.svn' -prune -o -name 'message.*' -exec grep -Iw uint {}

Finally, note that if you omit any action (-exec is an action), say like so:

find . -name '.svn' -prune -o -name 'message.*'

then the -print action is implied but will apply to the WHOLE expression, including the -name '.svn' -prune -o part and thus print all .svn directories as well as the 'message.*' files, which is probably not what you want. Therefore you always should use an 'action' in the right-hand side of the boolean expression when using -prune in this way. And when that action is printing you have to explicitly add it, like so:

find . -name '.svn' -prune -o -name 'message.*' -print

Solution 16 - Linux

Try [findrepo][1] which is a simple wrapper around find/grep and much faster than ack You would use it in this case like:

findrepo uint 'messages.*'

[1]: http://www.pixelbeat.org/scripts/findrepo "findrepo"

Solution 17 - Linux

wcfind is a find wrapper script that I use to automagically remove .svn directories.

Solution 18 - Linux

This works for me in the Unix prompt

> gfind . \( -not -wholename '*\.svn*' \) -type f -name 'messages.*' > -exec grep -Iw uint {} +

The command above will list FILES that are not with .svn and do the grep you mentioned.

Solution 19 - Linux

i usually pipe the output through grep one more time removing .svn, in my use it isn't much slower. typical example:

find -name 'messages.*' -exec grep -Iw uint {} + | grep -Ev '.svn|.git|.anythingElseIwannaIgnore'

OR

find . -type f -print0 | xargs -0 egrep messages. | grep -Ev '.svn|.git|.anythingElseIwannaIgnore'

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJohn KugelmanView Question on Stackoverflow
Solution 1 - LinuxwhaleyView Answer on Stackoverflow
Solution 2 - LinuxKaleb PedersonView Answer on Stackoverflow
Solution 3 - LinuxBrian AgnewView Answer on Stackoverflow
Solution 4 - LinuxkenorbView Answer on Stackoverflow
Solution 5 - LinuxAntoineView Answer on Stackoverflow
Solution 6 - Linuxghostdog74View Answer on Stackoverflow
Solution 7 - LinuxRonny BrendelView Answer on Stackoverflow
Solution 8 - LinuxJohn KugelmanView Answer on Stackoverflow
Solution 9 - Linuxme.View Answer on Stackoverflow
Solution 10 - LinuxVijayView Answer on Stackoverflow
Solution 11 - LinuxvladrView Answer on Stackoverflow
Solution 12 - LinuxrickfoosusaView Answer on Stackoverflow
Solution 13 - Linuxscott m gardnerView Answer on Stackoverflow
Solution 14 - LinuxCode-SourceView Answer on Stackoverflow
Solution 15 - LinuxCarlo WoodView Answer on Stackoverflow
Solution 16 - LinuxpixelbeatView Answer on Stackoverflow
Solution 17 - Linuxleedm777View Answer on Stackoverflow
Solution 18 - LinuxFelixView Answer on Stackoverflow
Solution 19 - LinuxgeminiimattView Answer on Stackoverflow