Use find command but exclude files in two directories

LinuxShellUnixFind

Linux Problem Overview


I want to find files that end with _peaks.bed, but exclude files in the tmp and scripts folders.

My command is like this:

 find . -type f \( -name "*_peaks.bed" ! -name "*tmp*" ! -name "*scripts*" \)

But it didn't work. The files in tmp and script folder will still be displayed.

Does anyone have ideas about this?

Linux Solutions


Solution 1 - Linux

Here's how you can specify that with find:

find . -type f -name "*_peaks.bed" ! -path "./tmp/*" ! -path "./scripts/*"

Explanation:

  • find . - Start find from current working directory (recursively by default)
  • -type f - Specify to find that you only want files in the results
  • -name "*_peaks.bed" - Look for files with the name ending in _peaks.bed
  • ! -path "./tmp/*" - Exclude all results whose path starts with ./tmp/
  • ! -path "./scripts/*" - Also exclude all results whose path starts with ./scripts/

Testing the Solution:

$ mkdir a b c d e
$ touch a/1 b/2 c/3 d/4 e/5 e/a e/b
$ find . -type f ! -path "./a/*" ! -path "./b/*"

./d/4
./c/3
./e/a
./e/b
./e/5

You were pretty close, the -name option only considers the basename, where as -path considers the entire path =)

Solution 2 - Linux

Here is one way you could do it...

find . -type f -name "*_peaks.bed" | egrep -v "^(./tmp/|./scripts/)"

Solution 3 - Linux

Use

find \( -path "./tmp" -o -path "./scripts" \) -prune -o  -name "*_peaks.bed" -print

or

find \( -path "./tmp" -o -path "./scripts" \) -prune -false -o  -name "*_peaks.bed"

or

find \( -path "./tmp" -path "./scripts" \) ! -prune -o  -name "*_peaks.bed"

The order is important. It evaluates from left to right. Always begin with the path exclusion.

Explanation

Do not use -not (or !) to exclude whole directory. Use -prune. As explained in the manual:

−prune    The primary shall always evaluate as  true;  it
          shall  cause  find  not  to descend the current
          pathname if it is a directory.  If  the  −depth
          primary  is specified, the −prune primary shall
          have no effect.

and in the GNU find manual:

-path pattern
              [...]
              To ignore  a  whole
              directory  tree,  use  -prune rather than checking
              every file in the tree.

Indeed, if you use -not -path "./pathname", find will evaluate the expression for each node under "./pathname".

find expressions are just condition evaluation.

  • \( \) - groups operation (you can use -path "./tmp" -prune -o -path "./scripts" -prune -o, but it is more verbose).
  • -path "./script" -prune - if -path returns true and is a directory, return true for that directory and do not descend into it.
  • -path "./script" ! -prune - it evaluates as (-path "./script") AND (! -prune). It revert the "always true" of prune to always false. It avoids printing "./script" as a match.
  • -path "./script" -prune -false - since -prune always returns true, you can follow it with -false to do the same than !.
  • -o - OR operator. If no operator is specified between two expressions, it defaults to AND operator.

Hence, \( -path "./tmp" -o -path "./scripts" \) -prune -o -name "*_peaks.bed" -print is expanded to:

[ (-path "./tmp" OR -path "./script") AND -prune ] OR ( -name "*_peaks.bed" AND print )

The print is important here because without it is expanded to:

{ [ (-path "./tmp" OR -path "./script" )  AND -prune ]  OR (-name "*_peaks.bed" ) } AND print

-print is added by find - that is why most of the time, you do not need to add it in you expression. And since -prune returns true, it will print "./script" and "./tmp".

It is not necessary in the others because we switched -prune to always return false.

Hint: You can use find -D opt expr 2>&1 1>/dev/null to see how it is optimized and expanded,
find -D search expr 2>&1 1>/dev/null to see which path is checked.

Solution 4 - Linux

for me, this solution didn't worked on a command exec with find, don't really know why, so my solution is

find . -type f -path "./a/*" -prune -o -path "./b/*" -prune -o -exec gzip -f -v {} \;

Explanation: same as sampson-chen one with the additions of

-prune - ignore the proceding path of ...

-o - Then if no match print the results, (prune the directories and print the remaining results)

18:12 $ mkdir a b c d e
18:13 $ touch a/1 b/2 c/3 d/4 e/5 e/a e/b
18:13 $ find . -type f -path "./a/*" -prune -o -path "./b/*" -prune -o -exec gzip -f -v {} \;

gzip: . is a directory -- ignored
gzip: ./a is a directory -- ignored
gzip: ./b is a directory -- ignored
gzip: ./c is a directory -- ignored
./c/3:	  0.0% -- replaced with ./c/3.gz
gzip: ./d is a directory -- ignored
./d/4:	  0.0% -- replaced with ./d/4.gz
gzip: ./e is a directory -- ignored
./e/5:	  0.0% -- replaced with ./e/5.gz
./e/a:	  0.0% -- replaced with ./e/a.gz
./e/b:	  0.0% -- replaced with ./e/b.gz

Solution 5 - Linux

You can try below:

find ./ ! \( -path ./tmp -prune \) ! \( -path ./scripts -prune \) -type f -name '*_peaks.bed'

Solution 6 - Linux

Try something like

find . \( -type f -name \*_peaks.bed -print \) -or \( -type d -and \( -name tmp -or -name scripts \) -and -prune \)

and don't be too surprised if I got it a bit wrong. If the goal is an exec (instead of print), just substitute it in place.

Solution 7 - Linux

With these explanations you meet your objective and many others. Just join each part as you want to do.

MODEL

find ./\
 -iname "some_arg" -type f\ # File(s) that you want to find at any hierarchical level.
 ! -iname "some_arg" -type f\ # File(s) NOT to be found on any hirearchic level (exclude).
 ! -path "./file_name"\ # File(s) NOT to be found at this hirearchic level (exclude).
 ! -path "./folder_name/*"\ # Folder(s) NOT to be found on this Hirearchic level (exclude).
 -exec grep -IiFl 'text_content' -- {} \; # Text search in the content of the found file(s) being case insensitive ("-i") and excluding binaries ("-I").

EXAMPLE

find ./\
 -iname "*" -type f\
 ! -iname "*pyc" -type f\
 ! -path "./.gitignore"\
 ! -path "./build/*"\
 ! -path "./__pycache__/*"\
 ! -path "./.vscode/*"\
 ! -path "./.git/*"\
 -exec grep -IiFl 'title="Brazil - Country of the Future",' -- {} \;

Thanks! 珞

[Ref(s).: https://unix.stackexchange.com/q/73938/61742 ]


EXTRA:

You can use the commands above together with your favorite editor and analyze the contents of the files found, for example...

vim -p $(find ./\
 -iname "*" -type f\
 ! -iname "*pyc" -type f\
 ! -path "./.gitignore"\
 ! -path "./build/*"\
 ! -path "./__pycache__/*"\
 ! -path "./.vscode/*"\
 ! -path "./.git/*"\
 -exec grep -IiFl 'title="Brazil - Country of the Future",' -- {} \;)

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionHanfei SunView Question on Stackoverflow
Solution 1 - Linuxsampson-chenView Answer on Stackoverflow
Solution 2 - LinuxalexView Answer on Stackoverflow
Solution 3 - Linuxf380cedricView Answer on Stackoverflow
Solution 4 - Linuxal3x2ndruView Answer on Stackoverflow
Solution 5 - LinuxJacky JiangView Answer on Stackoverflow
Solution 6 - LinuxDrCView Answer on Stackoverflow
Solution 7 - LinuxEduardo LucioView Answer on Stackoverflow