Iterate over a list of files with spaces

LinuxBashShell

Linux Problem Overview


I want to iterate over a list of files. This list is the result of a find command, so I came up with:

getlist() {
  for f in $(find . -iname "foo*")
  do
    echo "File found: $f"
    # do something useful
  done
}

It's fine except if a file has spaces in its name:

$ ls
foo_bar_baz.txt
foo bar baz.txt

$ getlist
File found: foo_bar_baz.txt
File found: foo
File found: bar
File found: baz.txt

What can I do to avoid the split on spaces?

Linux Solutions


Solution 1 - Linux

You could replace the word-based iteration with a line-based one:

find . -iname "foo*" | while read f
do
    # ... loop body
done

Solution 2 - Linux

There are several workable ways to accomplish this.

If you wanted to stick closely to your original version it could be done this way:

getlist() {
        IFS=$'\n'
        for file in $(find . -iname 'foo*') ; do
                printf 'File found: %s\n' "$file"
        done
}

This will still fail if file names have literal newlines in them, but spaces will not break it.

However, messing with IFS isn't necessary. Here's my preferred way to do this:

getlist() {
    while IFS= read -d $'\0' -r file ; do
            printf 'File found: %s\n' "$file"
    done < <(find . -iname 'foo*' -print0)
}

If you find the < <(command) syntax unfamiliar you should read about process substitution. The advantage of this over for file in $(find ...) is that files with spaces, newlines and other characters are correctly handled. This works because find with -print0 will use a null (aka \0) as the terminator for each file name and, unlike newline, null is not a legal character in a file name.

The advantage to this over the nearly-equivalent version

getlist() {
        find . -iname 'foo*' -print0 | while read -d $'\0' -r file ; do
                printf 'File found: %s\n' "$file"
        done
}

Is that any variable assignment in the body of the while loop is preserved. That is, if you pipe to while as above then the body of the while is in a subshell which may not be what you want.

The advantage of the process substitution version over find ... -print0 | xargs -0 is minimal: The xargs version is fine if all you need is to print a line or perform a single operation on the file, but if you need to perform multiple steps the loop version is easier.

EDIT: Here's a nice test script so you can get an idea of the difference between different attempts at solving this problem

#!/usr/bin/env bash

dir=/tmp/getlist.test/
mkdir -p "$dir"
cd "$dir"

touch       'file not starting foo' foo foobar barfoo 'foo with spaces'\
    'foo with'$'\n'newline 'foo with trailing whitespace      '

# while with process substitution, null terminated, empty IFS
getlist0() {
    while IFS= read -d $'\0' -r file ; do
            printf 'File found: '"'%s'"'\n' "$file"
    done < <(find . -iname 'foo*' -print0)
}

# while with process substitution, null terminated, default IFS
getlist1() {
    while read -d $'\0' -r file ; do
            printf 'File found: '"'%s'"'\n' "$file"
    done < <(find . -iname 'foo*' -print0)
}

# pipe to while, newline terminated
getlist2() {
    find . -iname 'foo*' | while read -r file ; do
            printf 'File found: '"'%s'"'\n' "$file"
    done
}

# pipe to while, null terminated
getlist3() {
    find . -iname 'foo*' -print0 | while read -d $'\0' -r file ; do
            printf 'File found: '"'%s'"'\n' "$file"
    done
}

# for loop over subshell results, newline terminated, default IFS
getlist4() {
    for file in "$(find . -iname 'foo*')" ; do
            printf 'File found: '"'%s'"'\n' "$file"
    done
}

# for loop over subshell results, newline terminated, newline IFS
getlist5() {
    IFS=$'\n'
    for file in $(find . -iname 'foo*') ; do
            printf 'File found: '"'%s'"'\n' "$file"
    done
}


# see how they run
for n in {0..5} ; do
    printf '\n\ngetlist%d:\n' $n
    eval getlist$n
done

rm -rf "$dir"

Solution 3 - Linux

There is also a very simple solution: rely on bash globbing

$ mkdir test
$ cd test
$ touch "stupid file1"
$ touch "stupid file2"
$ touch "stupid   file 3"
$ ls
stupid   file 3  stupid file1     stupid file2
$ for file in *; do echo "file: '${file}'"; done
file: 'stupid   file 3'
file: 'stupid file1'
file: 'stupid file2'

Note that I am not sure this behavior is the default one but I don't see any special setting in my shopt so I would go and say that it should be "safe" (tested on osx and ubuntu).

Solution 4 - Linux

find . -iname "foo*" -print0 | xargs -L1 -0 echo "File found:"

Solution 5 - Linux

find . -name "fo*" -print0 | xargs -0 ls -l

See man xargs.

Solution 6 - Linux

Since you aren't doing any other type of filtering with find, you can use the following as of bash 4.0:

shopt -s globstar
getlist() {
    for f in **/foo*
    do
        echo "File found: $f"
        # do something useful
    done
}

The **/ will match zero or more directories, so the full pattern will match foo* in the current directory or any subdirectories.

Solution 7 - Linux

I really like for loops and array iteration, so I figure I will add this answer to the mix...

I also liked marchelbling's stupid file example. :)

$ mkdir test
$ cd test
$ touch "stupid file1"
$ touch "stupid file2"
$ touch "stupid   file 3"

Inside the test directory:

readarray -t arr <<< "`ls -A1`"

This adds each file listing line into a bash array named arr with any trailing newline removed.

Let's say we want to give these files better names...

for i in ${!arr[@]}
do 
    newname=`echo "${arr[$i]}" | sed 's/stupid/smarter/; s/  */_/g'`; 
    mv "${arr[$i]}" "$newname"
done

${!arr[@]} expands to 0 1 2 so "${arr[$i]}" is the ith element of the array. The quotes around the variables are important to preserve the spaces.

The result is three renamed files:

$ ls -1
smarter_file1
smarter_file2
smarter_file_3

Solution 8 - Linux

find has an -exec argument that loops over the find results and executes an arbitrary command. For example:

find . -iname "foo*" -exec echo "File found: {}" \;

Here {} represents the found files, and wrapping it in "" allows for the resultant shell command to deal with spaces in the file name.

In many cases you can replace that last \; (which starts a new command) with a \+, which will put multiple files in the one command (not necessarily all of them at once though, see man find for more details).

Solution 9 - Linux

In some cases, here if you just need to copy or move a list of files, you could pipe that list to awk as well.
Important the \"" "\" around the field $0 (in short your files, one line-list = one file).

find . -iname "foo*" | awk '{print "mv \""$0"\" ./MyDir2" | "sh" }'

Solution 10 - Linux

Ok - my first post on Stack Overflow!

Though my problems with this have always been in csh not bash the solution I present will, I'm sure, work in both. The issue is with the shell's interpretation of the "ls" returns. We can remove "ls" from the problem by simply using the shell expansion of the * wildcard - but this gives a "no match" error if there are no files in the current (or specified folder) - to get around this we simply extend the expansion to include dot-files thus: * .* - this will always yield results since the files . and .. will always be present. So in csh we can use this construct ...

foreach file (* .*)
   echo $file
end

if you want to filter out the standard dot-files then that is easy enough ...

foreach file (* .*)
   if ("$file" == .) continue
   if ("file" == ..) continue
   echo $file
end

The code in the first post on this thread would be written thus:-

getlist() {
  for f in $(* .*)
  do
    echo "File found: $f"
    # do something useful
  done
}

Hope this helps!

Solution 11 - Linux

Another solution for job...

Goal was :

  • select/filter filenames recursively in directories
  • handle each names (whatever space in path...)
#!/bin/bash  -e
## @Trick in order handle File with space in their path...
OLD_IFS=${IFS}
IFS=$'\n'
files=($(find ${INPUT_DIR} -type f -name "*.md"))
for filename in ${files[*]}
do
      # do your stuff
      #  ....
done
IFS=${OLD_IFS}

Solution 12 - Linux

I recently had to deal with a similar case, and I built a FILES array to iterate over the filenames:

eval FILES=($(find . -iname "foo*" -printf '"%p" '))

The idea here is to surround each filename with double quotes, separate them with spaces and use the result to initialize the FILES array. The use of eval is necessary to evaluate the double quotes in the find output correctly for the array initialization.

To iterate over the files, just do:

for f in "${FILES[@]}"; do
    # Do something with $f
done

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestiongregsethView Question on Stackoverflow
Solution 1 - Linuxmartin claytonView Answer on Stackoverflow
Solution 2 - LinuxsorpigalView Answer on Stackoverflow
Solution 3 - LinuxmarchelblingView Answer on Stackoverflow
Solution 4 - LinuxKaroly HorvathView Answer on Stackoverflow
Solution 5 - LinuxTorpView Answer on Stackoverflow
Solution 6 - LinuxchepnerView Answer on Stackoverflow
Solution 7 - Linuxterafl0psView Answer on Stackoverflow
Solution 8 - Linuxnaught101View Answer on Stackoverflow
Solution 9 - LinuxSteveView Answer on Stackoverflow
Solution 10 - LinuxAndy FosterView Answer on Stackoverflow
Solution 11 - LinuxVince BView Answer on Stackoverflow
Solution 12 - LinuxlemrausView Answer on Stackoverflow