How can I select random files from a directory in bash?

BashRandom

Bash Problem Overview


I have a directory with about 2000 files. How can I select a random sample of N files through using either a bash script or a list of piped commands?

Bash Solutions


Solution 1 - Bash

Here's a script that uses GNU sort's random option:

ls |sort -R |tail -$N |while read file; do
    # Something involving $file, or you can leave
    # off the while to just get the filenames
done

Solution 2 - Bash

You can use shuf (from the GNU coreutils package) for that. Just feed it a list of file names and ask it to return the first line from a random permutation:

ls dirname | shuf -n 1
# probably faster and more flexible:
find dirname -type f | shuf -n 1
# etc..

Adjust the -n, --head-count=COUNT value to return the number of wanted lines. For example to return 5 random filenames you would use:

find dirname -type f | shuf -n 5

Solution 3 - Bash

Here are a few possibilities that don't parse the output of ls and that are 100% safe regarding files with spaces and funny symbols in their name. All of them will populate an array randf with a list of random files. This array is easily printed with printf '%s\n' "${randf[@]}" if needed.

  • This one will possibly output the same file several times, and N needs to be known in advance. Here I chose N=42.

      a=( * )
      randf=( "${a[RANDOM%${#a[@]}]"{1..42}"}" )
    

This feature is not very well documented.

  • If N is not known in advance, but you really liked the previous possibility, you can use eval. But it's evil, and you must really make sure that N doesn't come directly from user input without being thoroughly checked!

      N=42
      a=( * )
      eval randf=( \"\${a[RANDOM%\${#a[@]}]\"\{1..$N\}\"}\" )
    

I personally dislike eval and hence this answer!

  • The same using a more straightforward method (a loop):

      N=42
      a=( * )
      randf=()
      for((i=0;i<N;++i)); do
          randf+=( "${a[RANDOM%${#a[@]}]}" )
      done
    
  • If you don't want to possibly have several times the same file:

      N=42
      a=( * )
      randf=()
      for((i=0;i<N && ${#a[@]};++i)); do
          ((j=RANDOM%${#a[@]}))
          randf+=( "${a[j]}" )
          a=( "${a[@]:0:j}" "${a[@]:j+1}" )
      done
    

Note. This is a late answer to an old post, but the accepted answer links to an external page that shows terrible [tag:bash] practice, and the other answer is not much better as it also parses the output of ls. A comment to the accepted answer points to an excellent answer by Lhunath which obviously shows good practice, but doesn't exactly answer the OP.

Solution 4 - Bash

ls | shuf -n 10 # ten random files

Solution 5 - Bash

A simple solution for selecting 5 random files while avoiding to parse ls. It also works with files containing spaces, newlines and other special characters:

shuf -ezn 5 * | xargs -0 -n1 echo

Replace echo with the command you want to execute for your files.

Solution 6 - Bash

This is an even later response to @gniourf_gniourf's late answer, which I just upvoted because it's by far the best answer, twice over. (Once for avoiding eval and once for safe filename handling.)

But it took me a few minutes to untangle the "not very well documented" feature(s) this answer uses. If your Bash skills are solid enough that you saw immediately how it works, then skip this comment. But I didn't, and having untangled it I think it's worth explaining.

Feature #1 is the shell's own file globbing. a=(*) creates an array, $a, whose members are the files in the current directory. Bash understands all the weirdnesses of filenames, so that list is guaranteed correct, guaranteed escaped, etc. No need to worry about properly parsing textual file names returned by ls.

Feature #2 is Bash parameter expansions for arrays, one nested within another. This starts with ${#ARRAY[@]}, which expands to the length of $ARRAY.

That expansion is then used to subscript the array. The standard way to find a random number between 1 and N is to take the value of random number modulo N. We want a random number between 0 and the length of our array. Here's the approach, broken into two lines for clarity's sake:

LENGTH=${#ARRAY[@]}
RANDOM=${a[RANDOM%$LENGTH]}

But this solution does it in a single line, removing the unnecessary variable assignment.

Feature #3 is Bash brace expansion, although I have to confess I don't entirely understand it. Brace expansion is used, for instance, to generate a list of 25 files named filename1.txt, filename2.txt, etc: echo "filename"{1..25}".txt".

The expression inside the subshell above, "${a[RANDOM%${#a[@]}]"{1..42}"}", uses that trick to produce 42 separate expansions. The brace expansion places a single digit in between the ] and the }, which at first I thought was subscripting the array, but if so it would be preceded by a colon. (It would also have returned 42 consecutive items from a random spot in the array, which is not at all the same thing as returning 42 random items from the array.) I think it's just making the shell run the expansion 42 times, thereby returning 42 random items from the array. (But if someone can explain it more fully, I'd love to hear it.)

The reason N has to be hardcoded (to 42) is that brace expansion happens before variable expansion.

Finally, here's Feature #4, if you want to do this recursively for a directory hierarchy:

shopt -s globstar
a=( ** )

This turns on a shell option that causes ** to match recursively. Now your $a array contains every file in the entire hierarchy.

Solution 7 - Bash

If you have Python installed (works with either Python 2 or Python 3):

To select one file (or line from an arbitrary command), use

ls -1 | python -c "import sys; import random; print(random.choice(sys.stdin.readlines()).rstrip())"

To select N files/lines, use (note N is at the end of the command, replace this by a number)

ls -1 | python -c "import sys; import random; print(''.join(random.sample(sys.stdin.readlines(), int(sys.argv[1]))).rstrip())" N

Solution 8 - Bash

If you have more files in your folder, you can use the below piped command I found in unix stackexchange.

find /some/dir/ -type f -print0 | xargs -0 shuf -e -n 8 -z | xargs -0 cp -vt /target/dir/

Here I wanted to copy the files, but if you want to move files or do something else, just change the last command where I have used cp.

Solution 9 - Bash

This is the only script I can get to play nice with bash on MacOS. I combined and edited snippets from the following two links:

https://stackoverflow.com/questions/1767384/ls-command-how-can-i-get-a-recursive-full-path-listing-one-line-per-file

http://www.linuxquestions.org/questions/linux-general-1/is-there-a-bash-command-for-picking-a-random-file-678687/

#!/bin/bash

# Reads a given directory and picks a random file.

# The directory you want to use. You could use "$1" instead if you
# wanted to parametrize it.
DIR="/path/to/"
# DIR="$1"

# Internal Field Separator set to newline, so file names with
# spaces do not break our script.
IFS='
'
 
if [[ -d "${DIR}" ]]
then
  # Runs ls on the given dir, and dumps the output into a matrix,
  # it uses the new lines character as a field delimiter, as explained above.
  #  file_matrix=($(ls -LR "${DIR}"))

  file_matrix=($(ls -R $DIR | awk '; /:$/&&f{s=$0;f=0}; /:$/&&!f{sub(/:$/,"");s=$0;f=1;next}; NF&&f{ print s"/"$0 }'))
  num_files=${#file_matrix[*]}

  # This is the command you want to run on a random file.
  # Change "ls -l" by anything you want, it's just an example.
  ls -l "${file_matrix[$((RANDOM%num_files))]}"
fi

exit 0

Solution 10 - Bash

MacOS does not have the sort -R and shuf commands, so I needed a bash only solution that randomizes all files without duplicates and did not find that here. This solution is similar to gniourf_gniourf's solution #4, but hopefully adds better comments.

The script should be easy to modify to stop after N samples using a counter with if, or gniourf_gniourf's for loop with N. $RANDOM is limited to ~32000 files, but that should do for most cases.

#!/bin/bash

array=(*)  # this is the array of files to shuffle
# echo ${array[@]}
for dummy in "${array[@]}"; do  # do loop length(array) times; once for each file
    length=${#array[@]}
    randomi=$(( $RANDOM % $length ))  # select a random index
    
    filename=${array[$randomi]}
    echo "Processing: '$filename'"  # do something with the file
    
    unset -v "array[$randomi]"  # set the element at index $randomi to NULL
    array=("${array[@]}")  # remove NULL elements introduced by unset; copy array
done

Solution 11 - Bash

If you want to copy a sample of those files to another folder:

ls | shuf -n 100 | xargs -I % cp % ../samples/

make samples directory first obviously.

Solution 12 - Bash

I use this: it uses temporary file but goes deeply in a directory until it find a regular file and return it.

# find for a quasi-random file in a directory tree:

# directory to start search from:
ROOT="/";  

tmp=/tmp/mytempfile    
TARGET="$ROOT"
FILE=""; 
n=
r=
while [ -e "$TARGET" ]; do 
    TARGET="$(readlink -f "${TARGET}/$FILE")" ; 
    if [ -d "$TARGET" ]; then
	  ls -1 "$TARGET" 2> /dev/null > $tmp || break;
	  n=$(cat $tmp | wc -l); 
	  if [ $n != 0 ]; then
	    FILE=$(shuf -n 1 $tmp)
# or if you dont have/want to use shuf:
#	    r=$(($RANDOM % $n)) ; 
#	    FILE=$(tail -n +$(( $r + 1 ))  $tmp | head -n 1); 
	  fi ; 
    else
	  if [ -f "$TARGET"  ] ; then
	    rm -f $tmp
	    echo $TARGET
	    break;
	  else 
		# is not a regular file, restart:
	    TARGET="$ROOT"
	    FILE=""
	  fi
    fi
done;

Solution 13 - Bash

How about a Perl solution slightly doctored from Mr. Kang over here:
https://stackoverflow.com/questions/2153882/how-can-i-shuffle-the-lines-of-a-text-file-on-the-unix-command-line-or-in-a-shel

> $ ls | perl -MList::Util=shuffle -e '@lines = shuffle(<>); print > @lines[0..4]'

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMarlo GuthrieView Question on Stackoverflow
Solution 1 - BashJosh LeeView Answer on Stackoverflow
Solution 2 - BashNordic MainframeView Answer on Stackoverflow
Solution 3 - Bashgniourf_gniourfView Answer on Stackoverflow
Solution 4 - BashsilgonView Answer on Stackoverflow
Solution 5 - BashscaiView Answer on Stackoverflow
Solution 6 - BashKenView Answer on Stackoverflow
Solution 7 - BashMarkView Answer on Stackoverflow
Solution 8 - Bash25b3nkView Answer on Stackoverflow
Solution 9 - BashbenmarblesView Answer on Stackoverflow
Solution 10 - BashcatView Answer on Stackoverflow
Solution 11 - BashNaWeeDView Answer on Stackoverflow
Solution 12 - BashbzimageView Answer on Stackoverflow
Solution 13 - BashAAAfarmclubView Answer on Stackoverflow