Best way to choose a random file from a directory in a shell script

BashFileShellRandom

Bash Problem Overview


What is the best way to choose a random file from a directory in a shell script?

Here is my solution in Bash but I would be very interested for a more portable (non-GNU) version for use on Unix proper.

dir='some/directory'
file=`/bin/ls -1 "$dir" | sort --random-sort | head -1`
path=`readlink --canonicalize "$dir/$file"` # Converts to full path
echo "The randomly-selected file is: $path"

Anybody have any other ideas?

Edit: lhunath makes a good point about parsing ls. I guess it comes down to whether you want to be portable or not. If you have the GNU findutils and coreutils then you can do:

find "$dir" -maxdepth 1 -mindepth 1 -type f -print0 \
  | sort --zero-terminated --random-sort \
  | sed 's/\d000.*//g/'

Whew, that was fun! Also it matches my question better since I said "random file". Honsetly though, these days it's hard to imagine a Unix system deployed out there having GNU installed but not Perl 5.

Bash Solutions


Solution 1 - Bash

files=(/my/dir/*)
printf "%s\n" "${files[RANDOM % ${#files[@]}]}"

And don't parse ls. Read http://mywiki.wooledge.org/ParsingLs">http://mywiki.wooledge.org/ParsingLs</a>

Edit: Good luck finding a non-bash solution that's reliable. Most will break for certain types of filenames, such as filenames with spaces or newlines or dashes (it's pretty much impossible in pure sh). To do it right without bash, you'd need to fully migrate to awk/perl/python/... without piping that output for further processing or such.

Solution 2 - Bash

Is "shuf" not portable?

shuf -n1 -e /path/to/files/*

or find if files are deeper than one directory:

find /path/to/files/ -type f | shuf -n1

it's part of coreutils but you'll need 6.4 or newer to get it... so RH/CentOS does not include it.

Solution 3 - Bash

Something like:

let x="$RANDOM % ${#file}"
echo "The randomly-selected file is ${path[$x]}"

$RANDOM in bash is a special variable that returns a random number, then I use modulus division to get a valid index, then reference that index in the array.

Solution 4 - Bash

# ******************************************************************
# ******************************************************************
function randomFile {
  tmpFile=$(mktemp)
  
  files=$(find . -type f > $tmpFile)
  total=$(cat "$tmpFile"|wc -l)
  randomNumber=$(($RANDOM%$total))
  
  i=0
  while read line;  do
    if [ "$i" -eq "$randomNumber" ];then
      # Do stuff with file
      amarok $line
      break
    fi
    i=$[$i+1]
  done < $tmpFile
  rm $tmpFile
}

Solution 5 - Bash

This boils down to: How can I create a random number in a Unix script in a portable way?

Because if you have a random number between 1 and N, you can use head -$N | tail to cut somewhere in the middle. Unfortunately, I know no portable way to do this with the shell alone. If you have Python or Perl, you can easily use their random support but AFAIK, there is no standard rand(1) command.

Solution 6 - Bash

I think Awk is a good tool to get a random number. According to the Advanced Bash Guide, Awk is a good random number replacement for $RANDOM.

Here's a version of your script that avoids Bash-isms and GNU tools.

#! /bin/sh

dir='some/directory'
n_files=`/bin/ls -1 "$dir" | wc -l | cut -f1`
rand_num=`awk "BEGIN{srand();print int($n_files * rand()) + 1;}"`
file=`/bin/ls -1 "$dir" | sed -ne "${rand_num}p"`
path=`cd $dir && echo "$PWD/$file"` # Converts to full path.  
echo "The randomly-selected file is: $path"

It inherits the problems other answers have mentioned should files contain newlines.

Solution 7 - Bash

Newlines in file-names can be avoided by doing the following in Bash:

#!/bin/sh

OLDIFS=$IFS
IFS=$(echo -en "\n\b")

DIR="/home/user"

for file in $(ls -1 $DIR)
do
    echo $file
done

IFS=$OLDIFS

Solution 8 - Bash

Here's a shell snippet that relies only on POSIX features and copes with arbitrary file names (but omits dot files from the selection). The random selection uses awk, because that's all you get in POSIX. It's a very poor random number generator, since awk's RNG is seeded with the current time in seconds (so it's easily predictable, and returns the same choice if you call it multiple times per second).

set -- *
n=$(echo $# | awk '{srand(); print int(rand()*$0) + 1}')
eval "file=\$$n"
echo "Processing $file"

If you don't want to ignore dot files, the file name generation code (set -- *) needs to be replaced by something more complicated.

set -- *; [ -e "$1" ] || shift
set .[!.]* "$@"; [ -e "$1" ] || shift
set ..?* "$@"; [ -e "$1" ] || shift
if [ $# -eq 0]; then echo 1>&2 "empty directory"; exit 1; fi

If you have OpenSSL available, you can use it to generate random bytes. If you don't but your system has /dev/urandom, replace the call to openssl by dd if=/dev/urandom bs=3 count=1 2>/dev/null. Here's a snippet that sets n to a random value between 1 and $#, taking care not to introduce a bias. This snippet assumes that $# is at most 2^23-1.

while
  n=$(($(openssl rand 3 | od -An -t u4) + 1))
  [ $n -gt $((16777216 / $# * $#)) ]
do :; done
n=$((n % $#))

Solution 9 - Bash

BusyBox (used on embedded devices) is usually configured to support $RANDOM but it doesn't have bash-style arrays or sort --random-sort or shuf. Hence the following:

#!/bin/sh
FILES="/usr/bin/*"
for f in $FILES; do  echo "$RANDOM $f" ; done | sort -n | head -n1 | cut -d' ' -f2-

Note trailing "-" in cut -f2-; this is required to avoid truncating files that contain spaces (or whatever separator you want to use).

It won't handle filenames with embedded newlines correctly.

Solution 10 - Bash

Put each line of output from the command 'ls' into an associative array named line and then choose one of those like so...

ls | awk '{ line[NR]=$0 } END { print line[(int(rand()*NR+1))]}'

Solution 11 - Bash

My 2 cents, with a version that should not break when filenames with special chars exist:

#!/bin/bash --
dir='some/directory'

let number_of_files=$(find "${dir}" -type f -print0 | grep -zc .)
let rand_index=$((1+(RANDOM % number_of_files)))

printf "the randomly-selected file is: "
find "${dir}" -type f -print0 | head -z -n "${rand_index}" | tail -z -n 1
printf "\n"

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJasonSmithView Question on Stackoverflow
Solution 1 - BashlhunathView Answer on Stackoverflow
Solution 2 - BashjohnnyBView Answer on Stackoverflow
Solution 3 - BashfidoView Answer on Stackoverflow
Solution 4 - BashPipoView Answer on Stackoverflow
Solution 5 - BashAaron DigullaView Answer on Stackoverflow
Solution 6 - BashashawleyView Answer on Stackoverflow
Solution 7 - BashgsbabilView Answer on Stackoverflow
Solution 8 - BashGilles 'SO- stop being evil'View Answer on Stackoverflow
Solution 9 - BashRobert CalhounView Answer on Stackoverflow
Solution 10 - BashegmView Answer on Stackoverflow
Solution 11 - BashJay jargotView Answer on Stackoverflow