How can I randomize the lines in a file using standard tools on Red Hat Linux?

LinuxFileRandomRedhatShuffle

Linux Problem Overview


How can I randomize the lines in a file using standard tools on Red Hat Linux?

I don't have the shuf command, so I am looking for something like a perl or awk one-liner that accomplishes the same task.

Linux Solutions


Solution 1 - Linux

Um, lets not forget

sort --random-sort

Solution 2 - Linux

shuf is the best way.

sort -R is painfully slow. I just tried to sort 5GB file. I gave up after 2.5 hours. Then shuf sorted it in a minute.

Solution 3 - Linux

And a Perl one-liner you get!

perl -MList::Util -e 'print List::Util::shuffle <>'

It uses a module, but the module is part of the Perl code distribution. If that's not good enough, you may consider rolling your own.

I tried using this with the -i flag ("edit-in-place") to have it edit the file. The documentation suggests it should work, but it doesn't. It still displays the shuffled file to stdout, but this time it deletes the original. I suggest you don't use it.

Consider a shell script:

#!/bin/sh

if [[ $# -eq 0 ]]
then
  echo "Usage: $0 [file ...]"
  exit 1
fi

for i in "$@"
do
  perl -MList::Util -e 'print List::Util::shuffle <>' $i > $i.new
  if [[ `wc -c $i` -eq `wc -c $i.new` ]]
  then
    mv $i.new $i
  else
    echo "Error for file $i!"
  fi
done

Untested, but hopefully works.

Solution 4 - Linux

cat yourfile.txt | while IFS= read -r f; do printf "%05d %s\n" "$RANDOM" "$f"; done | sort -n | cut -c7-

Read the file, prepend every line with a random number, sort the file on those random prefixes, cut the prefixes afterwards. One-liner which should work in any semi-modern shell.

EDIT: incorporated Richard Hansen's remarks.

Solution 5 - Linux

A one-liner for python:

python -c "import random, sys; lines = open(sys.argv[1]).readlines(); random.shuffle(lines); print ''.join(lines)," myFile

And for printing just a single random line:

python -c "import random, sys; print random.choice(open(sys.argv[1]).readlines())," myFile

But see this post for the drawbacks of python's random.shuffle(). It won't work well with many (more than 2080) elements.

Solution 6 - Linux

Related to Jim's answer:

My ~/.bashrc contains the following:

unsort ()
{
    LC_ALL=C sort -R "$@"
}

With GNU coreutils's sort, -R = --random-sort, which generates a random hash of each line and sorts by it. The randomized hash wouldn't actually be used in some locales in some older (buggy) versions, causing it to return normal sorted output, which is why I set LC_ALL=C.


Related to Chris's answer:

perl -MList::Util=shuffle -e'print shuffle<>'

is a slightly shorter one-liner. (-Mmodule=a,b,c is shorthand for -e 'use module qw(a b c);'.)

The reason giving it a simple -i doesn't work for shuffling in-place is because Perl expects that the print happens in the same loop the file is being read, and print shuffle <> doesn't output until after all input files have been read and closed.

As a shorter workaround,

perl -MList::Util=shuffle -i -ne'BEGIN{undef$/}print shuffle split/^/m'

will shuffle files in-place. (-n means "wrap the code in a while (<>) {...} loop; BEGIN{undef$/} makes Perl operate on files-at-a-time instead of lines-at-a-time, and split/^/m is needed because $_=<> has been implicitly done with an entire file instead of lines.)

Solution 7 - Linux

When I install coreutils with homebrew

brew install coreutils

shuf becomes available as n.

Solution 8 - Linux

Mac OS X with DarwinPorts:

sudo port install unsort
cat $file | unsort | ...

Solution 9 - Linux

FreeBSD has its own random utility:

cat $file | random | ...

It's in /usr/games/random, so if you have not installed games, you are out of luck.

You could consider installing ports like textproc/rand or textproc/msort. These might well be available on Linux and/or Mac OS X, if portability is a concern.

Solution 10 - Linux

On OSX, grabbing latest from http://ftp.gnu.org/gnu/coreutils/ and something like

./configure make sudo make install

...should give you /usr/local/bin/sort --random-sort

without messing up /usr/bin/sort

Solution 11 - Linux

Or get it from MacPorts:

$ sudo port install coreutils

and/or

$ /opt/local//libexec/gnubin/sort --random-sort

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionStuart WoodwardView Question on Stackoverflow
Solution 1 - LinuxJim TView Answer on Stackoverflow
Solution 2 - LinuxMichal IllichView Answer on Stackoverflow
Solution 3 - LinuxChris LutzView Answer on Stackoverflow
Solution 4 - LinuxChristopheDView Answer on Stackoverflow
Solution 5 - LinuxscaiView Answer on Stackoverflow
Solution 6 - LinuxephemientView Answer on Stackoverflow
Solution 7 - LinuxJohn McDonnellView Answer on Stackoverflow
Solution 8 - LinuxCoroosView Answer on Stackoverflow
Solution 9 - LinuxCoroosView Answer on Stackoverflow
Solution 10 - LinuxDan BrickleyView Answer on Stackoverflow
Solution 11 - LinuxChadwick BoggsView Answer on Stackoverflow