Why is rand()%6 biased?

C++ Problem Overview

When reading how to use std::rand, I found this code on cppreference.com

int x = 7;
while(x > 6) 
    x = 1 + std::rand()/((RAND_MAX + 1u)/6);  // Note: 1+rand()%6 is biased

What is wrong with the expression on the right? Tried it and it works perfectly.

C++ Solutions

Solution 1 - C++

There are two issues with rand() % 6 (the 1+ doesn't affect either problem).

First, as several answers have pointed out, if the low bits of rand() aren't appropriately uniform, the result of the remainder operator is also not uniform.

Second, if the number of distinct values produced by rand() is not a multiple of 6, then the remainder will produce more low values than high values. That's true even if rand() returns perfectly distributed values.

As an extreme example, pretend that rand() produces uniformly distributed values in the range [0..6]. If you look at the remainders for those values, when rand() returns a value in the range [0..5], the remainder produces uniformly distributed results in the range [0..5]. When rand() returns 6, rand() % 6 returns 0, just as if rand() had returned 0. So you get a distribution with twice as many 0's as any other value.

The second is the real problem with rand() % 6.

The way to avoid that problem is to discard values that would produce non-uniform duplicates. You calculate the largest multiple of 6 that's less than or equal to RAND_MAX, and whenever rand() returns a value that's greater than or equal to that multiple you reject it and call `rand() again, as many times a needed.

So:

int max = 6 * ((RAND_MAX + 1u) / 6)
int value = rand();
while (value >= max)
    value = rand();

That's a different implementation of the code in question, intended to more clearly show what's going on.

Solution 2 - C++

There are hidden depths here:

The use of the small u in RAND_MAX + 1u. RAND_MAX is defined to be an int type, and is often the largest possible int. The behaviour of RAND_MAX + 1 would be undefined in such instances as you'd be overflowing a signed type. Writing 1u forces type conversion of RAND_MAX to unsigned, so obviating the overflow.
The use of % 6 can (but on every implementation of std::rand I've seen doesn't) introduce any additional statistical bias above and beyond the alternative presented. Such instances where % 6 is hazardous are cases where the number generator has correlation plains in the low order bits, such as a rather famous IBM implementation (in C) of rand in, I think, the 1970s which flipped the high and low bits as "a final flourish". A further consideration is that 6 is very small cf. RAND_MAX, so there will be a minimal effect if RAND_MAX is not a multiple of 6, which it probably isn't.

In conclusion, these days, due to its tractability, I'd use % 6. It's not likely to introduce any statistical anomalies beyond those introduced by the generator itself. If you are still in doubt, test your generator to see if it has the appropriate statistical properties for your use case.

Solution 3 - C++

This example code illustrates that std::rand is a case of legacy cargo cult balderdash that should make your eyebrows raise every time you see it.

There are several issues here:

The contract people usually assume—even the poor hapless souls who don't know any better and won't think of it in precisely these terms—is that rand samples from the uniform distribution on the integers in 0, 1, 2, …, RAND_MAX, and each call yields an independent sample.

The first problem is that the assumed contract, independent uniform random samples in each call, is not actually what the documentation says—and in practice, implementations historically failed to provide even the barest simulacrum of independence. For example, C99 §7.20.2.1 ‘The rand function’ says, without elaboration:

> The rand function computes a sequence of pseudo-random integers in the range 0 to RAND_MAX.

This is a meaningless sentence, because pseudorandomness is a property of a function (or family of functions), not of an integer, but that doesn't stop even ISO bureaucrats from abusing the language. After all, the only readers who would be upset by it know better than to read the documentation for rand for fear of their brain cells decaying.

A typical historical implementation in C works like this:

static unsigned int seed = 1;

static void
srand(unsigned int s)
{
    seed = s;
}

static unsigned int
rand(void)
{
    seed = (seed*1103515245 + 12345) % ((unsigned long)RAND_MAX + 1);
    return (int)seed;
}

This has the unfortunate property that even though a single sample may be uniformly distributed under a uniform random seed (which depends on the specific value of RAND_MAX), it alternates between even and odd integers in consecutive calls—after

int a = rand();
int b = rand();

the expression (a & 1) ^ (b & 1) yields 1 with 100% probability, which is not the case for independent random samples on any distribution supported on even and odd integers. Thus, a cargo cult emerged that one should discard the low-order bits to chase the elusive beast of ‘better randomness’. (Spoiler alert: This is not a technical term. This is a sign that whosever prose you are reading either doesn't know what they're talking about, or thinks you are clueless and must be condescended to.)

The second problem is that even if each call did sample independently from a uniform random distribution on 0, 1, 2, …, RAND_MAX, the outcome of rand() % 6 would not be uniformly distributed in 0, 1, 2, 3, 4, 5 like a die roll, unless RAND_MAX is congruent to -1 modulo 6. Simple counterexample: If RAND_MAX = 6, then from rand(), all outcomes have equal probability 1/7, but from rand() % 6, the outcome 0 has probability 2/7 while all other outcomes have probability 1/7.

The right way to do this is with rejection sampling: repeatedly draw an independent uniform random sample s from 0, 1, 2, …, RAND_MAX, and reject (for example) the outcomes 0, 1, 2, …, ((RAND_MAX + 1) % 6) - 1—if you get one of those, start over; otherwise, yield s % 6.

unsigned int s;
while ((s = rand()) < ((unsigned long)RAND_MAX + 1) % 6)
    continue;
return s % 6;

This way, the set of outcomes from rand() that we accept is evenly divisible by 6, and each possible outcome from s % 6 is obtained by the same number of accepted outcomes from rand(), so if rand() is uniformly distributed then so is s. There is no bound on the number of trials, but the expected number is less than 2, and the probability of success grows exponentially with the number of trials.

The choice of which outcomes of rand() you reject is immaterial, provided that you map an equal number of them to each integer below 6. The code at cppreference.com makes a different choice, because of the first problem above—that nothing is guaranteed about the distribution or independence of outputs of rand(), and in practice the low-order bits exhibited patterns that don't ‘look random enough’ (never mind that the next output is a deterministic function of the previous one).

Exercise for the reader: Prove that the code at cppreference.com yields a uniform distribution on die rolls if rand() yields a uniform distribution on 0, 1, 2, …, RAND_MAX.

Exercise for the reader: Why might you prefer one or the other subsets to reject? What computation is needed for each trial in the two cases?

A third problem is that the seed space is so small that even if the seed is uniformly distributed, an adversary armed with knowledge of your program and one outcome but not the seed can readily predict the seed and subsequent outcomes, which makes them seem not so random after all. So don't even think about using this for cryptography.

You can go the fancy overengineered route and C++11's std::uniform_int_distribution class with an appropriate random device and your favorite random engine like the ever-popular Mersenne twister std::mt19937 to play at dice with your four-year-old cousin, but even that is not going to be fit for generating cryptographic key material—and the Mersenne twister is a terrible space hog too with a multi-kilobyte state wreaking havoc on your CPU's cache with an obscene setup time, so it is bad even for, e.g., parallel Monte Carlo simulations with reproducible trees of subcomputations; its popularity likely arises mainly from its catchy name. But you can use it for toy dice rolling like this example!

Another approach is to use a simple cryptographic pseudorandom number generator with a small state, such as a simple [fast key erasure PRNG][2], or just a stream cipher such as AES-CTR or ChaCha20 if you are confident (e.g., in a Monte Carlo simulation for research in the natural sciences) that there are no adverse consequences to predicting past outcomes if the state is ever compromised.

[1]: https://en.wikipedia.org/wiki/Raphael_Weldon#Weldon%27s_dice "Wikipedia: Raphael Weldon, § Weldon's dice. Retrieved 2018-04-17." [2]: https://blog.cr.yp.to/20170723-random.html "Daniel J. Bernstein, ‘Fast-key-erasure random-number generators’, blog.cr.yp.to, 2017-07-23."

Solution 4 - C++

I'm not an experienced C++ user by any means, but was interested to see if the other answers regarding std::rand()/((RAND_MAX + 1u)/6) being less biased than 1+std::rand()%6 actually holds true. So I wrote a test program to tabulate the results for both methods (I haven't written C++ in ages, please check it). A link for running the code is found [here][1]. It's also reproduced as follows:

// Example program
#include <cstdlib>
#include <iostream>
#include <ctime>
#include <string>

int main()
{
    std::srand(std::time(nullptr)); // use current time as seed for random generator
    
    // Roll the die 6000000 times using the supposedly unbiased method and keep track of the results
    
    int results[6] = {0,0,0,0,0,0};
 
    // roll a 6-sided die 20 times
    for (int n=0; n != 6000000; ++n) {
        int x = 7;
        while(x > 6) 
            x = 1 + std::rand()/((RAND_MAX + 1u)/6);  // Note: 1+rand()%6 is biased
        
        results[x-1]++;
    }
  
    for (int n=0; n !=6; n++) {
        std::cout << results[n] << ' ';
    }
    
    std::cout << "\n";
    
    
    // Roll the die 6000000 times using the supposedly biased method and keep track of the results
    
    int results_bias[6] = {0,0,0,0,0,0};
 
    // roll a 6-sided die 20 times
    for (int n=0; n != 6000000; ++n) {
        int x = 7;
        while(x > 6) 
            x = 1 + std::rand()%6;
        
        results_bias[x-1]++;
    }
  
    for (int n=0; n !=6; n++) {
        std::cout << results_bias[n] << ' ';
    }
}

I then took the output of this and used the chisq.test function in R to run a Chi-square test to see if the results are significantly different than expected. This stackexchange question goes into more detail of using the chi-square test to test die fairness: [How can I test whether a die is fair?][2]. Here are the results for a few runs:

> ?chisq.test
> unbias <- c(100150, 99658, 100319, 99342, 100418, 100113)
> bias <- c(100049, 100040, 100091, 99966, 100188, 99666 )

> chisq.test(unbias)

Chi-squared test for given probabilities

data:  unbias
X-squared = 8.6168, df = 5, p-value = 0.1254

> chisq.test(bias)

Chi-squared test for given probabilities

data:  bias
X-squared = 1.6034, df = 5, p-value = 0.9008

> unbias <- c(998630, 1001188, 998932, 1001048, 1000968, 999234 )
> bias <- c(1000071, 1000910, 999078, 1000080, 998786, 1001075   )
> chisq.test(unbias)

Chi-squared test for given probabilities

data:  unbias
X-squared = 7.051, df = 5, p-value = 0.2169

> chisq.test(bias)

Chi-squared test for given probabilities

data:  bias
X-squared = 4.319, df = 5, p-value = 0.5045

> unbias <- c(998630, 999010, 1000736, 999142, 1000631, 1001851)
> bias <- c(999803, 998651, 1000639, 1000735, 1000064,1000108)
> chisq.test(unbias)

Chi-squared test for given probabilities

data:  unbias
X-squared = 7.9592, df = 5, p-value = 0.1585

> chisq.test(bias)

Chi-squared test for given probabilities

data:  bias
X-squared = 2.8229, df = 5, p-value = 0.7273

In the three runs that I did, the p-value for both methods was always greater than typical alpha values used to test significance (0.05). This means that we wouldn't consider either of them to be biased. Interestingly, the supposedly unbiased method has consistently lower p-values, which indicates that it might actually be more biased. The caveat being that I only did 3 runs.

UPDATE: While I was writing my answer, Konrad Rudolph posted an answer that takes the same approach, but gets a very different result. I don't have the reputation to comment on his answer, so I'm going to address it here. First, the main thing is that the code he uses uses the same seed for the random number generator every time it's run. If you change the seed, you actually get a variety of results. Second, if you don't change the seed, but change the number of trials, you also get a variety of results. Try increasing or decreasing by an order of magnitude to see what I mean. Third, there is some integer truncation or rounding going on where the expected values aren't quite accurate. It probably isn't enough to make a difference, but it's there.

Basically, in summary, he just happened to get the right seed and number of trials that he might be getting a false result.

[1]: http://cpp.sh/2pdtx "here" [2]: https://rpg.stackexchange.com/questions/70802/how-can-i-test-whether-a-die-is-fair

Solution 5 - C++

One can think of a random number generator as working on a stream of binary digits. The generator turns the stream into numbers by slicing it up into chunks. If the std:rand function is working with a RAND_MAX of 32767, then it is using 15 bits in each slice.

When one takes the modules of a number between 0 and 32767 inclusive one finds that 5462 '0's and '1's but only 5461 '2's, '3's, '4's, and '5's. Hence the result is biased. The larger the RAND_MAX value is, the less bias there will be, but it is inescapable.

What is not biased is a number in the range [0..(2^n)-1]. You can generate a (theoretically) better number in the range 0..5 by extracting 3 bits, converting them to an integer in the range 0..7 and rejecting 6 and 7.

One hopes that every bit in the bit stream has an equal chance of being a '0' or a '1' irrespective of where it is in the stream or the values of other bits. This is exceptionally difficult in practice. The many different implementations of software PRNGs offer different compromises between speed and quality. A linear congruential generator such as std::rand offers fastest speed for lowest quality. A cryptographic generator offers highest quality for lowest speed.

Content Type	Original Author	Original Content on Stackoverflow
Question	yO_	View Question on Stackoverflow
Solution 1 - C++	Pete Becker	View Answer on Stackoverflow
Solution 2 - C++	Bathsheba	View Answer on Stackoverflow
Solution 3 - C++	Squeamish Ossifrage	View Answer on Stackoverflow
Solution 4 - C++	anjama	View Answer on Stackoverflow
Solution 5 - C++	Simon G.	View Answer on Stackoverflow