Data structures for loaded dice?

Algorithm Language Agnostic Data Structures Random Probability

Algorithm Problem Overview

Suppose that I have an n-sided loaded die, where each side k has some probability p_k of coming up when I roll it. I’m curious if there is a good data structure for storing this information statically (i.e., for a fixed set of probabilities), so that I can efficiently simulate a random roll of the die.

Currently, I have an O(lg n) solution for this problem. The idea is to store a table of the cumulative probability of the first k sides for all k, then generate a random real number in the range [0, 1) and perform a binary search over the table to get the largest index whose cumulative value is no greater than the chosen value.

I rather like this solution, but it seems odd that the runtime doesn’t take the probabilities into account. In particular, in the extreme cases of one side always coming up or the values being uniformly distributed, it’s possible to generate the result of the roll in O(1) using a naive approach, while my solution will still take logarithmically many steps.

Does anyone have any suggestions for how to solve this problem in a way that is somehow “adaptive” in it’s runtime?

Update: Based on the answers to this question, I have written up an article describing many approaches to this problem, along with their analyses. It looks like Vose’s implementation of the alias method gives Θ(n) preprocessing time and O(1) time per die roll, which is truly impressive. Hopefully this is a useful addition to the information contained in the answers!

Algorithm Solutions

Solution 1 - Algorithm

You are looking for the alias method which provides a O(1) method for generating a fixed discrete probability distribution (assuming you can access entries in an array of length n in constant time) with a one-time O(n) set-up. You can find it documented in chapter 3 (PDF) of "Non-Uniform Random Variate Generation" by Luc Devroye.

The idea is to take your array of probabilities p_k and produce three new n-element arrays, q_k, a_k, and b_k. Each q_k is a probability between 0 and 1, and each a_k and b_k is an integer between 1 and n.

We generate random numbers between 1 and n by generating two random numbers, r and s, between 0 and 1. Let i = floor(r*N)+1. If q_i < s then return a_i else return b_i. The work in the alias method is in figuring out how to produce q_k, a_k and b_k.

Solution 2 - Algorithm

Use a balanced binary search tree (or binary search in an array) and get O(log n) complexity. Have one node for each die result and have the keys be the interval that will trigger that result.

function get_result(node, seed):
    if seed < node.interval.start:
        return get_result(node.left_child, seed)
    else if seed < node.interval.end:
        // start <= seed < end
        return node.result
    else:
        return get_result(node.right_child, seed)

The good thing about this solution is that is very simple to implement but still has good complexity.

Solution 3 - Algorithm

I'm thinking of granulating your table.

Instead of having a table with the cumulative for each die value, you could create an integer array of length xN, where x is ideally a high number to increase accuracy of the probability.

Populate this array using the index (normalized by xN) as the cumulative value and, in each 'slot' in the array, store the would-be dice roll if this index comes up.

Maybe I could explain easier with an example:

Using three dice: P(1) = 0.2, P(2) = 0.5, P(3) = 0.3

Create an array, in this case I will choose a simple length, say 10. (that is, x = 3.33333)

arr[0] = 1,
arr[1] = 1,
arr[2] = 2,
arr[3] = 2,
arr[4] = 2,
arr[5] = 2,
arr[6] = 2,
arr[7] = 3,
arr[8] = 3,
arr[9] = 3

Then to get the probability, just randomize a number between 0 and 10 and simply access that index.

This method might loose accuracy, but increase x and accuracy will be sufficient.

Solution 4 - Algorithm

There are many ways to generate a random integer with a custom distribution (also known as a discrete distribution). The choice depends on many things, including the number of integers to choose from, the shape of the distribution, and whether the distribution will change over time.

One of the simplest ways to choose an integer with a custom weight function f(x) is the rejection sampling method. The following assumes that the highest possible value of f is max and each weight is 0 or greater. The time complexity for rejection sampling is constant on average, but depends greatly on the shape of the distribution and has a worst case of running forever. To choose an integer in [1, k] using rejection sampling:

Choose a uniform random integer i in [1, k].
With probability f(i)/max, return i. Otherwise, go to step 1. (For example, if all the weights are integers greater than 0, choose a uniform random integer in [1, max] and if that number is f(i) or less, return i, or go to step 1 otherwise.)

Other algorithms have an average sampling time that doesn't depend so greatly on the distribution (usually either constant or logarithmic), but often require you to precalculate the weights in a setup step and store them in a data structure. Some of them are also economical in terms of the number of random bits they use on average. Many of these algorithms were introduced after 2011, and they include—

The Bringmann–Larsen succinct data structure ("Succinct Sampling from Discrete Distributions", 2012),
Yunpeng Tang's multi-level search ("An Empirical Study of Random Sampling Methods for Changing Discrete Distributions", 2019), and
the Fast Loaded Dice Roller (2020).

Other algorithms include the alias method (already mentioned in your article), the Knuth–Yao algorithm, the MVN data structure, and more. See my section "Weighted Choice With Replacement" for a survey.

Content Type	Original Author	Original Content on Stackoverflow
Question	templatetypedef	View Question on Stackoverflow
Solution 1 - Algorithm	mhum	View Answer on Stackoverflow
Solution 2 - Algorithm	hugomg	View Answer on Stackoverflow
Solution 3 - Algorithm	andrewjsaid	View Answer on Stackoverflow
Solution 4 - Algorithm	Peter O.	View Answer on Stackoverflow