Generating Discrete random variables with specified weights using SciPy or NumPy

PythonRandomNumpyScipy

Python Problem Overview


I am looking for a simple function that can generate an array of specified random values based on their corresponding (also specified) probabilities. I only need it to generate float values, but I don't see why it shouldn't be able to generate any scalar. I can think of many ways of building this from existing functions, but I think I probably just missed an obvious SciPy or NumPy function.

E.g.:

>>> values = [1.1, 2.2, 3.3]
>>> probabilities = [0.2, 0.5, 0.3]
>>> print some_function(values, probabilities, size=10)
(2.2, 1.1, 3.3, 3.3, 2.2, 2.2, 1.1, 2.2, 3.3, 2.2)

Note: I found scipy.stats.rv_discrete but I don't understand how it works. Specifically, I do not understand what this (below) means nor what it should do:

numargs = generic.numargs
[ <shape(s)> ] = ['Replace with resonable value', ]*numargs

If rv_discrete is what I should be using, could you please provide me with a simple example and an explanation of the above "shape" statement?

Python Solutions


Solution 1 - Python

Drawing from a discrete distribution is directly built into numpy. The function is called random.choice (difficult to find without any reference to discrete distributions in the numpy docs).

elements = [1.1, 2.2, 3.3]
probabilities = [0.2, 0.5, 0.3]
np.random.choice(elements, 10, p=probabilities)

Solution 2 - Python

Here is a short, relatively simple function that returns weighted values, it uses NumPy's digitize, accumulate, and random_sample.

import numpy as np
from numpy.random import random_sample

def weighted_values(values, probabilities, size):
    bins = np.add.accumulate(probabilities)
    return values[np.digitize(random_sample(size), bins)]

values = np.array([1.1, 2.2, 3.3])
probabilities = np.array([0.2, 0.5, 0.3])

print weighted_values(values, probabilities, 10)
#Sample output:
[ 2.2  2.2  1.1  2.2  2.2  3.3  3.3  2.2  3.3  3.3]

It works like this:

  1. First using accumulate we create bins.
  2. Then we create a bunch of random numbers (between 0, and 1) using random_sample
  3. We use digitize to see which bins these numbers fall into.
  4. And return the corresponding values.

Solution 3 - Python

You were going in a good direction: the built-in scipy.stats.rv_discrete() quite directly creates a discrete random variable. Here is how it works:

>>> from scipy.stats import rv_discrete  

>>> values = numpy.array([1.1, 2.2, 3.3])
>>> probabilities = [0.2, 0.5, 0.3]

>>> distrib = rv_discrete(values=(range(len(values)), probabilities))  # This defines a Scipy probability distribution

>>> distrib.rvs(size=10)  # 10 samples from range(len(values))
array([1, 2, 0, 2, 2, 0, 2, 1, 0, 2])

>>> values[_]  # Conversion to specific discrete values (the fact that values is a NumPy array is used for the indexing)
[2.2, 3.3, 1.1, 3.3, 3.3, 1.1, 3.3, 2.2, 1.1, 3.3]

The distribution distrib above thus returns indexes from the values list.

More generally, rv_discrete() takes a sequence of integer values in the first elements of its values=(…,…) argument, and returns these values, in this case; there is no need to convert to specific (float) values. Here is an example:

>>> values = [10, 20, 30]
>>> probabilities = [0.2, 0.5, 0.3]
>>> distrib = rv_discrete(values=(values, probabilities))
>>> distrib.rvs(size=10)
array([20, 20, 20, 20, 20, 20, 20, 30, 20, 20])

where (integer) input values are directly returned with the desired probability.

Solution 4 - Python

The simplest DIY way would be to sum up the probabilities into a cumulative distribution. This way, you split the unit interval into sub-intervals of the length equal to your original probabilities. Now generate a single random number uniform on [0,1), and and see to which interval it lands.

Solution 5 - Python

You could also use Lea, a pure Python package dedicated to discrete probability distributions.

>>> distrib = Lea.fromValFreqs((1.1,2),(2.2,5),(3.3,3))
>>> distrib
1.1 : 2/10
2.2 : 5/10
3.3 : 3/10
>>> distrib.random(10)
(2.2, 2.2, 1.1, 2.2, 2.2, 2.2, 1.1, 3.3, 1.1, 3.3)

Et voilà!

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionTimYView Question on Stackoverflow
Solution 1 - PythongoebbeView Answer on Stackoverflow
Solution 2 - PythonfraxelView Answer on Stackoverflow
Solution 3 - PythonEric O LebigotView Answer on Stackoverflow
Solution 4 - Pythonev-brView Answer on Stackoverflow
Solution 5 - PythonPierre DenisView Answer on Stackoverflow