Weighted random numbers

C++ Problem Overview

I'm trying to implement a weighted random numbers. I'm currently just banging my head against the wall and cannot figure this out.

In my project (Hold'em hand-ranges, subjective all-in equity analysis), I'm using Boost's random -functions. So, let's say I want to pick a random number between 1 and 3 (so either 1, 2 or 3). Boost's mersenne twister generator works like a charm for this. However, I want the pick to be weighted for example like this:

1 (weight: 90)
2 (weight: 56)
3 (weight:  4)

Does Boost have some sort of functionality for this?

C++ Solutions

Solution 1 - C++

There is a straightforward algorithm for picking an item at random, where items have individual weights:

calculate the sum of all the weights
pick a random number that is 0 or greater and is less than the sum of the weights
go through the items one at a time, subtracting their weight from your random number, until you get the item where the random number is less than that item's weight

Pseudo-code illustrating this:

int sum_of_weight = 0;
for(int i=0; i<num_choices; i++) {
   sum_of_weight += choice_weight[i];
}
int rnd = random(sum_of_weight);
for(int i=0; i<num_choices; i++) {
  if(rnd < choice_weight[i])
    return i;
  rnd -= choice_weight[i];
}
assert(!"should never get here");

This should be straightforward to adapt to your boost containers and such.

If your weights are rarely changed but you often pick one at random, and as long as your container is storing pointers to the objects or is more than a few dozen items long (basically, you have to profile to know if this helps or hinders), then there is an optimisation:

By storing the cumulative weight sum in each item you can use a binary search to pick the item corresponding to the pick weight.

If you do not know the number of items in the list, then there's a very neat algorithm called reservoir sampling that can be adapted to be weighted.

Solution 2 - C++

Updated answer to an old question. You can easily do this in C++11 with just the std::lib:

#include <iostream>
#include <random>
#include <iterator>
#include <ctime>
#include <type_traits>
#include <cassert>

int main()
{
    // Set up distribution
    double interval[] = {1,   2,   3,   4};
    double weights[] =  {  .90, .56, .04};
    std::piecewise_constant_distribution<> dist(std::begin(interval),
                                                std::end(interval),
                                                std::begin(weights));
    // Choose generator
    std::mt19937 gen(std::time(0));  // seed as wanted
    // Demonstrate with N randomly generated numbers
    const unsigned N = 1000000;
    // Collect number of times each random number is generated
    double avg[std::extent<decltype(weights)>::value] = {0};
    for (unsigned i = 0; i < N; ++i)
    {
        // Generate random number using gen, distributed according to dist
        unsigned r = static_cast<unsigned>(dist(gen));
        // Sanity check
        assert(interval[0] <= r && r <= *(std::end(interval)-2));
        // Save r for statistical test of distribution
        avg[r - 1]++;
    }
    // Compute averages for distribution
    for (double* i = std::begin(avg); i < std::end(avg); ++i)
        *i /= N;
    // Display distribution
    for (unsigned i = 1; i <= std::extent<decltype(avg)>::value; ++i)
        std::cout << "avg[" << i << "] = " << avg[i-1] << '\n';
}

Output on my system:

avg[1] = 0.600115
avg[2] = 0.373341
avg[3] = 0.026544

Note that most of the code above is devoted to just displaying and analyzing the output. The actual generation is just a few lines of code. The output demonstrates that the requested "probabilities" have been obtained. You have to divide the requested output by 1.5 since that is what the requests add up to.

Solution 3 - C++

If your weights change more slowly than they are drawn, C++11 discrete_distribution is going to be the easiest:

#include <random>
#include <vector>
std::vector<double> weights{90,56,4};
std::discrete_distribution<int> dist(std::begin(weights), std::end(weights));
std::mt19937 gen;
gen.seed(time(0));//if you want different results from different runs
int N = 100000;
std::vector<int> samples(N);
for(auto & i: samples)
    i = dist(gen);
//do something with your samples...

Note, however, that the c++11 discrete_distribution computes all of the cumulative sums on initialization. Usually, you want that because it speeds up the sampling time for a one time O(N) cost. But for a rapidly changing distribution it will incur a heavy calculation (and memory) cost. For instance if the weights represented how many items there are and every time you draw one, you remove it, you will probably want a custom algorithm.

Will's answer https://stackoverflow.com/a/1761646/837451 avoids this overhead but will be slower to draw from than the C++11 because it can't use binary search.

To see that it does this, you can see the relevant lines (/usr/include/c++/5/bits/random.tcc on my Ubuntu 16.04 + GCC 5.3 install):

  template<typename _IntType>
    void
    discrete_distribution<_IntType>::param_type::
    _M_initialize()
    {
      if (_M_prob.size() < 2)
        {
          _M_prob.clear();
          return;
        }

      const double __sum = std::accumulate(_M_prob.begin(),
                                           _M_prob.end(), 0.0);
      // Now normalize the probabilites.
      __detail::__normalize(_M_prob.begin(), _M_prob.end(), _M_prob.begin(),
                            __sum);
      // Accumulate partial sums.
      _M_cp.reserve(_M_prob.size());
      std::partial_sum(_M_prob.begin(), _M_prob.end(),
                       std::back_inserter(_M_cp));
      // Make sure the last cumulative probability is one.
      _M_cp[_M_cp.size() - 1] = 1.0;
    }

Solution 4 - C++

What I do when I need to weight numbers is using a random number for the weight.

For example: I need that generate random numbers from 1 to 3 with the following weights:

10% of a random number could be 1
30% of a random number could be 2
60% of a random number could be 3

Then I use:

weight = rand() % 10;

switch( weight ) {

    case 0:
        randomNumber = 1;
        break;
    case 1:
    case 2:
    case 3:
        randomNumber = 2;
        break;
    case 4:
    case 5:
    case 6:
    case 7:
    case 8:
    case 9:
        randomNumber = 3;
        break;
}

With this, randomly it has 10% of the probabilities to be 1, 30% to be 2 and 60% to be 3.

You can play with it as your needs.

Hope I could help you, Good Luck!

Solution 5 - C++

Build a bag (or std::vector) of all the items that can be picked.
Make sure that the number of each items is proportional to your weighting.

Example:

1 60%
2 35%
3 5%

So have a bag with 100 items with 60 1's, 35 2's and 5 3's.
Now randomly sort the bag (std::random_shuffle)

Pick elements from the bag sequentially until it is empty.
Once empty re-randomize the bag and start again.

Solution 6 - C++

Choose a random number on [0,1), which should be the default operator() for a boost RNG. Choose the item with cumulative probability density function >= that number:

template <class It,class P>
It choose_p(It begin,It end,P const& p)
{
    if (begin==end) return end;
    double sum=0.;
    for (It i=begin;i!=end;++i)
        sum+=p(*i);
    double choice=sum*random01();
    for (It i=begin;;) {
        choice -= p(*i);
        It r=i;
        ++i;
        if (choice<0 || i==end) return r;
    }
    return begin; //unreachable
}

Where random01() returns a double >=0 and <1. Note that the above doesn't require the probabilities to sum to 1; it normalizes them for you.

p is just a function assigning a probability to an item in the collection [begin,end). You can omit it (or use an identity) if you just have a sequence of probabilities.

Solution 7 - C++

This is my understanding of a "weighted random", I've been using this recently. (Code is in Python but can be implemented in other langs)

Let's say you want to pick a random person and they don't have equal chances of being selected You can give each person a "weight" or "chance" value:

choices = [("Ade", 60), ("Tope", 50), ("Maryamu", 30)]

You use their weights to calculate a score for each then find the choice with the highest score

highest = [None, 0]
for p in choices:
	score = math.floor(random.random() * p[1])
	if score > highest[1]:
		highest[0] = p
		highest[1] = score

print(highest)

For Ade the highest score they can get is 60, Tope 50 and so on, meaning that Ade has a higher chance of generating the largest score than the rest.

You can use any range of weights, the greater the difference the more skewed the distribution. E.g if Ade had a weight of 1000 they will almost always be chosen.

Test

votes = [{"name": "Ade", "votes": 0}, {"name": "Tope", "votes": 0}, {"name": "Maryamu", "votes": 0]
for v in range(100):
		
		highest = [None, 0]
		for p in choices:
			score = math.floor(random.random() * p[1])
			
			if score > highest[1]:
				highest[0] = p
				highest[1] = score

		candidate = choices(index(highest[0])) # get index of person
		votes[candidate]["count"] += 1 # increase vote count
print(votes)

// votes printed at the end. your results might be different
[{"name": "Ade", "votes": 45}, {"name": "Tope", "votes": 30}, {"name": "Maryamu", "votes": 25}]

Issues

It looks like the more the voters, the more predictable the results. Welp

Hope this gives someone an idea...

Solution 8 - C++

I have just implemented the given solution by "will"

#include <iostream>
#include <map>

using namespace std;


template < class T >
class WeightedRandomSample
{
public:
    void SetWeigthMap( map< T , unsigned int >& WeightMap )
    {
        m_pMap = &WeightMap;
    }
    
    T GetRandomSample()
    {
        unsigned int sum_of_weight = GetSumOfWeights();
        unsigned int rnd = (rand() % sum_of_weight);
        map<T , unsigned int>& w_map = *m_pMap;
        typename map<T , unsigned int>::iterator it;
        for(it = w_map.begin() ; it != w_map.end() ; ++it )
        {
            unsigned int w = it->second;
            if(rnd < w)
                return (it->first);
            rnd -= w;
        }
        //assert(!"should never get here");
        T* t = NULL;
        return *(t);
    }
    
    unsigned int GetSumOfWeights()
    {
        if(m_pMap == NULL)
            return 0;
        unsigned int sum = 0;
        map<T , unsigned int>& w_map = *m_pMap;
        typename map<T , unsigned int>::iterator it;
        
        for(it = w_map.begin() ; it != w_map.end() ; ++it )
        {
            sum += it->second;
        }
        return sum;
    }

    
protected:
    map< T , unsigned int>* m_pMap = NULL;
    
};

typedef pair<int , int> PAIR_INT_INT;
typedef map<PAIR_INT_INT ,unsigned int> mul_table_weighted_map;

int main()
{
    
    mul_table_weighted_map m;
    m[PAIR_INT_INT(2,3)] = 10;
    m[PAIR_INT_INT(4,5)] = 20;
    m[PAIR_INT_INT(2,5)] = 10;
    
    WeightedRandomSample<PAIR_INT_INT> WRS;
    WRS.SetWeigthMap(m);
    unsigned int sum_of_weight = WRS.GetSumOfWeights();
    cout <<"Sum of weights : " << sum_of_weight << endl;
    
    unsigned int number_of_test = 10000;
    cout << "testing " << number_of_test << " ..." << endl;
    map<PAIR_INT_INT , unsigned int> check_map;
    for(int i = 0 ; i < number_of_test ; i++)
    {
        PAIR_INT_INT res = WRS.GetRandomSample();
        check_map[res]++;
        //cout << i+1 << ": random = " << res.first << " * " << res.second << endl;
    }
    cout << "results: " << endl;
    
    for(auto t : check_map)
    {
        PAIR_INT_INT p = t.first;
        unsigned int expected = (number_of_test * m[p]) / sum_of_weight;
        cout << " pair " << p.first << " * " << p.second 
            << ", counted = " << t.second
            << ", expected = " << expected
            << endl;
    }

    return 0;
}

Content Type	Original Author	Original Content on Stackoverflow
Question	nhaa123	View Question on Stackoverflow
Solution 1 - C++	Will	View Answer on Stackoverflow
Solution 2 - C++	Howard Hinnant	View Answer on Stackoverflow
Solution 3 - C++	mmdanziger	View Answer on Stackoverflow
Solution 4 - C++	Chirry	View Answer on Stackoverflow
Solution 5 - C++	Martin York	View Answer on Stackoverflow
Solution 6 - C++	Jonathan Graehl	View Answer on Stackoverflow
Solution 7 - C++	LeanKhan	View Answer on Stackoverflow
Solution 8 - C++	mohtashami740	View Answer on Stackoverflow