Understanding boost::disjoint_sets

C++BoostDisjoint Sets

C++ Problem Overview


I need to use boost::disjoint_sets, but the documentation is unclear to me. Can someone please explain what each template parameter means, and perhaps give a small example code for creating a disjoint_sets?

As per the request, I am using disjoint_sets to implement Tarjan's off-line least common ancestors algorithm, i.e - the value type should be vertex_descriptor.

C++ Solutions


Solution 1 - C++

What I can understand from the documentation :

Disjoint need to associate a rank and a parent (in the forest tree) to each element. Since you might want to work with any kind of data you may,for example, not always want to use a map for the parent: with integer an array is sufficient. You also need a rank foe each element (the rank needed for the union-find).

You'll need two "properties" :

  • one to associate an integer to each element (first template argument), the rank
  • one to associate an element to an other one (second template argument), the fathers

On an example :

std::vector<int>  rank (100);
std::vector<int>  parent (100);
boost::disjoint_sets<int*,int*> ds(&rank[0], &parent[0]);

Arrays are used &rank[0], &parent[0] to the type in the template is int*

For a more complex example (using maps) you can look at Ugo's answer.

You are just giving to the algorithm two structures to store the data (rank/parent) he needs.

Solution 2 - C++

disjoint_sets<Rank, Parent, FindCompress>
  • Rank PropertyMap used to store the size of a set (element -> std::size_t). See union by rank
  • Parent PropertyMap used to store the parent of an element (element -> element). See Path compression
  • FindCompress Optional argument defining the find method. Default to find_with_full_path_compression See here (Default should be what you need).

Example:

template <typename Rank, typename Parent>
void algo(Rank& r, Parent& p, std::vector<Element>& elements)
{
 boost::disjoint_sets<Rank,Parent> dsets(r, p);
 for (std::vector<Element>::iterator e = elements.begin();
      e != elements.end(); e++)
  dsets.make_set(*e);
  ...
}

int main()
{
  std::vector<Element> elements;
  elements.push_back(Element(...));
  ...

  typedef std::map<Element,std::size_t> rank_t; // => order on Element
  typedef std::map<Element,Element> parent_t;
  rank_t rank_map;
  parent_t parent_map;

  boost::associative_property_map<rank_t>   rank_pmap(rank_map);
  boost::associative_property_map<parent_t> parent_pmap(parent_map);

  algo(rank_pmap, parent_pmap, elements);
}

Note that "The Boost Property Map Library contains a few adaptors that convert commonly used data-structures that implement a mapping operation, such as builtin arrays (pointers), iterators, and std::map, to have the property map interface"

This list of these adaptors (like boost::associative_property_map) can be found here.

Solution 3 - C++

For those of you who can't afford the overhead of std::map (or can't use it because you don't have default constructor in your class), but whose data is not as simple as int, I wrote a guide to a solution using std::vector, which is kind of optimal when you know the total number of elements beforehand.

The guide includes a fully-working sample code that you can download and test on your own.

The solution mentioned there assumes you have control of the class' code so that in particular you can add some attributes. If this is still not possible, you can always add a wrapper around it:

class Wrapper {
    UntouchableClass const& mInstance;
    size_t dsID;
    size_t dsRank;
    size_t dsParent;
}

Moreover, if you know the number of elements to be small, there's no need for size_t, in which case you can add some template for the UnsignedInt type and decide in runtime to instantiate it with uint8_t, uint16_t, uint32_tor uint64_t, which you can obtain with <cstdint> in C++11 or with boost::cstdint otherwise.

template <typename UnsignedInt>
class Wrapper {
    UntouchableClass const& mInstance;
    UnsignedInt dsID;
    UnsignedInt dsRank;
    UnsignedInt dsParent;
}

Here's the link again in case you missed it: http://janoma.cl/post/using-disjoint-sets-with-a-vector/

Solution 4 - C++

I written a simple implementation a while ago. Have a look.

struct DisjointSet {
    vector<int> parent;
    vector<int> size;

    DisjointSet(int maxSize) {
        parent.resize(maxSize);
        size.resize(maxSize);
        for (int i = 0; i < maxSize; i++) {
            parent[i] = i;
            size[i] = 1;
        }
    }

    int find_set(int v) {
        if (v == parent[v])
            return v;
        return parent[v] = find_set(parent[v]);
    }

    void union_set(int a, int b) {
        a = find_set(a);
        b = find_set(b);
        if (a != b) {
            if (size[a] < size[b])
                swap(a, b);
            parent[b] = a;
            size[a] += size[b];
        }
    }
};

And the usage goes like this. It's simple. Isn't it?

void solve() {
    int n;
    cin >> n;
    DisjointSet S(n);  // Initializing with maximum Size
    S.union_set(1, 2);
    S.union_set(3, 7);
    int parent = S.find_set(1);  // root of 1
}

Solution 5 - C++

Loic's answer looks good to me, but I needed to initialize the parent so that each element had itself as parent, so I used the iota function to generate an increasing sequence starting from 0.

Using Boost, and I imported bits/stdc++.h and used using namespace std for simplicity.

#include <bits/stdc++.h>

#include <boost/pending/disjoint_sets.hpp>
#include <boost/unordered/unordered_set.hpp>
using namespace std;

int main() {
  array<int, 100> rank;
  array<int, 100> parent;

  iota(parent.begin(), parent.end(), 0);
  boost::disjoint_sets<int*, int*> ds(rank.begin(), parent.begin());

  ds.union_set(1, 2);
  ds.union_set(1, 3);
  ds.union_set(1, 4);

  cout << ds.find_set(1) << endl;  // 1 or 2 or 3 or 4
  cout << ds.find_set(2) << endl;  // 1 or 2 or 3 or 4
  cout << ds.find_set(3) << endl;  // 1 or 2 or 3 or 4
  cout << ds.find_set(4) << endl;  // 1 or 2 or 3 or 4
  cout << ds.find_set(5) << endl;  // 5
  cout << ds.find_set(6) << endl;  // 6
}

I changed std::vector to std::array because pushing elements to a vector will make it realloc its data, which makes the references the disjoint sets object contains become invalid.

As far as I know, it's not guaranteed that the parent will be a specific number, so that's why I wrote 1 or 2 or 3 or 4 (it can be any of these). Maybe the documentation explains with more detail which number will be chosen as leader of the set (I haven't studied it).

In my case, the output is:

2
2
2
2
5
6

Seems simple, it can probably be improved to make it more robust (somehow).

Note: std::iota Fills the range [first, last) with sequentially increasing values, starting with value and repetitively evaluating ++value. More: https://en.cppreference.com/w/cpp/algorithm/iota

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionAmir RachumView Question on Stackoverflow
Solution 1 - C++Loïc FévrierView Answer on Stackoverflow
Solution 2 - C++log0View Answer on Stackoverflow
Solution 3 - C++JanomaView Answer on Stackoverflow
Solution 4 - C++Appaji ChintimiView Answer on Stackoverflow
Solution 5 - C++Chris VilchesView Answer on Stackoverflow