std::set with user defined type, how to ensure no duplicates

C++Set

C++ Problem Overview


So I have an std::set which needs to keep specific ordering as well as not allowing duplicates of a user defined (by me) type. Now I can get the order to work correctly by overloading the '<' operator in my type. However, the set does not appropriately detect duplicates, and to be honest I'm not entirely sure how it does this internally. I have overloaded the '==' operator, but somehow im not sure this is what the set is actually using? So the question is how does the set determine duplicates when you add values? Here is the relevant code:

The user defined type:

//! An element used in the route calculation.
struct RouteElem {
	int shortestToHere; // Shortest distance from the start.
	int heuristic;		// The heuristic estimate to the goal.
	Coordinate position;
	bool operator<( const RouteElem& other ) const
	{
		return (heuristic+shortestToHere) < (other.heuristic+other.shortestToHere);
	}
	bool operator==( const RouteElem& other ) const
	{
		return (position.x == other.position.x && position.y == other.position.y);
	}
};

So the elements are equivalent when their position is equivalent, and an element is less than another if its combined functional is less than that of the other. The sorting works, but the set will accept two elements of the same position.

C++ Solutions


Solution 1 - C++

operator== is not used by std::set. Elements a and b are considered equal iff !(a < b) && !(b < a)

Solution 2 - C++

std::set supports specifying a comparison function. The default is less which will use operator < to check equality. You can define a custom function to check equality and use that one instead:

std::set<RouteElem, mycomparefunction> myset; 

Note that it's not possible to separate the comparison function from sorting function. std::set is a binary tree and if an element in a binary tree is not bigger nor smaller than a specific element, it should be in the same place. It does something like this in the place finding algorithm:

if (a < b) {
    // check the left subtree
} else if (b < a) {
    // check the right subtree
} else {
    // the element should be placed here.
}

Solution 3 - C++

rlbond's comparator does not prevent the insertion of elements which compare equal. Apparently it's difficult to prove this in comments, given the character limit, because rlbond appears to thinks that std::set guarantees that it will never contain two elements with !compare(a,b) && !compare(b,a) for his comparator. However, rlbond's comparator does not define a strict order, and therefore is not a valid parameter to std::set.

#include <set>
#include <iostream>
#include <iterator>
#include <algorithm>

struct BrokenOrder {
    int order;
    int equality;

    public:
    BrokenOrder(int o, int e) : order(o), equality(e) {}

    bool operator<(const BrokenOrder &rhs) const {
        return order < rhs.order;
    }
    bool operator==(const BrokenOrder &rhs) const {
        return equality == rhs.equality;
    }
};

std::ostream &operator<<(std::ostream &stream, const BrokenOrder &b) {
    return stream << b.equality;
}

// rlbond's magic comparator
struct LessThan : public std::binary_function<BrokenOrder, BrokenOrder, bool> {
    bool operator()(const BrokenOrder& lhs, const BrokenOrder& rhs) const
    {
        return !(lhs == rhs) && (lhs < rhs);
    }
};

int main() {
    std::set<BrokenOrder,LessThan> s;
    for (int i = 0; i < 5; ++i) {
        s.insert(BrokenOrder(i,i));
    }
    for (int i = 0; i < 5; ++i) {
        s.insert(BrokenOrder(10-i,i));
    }
    std::copy(s.begin(), s.end(), 
        std::ostream_iterator<BrokenOrder>(std::cout, "\n"));
}

Output:

0
1
2
3
4
3
2
1
0

Duplicates. The magic comparator has failed. Different elements in the set have the same value of equality, and hence compare the same with operator==, because during insertion the set never compared the new element against its duplicate. The only duplicate which was excluded was 4, because the two 4's had sort orders 4 and 6. This put them close enough together in the set to be compared against each other.

From the C++ standard: 25.3:3 "For the algorithms to work correctly, comp has to induce a strict weak ordering on the values".

25.3:4 "... the requirements are that comp and equiv both be transitive relations:

comp(a,b) && comp(b,c) implies comp(a,c)"

Now, consider the elements a = BrokenOrder(1,1), b = BrokenOrder(2,2), and c = BrokenOrder(9,1), and comp of course equal to the magic comparator. Then:

  • comp(a,b) is true since 1 != 2 (equality) and 1 < 2 (order)
  • comp(b,c) is true since 2 != 1 (equality) and 2 < 9 (order)
  • comp(a,c) is false since 1 == 1 (equality)

Solution 4 - C++

The STL set implementation does something conceptually like this to detect equality:

bool equal = !(a < b) && !(b < a);

That is, if two elements are both not less than the other, then they must be equal. You may be able to check this by setting a breakpoint on your operator==() method and checking to see whether it ever gets called at all.

I would generally be suspicious of comparison operators that check completely different things. Your < operator is defined in terms of two things that are separate from how your == operator is defined. Generally you will want such comparisons to use consistent information.

Solution 5 - C++

You could try something like the following:

//! An element used in the route calculation.
struct RouteElem {
    int shortestToHere; // Shortest distance from the start.
    int heuristic;              // The heuristic estimate to the goal.
    Coordinate position;
    bool operator<( const RouteElem& other ) const
    {
      return (heuristic+shortestToHere) < (other.heuristic+other.shortestToHere);
    }
    bool operator==( const RouteElem& other ) const
    {
      return (position.x == other.position.x && position.y == other.position.y);
    }
};

struct CompareByPosition {
    bool operator()(const RouteElem &lhs, const RouteElem &rhs) {
        if (lhs.position.x != rhs.position.x) 
            return lhs.position.x < rhs.position.x;
        return lhs.position.y < rhs.position.y;
    }
};

// first, use std::set to remove duplicates
std::set<RouteElem,CompareByPosition> routeset;
// ... add each RouteElem to the set ...

// now copy the RouteElems into a vector
std::vector<RouteElem> routevec(routeset.begin(), routeset.end());

// now sort via operator<
std::sort(routevec.begin(), routevec.end());

Obviously there's the copy in the middle, which looks slow. But any structure which indexes items by two different criteria is therefore going to have some kind of extra overhead per item compared with a set. The whole of the code above is O(n log n), assuming that your implementation of std::sort uses introsort.

If you have it, under this scheme you could use unordered_set instead of set to do the initial uniqueifying. Since the hash would only have to depend on x and y, it should be faster than the O(log N) comparisons required to insert into a set.

Edit: just noticed that you said you wanted to "keep" sort order, not that you wanted to process everything in a batch. Sorry about that. If you want to efficiently maintain order and exclude duplicates while adding elements, then I would recommend using the set or unordered set I define above, based on position, and also a std::multiset<RouteElem>, which will maintain the operator< order. For each new element, do:

if (routeset.insert(elem).second) {
    routemultiset.insert(elem);
}

Although beware that this offers no exception guarantee. If the second insert throws, then the routeset has been modified, so the state is no longer consistent. So I guess really you need:

if (routeset.insert(elem).second) {
    try {
        routemultiset.insert(elem); // I assume strong exception guarantee
    } catch(...) {
        routeset.erase(elem); // I assume nothrow. Maybe should check those.
        throw;
    }
}

Or an equivalent with RAII, which will be more verbose if there's only one place in your code you ever use the RAII class, but better if there's much repetition.

Solution 6 - C++

Beware of the ramifications of this. It looks like you are trying to do something like A*, and if you try to insert a "duplicate" it will be ignored, even if there is a "better" route.

NOTE: This solution doesn't work, see onebyone's explanation below

struct RouteElem 
{
    int shortestToHere; // Shortest distance from the start.
    int heuristic;              // The heuristic estimate to the goal.
    Coordinate position;
    bool operator<( const RouteElem& other ) const
    {
        return (heuristic+shortestToHere) < (other.heuristic+other.shortestToHere);
    }
    bool operator==( const RouteElem& other ) const
    {
        return (position.x == other.position.x && position.y == other.position.y);
    }
};

struct RouteElemLessThan : public std::binary_function<RouteElem, RouteElem, bool>
{
    bool operator()(const RouteElem& lhs, const RouteElem& rhs) const
    {
        return !(lhs == rhs) && (lhs < rhs);
    }
};

std::set<RouteElem, RouteElemLessThan> my_set;

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDeusAduroView Question on Stackoverflow
Solution 1 - C++PaulView Answer on Stackoverflow
Solution 2 - C++mmxView Answer on Stackoverflow
Solution 3 - C++Steve JessopView Answer on Stackoverflow
Solution 4 - C++Greg HewgillView Answer on Stackoverflow
Solution 5 - C++Steve JessopView Answer on Stackoverflow
Solution 6 - C++rlbondView Answer on Stackoverflow