Best practices for overriding isEqual: and hash

Objective CEquality

Objective C Problem Overview


How do you properly override isEqual: in Objective-C? The "catch" seems to be that if two objects are equal (as determined by the isEqual: method), they must have the same hash value.

The Introspection section of the Cocoa Fundamentals Guide does have an example on how to override isEqual:, copied as follows, for a class named MyWidget:

- (BOOL)isEqual:(id)other {
    if (other == self)
        return YES;
    if (!other || ![other isKindOfClass:[self class]])
        return NO;
    return [self isEqualToWidget:other];
}
 
- (BOOL)isEqualToWidget:(MyWidget *)aWidget {
    if (self == aWidget)
        return YES;
    if (![(id)[self name] isEqual:[aWidget name]])
        return NO;
    if (![[self data] isEqualToData:[aWidget data]])
        return NO;
    return YES;
}

It checks pointer equality, then class equality, and finally compares the objects using isEqualToWidget:, which only checks the name and data properties. What the example doesn't show is how to override hash.

Let's assume there are other properties that do not affect equality, say age. Shouldn't the hash method be overridden such that only name and data affect the hash? And if so, how would you do that? Just add the hashes of name and data? For example:

- (NSUInteger)hash {
    NSUInteger hash = 0;
    hash += [[self name] hash];
    hash += [[self data] hash];
    return hash;
}

Is that sufficient? Is there a better technique? What if you have primitives, like int? Convert them to NSNumber to get their hash? Or structs like NSRect?

(Brain fart: Originally wrote "bitwise OR" them together with |=. Meant add.)

Objective C Solutions


Solution 1 - Objective C

Start with

 NSUInteger prime = 31;
 NSUInteger result = 1;

Then for every primitive you do

 result = prime * result + var

For objects you use 0 for nil and otherwise their hashcode.

 result = prime * result + [var hash];

For booleans you use two different values

 result = prime * result + ((var)?1231:1237);

Explanation and Attribution

This is not tcurdt's work, and comments were asking for more explanation, so I believe an edit for attribution is fair.

This algorithm was popularized in the book "Effective Java", and the relevant chapter can currently be found online here. That book popularized the algorithm, which is now a default in a number of Java applications (including Eclipse). It derived, however, from an even older implementation which is variously attributed to Dan Bernstein or Chris Torek. That older algorithm originally floated around on Usenet, and certain attribution is difficult. For example, there is some interesting commentary in this Apache code (search for their names) that references the original source.

Bottom line is, this is a very old, simple hashing algorithm. It is not the most performant, and it is not even proven mathematically to be a "good" algorithm. But it is simple, and a lot of people have used it for a long time with good results, so it has a lot of historical support.

Solution 2 - Objective C

I'm just picking up Objective-C myself, so I can't speak for that language specifically, but in the other languages I use if two instances are "Equal" they must return the same hash - otherwise you are going to have all sorts of problems when trying to use them as keys in a hashtable (or any dictionary-type collections).

On the other hand, if 2 instances are not equal, they may or may not have the same hash - it is best if they don't. This is the difference between an O(1) search on a hash table and an O(N) search - if all your hashes collide you may find that searching your table is no better than searching a list.

In terms of best practices your hash should return a random distribution of values for its input. This means that, for example, if you have a double, but the majority of your values tend to cluster between 0 and 100, you need to make sure that the hashes returned by those values are evenly distributed across the entire range of possible hash values. This will significantly improve your performance.

There are a number of hashing algorithms out there, including several listed here. I try to avoid creating new hash algorithms as it can have large performance implications, so using the existing hash methods and doing a bitwise combination of some sort as you do in your example is a good way to avoid it.

Solution 3 - Objective C

> A simple XOR over the hash values of critical properties is sufficient > 99% of the time.

For example:

- (NSUInteger)hash
{
    return [self.name hash] ^ [self.data hash];
}

Solution found at http://nshipster.com/equality/ by Mattt Thompson (which also referred to this question in his post :~)

Solution 4 - Objective C

I found this thread extremely helpful supplying everything I needed to get my isEqual: and hash methods implemented with one catch. When testing object instance variables in isEqual: the example code uses:

if (![(id)[self name] isEqual:[aWidget name]])
    return NO;

This repeatedly failed (i.e., returned NO) without and error, when I knew the objects were identical in my unit testing. The reason was, one of the NSString instance variables was nil so the above statement was:

if (![nil isEqual: nil])
    return NO;

and since nil will respond to any method, this is perfectly legal but

[nil isEqual: nil]

returns nil, which is NO, so when both the object and the one being tested had a nil object they would be considered not equal (i.e., isEqual: would return NO).

This simple fix was to change the if statement to:

if ([self name] != [aWidget name] && ![(id)[self name] isEqual:[aWidget name]])
    return NO;

This way, if their addresses are the same it skips the method call no matter if they are both nil or both pointing to the same object but if either is not nil or they point to different objects then the comparator is appropriately called.

I hope this saves someone a few minutes of head scratching.

Solution 5 - Objective C

The hash function should create a semi-unique value that is not likely to collide or match another object's hash value.

Here is the full hash function, which can be adapted to your classes instance variables. It uses NSUInteger's rather than int for compatibility on 64/32bit applications.

If the result becomes 0 for different objects, you run the risk of colliding hashes. Colliding hashes can result in unexpected program behavior when working with some of the collection classes that depend on the hash function. Make sure to test your hash function prior to use.

-(NSUInteger)hash {
	NSUInteger result = 1;
	NSUInteger prime = 31;
	NSUInteger yesPrime = 1231;
	NSUInteger noPrime = 1237;
	
	// Add any object that already has a hash function (NSString)
	result = prime * result + [self.myObject hash];
	
    // Add primitive variables (int)
    result = prime * result + self.primitiveVariable; 

    // Boolean values (BOOL)
	result = prime * result + (self.isSelected ? yesPrime : noPrime);
	
	return result;
}

Solution 6 - Objective C

The easy but inefficient way is to return the same -hash value for every instance. Otherwise, yes, you must implement hash based only on objects which affect equality. This is tricky if you use lax comparisons in -isEqual: (e.g. case-insensitive string comparisons). For ints, you can generally use the int itself, unless you’ll be comparing with NSNumbers.

Don’t use |=, though, it will saturate. Use ^= instead.

Random fun fact: [[NSNumber numberWithInt:0] isEqual:[NSNumber numberWithBool:NO]], but [[NSNumber numberWithInt:0] hash] != [[NSNumber numberWithBool:NO] hash]. (rdar://4538282, open since 05-May-2006)

Solution 7 - Objective C

Remember that you only need to provide hash that's equal when isEqual is true. When isEqual is false, the hash doesn't have to be unequal though presumably it is. Hence:

Keep hash simple. Pick a member (or few members) variable that is the most distinctive.

For example, for CLPlacemark, the name only is enough. Yes there are 2 or 3 distincts CLPlacemark with the exact same name but those are rare. Use that hash.

@interface CLPlacemark (equal)
- (BOOL)isEqual:(CLPlacemark*)other;
@end

@implementation CLPlacemark (equal)

...

-(NSUInteger) hash
{
    return self.name.hash;
}


@end

Notice I do not bother specifying the city, the country, etc. The name is enough. Perhaps the name and CLLocation.

Hash should be evenly distributed. So you can combine several members variable using the caret ^ (xor sign)

So it's something like

hash = self.member1.hash ^ self.member2.hash ^ self.member3.hash

That way the hash will be evenly distributed.

Hash must be O(1), and not O(n)

So what to do in array?

Again, simple. You do not have to hash all members of the array. Enough to hash the first element, last element, the count, maybe some middle elements, and that's it.

Solution 8 - Objective C

The equals and hash contracts are well specified and thoroughly researched in the Java world (see @mipardi's answer), but all the same considerations should apply to Objective-C.

Eclipse does a reliable job of generating these methods in Java, so here's an Eclipse example ported by hand to Objective-C:

- (BOOL)isEqual:(id)object {
    if (self == object)
        return true;
    if ([self class] != [object class])
        return false;
    MyWidget *other = (MyWidget *)object;
    if (_name == nil) {
        if (other->_name != nil)
            return false;
    }
    else if (![_name isEqual:other->_name])
        return false;
    if (_data == nil) {
        if (other->_data != nil)
            return false;
    }
    else if (![_data isEqual:other->_data])
        return false;
    return true;
}

- (NSUInteger)hash {
    const NSUInteger prime = 31;
    NSUInteger result = 1;
    result = prime * result + [_name hash];
    result = prime * result + [_data hash];
    return result;
}

And for a subclass YourWidget which adds a property serialNo:

- (BOOL)isEqual:(id)object {
    if (self == object)
        return true;
    if (![super isEqual:object])
        return false;
    if ([self class] != [object class])
        return false;
    YourWidget *other = (YourWidget *)object;
    if (_serialNo == nil) {
        if (other->_serialNo != nil)
            return false;
    }
    else if (![_serialNo isEqual:other->_serialNo])
        return false;
    return true;
}

- (NSUInteger)hash {
    const NSUInteger prime = 31;
    NSUInteger result = [super hash];
    result = prime * result + [_serialNo hash];
    return result;
}

This implementation avoids some subclassing pitfalls in the sample isEqual: from Apple:

  • Apple's class test other isKindOfClass:[self class] is asymmetric for two different subclasses of MyWidget. Equality needs to be symmetric: a=b if and only if b=a. This could easily be fixed by changing the test to other isKindOfClass:[MyWidget class], then all MyWidget subclasses would be mutually comparable.
  • Using an isKindOfClass: subclass test prevents subclasses from overriding isEqual: with a refined equality test. This is because equality needs to be transitive: if a=b and a=c then b=c. If a MyWidget instance compares equal to two YourWidget instances, then those YourWidget instances must compare equal to each other, even if their serialNo differs.

The second issue can be fixed by only considering objects to be equal if they belong to the exact same class, hence the [self class] != [object class] test here. For typical application classes, this seems to be the best approach.

However, there certainly are cases where the isKindOfClass: test is preferable. This is more typical of framework classes than application classes. For example, any NSString should compare equal to any other other NSString with the same underlying character sequence, regardless of the NSString/NSMutableString distinction, and also regardless of what private classes in the NSString class cluster are involved.

In such cases, isEqual: should have well-defined, well-documented behaviour, and it should be made clear that subclasses can't override this. In Java, the 'no override' restriction can be enforced by flagging the equals and hashcode methods as final, but Objective-C has no equivalent.

Solution 9 - Objective C

Hold on, surely a far easier way to do this is to first override - (NSString )description and provide a string representation of your object state (you must represent the entire state of your object in this string).

Then, just provide the following implementation of hash:

- (NSUInteger)hash {
    return [[self description] hash];
}

This is based on the principle that "if two string objects are equal (as determined by the isEqualToString: method), they must have the same hash value."

Source: NSString Class Reference

Solution 10 - Objective C

I've found this page to be a helpful guide in override equals- and hash-type methods. It includes a decent algorithm for calculating hash codes. The page is geared towards Java, but it's pretty easy to adapt it to Objective-C/Cocoa.

Solution 11 - Objective C

This doesn't directly answer your question (at all) but I've used MurmurHash before to generate hashes: murmurhash

Guess I should explain why: murmurhash is bloody fast...

Solution 12 - Objective C

I'm an Objective C newbie too, but I found an excellent article about identity vs. equality in Objective C here. From my reading it looks like you might be able to just keep the default hash function (which should provide a unique identity) and implement the isEqual method so that it compares data values.

Solution 13 - Objective C

Quinn is just wrong that the reference to the murmur hash is useless here. Quinn is right that you want to understand the theory behind hashing. The murmur distills a lot of that theory into an implementation. Figuring out how to apply that implementation to this particular application is worth exploring.

Some of the key points here:

The example function from tcurdt suggests that '31' is a good multiplier because it is prime. One needs to show that being prime is a necessary and sufficient condition. In fact 31 (and 7) are probably not particularly good primes because 31 == -1 % 32. An odd multiplier with about half the bits set and half the bits clear is likely to be better. (The murmur hash multiplication constant has that property.)

This type of hash function would likely be stronger if, after multiplying, the result value were adjusted via a shift and xor. The multiplication tends to produce the results of lots of bit interactions at the high end of the register and low interaction results at the bottom end of the register. The shift and xor increases the interactions at the bottom end of the register.

Setting the initial result to a value where about half the bits are zero and about half the bits are one would also tend to be useful.

It may be useful to be careful about the order in which elements are combined. One should probably first process booleans and other elements where the values are not strongly distributed.

It may be useful to add a couple of extra bit scrambling stages at the end of the computation.

Whether or not the murmur hash is actually fast for this application is an open question. The murmur hash premixes the bits of each input word. Multiple input words can be processed in parallel which helps multiple-issue pipelined cpus.

Solution 14 - Objective C

Combining @tcurdt's answer with @oscar-gomez's answer for getting property names, we can create an easy drop-in solution for both isEqual and hash:

NSArray *PropertyNamesFromObject(id object)
{
    unsigned int propertyCount = 0;
    objc_property_t * properties = class_copyPropertyList([object class], &propertyCount);
    NSMutableArray *propertyNames = [NSMutableArray arrayWithCapacity:propertyCount];

    for (unsigned int i = 0; i < propertyCount; ++i) {
        objc_property_t property = properties[i];
        const char * name = property_getName(property);
        NSString *propertyName = [NSString stringWithUTF8String:name];
        [propertyNames addObject:propertyName];
    }
    free(properties);
    return propertyNames;
}

BOOL IsEqualObjects(id object1, id object2)
{
    if (object1 == object2)
        return YES;
    if (!object1 || ![object2 isKindOfClass:[object1 class]])
        return NO;
    
    NSArray *propertyNames = PropertyNamesFromObject(object1);
    for (NSString *propertyName in propertyNames) {
        if (([object1 valueForKey:propertyName] != [object2 valueForKey:propertyName])
            && (![[object1 valueForKey:propertyName] isEqual:[object2 valueForKey:propertyName]])) return NO;
    }

    return YES;
}

NSUInteger MagicHash(id object)
{
    NSUInteger prime = 31;
    NSUInteger result = 1;

    NSArray *propertyNames = PropertyNamesFromObject(object);

    for (NSString *propertyName in propertyNames) {
        id value = [object valueForKey:propertyName];
        result = prime * result + [value hash];
    }

    return result;
}

Now, in your custom class you can easily implement isEqual: and hash:

- (NSUInteger)hash
{
    return MagicHash(self);
}

- (BOOL)isEqual:(id)other
{
    return IsEqualObjects(self, other);
}

Solution 15 - Objective C

Note that if you're creating a object that can be mutated after creation, the hash value must not change if the object is inserted into a collection. Practically speaking, this means that the hash value must be fixed from the point of the initial object creation. See Apple's documentation on the NSObject protocol's -hash method for more information:

> If a mutable object is added to a collection that uses hash values to determine the object’s position in the collection, the value returned by the hash method of the object must not change while the object is in the collection. Therefore, either the hash method must not rely on any of the object’s internal state information or you must make sure the object’s internal state information does not change while the object is in the collection. Thus, for example, a mutable dictionary can be put in a hash table but you must not change it while it is in there. (Note that it can be difficult to know whether or not a given object is in a collection.)

This sounds like complete whackery to me since it potentially effectively renders hash lookups far less efficient, but I suppose it's better to err on the side of caution and follow what the documentation says.

Solution 16 - Objective C

Sorry if I risk sounding a complete boffin here but... ...nobody bothered mentioning that to follow 'best practices' you should definitely not specify an equals method that would NOT take into account all data owned by your target object, e.g whatever data is aggregated to your object, versus an associate of it, should be taken into account when implementing equals. If you don't want to take, say 'age' into account in a comparison, then you should write a comparator and use that to perform your comparisons instead of isEqual:.

If you define an isEqual: method that performs equality comparison arbitrarily, you incur the risk that this method is misused by another developer, or even yourself, once you've forgotten the 'twist' in your equals interpretation.

Ergo, although this is a great q&a about hashing, you don't normally need to redefine the hashing method, you should probably define an ad-hoc comparator instead.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionDave DribinView Question on Stackoverflow
Solution 1 - Objective CtcurdtView Answer on Stackoverflow
Solution 2 - Objective CBrian B.View Answer on Stackoverflow
Solution 3 - Objective CYariv NissimView Answer on Stackoverflow
Solution 4 - Objective CLavaSliderView Answer on Stackoverflow
Solution 5 - Objective CPaul SoltView Answer on Stackoverflow
Solution 6 - Objective CJens AytonView Answer on Stackoverflow
Solution 7 - Objective Cuser4951View Answer on Stackoverflow
Solution 8 - Objective CjedwidzView Answer on Stackoverflow
Solution 9 - Objective CJonathan EllisView Answer on Stackoverflow
Solution 10 - Objective CmipadiView Answer on Stackoverflow
Solution 11 - Objective CschwaView Answer on Stackoverflow
Solution 12 - Objective CceperryView Answer on Stackoverflow
Solution 13 - Objective CCesView Answer on Stackoverflow
Solution 14 - Objective CjohnboilesView Answer on Stackoverflow
Solution 15 - Objective Cuser10345View Answer on Stackoverflow
Solution 16 - Objective CThibaud de SouzaView Answer on Stackoverflow