What makes a user-defined class unhashable?

PythonClassPython 3.xUser Defined-TypesHashable

Python Problem Overview


The docs say that a class is hashable as long as it defines __hash__ method and __eq__ method. However:

class X(list):
  # read-only interface of `tuple` and `list` should be the same, so reuse tuple.__hash__
  __hash__ = tuple.__hash__

x1 = X()
s = {x1} # TypeError: unhashable type: 'X'

What makes X unhashable?

Note that I must have identical lists (in terms of regular equality) to be hashed to the same value; otherwise, I will violate this requirement on hash functions:

> The only required property is that objects which compare equal have > the same hash value

The docs do warn that a hashable object shouldn't be modified during its lifetime, and of course I don't modify instances of X after creation. Of course, the interpreter won't check that anyway.

Python Solutions


Solution 1 - Python

Simply setting the __hash__ method to that of the tuple class is not enough. You haven't actually told it how to hash any differently. tuples are hashable because they are immutable. If you really wanted to make you specific example work, it might be like this:

class X2(list):
    def __hash__(self):
        return hash(tuple(self))

In this case you are actually defining how to hash your custom list subclass. You just have to define exactly how it can generate a hash. You can hash on whatever you want, as opposed to using the tuple's hashing method:

def __hash__(self):
    return hash("foobar"*len(self))

Solution 2 - Python

From the Python3 docs:

> If a class does not define an _eq_() method it should not define a > _hash_() operation either; if it defines _eq_() but not _hash_(), its instances will not be usable as items in hashable collections. If a class defines mutable objects and implements an > _eq_() method, it should not implement _hash_(), since the implementation of hashable collections requires that a key’s hash > value is immutable (if the object’s hash value changes, it will be in > the wrong hash bucket).

Ref: object._hash_(self)

Sample code:

class Hashable:
    pass

class Unhashable:
    def __eq__(self, other):
        return (self == other)

class HashableAgain:
    def __eq__(self, other):
        return (self == other)

    def __hash__(self):
        return id(self)

def main():
    # OK
    print(hash(Hashable()))
    # Throws: TypeError("unhashable type: 'X'",)
    print(hash(Unhashable()))  
    # OK
    print(hash(HashableAgain()))

Solution 3 - Python

What you could and should do, based on your other question, is: don't subclass anything, just encapsulate a tuple. It's perfectly fine to do so in the init.

class X(object):
    def __init__(self, *args):
        self.tpl = args
    def __hash__(self):
        return hash(self.tpl)
    def __eq__(self, other):
        return self.tpl == other
    def __repr__(self):
        return repr(self.tpl)

x1 = X()
s = {x1}

which yields:

>>> s
set([()])
>>> x1
()

Solution 4 - Python

If you don't modify instances of X after creation, why aren't you subclassing tuple?

But I'll point out that this actually doesn't throw an error, at least in Python 2.6.

>>> class X(list):
...     __hash__ = tuple.__hash__
...     __eq__ = tuple.__eq__
... 
>>> x = X()
>>> s = set((x,))
>>> s
set([[]])

I hesitate to say "works" because this doesn't do what you think it does.

>>> a = X()
>>> b = X((5,))
>>> hash(a)
4299954584
>>> hash(b)
4299954672
>>> id(a)
4299954584
>>> id(b)
4299954672

It's just using the object id as a hash. When you actually call __hash__ you still get an error; likewise for __eq__.

>>> a.__hash__()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: descriptor '__hash__' for 'tuple' objects doesn't apply to 'X' object
>>> X().__eq__(X())
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: descriptor '__eq__' for 'tuple' objects doesn't apply to 'X' object

I gather that the python internals, for some reason, are detecting that X has a __hash__ and an __eq__ method, but aren't calling them.

The moral of all this is: just write a real hash function. Since this is a sequence object, converting it to a tuple and hashing that is the most obvious approach.

def __hash__(self):
    return hash(tuple(self))

Solution 5 - Python

An addition to the above answers - For the specific case of a dataclass in python3.7+ - to make a dataclass hashable, you can use

@dataclass(frozen=True)
class YourClass:
    pass

as the decoration instead of

@dataclass
class YourClass:
    pass

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionmaxView Question on Stackoverflow
Solution 1 - PythonjdiView Answer on Stackoverflow
Solution 2 - PythonkevinarpeView Answer on Stackoverflow
Solution 3 - Pythonch3kaView Answer on Stackoverflow
Solution 4 - PythonsenderleView Answer on Stackoverflow
Solution 5 - PythonAlex JosephView Answer on Stackoverflow