How can I make a python dataclass hashable without making them immutable?
PythonPython 3.xHashPython DataclassesPython Problem Overview
Say a I have a dataclass in python3. I want to be able to hash and order these objects. I do not want these to be immutable.
I only want them ordered/hashed on id.
I see in the docs that I can just implement _hash_ and all that but I'd like to get datacalsses to do the work for me because they are intended to handle this.
from dataclasses import dataclass, field
@dataclass(eq=True, order=True)
class Category:
id: str = field(compare=True)
name: str = field(default="set this in post_init", compare=False)
a = sorted(list(set([ Category(id='x'), Category(id='y')])))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'Category'
Python Solutions
Solution 1 - Python
From the docs:
> Here are the rules governing implicit creation of a __hash__()
method:
>
> [...]
>
> If eq
and frozen
are both true, by default dataclass()
will
> generate a __hash__()
method for you. If eq
is true and frozen
> is false, __hash__()
will be set to None
, marking it unhashable
> (which it is, since it is mutable). If eq
is false, __hash__()
> will be left untouched meaning the __hash__()
method of the
> superclass will be used (if the superclass is object, this means it
> will fall back to id-based hashing).
Since you set eq=True
and left frozen
at the default (False
), your dataclass is unhashable.
You have 3 options:
-
Set
frozen=True
(in addition toeq=True
), which will make your class immutable and hashable. -
Set
unsafe_hash=True
, which will create a__hash__
method but leave your class mutable, thus risking problems if an instance of your class is modified while stored in a dict or set:cat = Category('foo', 'bar') categories = {cat} cat.id = 'baz' print(cat in categories) # False
-
Manually implement a
__hash__
method.
Solution 2 - Python
TL;DR
Use frozen=True
in conjunction to eq=True
(which will make the instances immutable).
Long Answer
From the docs:
> __hash__()
is used by built-in hash()
, and when objects are added to hashed collections such as dictionaries and sets. Having a __hash__()
> implies that instances of the class are immutable. Mutability is a
> complicated property that depends on the programmer’s intent, the
> existence and behavior of __eq__()
, and the values of the eq and
> frozen flags in the dataclass()
decorator.
>
> By default, dataclass()
will not implicitly add a __hash__()
method
> unless it is safe to do so. Neither will it add or change an existing
> explicitly defined __hash__()
method. Setting the class attribute
> __hash__ = None
has a specific meaning to Python, as described in the __hash__()
documentation.
>
> If __hash__()
is not explicit defined, or if it is set to None, then
> dataclass()
may add an implicit __hash__()
method. Although not
> recommended, you can force dataclass()
to create a __hash__()
method
> with unsafe_hash=True
. This might be the case if your class is
> logically immutable but can nonetheless be mutated. This is a
> specialized use case and should be considered carefully.
>
> Here are the rules governing implicit creation of a __hash__()
method.
> Note that you cannot both have an explicit __hash__()
method in your
> dataclass and set unsafe_hash=True
; this will result in a TypeError
.
>
> If eq and frozen are both true, by default dataclass()
will generate a
> __hash__()
method for you. If eq is true and frozen is false, __hash__()
will be set to None, marking it unhashable (which it is, since it is mutable). If eq is false, __hash__()
will be left
> untouched meaning the __hash__()
method of the superclass will be used
> (if the superclass is object, this means it will fall back to id-based
> hashing).
Solution 3 - Python
I'd like to add a special note for use of unsafe_hash.
You can exclude fields from being compared by hash by setting compare=False, or hash=False. (hash by default inherits from compare).
This might be useful if you store nodes in a graph but want to mark them visited without breaking their hashing (e.g if they're in a set of unvisited nodes..).
from dataclasses import dataclass, field
@dataclass(unsafe_hash=True)
class node:
x:int
visit_count: int = field(default=10, compare=False) # hash inherits compare setting. So valid.
# visit_count: int = field(default=False, hash=False) # also valid. Arguably easier to read, but can break some compare code.
# visit_count: int = False # if mutated, hashing breaks. (3* printed)
s = set()
n = node(1)
s.add(n)
if n in s: print("1* n in s")
n.visit_count = 11
if n in s:
print("2* n still in s")
else:
print("3* n is lost to the void because hashing broke.")
This took me hours to figure out... Useful further readings I found is the python doc on dataclasses. Specifically see the field documentation and dataclass arg documentations. https://docs.python.org/3/library/dataclasses.html