How and when to appropriately use weakref in Python

PythonCircular Reference

Python Problem Overview


I have some code where instances of classes have parent<->child references to each other, e.g.:

class Node:

    def __init__(self):
        self.parent = None
        self.children = {}

    def AddChild(self, name, child):
        child.parent = self
        self.children[name] = child


def Run():
    root, c1, c2 = Node(), Node(), Node()
    root.AddChild('first', c1)
    root.AddChild('second', c2)


Run()

I think this creates circular references such that root, c1 and c2 won't be freed after Run() is completed, right?. So, how do get them to be freed? I think I can do something like root.children.clear(), or self.parent = None - but what if I don't know when to do that?

Is this an appropriate time to use the weakref module? What, exactly, do I weakref'ify? the parent attribute? The children attribute? The whole object? All of the above? I see talk about the WeakKeyDictionary and weakref.proxy, but its not clear to me how they should be used, if at all, in this case.

This is also on Python 2.4 (can't upgrade).

Update: Example and Summary

What objects to weakref-ify depends on which object can live without the other, and what objects depend on each other. The object that lives the longest should contain weakrefs to the shorter-lived objects. Similarly, weakrefs should not be made to dependencies - if they are, the dependency could silently disappear even though it is still needed.

If, for example, you have a tree structure, root, that has children, kids, but can exist without children, then the root object should use weakrefs for its kids. This is also the case if the child object depends on the existence of the parent object. Below, the child object requires a parent in order to compute its depth, hence the strong-ref for parent. The members of the kids attribute are optional, though, so weakrefs are used to prevent a circular reference.

class Node:

    def __init__(self):
        self.parent = None
        self.kids = weakref.WeakValueDictionary()

    def GetDepth(self):
        root, depth = self, 0
        while root:
            depth += 1
            root = root.parent
        return depth


root = Node()
root.kids['one'] = Node()
root.kids['two'] = Node()

To flip the relationship around, we have something like the below. Here, the Facade classes require a Subsystem instance to work, so they use a strong-ref to the subsystem they need. Subsystems, however, don't require a Facade to work. Subsystems just provide a way to notify Facades about each other's actions.

class Facade:

  def __init__(self, subsystem):
    self.subsystem = subsystem
    subsystem.Register(self)


class Subsystem:

    def __init__(self):
        self.notify = []

    def Register(self, who):
        self.notify.append(weakref.proxy(who))


sub = Subsystem()
cli = Facade(sub)

Python Solutions


Solution 1 - Python

Yep, weakref's excellent here. Specifically, instead of:

self.children = {}

use:

self.children = weakref.WeakValueDictionary()

Nothing else needs change in your code. This way, when a child has no other differences, it just goes away -- and so does the entry in the parent's children map that has that child as the value.

Avoiding reference loops is up high on a par with implementing caches as a motivation for using the weakref module. Ref loops won't kill you, but they may end up clogging your memory, esp. if some of the classes whose instances are involved in them define __del__, since that interferes with the gc's module ability to dissolve those loops.

Solution 2 - Python

I suggest using child.parent = weakref.proxy(self). This is a good solution to avoid circular references when the lifetime of parent covers the lifetime of child. Use self.children = weakref.WeakValueDictionary() (as Alex Martelli suggested) when the lifetime of child covers the lifetime of parent. But never use weak references when both parent and child can be alive independently. Here after these rules are illustrated with examples.

Use a weakly referenced parent if you bind the root to a name and pass it around, while children are accessed from it:

def Run():
    root, c1, c2 = Node(), Node(), Node()
    root.AddChild('first', c1)
    root.AddChild('second', c2)
    return root  # only root refers to c1 and c2 after return, 
                 # so this references should be strong

Use weakly referenced children if you bind each child to a name and pass them around, while root is accessed from them:

def Run():
    root, c1, c2 = Node(), Node(), Node()
    root.AddChild('first', c1)
    root.AddChild('second', c2)
    return c1, c2

Don’t use weak references in this case:

def Run():
    root, c1, c2 = Node(), Node(), Node()
    root.AddChild('first', c1)
    root.AddChild('second', c2)
    return c1

Solution 3 - Python

I wanted to clarify which references can be weak. The following approach is general, but I use the doubly-linked tree in all examples.

Logical Step 1.

You need to ensure that there are strong references to keep all the objects alive as long as you need them. It could be done in many ways, for example by:

  • [direct names]: a named reference to each node in the tree
  • [container]: a reference to a container that stores all the nodes
  • [root + children]: a reference to the root node, and references from each node to its children
  • [leaves + parent]: references to all the leaf nodes, and references from each node to its parent

Logical Step 2.

Now you add references to represent information, if required.

For instance, if you used [container] approach in Step 1, you still have to represent the edges. An edge between nodes A and B can be represented with a single reference; it can go in either direction. Again, there are many options, for example:

  • [children]: references from each node to its children
  • [parent]: a reference from each node to its parent
  • [set of sets]: a set containing 2-element sets; each 2-element contains references to nodes of one edge

Of course, if you used [root + children] approach in Step 1, all your information is already fully represented, so you skip this step.

Logical Step 3.

Now you add references to improve performance, if desired.

For instance, if you used [container] approach in Step 1, and [children] approach in Step 2, you might desire to improve the speed of certain algorithms, and add references between each each node and its parent. Such information is logically redundant, since you could (at a cost in performance) derive it from existing data.


All the references in Step 1 must be strong.

All the references in Steps 2 and 3 may be weak or strong. There is no advantage to using strong references. There is an advantage to using weak references until you know that cycles are no longer possible. Strictly speaking, once you know that cycles are impossible, it makes no difference whether to use weak or strong references. But to avoid thinking about it, you might as well use exclusively weak references in the Steps 2 and 3.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionRichard LevasseurView Question on Stackoverflow
Solution 1 - PythonAlex MartelliView Answer on Stackoverflow
Solution 2 - PythonDenis OtkidachView Answer on Stackoverflow
Solution 3 - PythonmaxView Answer on Stackoverflow