How to make heapq evaluate the heap off of a specific attribute?

PythonData StructuresHeap

Python Problem Overview


I wish to hold a heap of objects, not just numbers. They will have an integer attribute in them that the heap can sort by. The easiest way to use heaps in python is heapq, but how do I tell it to sort by a specific attribute when using heapq?

Python Solutions


Solution 1 - Python

According to the example from the documentation, you can use tuples, and it will sort by the first element of the tuple:

>>> h = []
>>> heappush(h, (5, 'write code'))
>>> heappush(h, (7, 'release product'))
>>> heappush(h, (1, 'write spec'))
>>> heappush(h, (3, 'create tests'))
>>> heappop(h)
(1, 'write spec')

So if you don't want to (or can't?) do a __cmp__ method, you can manually extract your sorting key at push time.

Note that if the first elements in a pair of tuples are equal, further elements will be compared. If this is not what you want, you need to ensure that each first element is unique.

Solution 2 - Python

heapq sorts objects the same way list.sort does, so just define a method __cmp__() within your class definition, which will compare itself to another instance of the same class:

def __cmp__(self, other):
    return cmp(self.intAttribute, other.intAttribute)

Works in Python 2.x.

In 3.x use:

def __lt__(self, other):
    return self.intAttribute < other.intAttribute

Solution 3 - Python

According to the Official Document, a solution to this is to store entries as tuples (please take a look at Section 8.4.1 and 8.4.2).

For example, your object is something like this in tuple's format (key, value_1, value_2)

When you put the objects (i.e. tuples) into heap, it will take the first attribute in the object (in this case is key) to compare. If a tie happens, the heap will use the next attribute (i.e. value_1) and so on.

For example:

import heapq

heap = []
heapq.heappush(heap, (0,'one', 1))
heapq.heappush(heap, (1,'two', 11))
heapq.heappush(heap, (1, 'two', 2))
heapq.heappush(heap, (1, 'one', 3))
heapq.heappush(heap, (1,'two', 3))
heapq.heappush(heap, (1,'one', 4))
heapq.heappush(heap, (1,'two', 5))
heapq.heappush(heap, (1,'one', 1))

show_tree(heap)

Output:

                                      (0, 'one', 1)                                       
                (1, 'one', 1)                                (1, 'one', 4)                
    (1, 'one', 3)         (1, 'two', 3)         (1, 'two', 2)         (1, 'two', 5)     
(1, 'two', 11)

About pretty print a heap in python (updated the link): show_tree()

Solution 4 - Python

I feel the simplest way is to override the existing cmp_lt function of the heapq module. A short example:

import heapq

# your custom function. Here, comparing tuples a and b based on their 2nd element
def new_cmp_lt(self,a,b):
    return a[1]<b[1]

#override the existing "cmp_lt" module function with your function
heapq.cmp_lt=new_cmp_lt

#Now use everything like normally used

Solution 5 - Python

I had the same question but none of the above answers hit the spot although some were close but not elaborated enough. Anyway, I did some research and tried this piece of code and hopefully this should be sufficient for someone next who is looking to get an answer:

The problem with using a tuple is it only uses the first item which is not very flexible. I wanted something similar to std::priority_queue in c++ like this: std::priority_queue<pair<int, int>, vector<pair<int, int>>, comparator> pq; where I could design my own comparator which is more common in real world applications.

Hopefully the below snippet helps: https://repl.it/@gururajks/EvenAccurateCylinders

import heapq
class PQNode:

    def __init__(self, key, value):
        self.key = key
        self.value = value
    
    # compares the second value
    def __lt__(self, other):
        return self.value < other.value

    def __str__(self):
        return str("{} : {}".format(self.key, self.value))

input = [PQNode(1, 4), PQNode(7, 4), PQNode(6, 9), PQNode(2, 5)]
hinput = []
for item in input:
    heapq.heappush(hinput, item)

while (hinput):
    print (heapq.heappop(hinput))

Solution 6 - Python

Unfortunately, you can't, although this is an often requested feature.

One option would be to insert (key, value) tuples into the heap. However, that won't work if the values throw an exception when compared (they will be compared in the case of a tie between keys).

A second option would be to define a __lt__ (less-than) method in the class that will use the appropriate attribute to compare the elements for sorting. However, that might not be possible if the objects were created by another package or if you need them to compare differently elsewhere in the program.

A third option would be to use the sortedlist class from the blist module (disclaimer: I'm the author). The constructor for sortedlist takes a key parameter that lets you specify a function to return the sort key of an element, similar to the key parameter of list.sort and sorted.

Solution 7 - Python

Python 3 Update

This other answers here are out-of-date:

  • Some are Python 2 specific. The __cmp__ method doesn't exist anymore.
  • Some do not reflect best practices and target only __lt__ instead of all the rich comparisons as recommended by PEP 8.
  • Some do not use modern tooling such as dataclasses, attrgetter, or total_ordering.

Modern solution with Dataclasses

With dataclasses, it is easy to make a data holder with customized ordering. For example, here is a Person class that excludes the name field from the comparison order:

from dataclasses import dataclass, field

@dataclass(order=True)
class Person:
    name: str = field(compare=False)
    age: int

actors = [
    Person('T Hanks', 65),
    Person('E Olson', 33),
    Person('A Tapping', 58),
]

This works perfectly with heaps:

>>> heapify(actors)
>>> heappop(actors)
Person(name='E Olson', age=33)
>>> heappop(actors)
Person(name='A Tapping', age=58)
>>> heappop(actors)
Person(name='T Hanks', age=65)

Handling Existing Classes

Sometimes you have to work with the data as provided and need to control the comparison order without changing the original class.

The solution is to add a wrapper with the new comparison. This leaves the unoriginal data and its class unchanged. Here is a modern recipe for adding such a wrapper:

from functools import total_ordering
from operator import attrgetter

def new_compare(*field_names):
    extract = attrgetter(*field_names)
    @total_ordering
    class ComparisonWrapper:
        def __init__(self, obj):
            self.obj = obj
        def __eq__(self, other):
            return extract(self.obj) == extract(other.obj)
        def __lt__(self, other):
            return extract(self.obj) < extract(other.obj)
    return ComparisonWrapper

For example, you may be given the following data and cannot alter it or its class directly:

class Person:
    def __init__(self, name, age):
        self.name = name
        self.age = age
    def __repr__(self):
        return f'Person({self.name!r}, {self.age})'

actors = [
    Person('T Hanks', 65),
    Person('E Olson', 33),
    Person('A Tapping', 58),
]

The wrapper can be applied gracefully with map(). To unwrap the data, access the obj attribute:

>>> from heapq import heapify, heappop

>>> data = list(map(new_compare('age'), actors))
>>> heapify(data)
>>> heappop(data).obj
Person('E Olson', 33)
>>> heappop(data).obj
Person('A Tapping', 58)
>>> heappop(data).obj
Person('T Hanks', 65)

Wrappers versus Decorating Tuples

As noted in the modern documentation, the traditional solution with decorating tuples no longer works for some essential use cases. In particular, if the objects in the heap are functions, a tuple in the form of (priority, task) no longer works in Python 3 because functions cannot be compared.

The new suggestion is to use a wrapper such as:

from dataclasses import dataclass, field
from typing import Any

@dataclass(order=True)
class PrioritizedItem:
    priority: int
    item: Any=field(compare=False)

This will always work even if the item objects aren't comparable.

Solution 8 - Python

You could implement a heapdict. Note the use of popitem() to get the lowest priority item.

import heapdict as hd
import string
import numpy as np

h = hd.heapdict()
keys = [char for char in string.ascii_lowercase[:10]]
vals = [i for i in np.random.randint(0,10, 10)]
for k,v in zip(keys,vals):
    h[k] = v
for i in range(len(vals)):
    print h.popitem()

Solution 9 - Python

There is a module called heaps. The Github address is https://github.com/gekco/heapy. You can apply your own key / sort function at instantiation of the class or when creating the heap from an array, which is very useful as this saves you adding it as an argument every time you perform an action.

Example where I want the list what the smallest element at the last position of the tuple be on top of the heap:

>>> from heapy.heap import Heap 
>>> a = [(3, 5, 10), (-5, 3, 8), (7, 8, 9), (-4, 0, 2)]
>>> x = Heap.from_array(a, key=lambda t : t[-1])
>>> x.length
4
>>> x.top()
(-4, 0, 2)
>>> x.insert((-1, 0, 1))
>>> x.length
5
>>> x.top()
(-1, 0, 1)
>>> a
[(3, 5, 10), (-5, 3, 8), (7, 8, 9), (-4, 0, 2)]
 

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestioncoffeeView Question on Stackoverflow
Solution 1 - PythonJanderView Answer on Stackoverflow
Solution 2 - PythoneumiroView Answer on Stackoverflow
Solution 3 - PythonCatbuiltsView Answer on Stackoverflow
Solution 4 - Pythontu_curiousView Answer on Stackoverflow
Solution 5 - PythonGuruView Answer on Stackoverflow
Solution 6 - PythonDaniel StutzbachView Answer on Stackoverflow
Solution 7 - PythonRaymond HettingerView Answer on Stackoverflow
Solution 8 - PythonDanGoodrickView Answer on Stackoverflow
Solution 9 - PythonElmex80sView Answer on Stackoverflow