How to delete in a heap data structure?

AlgorithmData StructuresHeap

Algorithm Problem Overview


I understand how to delete the root node from a max heap but is the procedure for deleting a node from the middle to remove and replace the root repeatedly until the desired node is deleted?

  1. Is O(log n) the optimal complexity for this procedure?

  2. Does this affect the big O complexity since other nodes must be deleted in order to delete a specific node?

Algorithm Solutions


Solution 1 - Algorithm

Actually, you can remove an item from the middle of a heap without trouble.

The idea is to take the last item in the heap and, starting from the current position (i.e. the position that held the item you deleted), sift it up if the new item is greater than the parent of the old item. If it's not greater than the parent, then sift it down.

That's the procedure for a max heap. For a min heap, of course, you'd reverse the greater and less cases.

Finding an item in a heap is an O(n) operation, but if you already know where it is in the heap, removing it is O(log n).

I published a heap-based priority queue for DevSource a few years back. The full source is at http://www.mischel.com/pubs/priqueue.zip

Update

Several have asked if it's possible to move up after moving the last node in the heap to replace the deleted node. Consider this heap:

        1
    6       2
  7   8   3

If you delete the node with value 7, the value 3 replaces it:

        1
    6       2
  3   8

You now have to move it up to make a valid heap:

        1
    3       2
  6   8

The key here is that if the item you're replacing is in a different subtree than the last item in the heap, it's possible that the replacement node will be smaller than the parent of the replaced node.

Solution 2 - Algorithm

The problem with removing an arbitrary element from a heap is that you cannot find it.

In a heap, looking for an arbitrary element is O(n), thus removing an element [if given by value] is O(n) as well.

If it is important for you to remove arbitrary elements form the data structure, a heap is probably not the best choice, you should consider full sorted data structurs instead such as balanced BST or a skip list.

If your element is given by reference, it is however possible to remove it in O(logn) by simply 'replacing' it with the last leaf [remember a heap is implemented as a complete binary tree, so there is a last leaf, and you know exactly where it is], remove these element, and re-heapify the relevant sub heap.

Solution 3 - Algorithm

If you have a max heap, you could implement this by assigning a value larger than any other (eg something like int.MaxValue or inf in whichever language you are using) possible to the item to be deleted, then re-heapify and it will be the new root. Then perform a regular removal of the root node.

This will cause another re-heapify, but I can't see an obvious way to avoid doing it twice. This suggests that perhaps a heap isn't appropriate for your use-case, if you need to pull nodes from the middle of it often.

(for a min heap, you can obviously use int.MinValue or -inf or whatever)

Solution 4 - Algorithm

Removing an element from a known heap array position has O(log n) complexity (which is optimal for a heap). Thus, this operation has the same complexity as extracting (i.e. removing) the root element.

The basic steps for removing the i-th element (where 0<=i<n) from heap A (with n elements) are:

  1. swap element A[i] with element A[n-1]
  2. set n=n-1
  3. possibly fix the heap such that the heap-property is satisfied for all elements

Which is pretty similar to how the extraction of the root element works.

Remember that the heap-property is defined in a max-heap as:

A[parent(i)] >= A[i], for 0 < i < n

Whereas in a min-heap it's:

A[parent(i)] <= A[i], for 0 < i < n

In the following we assume a max-heap to simplify the description. But everything works analogously with a min-heap.

After the swap we have to distinguish 3 cases:

  1. new key in A[i] equals the old key - nothing changes, done
  2. new key in A[i] is greater than the old key. Nothing changes for the sub-trees l and r of i. If previously A[parent(i)] >= A[j] was true then now A[parent(i)]+c >= A[j] must be true as well (for j in (l, r) and c>=0). But the ancestors of element i might need fixing. This fix-up procedure is basically the same as when increasing A[i].
  3. new key in A[i] is smaller than the old key. Nothing changes for the ancestors of element i, because if the previous value already satisfied the heap property, a smaller value values does it as well. But the sub-trees might now need fixing, i.e. in the same way as when extracting the maximum element (i.e. the root).

An example implementation:

void heap_remove(A, i, &n)
{
    assert(i < n);
    assert(is_heap(A, i));
    
    --n;
    if (i == n)
      return;
    
    bool is_gt = A[n] > A[i];

    A[i] = A[n]; 
    
    if (is_gt)
        heapify_up(A, i);
    else
        heapify(A, i, n);
}

Where heapifiy_up() basically is the textbook increase() function - modulo writing the key:

void heapify_up(A, i)
{
    while (i > 0) {
        j = parent(i);
        if (A[i] > A[j]) {
            swap(A, i, j);
            i = j;
        } else {
            break;
        }
    }
}

And heapify() is the text-book sift-down function:

void heapify(A, i, n)
{
    for (;;) {
        l = left(i);
        r = right(i);

        maxi = i;
        if (l < n && A[l] > A[i])
            maxi = l;
        if (r < n && A[r] > A[i])
            maxi = r;
        if (maxi == i)
            break;
        swap(A, i, maxi);
        i = maxi;
    }
}

Since the heap is an (almost) complete binary tree, its height is in O(log n). Both heapify functions have to visit all tree levels, in the worst case, thus the removal by index is in O(log n).

Note that finding the element with a certain key in a heap is in O(n). Thus, removal by key value is in O(n) because of the find complexity, in general.

So how can we keep track of the array position of an element we've inserted? After all, further inserts/removals might move it around.

We can keep track by also storing a pointer to an element record next to the key, on the heap, for each element. The element record then contains a field with the current position - which thus has to be maintained by modified heap-insert and heap-swap functions. If we retain the pointer to the element record after insert, we can get the element's current position in the heap in constant time. Thus, in that way, we can also implement element removal in O(log n).

Solution 5 - Algorithm

What you want to achieve is not a typical heap operation and it seems to me that once you introduce "delete middle element" as a method some other binary tree(for instance red-black or AVL tree) is a better choice. You have a red-black tree implemented in some languages(for instance map and set in c++).

Otherwise the way to do middle element deletion is as proposed in rejj's answer: assign a big value(for max heap) or small value(for min heap) to the element, sift it up until it is root and then delete it.

This approach still keeps the O(log(n)) complexity for middle element deletion, but the one you propose doesn't. It will have complexity O(n*log(n)) and therefor is not very good. Hope that helps.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questiontesfa koliView Question on Stackoverflow
Solution 1 - AlgorithmJim MischelView Answer on Stackoverflow
Solution 2 - AlgorithmamitView Answer on Stackoverflow
Solution 3 - AlgorithmrejjView Answer on Stackoverflow
Solution 4 - AlgorithmmaxschlepzigView Answer on Stackoverflow
Solution 5 - AlgorithmIvaylo StrandjevView Answer on Stackoverflow