How to delete in a heap data structure?
AlgorithmData StructuresHeapAlgorithm Problem Overview
I understand how to delete the root node from a max heap but is the procedure for deleting a node from the middle to remove and replace the root repeatedly until the desired node is deleted?
-
Is O(log n) the optimal complexity for this procedure?
-
Does this affect the big O complexity since other nodes must be deleted in order to delete a specific node?
Algorithm Solutions
Solution 1 - Algorithm
Actually, you can remove an item from the middle of a heap without trouble.
The idea is to take the last item in the heap and, starting from the current position (i.e. the position that held the item you deleted), sift it up if the new item is greater than the parent of the old item. If it's not greater than the parent, then sift it down.
That's the procedure for a max heap. For a min heap, of course, you'd reverse the greater and less cases.
Finding an item in a heap is an O(n) operation, but if you already know where it is in the heap, removing it is O(log n).
I published a heap-based priority queue for DevSource a few years back. The full source is at http://www.mischel.com/pubs/priqueue.zip
Update
Several have asked if it's possible to move up after moving the last node in the heap to replace the deleted node. Consider this heap:
1
6 2
7 8 3
If you delete the node with value 7, the value 3 replaces it:
1
6 2
3 8
You now have to move it up to make a valid heap:
1
3 2
6 8
The key here is that if the item you're replacing is in a different subtree than the last item in the heap, it's possible that the replacement node will be smaller than the parent of the replaced node.
Solution 2 - Algorithm
The problem with removing an arbitrary element from a heap is that you cannot find it.
In a heap, looking for an arbitrary element is O(n)
, thus removing an element [if given by value] is O(n)
as well.
If it is important for you to remove arbitrary elements form the data structure, a heap is probably not the best choice, you should consider full sorted data structurs instead such as balanced BST or a skip list.
If your element is given by reference, it is however possible to remove it in O(logn)
by simply 'replacing' it with the last leaf [remember a heap is implemented as a complete binary tree, so there is a last leaf, and you know exactly where it is], remove these element, and re-heapify the relevant sub heap.
Solution 3 - Algorithm
If you have a max heap, you could implement this by assigning a value larger than any other (eg something like int.MaxValue
or inf
in whichever language you are using) possible to the item to be deleted, then re-heapify and it will be the new root. Then perform a regular removal of the root node.
This will cause another re-heapify, but I can't see an obvious way to avoid doing it twice. This suggests that perhaps a heap isn't appropriate for your use-case, if you need to pull nodes from the middle of it often.
(for a min heap, you can obviously use int.MinValue
or -inf
or whatever)
Solution 4 - Algorithm
Removing an element from a known heap array position has O(log n)
complexity (which is optimal for a heap). Thus, this operation has the same complexity as extracting (i.e. removing) the root element.
The basic steps for removing the i-th element (where 0<=i<n
) from heap A (with n
elements) are:
- swap element
A[i]
with elementA[n-1]
- set
n=n-1
- possibly fix the heap such that the heap-property is satisfied for all elements
Which is pretty similar to how the extraction of the root element works.
Remember that the heap-property is defined in a max-heap as:
A[parent(i)] >= A[i], for 0 < i < n
Whereas in a min-heap it's:
A[parent(i)] <= A[i], for 0 < i < n
In the following we assume a max-heap to simplify the description. But everything works analogously with a min-heap.
After the swap we have to distinguish 3 cases:
- new key in
A[i]
equals the old key - nothing changes, done - new key in
A[i]
is greater than the old key. Nothing changes for the sub-treesl
andr
ofi
. If previouslyA[parent(i)] >= A[j]
was true then nowA[parent(i)]+c >= A[j]
must be true as well (forj in (l, r)
andc>=0
). But the ancestors of elementi
might need fixing. This fix-up procedure is basically the same as when increasingA[i]
. - new key in
A[i]
is smaller than the old key. Nothing changes for the ancestors of element i, because if the previous value already satisfied the heap property, a smaller value values does it as well. But the sub-trees might now need fixing, i.e. in the same way as when extracting the maximum element (i.e. the root).
An example implementation:
void heap_remove(A, i, &n)
{
assert(i < n);
assert(is_heap(A, i));
--n;
if (i == n)
return;
bool is_gt = A[n] > A[i];
A[i] = A[n];
if (is_gt)
heapify_up(A, i);
else
heapify(A, i, n);
}
Where heapifiy_up()
basically is the textbook increase()
function - modulo writing the key:
void heapify_up(A, i)
{
while (i > 0) {
j = parent(i);
if (A[i] > A[j]) {
swap(A, i, j);
i = j;
} else {
break;
}
}
}
And heapify()
is the text-book sift-down function:
void heapify(A, i, n)
{
for (;;) {
l = left(i);
r = right(i);
maxi = i;
if (l < n && A[l] > A[i])
maxi = l;
if (r < n && A[r] > A[i])
maxi = r;
if (maxi == i)
break;
swap(A, i, maxi);
i = maxi;
}
}
Since the heap is an (almost) complete binary tree, its height is in O(log n)
. Both heapify functions have to visit all tree levels, in the worst case, thus the removal by index is in O(log n)
.
Note that finding the element with a certain key in a heap is in O(n)
. Thus, removal by key value is in O(n)
because of the find complexity, in general.
So how can we keep track of the array position of an element we've inserted? After all, further inserts/removals might move it around.
We can keep track by also storing a pointer to an element record next to the key, on the heap, for each element. The element record then contains a field with the current position - which thus has to be maintained by modified heap-insert and heap-swap functions. If we retain the pointer to the element record after insert, we can get the element's current position in the heap in constant time. Thus, in that way, we can also implement element removal in O(log n)
.
Solution 5 - Algorithm
What you want to achieve is not a typical heap operation and it seems to me that once you introduce "delete middle element" as a method some other binary tree(for instance red-black or AVL tree) is a better choice. You have a red-black tree implemented in some languages(for instance map and set in c++).
Otherwise the way to do middle element deletion is as proposed in rejj's answer: assign a big value(for max heap) or small value(for min heap) to the element, sift it up until it is root and then delete it.
This approach still keeps the O(log(n)) complexity for middle element deletion, but the one you propose doesn't. It will have complexity O(n*log(n)) and therefor is not very good. Hope that helps.