Heap vs Binary Search Tree (BST)

AlgorithmBinary TreeHeapBinary Search-Tree

Algorithm Problem Overview


What is the difference between a heap and BST?

When to use a heap and when to use a BST?

If you want to get the elements in a sorted fashion, is BST better over heap?

Algorithm Solutions


Solution 1 - Algorithm

Summary

          Type      BST (*)   Heap
Insert    average   log(n)    1
Insert    worst     log(n)    log(n) or n (***)
Find any  worst     log(n)    n
Find max  worst     1 (**)    1
Create    worst     n log(n)  n
Delete    worst     log(n)    log(n)

All average times on this table are the same as their worst times except for Insert.

  • *: everywhere in this answer, BST == Balanced BST, since unbalanced sucks asymptotically
  • **: using a trivial modification explained in this answer
  • ***: log(n) for pointer tree heap, n for dynamic array heap

Advantages of binary heap over a BST

Advantage of BST over binary heap

  • search for arbitrary elements is O(log(n)). This is the killer feature of BSTs.

    For heap, it is O(n) in general, except for the largest element which is O(1).

"False" advantage of heap over BST

Average binary heap insert is O(1)

Sources:

Intuitive argument:

  • bottom tree levels have exponentially more elements than top levels, so new elements are almost certain to go at the bottom
  • heap insertion starts from the bottom, BST must start from the top

In a binary heap, increasing the value at a given index is also O(1) for the same reason. But if you want to do that, it is likely that you will want to keep an extra index up-to-date on heap operations https://stackoverflow.com/questions/17009056/how-to-implement-ologn-decrease-key-operation-for-min-heap-based-priority-queu e.g. for Dijkstra. Possible at no extra time cost.

GCC C++ standard library insert benchmark on real hardware

I benchmarked the C++ std::set (Red-black tree BST) and std::priority_queue (dynamic array heap) insert to see if I was right about the insert times, and this is what I got:

enter image description here

  • benchmark code
  • plot script
  • plot data
  • tested on Ubuntu 19.04, GCC 8.3.0 in a Lenovo ThinkPad P51 laptop with CPU: Intel Core i7-7820HQ CPU (4 cores / 8 threads, 2.90 GHz base, 8 MB cache), RAM: 2x Samsung M471A2K43BB1-CRC (2x 16GiB, 2400 Mbps), SSD: Samsung MZVLB512HAJQ-000L7 (512GB, 3,000 MB/s)

So clearly:

GCC C++ standard library insert benchmark on gem5

gem5 is a full system simulator, and therefore provides an infinitely accurate clock with with m5 dumpstats. So I tried to use it to estimate timings for individual inserts.

enter image description here

Interpretation:

  • heap is still constant, but now we see in more detail that there are a few lines, and each higher line is more sparse.

    This must correspond to memory access latencies are done for higher and higher inserts.

  • TODO I can't really interpret the BST fully one as it does not look so logarithmic and somewhat more constant.

    With this greater detail however we can see can also see a few distinct lines, but I'm not sure what they represent: I would expect the bottom line to be thinner, since we insert top bottom?

Benchmarked with this Buildroot setup on an aarch64 HPI CPU.

BST cannot be efficiently implemented on an array

Heap operations only need to bubble up or down a single tree branch, so O(log(n)) worst case swaps, O(1) average.

Keeping a BST balanced requires tree rotations, which can change the top element for another one, and would require moving the entire array around (O(n)).

Heaps can be efficiently implemented on an array

Parent and children indexes can be computed from the current index as shown here.

There are no balancing operations like BST.

Delete min is the most worrying operation as it has to be top down. But it can always be done by "percolating down" a single branch of the heap as explained here. This leads to an O(log(n)) worst case, since the heap is always well balanced.

If you are inserting a single node for every one you remove, then you lose the advantage of the asymptotic O(1) average insert that heaps provide as the delete would dominate, and you might as well use a BST. Dijkstra however updates nodes several times for each removal, so we are fine.

Dynamic array heaps vs pointer tree heaps

Heaps can be efficiently implemented on top of pointer heaps: https://stackoverflow.com/questions/19720438/is-it-possible-to-make-efficient-pointer-based-binary-heap-implementations

The dynamic array implementation is more space efficient. Suppose that each heap element contains just a pointer to a struct:

  • the tree implementation must store three pointers for each element: parent, left child and right child. So the memory usage is always 4n (3 tree pointers + 1 struct pointer).

    Tree BSTs would also need further balancing information, e.g. black-red-ness.

  • the dynamic array implementation can be of size 2n just after a doubling. So on average it is going to be 1.5n.

On the other hand, the tree heap has better worst case insert, because copying the backing dynamic array to double its size takes O(n) worst case, while the tree heap just does new small allocations for each node.

Still, the backing array doubling is O(1) amortized, so it comes down to a maximum latency consideration. Mentioned here.

Philosophy

  • BSTs maintain a global property between a parent and all descendants (left smaller, right bigger).

    The top node of a BST is the middle element, which requires global knowledge to maintain (knowing how many smaller and larger elements are there).

    This global property is more expensive to maintain (log n insert), but gives more powerful searches (log n search).

  • Heaps maintain a local property between parent and direct children (parent > children).

    The top node of a heap is the big element, which only requires local knowledge to maintain (knowing your parent).

Comparing BST vs Heap vs Hashmap:

  • BST: can either be either a reasonable:

    • unordered set (a structure that determines if an element was previously inserted or not). But hashmap tends to be better due to O(1) amortized insert.
    • sorting machine. But heap is generally better at that, which is why heapsort is much more widely known than tree sort
  • heap: is just a sorting machine. Cannot be an efficient unordered set, because you can only check for the smallest/largest element fast.

  • hash map: can only be an unordered set, not an efficient sorting machine, because the hashing mixes up any ordering.

Doubly-linked list

A doubly linked list can be seen as subset of the heap where first item has greatest priority, so let's compare them here as well:

  • insertion:
    • position:
      • doubly linked list: the inserted item must be either the first or last, as we only have pointers to those elements (unless we have a pointer to the position of interest e.g. during iteration)
      • binary heap: the inserted item can end up in any position. Less restrictive than linked list.
    • time:
      • doubly linked list: O(1) worst case since we have pointers to the items, and the update is really simple
      • binary heap: O(1) average, thus worse than linked list. Tradeoff for having more general insertion position.
  • search: O(n) for both

An use case for this is when the key of the heap is the current timestamp: in that case, new entries will always go to the beginning of the list. So we can even forget the exact timestamp altogether, and just keep the position in the list as the priority.

This can be used to implement an LRU cache. Just like for heap applications like Dijkstra, you will want to keep an additional hashmap from the key to the corresponding node of the list, to find which node to update quickly.

Comparison of different Balanced BST

Although the asymptotic insert and find times for all data structures that are commonly classified as "Balanced BSTs" that I've seen so far is the same, different BBSTs do have different trade-offs. I haven't fully studied this yet, but it would be good to summarize these trade-offs here:

  • Red-black tree. Appears to be the most commonly used BBST as of 2019, e.g. it is the one used by the GCC 8.3.0 C++ implementation
  • AVL tree. Appears to be a bit more balanced than BST, so it could be better for find latency, at the cost of slightly more expensive finds. Wiki summarizes: "AVL trees are often compared with red–black trees because both support the same set of operations and take [the same] time for the basic operations. For lookup-intensive applications, AVL trees are faster than red–black trees because they are more strictly balanced. Similar to red–black trees, AVL trees are height-balanced. Both are, in general, neither weight-balanced nor mu-balanced for any mu < 1/2; that is, sibling nodes can have hugely differing numbers of descendants."
  • WAVL. The original paper mentions advantages of that version in terms of bounds on rebalancing and rotation operations.

See also

Similar question on CS: https://cs.stackexchange.com/questions/27860/whats-the-difference-between-a-binary-search-tree-and-a-binary-heap

Solution 2 - Algorithm

Heap just guarantees that elements on higher levels are greater (for max-heap) or smaller (for min-heap) than elements on lower levels, whereas BST guarantees order (from "left" to "right"). If you want sorted elements, go with BST.

Solution 3 - Algorithm

> When to use a heap and when to use a BST

Heap is better at findMin/findMax (O(1)), while BST is good at all finds (O(logN)). Insert is O(logN) for both structures. If you only care about findMin/findMax (e.g. priority-related), go with heap. If you want everything sorted, go with BST.

First few slides from here explain things very clearly.

Solution 4 - Algorithm

As mentioned by others, Heap can do findMin or findMax in O(1) but not both in the same data structure. However I disagree that Heap is better in findMin/findMax. In fact, with a slight modification, the BST can do both findMin and findMax in O(1).

In this modified BST, you keep track of the the min node and max node everytime you do an operation that can potentially modify the data structure. For example in insert operation you can check if the min value is larger than the newly inserted value, then assign the min value to the newly added node. The same technique can be applied on the max value. Hence, this BST contain these information which you can retrieve them in O(1). (same as binary heap)

In this BST (Balanced BST), when you pop min or pop max, the next min value to be assigned is the successor of the min node, whereas the next max value to be assigned is the predecessor of the max node. Thus it perform in O(1). However we need to re-balance the tree, thus it will still run O(log n). (same as binary heap)

I would be interested to hear your thought in the comment below. Thanks :)

Update

Cross reference to similar question https://stackoverflow.com/q/7878622/764592 for more discussion on simulating Heap using BST.

Solution 5 - Algorithm

A binary search tree uses the definition: that for every node,the node to the left of it has a less value(key) and the node to the right of it has a greater value(key).

Where as the heap,being an implementation of a binary tree uses the following definition:

If A and B are nodes, where B is the child node of A,then the value(key) of A must be larger than or equal to the value(key) of B.That is, key(A) ≥ key(B).

http://wiki.answers.com/Q/Difference_between_binary_search_tree_and_heap_tree

I ran in the same question today for my exam and I got it right. smile ... :)

Solution 6 - Algorithm

Another use of BST over Heap; because of an important difference :

  • finding successor and predecessor in a BST will take O(h) time. ( O(logn) in balanced BST)
  • while in Heap, would take O(n) time to find successor or predecessor of some element.

Use of BST over a Heap: Now, Lets say we use a data structure to store landing time of flights. We cannot schedule a flight to land if difference in landing times is less than 'd'. And assume many flights have been scheduled to land in a data structure(BST or Heap).

Now, we want to schedule another Flight which will land at t. Hence, we need to calculate difference of t with its successor and predecessor (should be >d). Thus, we will need a BST for this, which does it fast i.e. in O(logn) if balanced.

EDITed:

Sorting BST takes O(n) time to print elements in sorted order (Inorder traversal), while Heap can do it in O(n logn) time. Heap extracts min element and re-heapifies the array, which makes it do the sorting in O(n logn) time.

Solution 7 - Algorithm

Insert all n elements from an array to BST takes O(n logn). n elemnts in an array can be inserted to a heap in O(n) time. Which gives heap a definite advantage

Solution 8 - Algorithm

Heap is a concrete data structure to implement a priority queue.

enter image description here

There is no reason to keep our branching factor fixed and equal to 2. In general we could have d-ary trees:

enter image description here

Trees that used for heap are complete trees. Heaps are created to get the highest priority element in the data. However binary search as the name says are for searching.

  • Heaps are not good for is checking whether or not an element is stored in them. You have no other choice than going through all the elements until you find the one you are looking for, or we get to the end of the array. This means a linear time algorithm. It is O(log(n)) for binary search trees.

  • Binary Search Tree doesn’t allow duplicates, however, the Heap does.

  • The Binary Search Tree is ordered, but the Heap is not.

  • Insert and remove will take O(log(n)) time in a heap. In a Binary Search Tree, this may take up to O(n) time, if the tree is completely unbalanced. In the image below you see two BSTs, right one is chained so insert and remove might take O(N) but for the left one O(Log(N))

enter image description here

  • Heap can be built in linear time, however, the BST needs O(n * log(n)) to be created.

  • Heaps are complete trees. This is very important fact. Because when you insert you have to insert from the last so that after swapping you would have complete tree structure. Similarly, when you delete, deletion starts from root, you delete the root and then to keep the complete tree structure the last element in the tree should be inserted into the root.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionkc3View Question on Stackoverflow
Solution 1 - AlgorithmCiro Santilli Путлер Капут 六四事View Answer on Stackoverflow
Solution 2 - AlgorithmDante May CodeView Answer on Stackoverflow
Solution 3 - AlgorithmxysunView Answer on Stackoverflow
Solution 4 - AlgorithmYeoView Answer on Stackoverflow
Solution 5 - AlgorithmJuneView Answer on Stackoverflow
Solution 6 - AlgorithmCODErrorView Answer on Stackoverflow
Solution 7 - AlgorithmAMRView Answer on Stackoverflow
Solution 8 - AlgorithmYilmazView Answer on Stackoverflow