Down the rabbit hole

I was searching for an algorithm that was interesting yet simple enough to explain in depth in 15 minutes, and I came across A*. I quickly found that while it is very interesting, preparing to explain it briefly was not going to happen this weekend. A* is an algorithm for finding the shortest path between two vertices in a grid, often used in video games for path-finding. However, this rabbit hole led me to other interesting things.

One critical piece to implementing A* is a priority queue. A priority queue is similar to a normal queue, except instead of first-in-first-out, it is first-in-best-out. In other words, it gives priority to items based on some criteria and spits the highest priority ones out first.

A priority queue is usually implemented using a heap. I had heard of heaps before, but never actually implemented one so that seemed like a natural direction to pursue. That’s how I came to ask, “what the *bleep* is a heap?”

The heap explained

Turns out, a heap is a data structure that keeps the smallest value (for a min-heap) or the largest value (for a max-heap) in the root of a tree (usually a binary tree).

A good analogy for a heap is to imagine a dog who comes across a random pile (heap) of food containing everything from meat to dog treats to kibble to broccoli. The dog will sift through the heap to find the tastiest bits and bring them up to the top, and the less desirable pieces will sink to the bottom. First, the dog will eat the meat. Then he/she will move on to treats, and then the kibble. The broccoli will probably be left behind.

Similarly in a heap, each time something is removed from the top, the items come up to the top in priority order, while the low-priority items sink to the bottom. This is useful for many applications, including the famous heapsort sorting algorithm and the aforementioned priority queue.

Let’s build a heap

Heaps are often implemented with a fixed number of children for each node (usually 2). Due to these constraints, nodes can be stored in a dynamic array rather than in a true nodes-and-pointers tree. An item’s parent and children are found using some arithmetic rather than storing pointers alongside the data. In a 0-indexed array, the children are at 2i + 1 and 2i + 2, and the parent is at floor((i-1)/2).

At this point though, our heap is still just an array that is trying too hard to be a tree. A few further methods are required to make it a heap.

The first consideration for any heap is the “heap property”, which determines priority of heap items. In a min-heap, the heap property is that a parent node must always have a smaller value than its children. In a max-heap the heap property is the opposite — the parent must have a larger value than its children. In my implementation, I created MinHeap and MaxHeap classes to extend my generic Heap class. The two classes differ only in their ‘match_heap_property?(parent,child)’ method which returns true if the two parameters match the respective heap property.

The sift-down operation moves an out-of-place node further down in the tree until it either is in place or is a “leaf” (node with no children). This is used in the remove operation to rearrange the tree after removing the priority element, as well as in the heapify method (further explained below).

The sift-up operation moves an out-of-place node further up the tree until it either is in place or is the root of the tree (node with no parent). This is used to rearrange the tree after an item is added to the bottom.

Both of these operations are very similar to the well-known but terribly inefficient “bubble sort” in that items are swapped up the chain to the top. The difference here is that it is done in a binary tree, and only needs to be done once to bring the desired element to the top, so the time efficiency is a much better O(log n).

Another operation, which is actually the first one done when building a heap from an array, is the heapify operation. Starting from the lowest parent (the parent node of array.length-1) and iterating backwards through the array, repeatedly call sift-down on the current element. This operation, which only has to be done once, has a time efficiency of O(n).

You then have a heap which can be used for all kinds of things!

Heap sort

A heap is very closely related to “heap sort”, a classic sorting algorithm. Heap sort first does a max heapify on the array, then repeatedly swaps the first and last items. It keeps track of the border between the unsorted sub-list and the sorted sub-list which is being built. It then does a sift-down on the unsorted portion of the list. This is repeated until the list is sorted.

Check out my code on github! In it I implement min-heap, max-heap, a priority queue, and heap sort.