CS 261 – Data Structures BuildHeap and Heap Sort
Heap Implementation: Constructors void buildHeap(struct dyArray * data) { int I; int max = dyArraySize(data); // All nodes greater than max/2 - 1 are leaves and thus adhere to the heap property! for (i = max / 2 - 1; i >= 0; i--) adjustHeap(data, max, i); // Make subtree rooted at i a heap. } At the beginning, only the leaves are proper heaps: –Leaves are all nodes with indices greater than max / 2 At each step, the subtree rooted at index i becomes a heap
Heap Implementation: Build heap void buildHeap(struct vecctor * data) { int I; int max = vectorSize(data); // All nodes greater than max/2 - 1 are leaves and thus adhere to the heap property! for (i = max / 2 - 1; i >= 0; i--) adjustHeap(data, max, i); // Make subtree rooted at i a heap. } For all subtrees that are are not already heaps (initially, all inner, or non-leaf, nodes) : –Call adjustHeap with the largest node index that is not already guaranteed to be a heap –Iterate until the root node becomes a heap Why call adjustHeap with the largest non-heap node? –Because its children, having larger indices, are already guaranteed to be heaps
Heap Implementation: adjustHeap max i ( max/2-1 ) Already heaps (leaf nodes) adjustHeap First iteration: adjust largest non-leaf node (index 4)
Heap Implementation: adjustHeap (cont.) max i (no adjustment needed) Already heaps adjustHeap Second iteration: adjust largest non-heap node (index 3)
Heap Implementation: adjustHeap (cont.) max i Already heaps adjustHeap Third iteration: adjust largest non-heap node (index 2)
Heap Implementation: adjustHeap (cont.) max i Already heaps adjustHeap Fourth iteration: adjust largest non-heap node (index 1)
Heap Implementation: adjustHeap (cont.) max i Already heaps Fifth iteration: adjust largest non-heap node (index 0 root) adjustHeap
Heap Implementation: adjustHeap (cont.) Already heaps (entire tree)
Heap Sort - Basic idea Build the initial heap Repeately –Remove the smallest element (root) –Rebuild the heap So the heap starts out with n elements, then has n-1, then n-2, and so on Where should you store the elements that are removed? Why not put them in the other end of the array? (The one that is no longer being used for the heap).
Heap Implementation: sort void heapSort(struct dyArray * data) { int i; buildHeap(data); // Build initial heap. for (i = dyArraySize(data)–1; i > 0; i--) { // For each of the n elements: dyArraySwap(data,0, i); // Swap last element with the first (smellest) element adjustHeap(data, i, 0); // Rebuild heap property. } Sorts the data in descending order (from largest to smallest) : –Builds heap from initial (unsorted) data –Iteratively swaps the smallest element (at index 0) with last unsorted element –Adjust the heap after each swap, but only considers the unsorted data
View from Middle of Execution
Heap Analysis: sort Execution time: –Build heap: n calls to adjustHeap = n log n –Loop: n calls to adjustHeap = n log n –Total: 2n log n = O(n log n) Advantages/disadvantages: –Same average as merge sort and quick sort –Doesn’t require extra space as the merge sort does –Doesn’t suffer if data is already sorted or mostly sorted
On the worksheet I’m having you represent the heap as a tree (easier to visualize than the array representation) Build the initial heap Then repeately remove the smallest element, then build the heap again, until there are no elements in the tree.