Presentation is loading. Please wait.

Presentation is loading. Please wait.

Heaps and Heapsort Prof. Sin-Min Lee Department of Computer Science San Jose State University.

Similar presentations


Presentation on theme: "Heaps and Heapsort Prof. Sin-Min Lee Department of Computer Science San Jose State University."— Presentation transcript:

1 Heaps and Heapsort Prof. Sin-Min Lee Department of Computer Science San Jose State University

2 Heap Definition. Adding a Node. Removing a Node. Array Implementation. Analysis Source Code. Heaps

3 What is Heap? A heap is like a binary search tree. It is a binary tree where the six comparison operators form a total order semantics that can be used to compare the node’s entries. But the arrangement of the elements in a heap follows some new rules that are different from a binary search tree.

4 Heaps A heap is a certain kind of complete binary tree. It is defined by the following properties.

5 Heaps (Priority Queues) A heap is a data structure which supports efficient implementation of three operations: insert, findMin, and deleteMin. (Such a heap is a min-heap, to be more precise.) Two obvious implementations are based on an array (assuming the size n is known) and on a linked list. The comparisons of their relative time complexity are as follows: insert findMin deleteMin Sorted array O(n) O(1) O(1) (from large (to move (find at end) (delete the end) to small) elements) Linked list O(1) O(1) O(n) (append) (a pointer to Min) (reset Min pointer)

6 Heaps 1. All leaf nodes occur at adjacent levels. When a complete binary tree is built, its first node must be the root. When a complete binary tree is built, its first node must be the root. Root

7 Heaps 2. All levels of the tree, except for the bottom level are completely filled. All nodes on the bottom level occur as far left as possible. Left child of the root The second node is always the left child of the root. The second node is always the left child of the root.

8 Heaps 3. The key of the root node is greater than or equal to the key of each child. Each subtree is also a heap. Right child of the root The third node is always the right child of the root. The third node is always the right child of the root.

9 Heaps The next nodes always fill the next. level from left-to-right. The next nodes always fill the next. level from left-to-right.

10 Heaps The next nodes always fill the next level from left-to-right. The next nodes always fill the next level from left-to-right.

11 Heaps Complete binary tree.

12 Heap Storage Rules A heap is a binary tree where the entries of the nodes can be compared with a total order semantics. In addition, these two rules are followed: The entry contained by a node is greater than or equal to the entries of the node’s children. The tree is a complete binary tree, so that every level except the deepest must contain as many nodes as possible; and at the deepest level, all the nodes are as far left as possible.

13 Heaps A heap is a certain kind of complete binary tree. 19 4222127 23 45 35 The "heap property" requires that each node's key is >= the keys of its children The "heap property" requires that each node's key is >= the keys of its children

14 Heaps This tree is not a heap. It breaks rule number 1. 19 27 23 45 35

15 Heaps This tree is not a heap. It breaks rule number 2. 21 23 45 35

16 Heaps This tree is not a heap. It breaks rule number 3. 2135 23 45 27

17 Heap implementation of priority queue Each node of the heap contains one entry along with the entry’s priority, and the tree is maintained so that it follows the heap storage rules using the entry’s priorities to compare nodes. Therefore: The entry contained by a node has a priority that is greater than or equal to the priority of the entries of the nodes children. The tree is a complete binary tree.

18 Adding a Node to a Heap ¶ Put the new node in the next available spot. 19 4222127 23 45 35 42

19 Adding a Node to a Heap ·Push the new node upward, swapping with its parent until the new node reaches an acceptable location. 19 4222142 23 45 35 27

20 Adding a Node to a Heap ·Push the new node upward, swapping with its parent until the new node reaches an acceptable location. 19 4222135 23 45 42 27

21 Adding a Node to a Heap 4The parent has a key that is >= new node, or 4The node reaches the root. ÚThe process of pushing the new node upward is called reheapification upward. 19 4222135 23 45 42 27

22 Removing the Top of a Heap ¶Move the last node onto the root. 19 4222135 23 45 42 27

23 Removing the Top of a Heap ¶Move the last node onto the root. 19 4222135 23 27 42

24 Removing the Top of a Heap ·Push the out-of- place node downward, swapping with its larger child until the new node reaches an acceptable location. 19 4222135 23 27 42

25 Removing the Top of a Heap ·Push the out-of- place node downward, swapping with its larger child until the new node reaches an acceptable location. 19 4222135 23 42 27

26 Removing the Top of a Heap ·Push the out-of- place node downward, swapping with its larger child until the new node reaches an acceptable location. 19 4222127 23 42 35

27 Removing the Top of a Heap 4The children all have keys <= the out-of-place node, or 4The node reaches the leaf. ØThe process of pushing the new node downward is called reheapification downward. 19 4222127 23 42 35

28 Adding an Entry to a Heap 519 2721422 2335 45 As an example, suppose that we already have nine entries that are arranged in a heap with the above priorities.

29 Suppose that we are adding a new entry with a priority 42.The first step is to add this entry in a way that keeps the binary tree complete. In this case the new entry will be the left child of the entry with priority 21. 519 27 21422 2335 45 42

30 The structure is no longer a heap, since the node with priority 21 has a child with a higher priority. The new entry (with priority 42) rises upward until it reaches an acceptable location. This is accomplished by swapping the new entry with its parent. 519 27 42 422 2335 45 21

31 The new entry is still bigger than its parent, so a second swap is done. 519 27 35422 23 42 45 21

32 Adding an entry to a priority queue Pseudocode for Adding an entry The priority queue has been implemented as a heap. 1.Place the new entry in the heap in the first available location.This keeps the structure as a complete binary tree, but it might no longer be a heap since the new entry might have a higher priority than its parent. 2. while (the new entry has a priority that is higher than its parent) swap the new entry with its parent.

33 Reheapification upward Now the new entry stops rising, because its parent (with priority 45) has a higher priority than the new entry. In general, the “new entry rising” stops when the new entry reaches the root. The rising process is called reheapification upward.

34 Implementing a Heap We will store the data from the nodes in a partially-filled array. An array of data 2127 23 42 35

35 Implementing a Heap Data from the root goes in the first location of the array. An array of data 2127 23 42 35 42

36 Implementing a Heap Data from the next row goes in the next two array locations. An array of data 2127 23 42 35 423523

37 Implementing a Heap Data from the next row goes in the next two array locations. Any node’s two children reside at indexes (2n) and (2n + 1) in the array. An array of data 2127 23 42 35 423523 2721

38 Implementing a Heap Data from the next row goes in the next two array locations. An array of data 2127 23 42 35 423523 2721 We don't care what's in this part of the array.

39 Important Points about the Implementation The links between the tree's nodes are not actually stored as pointers, or in any other way. The only way we "know" that "the array is a tree" is from the way we manipulate the data. An array of data 2127 23 42 35 423523 2721

40 Important Points about the Implementation If you know the index of a node, then it is easy to figure out the indexes of that node's parent and children. [1] [1] [2] [3] [4] [5] 2127 23 42 35 423523 2721

41 Removing an entry from a Heap When an entry is removed from a priority, we must always remove the entry with the highest priority - the entry that stands “on top of heap”. For example, the entry at the root, with priority 45 will be removed.

42 The highest priority item, at the root, will be removed from the heap 519 27 35422 2342 45 21 45

43 Removing an entry from a heap During the removal, we must ensure that the resulting structure remains a heap. If the root entry is the only entry in the heap, then there is really no more work to do except to decrement the member variable that is keeping track of the size of the heap. But if there are other entries in the tree, then the tree must be rearranged because a heap is not allowed to run around without a root. The rearrangement begins by moving the last entry in the last level upto the root, like this :

44 519 35422 23 42 45 21 27 21 422 23 35 42 27 519 Before the move After the move The last entry in the last level has been moved to the root. Removing an entry from a Heap

45 Heapsort Heapsort combines the time efficiency of mergesort and the storage efficiency of quicksort. Like mergesort, heapsort has a worst case running time that is O(nlogn), and like quicksort it does not require an additional array. Heapsort works by first transforming the array to be sorted into a heap.

46 Heap representation A heap is a complete binary tree, and an efficient method of representing a complete binary tree in an array follows these rules: Data from the root of the complete binary tree goes in the [0] component of the array. The data from depth one nodes is placed in the next two components of the array. We continue in this way, placing the four nodes of depth two in the next four components of the array, and so forth. For example, a heap with ten nodes can be stored in a ten element array as shown below:

47 27 45 42 23352221 5419 4527 42 21 23 22 35 19 4 5 45

48 Rules for location of data in the array The data from the root always appears in the [0] component of the array. Suppose that the data for a root node appears in the component [i] of the array. Then the data for its parent is always at location [(i - 1) / 2] (using integer division).

49 Suppose that the data for a node appears in component [i] of the array. Then its children (if they exist)always have their data at these locations: Left child at component [2i + 1] Right child at component [2i + 2]

50 The largest value in a heap is stored in the root and that in our array representation of a complete binary tree, the root node is stored in the first array position. Thus, since the array represents a heap, we know that the largest array element is in the first array position. To get the largest value in the correct array position, we simply interchange the first and the final array elements. After interchanging these two array elements, we know that the largest array element is in the final array position, as shown below:

51 The dark vertical line in the array separates the sorted side of the array from the unsorted side (on the left). Moreover the left side still represents a complete binary tree which is almost a heap. 5 27 42 21 23 22 35 19 4 45

52 When we consider the unsorted side as a complete binary tree, it is only the root that violates the heapstorage rule. This root is smaller than its children. The next step of the heapsort is to reposition this out of place value in order to restore the heap. The process called reheapification downward begins by comparing the value in the root with the value of each of its children. If one or both children are larger than the root, then the root’s value is exchanged with the larger of the two children. This moves the troublesome value down one level in the tree. For example:

53 27 22 419 212335 5 42 42 27 5 21 23 22 35 19 4 45

54 In the example 5 is still out of place, so we will once again exchange it with its largest child resulting in the array and heap shown below: 27 22 419 2123 35 5 42 42 27 35 21 23 22 5 19 4 45

55 The unsorted side of the array must be maintained a heap. When the unsorted side of the array is once again a heap, the heapsort continues by exchanging the largest element in the unsorted side with the rightmost element of the unsorted side. For example 42 is exchanged with 4 as shown below: 4 27 35 21 23 22 5 19 42 45

56 The sorted side now has two largest elements, and the unsorted side is once again almost a heap, as shown below : 4 27 35 21 23 22 5 19 42 45 5 22 35 4 27 2321 19 Only the root (4) is out of place, and that may be fixed by another reheapification downward. After the reheapification downward, the largest value of the unsorted side will once again reside at location [0]and we can continue to pull out the next largest value.

57 Pseudocode for the heapsort algorithm //Heapsort for the array called data with n elements 1. Convert the array of n elements into a heap. 2. unsorted = n; // The number of elements in the unsorted side 3. while ( unsorted > 1) { // Reduce the unsorted side by one unsorted - -; Swap data[0] with data [unsorted]. The unsorted side of the array is now a heap with the root out of place. Do a reheapification downward to turn the unsorted side back into a heap. }

58 While the Heapsort is a constant factor slower than the Quicksort for average data sets, it still has a complexity of O(n log n) for best, worst and average data. Reheapification is the process by which an out of place node is exchanged with its parent (upward) or its children (downward) until the node’s key is = to it’s children. Then reheapification is complete. Analysis

59 A simpler but related problem: Find the smallest and the second smallest elements in an array (of size n): A straightforward method: Find the smallest using n –1 comparisons; swap it to the end of the array, find the next smallest in n –2 comparisons, for a total of 2n – 3 comparisons. A method that “saves” results of some earlier comparisons: (1) Divide the array into two halves, find the smallest elements of each half, call them min1 and min2, respectively; (2) Compare min1 and min2, to find the smallest; (3) If min1 is the smallest, compare min2 with the remaining elements of the first half to determine the second smallest; similarly for the case if min2 is the smallest. The number of comparisons is (n –1) + (  n/2  –1) = (  3n/2  –2.

60 The second method can be depicted in the following figure: min1min2 min First half of arraySecond half of array In the figure, a link connects a smaller element to a larger one going downward; for example, min  min1 and min  min2, and min1  each element in the first half of the array, similarly for min2. This organization facilitates finding the smallest and second smallest elements. H1H1 H2H2

61 A binary tree data structure for Min-heaps: A binary tree is called a left-complete binary tree if (1)the tree is full at each level except possibly at the maximum level (depth), where the tree is full at level i means there are 2 i nodes at that level (recall the root is at level 0); and (2)at the tree’s maximum level h, if there are fewer than 2 h nodes then the missing nodes are on the right side of the tree. Missing on the right side If there are n nodes in a left- complete binary tree of depth h, then 2 h  n  2 h+1 –1. Thus, h  lg n < (h +1). In particular, h = O(lg n).

62 A binary tree satisfies the (min-)heap-order property if the value at each is less than or equal to the value at each of the child nodes (if exist). A binary min-heap is a left-complete binary tree that also satisfies the heap-order property. 13 21 2431 16 1968 652632 Note that there is no definite order between values in sibling nodes; also, it is obvious that the minimum value is at the root. A binary min-heap

63 Two basic operations on binary min-heaps: (1)Suppose the value at a node is decreased which causes violation to the heap-order property (because the value is smaller than that of its parent’s). The operation to restore the heap property is named percolateUp (siftUp). (2)Suppose the value at a node is increased which becomes larger than either (or both) of the value of the children’s. The operation to restore the heap order is named percolateDown (siftDown). Both operations can be completed in time proportional to the tree height, thus, of time complexity O(lg n) where n is the total number of nodes. These operations are crucial in supporting the heap operations insert and deleteMin.

64 The percolateUp (siftUp) operation: 13 2135 25374312 The heap order violated at node valued 12 21 253743 13 12 35 percolateUp 21 25374335 13 12 percolateUp Note that the time of each step (each level) is constant, i.e., compare with parent and swap

65 The percolateDown (siftDown) operation: 43 2135 37254563 The heap order violated at node valued 43 Each step of percolateDown finds the smaller of the two children (if exist) then swap it with the violating node 35 37254563 43 21 Who is smaller ? 35 374563 21 25 43 percolateDown percolate- Down

66 Implementation of the heap operations: insert(): insert a node next to the last node, call percolateUp from it, treating the new node as a violation. deleteMin(): delete the root node, move the last node to the root, then call percolateDown from the root. findMin(): return the root’s value. Finally, a binary heap can be implemented using an array storing the heap elements from the root towards the leaves, level by level and from left to right within each level. The left-completeness property guarantees that there are no “holes” in the array. Also, if we use array indexes 1..n to store n elements, the parent’s index (if exists) of node i is  i/2 , the left child’s index is 2i, the right child 2i + 1 (if exist). The time complexity of deleteMin and insert both are O(lg n); the time complexity of findMin is O(1).

67 An array implementation of a binary heap: 1 23 456 Number the nodes (starting at 1) by levels, from top to bottom and left to right within level 1 2 3 4 5 6..... Level 2 nodesLevel 1 nodesRoot Parent to children (index i to 2i, 2i+1)

68 Convert an array T[1..n] into a heap (known as heapify): A top-down method: repeatedly call insert() by inserting T[i] into a heap T[1..(i –1)], for i = 2 to n. 21 129 137 Initially, T[1] by itself ia a heap 12 219 137 Insert 12 into T[1..1], making T[1..2] a heap 9 2112 137 9 12 217 7 912 2113 Insert 9 into T[1..2], making T[1..3] a heap Insert 13 into T[1..3], making T[1..4} a heap Insert 7 into T[1..4], making T[1..5] a heap The total time complexity of top-down heapify is O(n lg n).

69 A bottom-up heapify method: 129 137 21 For i =  n/2  down to 1 /* percolate down T[i], making the subtree rooted at i a heap */ call percolateDown(i) n = 5, start at node  n/2 , the node of the highest index that has any children Complexity analysis: Node1 travels down h levels during percolateDown, nodes 2 and 3 each (h –1) levels, nodes 4 through 7 each (h –2) levels, etc. The total number of levels traveled is  h + 2(h –1) + 4 (h –2) + …+ 2 h –1 (h – (h – 1)) = 2 h –1 (1 + 2(1/2) + 3(1/2) 2 + 4(1/2) 3 + …)  2n, because 2 h –1 < n/2 since h  lg n, and the infinite series 1 + 2x + 3x 2 + … = 1/(1 –x) 2 when |x| < 1. Thus, the time complexity of heapify is O(n).


Download ppt "Heaps and Heapsort Prof. Sin-Min Lee Department of Computer Science San Jose State University."

Similar presentations


Ads by Google