Presentation is loading. Please wait.

Presentation is loading. Please wait.

Building Java Programs Chapter 18 Advanced Data Structures: Hashing and Heaps Copyright (c) Pearson 2013. All rights reserved.

Similar presentations


Presentation on theme: "Building Java Programs Chapter 18 Advanced Data Structures: Hashing and Heaps Copyright (c) Pearson 2013. All rights reserved."— Presentation transcript:

1 Building Java Programs Chapter 18 Advanced Data Structures: Hashing and Heaps Copyright (c) Pearson 2013. All rights reserved.

2 Hashing Reading: 18.1

3 3 Recall: ADTs abstract data type (ADT): A specification of a collection of data and the operations that can be performed on it. –Describes what a collection does, not how it does it. Java's collection framework describes ADTs with interfaces: –Collection, Deque, List, Map, Queue, Set, SortedMap An ADT can be implemented in multiple ways by classes: –ArrayList and LinkedList implement List –HashSet and TreeSet implement Set –LinkedList, ArrayDeque, etc.implement Queue

4 4 SearchTree as a set We implemented a class SearchTree to store a BST of int s: Our BST is essentially a set of integers. Operations we support: –add –contains –remove... But there are other ways to implement a set... 9160 8729 55 42-3 overallRoot

5 5 Sets set: A collection of unique values (no duplicates allowed) that can perform the following operations efficiently: –add, remove, search (contains) –The client doesn't think of a set as having indexes; we just add things to the set in general and don't worry about order set.contains("to") true set "the" "of" "from" "to" "she" "you" "him" "why" "in" "down" "by" "if" set.contains("be")false

6 6 Int Set ADT interface Let's think about how to write our own implementation of a set. –To simplify the problem, we only store int s in our set for now. –As is (usually) done in the Java Collection Framework, we will define sets as an ADT by creating a Set interface. –Core operations are: add, contains, remove. public interface IntSet { void add(int value); boolean contains(int value); void clear(); boolean isEmpty(); void remove(int value); int size(); }

7 7 Unfilled array set Consider storing a set in an unfilled array. –It doesn't really matter what order the elements appear in a set, so long as they can be added and searched quickly. –What would make a good ordering for the elements? If we store them in the next available index, as in a list,... –set.add(9); set.add(23); set.add(8); set.add(-3); set.add(49); set.add(12); –How efficient is add ? contains ? remove ? O(1), O(N), O(N) ( contains must loop over the array; remove must shift elements.) inde x 0123456789 valu e 92323 8-34949 1212 0000 size6

8 8 Sorted array set Suppose we store the elements in an unfilled array, but in sorted order rather than order of insertion. –set.add(9); set.add(23); set.add(8); set.add(-3); set.add(49); set.add(12); –How efficient is add ? contains ? remove ? O(N), O(log N), O(N) (You can do an O(log N) binary search to find elements in contains, and to find the proper index in add / remove ; but add / remove still need to shift elements right/left to make room, which is O(N) on average.) inde x 0123456789 valu e -3891212 2323 4949 0000 size6

9 9 A strange idea Silly idea: When client adds value i, store it at index i in the array. –Would this work? –Problems / drawbacks of this approach? How to work around them? set.add(7); set.add(1); set.add(9);... set.add(18); set.add(12); inde x 0123456789 valu e 0100000709 size3 index01234567890123456789 value0100000709001212 000001818 0 size5

10 10 Hashing hash: To map a large domain of values to a smaller fixed domain. –Typically, mapping a set of elements to integer indexes in an array. –Idea: Store any given element value in a particular predictable index. That way, adding / removing / looking for it are constant-time (O(1)). –hash table: An array that stores elements via hashing. hash function: An algorithm that maps values to indexes. –hash code: The output of a hash function for a given value. –In previous slide, our "hash function" was: hash(i)  i Potentially requires a large array (a.length > i). Doesn't work for negative numbers. Array could be very sparse, mostly empty (memory waste).

11 11 Improved hash function To deal with negative numbers: hash(i)  abs(i) To deal with large numbers:hash(i)  abs(i) % length set.add(37);// abs(37) % 10 == 7 set.add(-2);// abs(-2) % 10 == 2 set.add(49);// abs(49) % 10 == 9 // inside HashIntSet class private int hash(int i) { return Math.abs(i) % elements.length; } inde x 0123456789 valu e 00-200003737 04949 size3

12 12 Sketch of implementation public class HashIntSet implements IntSet { private int[] elements;... public void add(int value) { elements[hash(value)] = value; } public boolean contains(int value) { return elements[hash(value)] == value; } public void remove(int value) { elements[hash(value)] = 0; } –Runtime of add, contains, and remove : O(1) !! Are there any problems with this approach?

13 13 Collisions collision: When hash function maps 2 values to same index. set.add(11); set.add(49); set.add(24); set.add(37); set.add(54); // collides with 24! collision resolution: An algorithm for fixing collisions. inde x 0123456789 valu e 01 005454 003737 04949 size5

14 14 Probing probing: Resolving a collision by moving to another index. –linear probing: Moves to the next available index (wraps if needed). set.add(11); set.add(49); set.add(24); set.add(37); set.add(54); // collides with 24; must probe –variation: quadratic probing moves increasingly far away: +1, +4, +9,... inde x 0123456789 valu e 01 002424 5454 03737 04949 size5

15 15 Implementing HashIntSet Let's implement an int set using a hash table with linear probing. –For simplicity, assume that the set cannot store 0s for now. public class HashIntSet implements IntSet { private int[] elements; private int size; // constructs new empty set public HashIntSet() { elements = new int[10]; size = 0; } // hash function maps values to indexes private int hash(int value) { return Math.abs(value) % elements.length; }...

16 16 The add operation How do we add an element to the hash table? –Use the hash function to find the proper bucket index. –If we see a 0, put it there. –If not, move forward until we find an empty (0) index to store it. –If we see that the value is already in the table, don't re-add it. –set.add(54); // client code –set.add(14); inde x 0123456789 valu e 01 002424 5454 1414 3737 04949 size6

17 17 Implementing add How do we add an element to the hash table? public void add(int value) { int h = hash(value); while (elements[h] != 0 && elements[h] != value) { // linear probing h = (h + 1) % elements.length; // for empty slot } if (elements[h] != value) { // avoid duplicates elements[h] = value; size++; } } inde x 0123456789 valu e 01 002424 5454 03737 04949 size5

18 18 The contains operation How do we search for an element in the hash table? –Use the hash function to find the proper bucket index. –Loop forward until we either find the value, or an empty index (0). –If find the value, it is contained ( true ). If we find 0, it is not ( false ). –set.contains(24) // true –set.contains(14) // true –set.contains(35) // false inde x 0123456789 valu e 01 002424 5454 1414 3737 04949 size6

19 19 Implementing contains public boolean contains(int value) { int h = hash(value); while (elements[h] != 0) { if (elements[h] == value) { // linear probing return true; // to search } h = (h + 1) % elements.length; } return false; // not found } inde x 0123456789 valu e 01 002424 5454 03737 04949 size5

20 20 The remove operation We cannot remove by simply zeroing out an element: set.remove(54); // set index 5 to 0 set.contains(14) // false??? oops Instead, we replace it by a special "removed" placeholder value –(can be re-used on add, but keep searching on contains ) inde x 0123456789 valu e 01 002424 01414 3434 04949 size5 inde x 0123456789 valu e 01 002424X 1414 3434 04949 size5

21 21 Implementing remove public void remove(int value) { int h = hash(value); while (elements[h] != 0 && elements[h] != value) { h = (h + 1) % elements.length; } if (elements[h] == value) { elements[h] = -999; // "removed" flag value size--; } } set.remove(54); // client code set.remove(11); set.remove(34); inde x 0123456789 valu e 0110024- 999 1434049 size5

22 22 Patching add, contains private static final int REMOVED = -999; public void add(int value) { int h = hash(value); while (elements[h] != 0 && elements[h] != value && elements[h] != REMOVED) { h = (h + 1) % elements.length; } if (elements[h] != value) { elements[h] = value; size++; } // contains does not need patching; // it should keep going on a -999, which it already does public boolean contains(int value) { int h = hash(value); while (elements[h] != 0 && elements[h] != value) { h = (h + 1) % elements.length; } return elements[h] == value; }

23 23 Problem: full array clustering: Clumps of elements at neighboring indexes. –Slows down the hash table lookup; you must loop through them. set.add(11); set.add(49); set.add(24); set.add(37); set.add(54); // collides with 24 set.add(14); // collides with 24, then 54 set.add(86); // collides with 14, then 37 Where does each value go in the array? How many indexes must be examined to answer contains(94) ? What will happen if the array completely fills? inde x 0123456789 valu e 0000000000 size0

24 24 Rehashing rehash: Growing to a larger array when the table is too full. –Cannot simply copy the old array to a new one. (Why not?) load factor: ratio of (# of elements ) / (hash table length ) –many collections rehash when load factor ≅.75 inde x 01234567890123456789 valu e 00002424 06 04848 001 005454 9595 1414 3737 00 size8 inde x 0123456789 valu e 95951 002424 5454 1414 37376 4848 size8

25 25 Implementing rehash // Grows hash table to twice its original size. private void rehash() { int[] old = elements; elements = new int[2 * old.length]; size = 0; for (int value : old) { if (value != 0 && value != REMOVED) { add(value); } public void add(int value) { if ((double) size / elements.length >= 0.75) { rehash(); }... }

26 26 Hash table sizes Can use prime numbers as hash table sizes to reduce collisions. Also improves spread / reduces clustering on rehash. set.add(11); // 11 % 13 == 11 set.add(39); // 39 % 13 == 0 set.add(21); // 21 % 13 == 8 set.add(29); // 29 % 13 == 3 set.add(71); // 81 % 13 == 6 set.add(41); // 41 % 13 == 2 set.add(99); // 101 % 13 == 10 index0123456789101112 value 39041290071021010 1 110 size7

27 27 Other details How would we implement toString on our HashIntSet ? System.out.println(set); // [11, 24, 54, 37, 49] inde x 0123456789 valu e 01 002424 5454 03737 04949 size5

28 28 Separate chaining separate chaining: Solving collisions by storing a list at each index. –add/contains/remove must traverse lists, but the lists are short –impossible to "run out" of indexes, unlike with probing private class Node { public int data; public Node next;... } index0123456789 value 54 14 2411749

29 29 Implementing HashIntSet Let's implement a hash set of int s using separate chaining. public class HashIntSet implements IntSet { // array of linked lists; // elements[i] = front of list #i (null if empty) private Node[] elements; private int size; // constructs new empty set public HashIntSet() { elements = new Node[10]; size = 0; } // hash function maps values to indexes private int hash(int value) { return Math.abs(value) % elements.length; }...

30 30 The add operation How do we add an element to the hash table? –When you want to modify a linked list, you must either change the list's front reference, or the next field of a node in the list. –Where in the list should we add the new element? –Must make sure to avoid duplicates. –set.add(24); index0123456789 value 54 14 24 11749 new node

31 31 Implementing add public void add(int value) { if (!contains(value)) { int h = hash(value); // add to front Node newNode = new Node(value); // of list #h newNode.next = elements[h]; elements[h] = newNode; size++; }

32 32 The contains operation How do we search for an element in the hash table? –Must loop through the linked list for the appropriate hash index, looking for the desired value. –Looping through a linked list requires a "current" node reference. –set.contains(14) // true –set.contains(84) // false –set.contains(53) // false index0123456789 value 54 14 2411749 current

33 33 Implementing contains public boolean contains(int value) { Node current = elements[hash(value)]; while (current != null) { if (current.data == value) { return true; } current = current.next; } return false; }

34 34 The remove operation How do we remove an element from the hash table? –Cases to consider: front (24), non-front (14), not found (94), null (32) –To remove a node from a linked list, you must either change the list's front reference, or the next field of the previous node in the list. –set.remove(54); index0123456789 value 54 14 2411749 current

35 35 Implementing remove public void remove(int value) { int h = hash(value); if (elements[h] != null && elements[h].data == value) { elements[h] = elements[h].next; // front case size--; } else { Node current = elements[h]; // non-front case while (current != null && current.next != null) { if (current.next.data == value) { current.next = current.next.next; size--; return; } current = current.next; }

36 36 Rehashing w/ chaining Separate chaining handles rehashing similarly to linear probing. –Loop over the list in each hash bucket; re-add each element. –An optimal implementation re-uses node objects, but this is optional. inde x 0123456789 valu e 1 2424 5454 1414 74949 inde x 012345678910101 1212 1313 1414 1515 1616 1717 1818 1919 valu e 1 2424 1414 74949 5454

37 37 Hash set of objects public class HashSet implements Set {... private class Node { public E data; public Node next; } It is easy to hash an integer i (use index abs(i) % length ). –How can we hash other types of values (such as objects)?

38 38 The hashCode method All Java objects contain the following method: public int hashCode() Returns an integer hash code for this object. –We can call hashCode on any object to find its preferred index. –HashSet, HashMap, and the other built-in "hash" collections call hashCode internally on their elements to store the data. We can modify our set's hash function to be the following: private int hash(E e) { return Math.abs(e.hashCode()) % elements.length; }

39 39 Issues with generics You must make an unusual cast on your array of generic nodes: public class HashSet implements Set { private Node[] elements;... public HashSet() { elements = (Node[]) new HashSet.Node[10]; } Perform all element comparisons using equals : public boolean contains(int value) {... // if (current.data == value) { if (current.data.equals(value)) { return true; }...

40 40 Implementing hashCode You can write your own hashCode methods in classes you write. –All classes come with a default version based on memory address. –Your overridden version should somehow "add up" the object's state. Often you scale/multiply parts of the result to distribute the results. public class Point { private int x; private int y;... public int hashCode() { // better than just returning (x + y); // spreads out numbers, fewer collisions return 137 * x + 23 * y; }

41 41 Good hashCode behavior A well-written hashCode method has: –Consistently with itself (must produce same results on each call): o.hashCode() == o.hashCode(), if o 's state doesn't change –Consistently with equality: a.equals(b) must imply that a.hashCode() == b.hashCode(), !a.equals(b) does NOT necessarily imply that a.hashCode() != b.hashCode() (why not?) When your class has an equals or hashCode, it should have both. –Good distribution of hash codes: For a large set of objects with distinct states, they will generally return unique hash codes rather than all colliding into the same hash bucket.

42 42 Example: String hashCode The hashCode function inside a String object looks like this: public int hashCode() { int hash = 0; for (int i = 0; i < this.length(); i++) { hash = 31 * hash + this.charAt(i); } return hash; } –As with any general hashing function, collisions are possible. Example: "Ea" and "FB" have the same hash value. –Early versions of the Java examined only the first 16 characters. For some common data this led to poor hash table performance.

43 43 hashCode tricks If one of your object's fields is an object, call its hashCode : public int hashCode() { // Student return 531 * firstName.hashCode() +...; To incorporate a double or boolean, use the hashCode method from the Double or Boolean wrapper classes: public int hashCode() { // BankAccount return 37 * Double.valueOf(balance).hashCode() + Boolean.valueOf(isCheckingAccount).hashCode(); Guava includes an Objects.hashCode(...) method that takes any number of values and combines them into one hash code. public int hashCode() { // BankAccount return Objects.hashCode(name, id, balance);

44 44 Implementing a hash map A hash map is like a set where the nodes store key/value pairs: public class HashMap implements Map {... } // key value map.put("Marty", 14); map.put("Jeff", 21); map.put("Kasey", 20); map.put("Stef", 35); –Must modify your Node class to store a key and a value index0123456789 value "Jeff"21 "Marty"14 "Kasey"20 "Stef " 35

45 45 Map ADT interface Let's think about how to write our own implementation of a map. –As is (usually) done in the Java Collection Framework, we will define map as an ADT by creating a Map interface. –Core operations: put (add), get, contains key, remove public interface Map { void clear(); boolean containsKey(K key); V get(K key); boolean isEmpty(); void put(K key, V value); void remove(int value); int size(); }

46 46 Hash map vs. hash set –The hashing is always done on the keys, not the values. –The contains method is now containsKey ; there and in remove, you search for a node whose key matches a given key. –The add method is now put ; if the given key is already there, you must replace its old value with the new one. map.put("Bill", 66); // replace 49 with 66 index0123456789 value "Jeff"21 "Marty"14 "Kasey"20 "Stef " 35 66 "Abby"57 "Bill"49

47 Priority Queues and Heaps Reading: 18.2

48 48 Prioritization problems print jobs: CSE lab printers constantly accept and complete jobs from all over the building. We want to print faculty jobs before staff before student jobs, and grad students before undergrad, etc. ER scheduling: Scheduling patients for treatment in the ER. A gunshot victim should be treated sooner than a guy with a cold, regardless of arrival time. How do we always choose the most urgent case when new patients continue to arrive? key operations we want: –add an element (print job, patient, etc.) –get/remove the most "important" or "urgent" element

49 49 Priority Queue ADT priority queue: A collection of ordered elements that provides fast access to the minimum (or maximum) element. –add adds in order –peek returns minimum or "highest priority" value –remove removes/returns minimum value –isEmpty, clear, size, iterator O(1) pq.add("if"); pq.add("from");... priority queue "the" "of" "from" "to" "she" "you" "him" "why" "in" "down" "by" "if" pq.remove() "by"

50 50 Unfilled array? Consider using an unfilled array to implement a priority queue. –add :Store it in the next available index, as in a list. –peek :Loop over elements to find minimum element. –remove :Loop over elements to find min. Shift to remove. queue.add(9); queue.add(23); queue.add(8); queue.add(-3); queue.add(49); queue.add(12); queue.remove(); –How efficient is add ? peek ? remove ? O(1), O(N), O(N) ( peek must loop over the array; remove must shift elements) inde x 0123456789 valu e 92323 8-3-3 4949 1212 0000 size6

51 51 Sorted array? Consider using a sorted array to implement a priority queue. –add :Store it in the proper index to maintain sorted order. –peek :Minimum element is in index [0]. –remove :Shift elements to remove min from index [0]. queue.add(9); queue.add(23); queue.add(8); queue.add(-3); queue.add(49); queue.add(12); queue.remove(); –How efficient is add ? peek ? remove ? O(N), O(1), O(N) ( add and remove must shift elements) inde x 0123456789 valu e -3-3 891212 2323 4949 0000 size6

52 52 Linked list? Consider using a doubly linked list to implement a priority queue. –add :Store it at the end of the linked list. –peek :Loop over elements to find minimum element. –remove :Loop over elements to find min. Unlink to remove. queue.add(9); queue.add(23); queue.add(8); queue.add(-3); queue.add(49); queue.add(12); queue.remove(); –How efficient is add ? peek ? remove ? O(1), O(N), O(N) ( peek and remove must loop over the linked list) 9 2323 8-34949 1212 frontback

53 53 Sorted linked list? Consider using a sorted linked list to implement a priority queue. –add :Store it in the proper place to maintain sorted order. –peek :Minimum element is at the front. –remove :Unlink front element to remove. queue.add(9); queue.add(23); queue.add(8); queue.add(-3); queue.add(49); queue.add(12); queue.remove(); –How efficient is add ? peek ? remove ? O(N), O(1), O(1) ( add must loop over the linked list to find the proper insertion point) -3 891212 2323 4949 frontback

54 54 Binary search tree? Consider using a binary search tree to implement a PQ. –add :Store it in the proper BST L/R - ordered spot. –peek :Minimum element is at the far left edge of the tree. –remove :Unlink far left element to remove. queue.add(9); queue.add(23); queue.add(8); queue.add(-3); queue.add(49); queue.add(12); queue.remove(); –How efficient is add ? peek ? remove ? O(log N), O(log N), O(log N)...? (good in theory, but the tree tends to become unbalanced to the right) 49 -3 23 8 9 12

55 55 Unbalanced binary tree queue.add(9); queue.add(23); queue.add(8); queue.add(-3); queue.add(49); queue.add(12); queue.remove(); queue.add(16); queue.add(34); queue.remove(); queue.remove(); queue.add(42); queue.add(45); queue.remove(); –Simulate these operations. What is the tree's shape? –A tree that is unbalanced has a height close to N rather than log N, which breaks the expected runtime of many operations. 49 23 12 16 34 42 45

56 56 Heaps heap: A complete binary tree with vertical ordering. –complete tree: Every level is full except possibly the lowest level, which must be filled from left to right (i.e., a node may not have any children until all possible siblings exist)

57 57 Heap ordering heap ordering: If P ≤ X for every element X with parent P. –Parents' values are always smaller than those of their children. –Implies that minimum element is always the root (a "min-heap"). variation: "max-heap" stores largest element at root, reverses ordering –Is a heap a BST? How are they related?

58 58 Which are min-heaps? 1530 8020 10 996040 8020 10 50700 85 996040 8020 10 50 700 85 996040 8010 20 50700 85 6040 8020 10 996040 8020 10 no

59 59 24 7 3 30 10 40 30 80 25 10 48 21 14 10 17 33 91828 11 22 3530 50 30 10 20 no Which are max-heaps? 59

60 60 Heap height and runtime The height of a complete tree is always log N. –How do we know this for sure? Because of this, if we implement a priority queue using a heap, we can provide the following runtime guarantees: –add :O(log N) –peek :O(1) –remove :O(log N) n-node complete tree of height h: 2 h  n  2 h+1 – 1 h =  log n 

61 61 The add operation When an element is added to a heap, where should it go? –Must insert a new node while maintaining heap properties. –queue.add(15); 996040 8020 10 50700 85 65 15 new node

62 62 The add operation When an element is added to a heap, it should be initially placed as the rightmost leaf (to maintain the completeness property). –But the heap ordering property becomes broken! 996040 8020 10 50700 85 65 996040 8020 10 50700 85 65 15

63 63 "Bubbling up" a node bubble up: To restore heap ordering, the newly added element is shifted ("bubbled") up the tree until it reaches its proper place. –Weiss: "percolate up" by swapping with its parent –How many bubble-ups are necessary, at most? 996040 8020 10 50700 85 65 15 992040 8015 10 50700 85 65 60

64 64 Bubble-up exercise Draw the tree state of a min-heap after adding these elements: –6, 50, 11, 25, 42, 20, 104, 76, 19, 55, 88, 2 1044225 619 2 7650 11 55 88 20

65 65 The peek operation A peek on a min-heap is trivial to perform. –because of heap properties, minimum element is always the root –O(1) runtime Peek on a max-heap would be O(1) as well (return max, not min) 996040 8020 10 5076 85 65

66 66 The remove operation When an element is removed from a heap, what should we do? –The root is the node to remove. How do we alter the tree? –queue.remove(); 996040 8020 10 50700 85 65

67 67 The remove operation When the root is removed from a heap, it should be initially replaced by the rightmost leaf (to maintain completeness). –But the heap ordering property becomes broken! 996040 8020 10 70050 85 65 996040 8020 65 70050 85 65

68 68 "Bubbling down" a node bubble down: To restore heap ordering, the new improper root is shifted ("bubbled") down the tree until it reaches its proper place. –Weiss: "percolate down" by swapping with its smaller child (why?) –How many bubble-down are necessary, at most? 996040 8020 65 7450 85996050 8040 20 7465 85

69 69 Bubble-down exercise Suppose we have the min-heap shown below. Show the state of the heap tree after remove has been called 3 times, and which elements are returned by the removal. 1044225 619 2 7650 11 55 88 20

70 70 Array heap implementation Though a heap is conceptually a binary tree, since it is a complete tree, when implementing it we actually can "cheat" and just use an array! –index of root = 1 (leave 0 empty to simplify the math) –for any node n at index i : index of n.left = 2i index of n.right = 2i + 1 parent index of n? –This array representation is elegant and efficient (O(1)) for common tree operations.

71 71 Implementing HeapPQ Let's implement an int priority queue using a min-heap array. public class HeapIntPriorityQueue implements IntPriorityQueue { private int[] elements; private int size; // constructs a new empty priority queue public HeapIntPriorityQueue() { elements = new int[10]; size = 0; }... }

72 72 Helper methods Since we will treat the array as a complete tree/heap, and walk up/down between parents/children, these methods are helpful: // helpers for navigating indexes up/down the tree private int parent(int index) { return index/2; } private int leftChild(int index) { return index*2; } private int rightChild(int index) { return index*2 + 1; } private boolean hasParent(int index) { return index > 1; } private boolean hasLeftChild(int index) { return leftChild(index) <= size; } private boolean hasRightChild(int index) { return rightChild(index) <= size; } private void swap(int[] a, int index1, int index2) { int temp = a[index1]; a[index1] = a[index2]; a[index2] = temp; }

73 73 Implementing add Let's write the code to add an element to the heap: public void add(int value) {... } 996040 8020 10 50700 85 65 15 992040 8015 10 50700 85 65 60

74 74 Implementing add // Adds the given value to this priority queue in order. public void add(int value) { elements[size + 1] = value; // add as rightmost leaf // "bubble up" as necessary to fix ordering int index = size + 1; boolean found = false; while (!found && hasParent(index)) { int parent = parent(index); if (elements[index] < elements[parent]) { swap(elements, index, parent(index)); index = parent(index); } else { found = true; // found proper location; stop } size++; }

75 75 Resizing a heap What if our array heap runs out of space? –We must enlarge it. –When enlarging hash sets, we needed to carefully rehash the data. –What must we do here? –(We can simply copy the data into a larger array.)

76 76 Modified add code // Adds the given value to this priority queue in order. public void add(int value) { // resize to enlarge the heap if necessary if (size == elements.length - 1) { elements = Arrays.copyOf(elements, 2 * elements.length); }... }

77 77 Implementing peek Let's write code to retrieve the minimum element in the heap: public int peek() {... } 992040 8015 10 50700 85 65 60

78 78 Implementing peek // Returns the minimum element in this priority queue. // precondition: queue is not empty public int peek() { return elements[1]; }

79 79 Implementing remove Let's write code to remove the minimum element in the heap: public int remove() {... } 996040 8020 10 70050 85 65 996040 8020 65 70050 85 65

80 80 Implementing remove public int remove() { // precondition: queue is not empty int result = elements[1]; // last leaf -> root elements[1] = elements[size]; size--; int index = 1; // "bubble down" to fix ordering boolean found = false; while (!found && hasLeftChild(index)) { int left = leftChild(index); int right = rightChild(index); int child = left; if (hasRightChild(index) && elements[right] < elements[left]) { child = right; } if (elements[index] > elements[child]) { swap(elements, index, child); index = child; } else { found = true; // found proper location; stop } return result; }

81 81 Int PQ ADT interface Let's write our own implementation of a priority queue. –To simplify the problem, we only store int s in our set for now. –As is (usually) done in the Java Collection Framework, we will define sets as an ADT by creating a Set interface. –Core operations are: add, peek (at min), remove (min). public interface IntPriorityQueue { void add(int value); void clear(); boolean isEmpty(); int peek(); // return min element int remove(); // remove/return min element int size(); }

82 82 Generic PQ ADT Let's modify our priority queue so it can store any type of data. –As with past collections, we will use Java generics (a type parameter). public interface PriorityQueue { void add(E value); void clear(); boolean isEmpty(); E peek(); // return min element E remove(); // remove/return min element int size(); }

83 83 Generic HeapPQ class We can modify our heap priority class to use generics as usual... public class HeapPriorityQueue implements PriorityQueue { private E[] elements; private int size; // constructs a new empty priority queue public HeapPriorityQueue() { elements = (E[]) new Object[10]; size = 0; }... }

84 84 Problem: ordering elements // Adds the given value to this priority queue in order. public void add(E value) {... int index = size + 1; boolean found = false; while (!found && hasParent(index)) { int parent = parent(index); if (elements[index] < elements[parent]) { // error swap(elements, index, parent(index)); index = parent(index); } else { found = true; // found proper location; stop } } } –Even changing the < to a compareTo call does not work. Java cannot be sure that type E has a compareTo method.

85 85 Comparing objects Heaps rely on being able to order their elements. Operators like do not work with objects in Java. –But we do think of some types as having an ordering (e.g. Date s). –(In other languages, we can enable with operator overloading.) natural ordering: Rules governing the relative placement of all values of a given type. –Implies a notion of equality (like equals ) but also. –total ordering: All elements can be arranged in A ≤ B ≤ C ≤... order. –The Comparable interface provides a natural ordering.

86 86 The Comparable interface The standard way for a Java class to define a comparison function for its objects is to implement the Comparable interface. public interface Comparable { public int compareTo(T other); } A call of A.compareTo( B ) should return: a value <0if A comes "before" B in the ordering, a value >0if A comes "after" B in the ordering, orexactly0if A and B are considered "equal" in the ordering. Effective Java Tip #12: Consider implementing Comparable.

87 87 Bounded type parameters –An upper bound; accepts the given supertype or any of its subtypes. –Works for multiple superclass/interfaces with & : –A lower bound; accepts the given supertype or any of its supertypes. Example: // can be instantiated with any animal type public class Nest {... }... Nest nest = new Nest ();

88 88 Corrected HeapPQ class public class HeapPriorityQueue > implements PriorityQueue { private E[] elements; private int size; // constructs a new empty priority queue public HeapPriorityQueue() { elements = (E[]) new Object[10]; size = 0; }... public void add(E value) {... while (...) { if (elements[index].compareTo( elements[parent]) < 0) { swap(...); }

89 Ordering and Comparators

90 90 What's the "natural" order? public class Rectangle implements Comparable { private int x, y, width, height; public int compareTo(Rectangle other) { //...? } } What is the "natural ordering" of rectangles? –By x, breaking ties by y? –By width, breaking ties by height? –By area? By perimeter? Do rectangles have any "natural" ordering? –Might we want to arrange rectangles into some order anyway?

91 91 Comparator interface public interface Comparator { public int compare(T first, T second); } Interface Comparator is an external object that specifies a comparison function over some other type of objects. –Allows you to define multiple orderings for the same type. –Allows you to define a specific ordering(s) for a type even if there is no obvious "natural" ordering for that type. –Allows you to externally define an ordering for a class that, for whatever reason, you are not able to modify to make it Comparable: a class that is part of the Java class libraries a class that is final and can't be extended a class from another library or author, that you don't control...

92 92 Comparator examples public class RectangleAreaComparator implements Comparator { // compare in ascending order by area (WxH) public int compare(Rectangle r1, Rectangle r2) { return r1.getArea() - r2.getArea(); } public class RectangleXYComparator implements Comparator { // compare by ascending x, break ties by y public int compare(Rectangle r1, Rectangle r2) { if (r1.getX() != r2.getX()) { return r1.getX() - r2.getX(); } else { return r1.getY() - r2.getY(); }

93 93 Using Comparators TreeSet, TreeMap, PriorityQueue can use Comparator : Comparator comp = new RectangleAreaComparator(); Set set = new TreeSet (comp); Queue pq = new PriorityQueue (10,comp); Searching and sorting methods can accept Comparator s. Arrays.binarySearch( array, value, comparator ) Arrays.sort( array, comparator ) Collections.binarySearch( list, comparator ) Collections.max( collection, comparator ) Collections.min( collection, comparator ) Collections.sort( list, comparator ) Methods are provided to reverse a Comparator 's ordering: public static Comparator Collections.reverseOrder() public static Comparator Collections.reverseOrder( comparator )

94 94 PQ and Comparator Our heap priority queue currently relies on the Comparable natural ordering of its elements: public class HeapPriorityQueue > implements PriorityQueue {... public HeapPriorityQueue() {...} } To allow other orderings, we can add a constructor that accepts a Comparator so clients can arrange elements in any order:... public HeapPriorityQueue(Comparator comp) {...}

95 95 PQ Comparator exercise Write code that stores strings in a priority queue and reads them back out in ascending order by length. –If two strings are the same length, break the tie by ABC order. Queue pq = new PriorityQueue (...); pq.add("you"); pq.add("meet"); pq.add("madam"); pq.add("sir"); pq.add("hello"); pq.add("goodbye"); while (!pq.isEmpty()) { System.out.print(pq.remove() + " "); } // sir you meet hello madam goodbye

96 96 PQ Comparator answer Use the following comparator class to organize the strings: public class LengthComparator implements Comparator { public int compare(String s1, String s2) { if (s1.length() != s2.length()) { // if lengths are unequal, compare by length return s1.length() - s2.length(); } else { // break ties by ABC order return s1.compareTo(s2); }... Queue pq = new PriorityQueue (100, new LengthComparator());

97 97 Heap sort heap sort: An algorithm to sort an array of N elements by turning the array into a heap, then calling remove N times. –The elements will come out in sorted order. –We can put them into a new sorted array. –What is the runtime?

98 98 Heap sort implementation public static void heapSort(int[] a) { PriorityQueue pq = new HeapPriorityQueue (); for (int n : a) { pq.add(a); } for (int i = 0; i < a.length; i++) { a[i] = pq.remove(); } –This code is correct and runs in O(N log N) time but wastes memory. –It makes an entire copy of the array a into the internal heap of the priority queue. –Can we perform a heap sort without making a copy of a ?

99 99 Improving the code Idea: Treat a itself as a max-heap, whose data starts at 0 (not 1). –a is not actually in heap order. –But if you repeatedly "bubble down" each non-leaf node, starting from the last one, you will eventually have a proper heap. Now that a is a valid max-heap: –Call remove repeatedly until the heap is empty. –But make it so that when an element is "removed", it is moved to the end of the array instead of completely evicted from the array. –When you are done, voila! The array is sorted.

100 100 Step 1: Build heap in-place "Bubble" down non-leaf nodes until the array is a max-heap: –int[] a = {21, 66, 40, 10, 70, 81, 30, 22, 45, 95, 88, 38}; –Swap each node with its larger child as needed. 307010 4066 21 2245 81 95 88 index0123456789012... value216 4040 1010 7070 8181 30302 4545 95958 3838 0... size12 38

101 101 Build heap in-place answer –30: nothing to do –81: nothing to do –70: swap with 95 –10: swap with 45 –40: swap with 81 –66: swap with 95, then 88 –21: swap with 95, then 88, then 70 307045 8188 95 2210 40 66 21 index0123456789012... value958 8181 4545 7070 4040 30302 10106 2121 3838 0... size12 38

102 102 Remove to sort Now that we have a max-heap, remove elements repeatedly until we have a sorted array. –Move each removed element to the end, rather than tossing it. 307045 8188 95 2210 40 66 21 index0123456789012... value958 8181 4545 7070 4040 30302 10106 2121 3838 0... size12 38

103 103 Remove to sort answer –95: move 38 up, swap with 88, 70, 66 –88: move 21 up, swap with 81, 40 –81: move 38 up, swap with 70, 66 –70: move 10 up, swap with 66, 45, 22 –... –(Notice that after 4 removes, the last 4 elements in the array are sorted. If we remove every element, the entire array will be sorted.) 303822 4045 66 1070 21 81 88 index0123456789012... value664545 40402 3838 2121 3030 1010 7070 81818 9595 0... size12 95


Download ppt "Building Java Programs Chapter 18 Advanced Data Structures: Hashing and Heaps Copyright (c) Pearson 2013. All rights reserved."

Similar presentations


Ads by Google