Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advanced Algorithms Analysis and Design Lecture 10 Hashing,Heaps and Binomial trees.

Similar presentations


Presentation on theme: "Advanced Algorithms Analysis and Design Lecture 10 Hashing,Heaps and Binomial trees."— Presentation transcript:

1 Advanced Algorithms Analysis and Design Lecture 10 Hashing,Heaps and Binomial trees

2 HASHING

3 Hash Tables All search structures so far Relied on a comparison operation Performance O(n) or O( log n) Assume I have a function f ( key )  integer ie one that maps a key to an integer What performance might I expect now?

4 Hash Tables - Keys are integers Need a hash function h( key )  integer ie one that maps a key to an integer Applying this function to the key produces an address If h maps each key to a unique integer in the range 0.. m-1 then search is O(1)

5 Hash Tables - Hash functions Form of the hash function Example - using an n -character key int hash( char *s, int n ) { int sum = 0; while( n-- ) sum = sum + *s++; return sum % 256; } returns a value in 0.. 255 xor function is also commonly used sum = sum ^ *s++; Example hash( “AB”, 2 ) and hash( “BA”, 2 ) return the same value! This is called a collision A variety of techniques are used for resolving collisions

6 6 Hashing: Collision Resolution Schemes Collision Resolution Techniques Separate Chaining Separate Chaining with String Keys Separate Chaining versus Open-addressing Implementation of Separate Chaining Introduction to Collision Resolution using Open Addressing Linear Probing

7 7 Collision Resolution Techniques There are two broad ways of collision resolution: 1. Separate Chaining:: An array of linked list implementation. 2. Open Addressing: Array-based implementation. (i) Linear probing (linear search) (ii) Quadratic probing (nonlinear search) (iii) Double hashing (uses two hash functions)

8 8 Separate Chaining The hash table is implemented as an array of linked lists. Inserting an item, r, that hashes at index i is simply insertion into the linked list at position i. identicals are chained in the same linked list.

9 9 Separate Chaining (cont’d) Retrieval of an item, r, with hash address, i, is simply retrieval from the linked list at position i. Deletion of an item, r, with hash address, i, is simply deleting r from the linked list at position i. Example: Load the keys 23, 13, 21, 14, 7, 8, and 15, in this order, in a hash table of size 7 using separate chaining with the hash function: h(key) = key % 7 h(23) = 23 % 7 = 2 h(13) = 13 % 7 = 6 h(21) = 21 % 7 = 0 h(14) = 14 % 7 = 0 collision h(7) = 7 % 7 = 0 collision h(8) = 8 % 7 = 1 h(15) = 15 % 7 = 1 collision

10 10 Separate Chaining with String Keys Recall that search keys can be numbers, strings or some other object. A hash function for a string s = c0c1c2…cn-1 can be defined as: hash = (c 0 + c 1 + c 2 + … + c n-1 ) % tableSize this can be implemented as: Example: The following class describes commodity items: public static int hash(String key, int tableSize){ int hashValue = 0; for (int i = 0; i < key.length(); i++){ hashValue += key.charAt(i); } return hashValue % tableSize; } class CommodityItem { String name;// commodity name int quantity;// commodity quantity needed double price;// commodity price }

11 Separate Chaining with String Keys (cont’d) Use the hash function hash to load the following commodity items into a hash table of size 13 using separate chaining: onion1 10.0 tomato1 8.50 cabbage3 3.50 carrot1 5.50 okra1 6.50 mellon2 10.0 potato2 7.50 Banana 3 4.00 olive2 15.0 salt2 2.50 cucumber3 4.50 mushroom3 5.50 orange2 3.00 Solution: hash(onion) = (111 + 110 + 105 + 111 + 110) % 13 = 547 % 13 = 1 hash(salt) = (115 + 97 + 108 + 116) % 13 = 436 % 13 = 7 hash(orange) = (111 + 114 + 97 + 110 + 103 + 101)%13 = 636 %13 = 12

12 12 Separate Chaining with String Keys (cont’d) 0 1 2 3 4 5 6 7 8 9 10 11 12 onion okra mellon banana tomatoolive cucumber mushroom salt cabbage carrot potato orange ItemQtyPrice h(key) onion1 10.0 1 tomato1 8.50 10 cabbage3 3.50 4 carrot1 5.50 1 okra1 6.50 0 mellon2 10.0 10 potato2 7.50 0 Banana 3 4.0 11 olive2 15.0 10 salt2 2.50 7 cucumber3 4.50 9 mushroom3 5.50 6 orange2 3.00 12

13 13 Separate Chaining versus Open-addressing Organization Advantages Disadvantages Chaining Unlimited number of elements Unlimited number of collisions Overhead of multiple linked lists

14 14 Introduction to Open Addressing All items are stored in the hash table itself. In addition to the cell data (if any), each cell keeps one of the three states: EMPTY, OCCUPIED, DELETED. While inserting, if a collision occurs, alternative cells are tried until an empty cell is found. Deletion: (lazy deletion): When a key is deleted the slot is marked as DELETED rather than EMPTY otherwise subsequent searches that hash at the deleted cell will be unsuccessful. Probe sequence: A probe sequence is the sequence of array indexes that is followed in searching for an empty cell during an insertion, or in searching for a key during find or delete operations. The most common probe sequences are of the form: h i (key) = [h(key) + c(i)] % n, for i = 0, 1, …, n-1. where h is a hash function and n is the size of the hash table The function c(i) is required to have the following two properties: Property 1: c(0) = 0 Property 2: The set of values {c(0) % n, c(1) % n, c(2) % n,..., c(n-1) % n} must be a permutation of {0, 1, 2,..., n – 1}, that is, it must contain every integer between 0 and n - 1 inclusive.

15 15 Introduction to Open Addressing (cont’d) The function c(i) is used to resolve collisions. To insert item r, we examine array location h 0 (r) = h(r). If there is a collision, array locations h 1 (r), h 2 (r),..., h n-1 (r) are examined until an empty slot is found. Similarly, to find item r, we examine the same sequence of locations in the same order. Note: For a given hash function h(key), the only difference in the open addressing collision resolution techniques (linear probing, quadratic probing and double hashing) is in the definition of the function c(i). Common definitions of c(i) are: Collision resolution techniquec(i) Linear probingi Quadratic probing±i 2 Double hashingi*h p (key) where h p (key) is another hash function.

16 16 Introduction to Open Addressing (cont'd) Advantages of Open addressing: All items are stored in the hash table itself. There is no need for another data structure(MEANS NO LINKLIST). Open addressing is more efficient storage-wise. Disadvantages of Open Addressing: The keys of the objects to be hashed must be distinct. Dependent on choosing a proper table size. Requires the use of a three-state (Occupied, Empty, or Deleted) flag in each cell.

17 Open Addressing Facts In general, the best table size is most important. With any open addressing method of collision resolution, as the table fills, there can be a severe degradation in the table performance. Hashing has two parameters that affect its performance: initial capacity and load factor. The capacity is the number of buckets in the hash table, and the initial capacity is simply the capacity at the time the hash table is created. The load factor is a measure of how full the hash table is allowed to get before its capacity is automatically increased. i.e When the number of entries in the hash table exceeds the product of the load factor and the current capacity, the capacity is roughly doubled by calling the rehash method. As a general rule, the default load factor (.75) offers a good tradeoff between time and space costs. The load factor of the table is m/N, where m is the number of distinct indexes used in the table or is the number of records currently in the table. and N is the size of the array used to implement it. Load factors between 0.6 and 0.7 are common. Load factors > 0.7 are undesirable.

18 18 Open Addressing : Linear Probing (cont’d) Example: Perform the operations given below, in the given order, on an initially empty hash table of size 13 using linear probing with c(i) = i and the hash function: h(key) = key % 13: insert(18), insert(26), insert(35), insert(9), find(15), find(48), delete(35), delete(40), find(9), insert(64), insert(47), find(35) The required probe sequences are given by: h i (key) = (h(key) + i) % 13 i = 0, 1, 2,..., 12

19 19 a IndexStatusValue 0O26 1E 2E 3E 4E 5O18 6E 7E 8O47 9D35 10O9 11E 12O64 Linear Probing (cont’d)

20 20 Disadvantage of Linear Probing: Primary Clustering Linear probing is subject to a primary clustering phenomenon. Elements tend to cluster around table locations that they originally hash to. Primary clusters can combine to form larger clusters. This leads to long search sequences and hence deterioration in hash table efficiency. Example of a primary cluster: Insert keys: 18, 41, 22, 44, 59, 32, 31, 73, in this order, in an originally empty hash table of size 13, using the hash function h(key) = key % 13 and c(i) = i: h(18) = 5 h(41) = 2 h(22) = 9 h(44) = 5+1 h(59) = 7 h(32) = 6+1+1 h(31) = 5+1+1+1+1+1 h(73) = 8+1+1+1

21 HEAPS

22 Heaps A heap is a special kind of rooted tree that can be implemented efficiently in an array without any explicit pointers. It can be used for heap sort and the efficient representation of certain dynamic priority lists, such as the event list in a simulation or the list of tasks to be scheduled by an operating system. A heap is an essentially complete binary tree

23 Heaps Figure illustrates an essentially complete binary tree containing 10 nodes. The five internal nodes occupy level 3 (the root), level 2, and the left side of level 1; the five leaves fill the right side of level 1 and then continue at the left of level 0. If an essentially complete binary tree has height k, then there is one node (the root) on level k, there are two nodes on level k-1 and so on; there are 2 k-1 nodes on level 1, and at least 1 and not more than 2 k on level 0. A heap is an essentially complete binary tree, each of whose nodes includes an element of information called the value of the node, and which has the property that the value of each internal node is greater than or equal to the values of its children.

24 An essentially complete binary tree T[1] T[3] T[2] T[4] T[8]T[9] T[5] T[10] T[6] T[7]

25 A heap 10 7 9 4 752 216 Figure shows an example of a heap with 10 nodes.

26 Heaps Now we have marked each node with its value. This same heap can be represented by the following array 10 79 4 752216 The crucial characteristic of this data structure is that the heap property can be restored efficiently if the value of a node is modified. If the value of a node increases to the extent that it becomes greater than the value of its parent, it should be sufficient to exchange these two values, and then to continue the same process upwards in the tree if necessary until the heap property is restored. The modified value is percolated up to its new position in the heap This operation is often called sifting up If the value 1 in Figure is modified so that it becomes 8, we can restore the heap property by exchanging the 8 with its parent 4, and then exchanging it again with its new parent 7.

27 The heap, after percolating 8 to its place 10 8 9 7 752 246

28 Heaps If on the contrary the value of a node is decreased so that it becomes less than the value of at least one of its children, it suffices to exchange the modified value with the larger of the values in the children, and then to continue this process downwards in the tree if necessary until the heap property is restored. The modified value has been sifted down to its new position. 9 The heap, after sifting 3 (originally 10) down to its place 8585 77327732 246246

29 Heaps The following procedures describe more formally the basic processes for manipulating a heap. Procedure alter-heap (T[1..n], i, v) {T[1..n] is a heap. The value of T[i] is set to v and the heap property is re-established. Suppose that 1≤ i ≤ n.} x ← T[i] T[i] ← v if v < x then sift-down(T,i) else percolate (T,i)

30 Procedure sift-down (T[1…n], i) {This procedure sifts node i down so as to re-establish the heap property in T[1..n]. Suppose that T would be a heap if T[i] were sufficiently large and that 1≤ i ≤ n.} k ← i repeat j ← k {find the larger child of node j} if 2j ≤ n and T[2j]> T[k] then k ← 2j if 2j T[k] then k ← 2j+1 exchange T[j] and T[k] {if j=k, then the node has arrived at its final position} until j=k

31 Procedure percolate (T[1…n], i) {This procedure percolate node i so as to re-establish the heap property in T[1..n]. Suppose that T would be a heap if T[i] were sufficiently small and that 1≤ i ≤ n. The parameter n is not used here} k ← i repeat j ← k if j > 1 and T[j ÷ 2]< T[k] then k ← j ÷ 2 exchange T[j] and T[k] {if j=k, then the node has arrived at its final position} until j=k

32 Heaps Heap is an ideal data structure for finding the largest element of a set, removing it, adding a new node, or modifying a node. These are exactly the operations we need to implement dynamic priority lists efficiently. The value of a node gives the priority of the corresponding event, the event with highest priority is always found at the root of the heap, and the priority of an event can be changed dynamically at any time. This is particularly useful in computer simulations and in the design of schedulers for an operating system. Some typical procedures are illustrated below.

33 function find-max (T[1..n]) {Returns the larges element of the heap T[1..n]} return T[1] Procedure delete-max (T[1…n]) {Removes the largest element of the heap T[1..n] and restores the heap property in T[1..n - 1]} T[1] ← T[n] sift-down( T[1..n - 1], 1)

34 Procedure insert node (T[1…n], v) {Adds an element whose value is v to the heap T[1..n] and restores the heap property in T[1..n + 1]} T[n+1] ← v percolate(T[1..n + 1], n+1)

35 Heaps There exists a cleverer algorithm for making a heap. Suppose, for example, that our starting point is the following array represented by the tree in Figure. 1 69 2 7527410 The starting situation 2 7 6 7 10 4 1 9 5252

36 Heaps We first make each of the subtrees whose roots are at level 1 into a heap, this is done by sifting down these tools, as illustrated in Figure. 7 247247 10 5252 The level 1 subtrees are made into heaps

37 Heaps This figure shows the process for the left subtree. The other subtree at level 2 is already a heap. This results in an essentially complete binary tree corresponding to the array. 1 109 7 752246 6 7 7677 24724 7 24 6 One level 2 subtree is made into a heap (the other already is a heap)

38 It only remains to sift down its root to obtain the desired heap. This process thus goes as follows: 10197752246 791752246 794752216 7 9 4 752 216

39 Construct the heap using the array A=(16, 4, 10, 14, 7, 9, 3, 2, 8, 1) 16 410 14793 281281 8 Maintaining heap property 2 The initial configuration 16 1410 793793 4141 How to Sort Heap

40 A=16 1410 8 7932 for i←n down 2 do exchange T(1) and T(i) 4 Sift-down(T[..i-1],1) 8 2 4 2 8 4 1 14 7 9 1 1 16 3 10 5757 6 9393 Make-heap (T) How to Sort Heap

41 i = 10, exchange T[1] & T[10] and sift-down (T[1..9],1) 14 810 4 7932116 12345678910 i = 9, exchange T[1] & T[9] and sift-down (T[1..8],1) 10 89 4 71321416 12345678910 i = 8, exchange T[1] & T[8] and sift-down (T[1..7],1) 9 83 4 712101416 12345678910 i = 7, exchange T[1] & T[7] and sift-down (T[1..6],1) 8 73 4 219101416 12345678910 i = 6, exchange T[1] & T[6] and sift-down (T[1..5],1) 7 4 31 289 10 1416 12345678910 i = 5, exchange T[1] & T[5] and sift-down (T[1..4],1)

42 4 23 1 789 10 1416 12345678910 i = 4, exchange T[1] & T[4] and sift-down (T[1..3],1) 1 23 4 789 10 1416 12345678910 i = 3, exchange T[1] & T[3] and sift-down (T[1..2],1) 2 13 4 789 10 1416 12345678910 i = 2, exchange T[1] & T[2] and sift-down (T[1..1],1) 1 23 4 789 10 1416 123412345678910 End of Sorting

43 Sorted heap 1 4 1014 2323 789789 16

44 B0B0 B1B1 B2B2 B3B3 B4B4 Binomial trees B 0 to B 4 Binomial trees

45

46 max 325 129 23 14 1515 20 17 8 A binomial heap containing 11 items Parent node greater than child node: Max binomial heap

47 15 1919 6 +4+4 12 1 8 3 15 9 12 648648 3 Linking two B 2 ’s to make a B 3

48 7 12 9 1310 3111merged with 6 yields91312 27111 3636 Merging two binomial heaps 258258 4 10 5858 4

49 head [H 1 ] 12 head [H 2 ] 183 37 45 55 715 252833 41 BINOMIAL-HEAP MERGE 6 8291044 302322483117 322450 Note: Check Heap type max /min before start

50 head [H] 12 18 3 73728 25 41 15 33 3023 453224 55 6 291044 8 22483117 50

51 head [H] 12 1815 2833 41 3 737 2530 4532 55 6 8291044 2322483117 2450

52 head [H] 37 41 10 2813 778 1117 27 1 161225 6 1429262318 3842 The node with value 1 to be deleted (b) head [H] 37 41 Separated into two heaps 10 2813 778 1117 27 1 161225 6 1429262318 3842

53 head [H] 37 41 10 head [H] 2813 77 2512166 18262381429 42111738 27 Node with value 1 has been deleted, two heaps H & H head [H] 25126 37 18 10 814 29 41 16 28131117 38 26237727 42 Merging heaps H & H

54 head [H] 2512 3718 41 y7y7 42 Node y value decreased from 26 to 7 head [H] 2512 3718 41 16 42 10 162813 2377 10 72813 2377 6 1429 8 111738 27 6 1429 8 111738 27

55 head [H] 25 12 3718 4110 1623 42 6 81429 7 2813111738 7727

56 head [H] 1047 61 8 2611 7519 3029 50 124 141524 13 2236273116 4443


Download ppt "Advanced Algorithms Analysis and Design Lecture 10 Hashing,Heaps and Binomial trees."

Similar presentations


Ads by Google