Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hash Tables Briana B. Morrison Adapted from William Collins.

Similar presentations


Presentation on theme: "Hash Tables Briana B. Morrison Adapted from William Collins."— Presentation transcript:

1 Hash Tables Briana B. Morrison Adapted from William Collins

2 Hashing 2

3 3

4 4 Sequential Search Given a vector of integers: v = {12, 15, 18, 3, 76, 9, 14, 33, 51, 44} What is the best case for sequential search? O(1) when value is the first element What is the worst case? O(n) when value is last element, or value is not in the list What is the average case? O(1/2 * n) which is O(n)

5 Hashing 5

6 6

7 7 Binary Search Given a vector of integers: v = {3, 9, 12, 14, 15, 18, 33, 44, 51, 76} What is the best case for binary search? O(1) when element is the middle element What is the worst case? O(log n) when element is first, last, or not in list What is the average case? O(log n)

8 Hashing 8

9 9

10 10

11 Hashing 11

12 Hashing 12

13 Hashing 13

14 Hashing 14

15 Hashing 15

16 Hashing 16

17 Hashing 17

18 Hashing 18

19 Hashing 19

20 Hashing 20

21 Hashing 21 Map vs. Hashmap What are the differences between a map and a hashmap? Interface Efficiency Applications Implementation

22 Hashing 22

23 Hashing 23

24 Hashing 24 CONTIGUOUS array? vector? deque? heap? LINKED Linked? list? map? BUT NONE OF THESE WILL GIVE CONSTANT AVERAGE TIME FOR SEARCHES, INSERTIONS AND REMOVALS.

25 Hashing 25

26 Hashing 26

27 Hashing 27

28 Hashing 28

29 Hashing 29

30 Hashing 30

31 Hashing 31

32 Hashing 32 To make these values fit into the table, we need to mod by the table size; i.e., key % OOPS!

33 Hashing 33

34 Hashing 34

35 Hashing 35

36 Hashing 36 Hash Codes Suppose we have a table of size N A hash code is: A number in the range 0 to N-1 We compute the hash code from the key You can think of this as a default position when inserting, or a position hint when looking up A hash function is a way of computing a hash code Desire: The set of keys should spread evenly over the N values When two keys have the same hash code: collision

37 Hashing 37 Hash Functions A hash function should be quick and easy to compute. A hash function should achieve an even distribution of the keys that actually occur across the range of indices for both random and non-random data. Calculation should involve the entire search key.

38 Hashing 38 Examples of Hash Functions Usually involves taking the key, chopping it up, mix the pieces together in various ways Examples: Truncation – ignore part of key, use the remaining part as the index Folding – partition the key into several parts and combine the parts in a convenient way (adding, etc.) After calculating the index, use modular arithmetic. Divide by the size of the index range, and take the remainder as the result

39 Hashing 39 Example Hash Function

40 Hashing 40 Devising Hash Functions Simple functions often produce many collisions... but complex functions may not be good either! It is often an empirical process Adding letter values in a string: same hash for strings with same letters in different order Better approach: size_t hash = 0; for (size_t i = 0; i < s.size(); ++i) hash = hash * 31 + s[i];

41 Hashing 41 Devising Hash Functions (2) The String hash is good in that: Every letter affects the value The order of the letters affects the value The values tend to be spread well over the integers

42 Hashing 42 Devising Hash Functions (3) Guidelines for good hash functions: Spread values evenly: as if random Cheap to compute Generally, number of possible values much greater than table size

43 Hashing 43 Hash Code Maps Memory address: We reinterpret the memory address of the key object as an integer Good in general, except for numeric and string keys Integer cast: We reinterpret the bits of the key as an integer Suitable for keys of length less than or equal to the number of bits of the integer type (e.g., char, short, int and float on many machines) Component sum: We partition the bits of the key into components of fixed length (e.g., 16 or 32 bits) and we sum the components (ignoring overflows) Suitable for numeric keys of fixed length greater than or equal to the number of bits of the integer type (e.g., long and double on many machines)

44 Hashing 44 Hash Code Maps (cont.) Polynomial accumulation: We partition the bits of the key into a sequence of components of fixed length (e.g., 8, 16 or 32 bits) a 0 a 1 … a n 1 We evaluate the polynomial p(z) a 0 a 1 z a 2 z 2 … … a n 1 z n 1 at a fixed value z, ignoring overflows Especially suitable for strings (e.g., the choice z 33 gives at most 6 collisions on a set of 50,000 English words) Polynomial p(z) can be evaluated in O(n) time using Horners rule: The following polynomials are successively computed, each from the previous one in O(1) time p 0 (z) a n 1 p i (z) a n i 1 zp i 1 (z) (i 1, 2, …, n 1) We have p(z) p n 1 (z)

45 Hashing 45

46 Hashing 46

47 Hashing 47

48 Hashing 48

49 Hashing 49

50 Hashing 50

51 Hashing 51

52 Hashing 52

53 Hashing 53

54 Hashing 54

55 Hashing 55

56 Hashing 56

57 Hashing 57

58 Hashing 58

59 Hashing 59

60 Hashing 60 Collision Handlers NOW WELL LOOK AT SPECIFIC COLLISION HANDLERS: Chaining Linear Probing (Open Addressing) Double Hashing Quadratic Hashing

61 Hashing 61 Collision Handling Collisions occur when different elements are mapped to the same cell Chaining: let each cell in the table point to a linked list of elements that map there Chaining is simple, but requires additional memory outside the table

62 Hashing 62

63 Hashing 63

64 Hashing 64 Chaining with Separate Lists Example

65 Hashing 65 Chaining Picture Two items hashed to bucket 3 Three items hashed to bucket 4

66 Hashing 66

67 Hashing 67

68 Hashing 68 FOR THE find METHOD, averageTime S (n, m) n / 2m iterations. <= 0.75 / 2 SO averageTime S (n, m) <= A CONSTANT. averageTime S (n, m) IS CONSTANT.

69 Hashing 69

70 Hashing 70

71 Hashing 71

72 Hashing 72

73 Hashing 73

74 Hashing 74

75 Hashing 75 Hash Table Using Open Probe Addressing Example Insert 45 (mod by table size … % 11)

76 Hashing 76 Hash Table Using Open Probe Addressing Example Insert 35

77 Hashing 77 Hash Table Using Open Probe Addressing Example Insert 76

78 Hashing 78 Hash Table Using Open Probe Addressing Example

79 Hashing 79 Linear Probing Open addressing: the colliding item is placed in a different cell of the table Linear probing handles collisions by placing the colliding item in the next (circularly) available table cell Each table cell inspected is referred to as a probe Colliding items lump together, causing future collisions to cause a longer sequence of probes Example: h(x) x mod 13 Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order

80 Hashing 80 WE NEED TO KNOW WHEN A SLOT IS FULL OR OCCUPIED. HOW? INSTEAD OF JUST T() STORED IN THE BUCKETS (BECAUSE T() COULD BE A VALID VALUE), THE BUCKET WILL STORE AN INSTANCE OF THE VALUE_TYPE CLASS.

81 Hashing 81

82 Hashing 82

83 Hashing 83 Retrieve What about when we want to retrieve? Consider the previous example….

84 Hashing 84 Hash Table Using Open Probe Addressing Example Find the value 35. (% 11) Now find the value 76. Now find the value 33.

85 Hashing 85 Hash Table Using Open Probe Addressing Example Now delete 35. (% 11) Now find the value 76. Now find the value 33.

86 Hashing 86 Linear Probing Probe by incrementing the index If fall off end, wrap around to the beginning Take care not to cycle forever! 1. Compute index as hash_fcn() % table.size() 2. if table[index] == NULL, item is not in the table 3. if table[index] matches item, found item (done) 4. Increment index circularly and go to 2 Why must we probe repeatedly? hashCode may produce collisions remainder by table.size may produce collisions

87 Hashing 87 Search Termination Ways to obtain proper termination Stop when you come back to your starting point Stop after probing N slots, where N is table size Stop when you reach the bottom the second time Ensure table never full Reallocate when occupancy exceeds threshold

88 Hashing 88

89 Hashing 89 Erase value false

90 Hashing 90 Now search for 460.

91 Hashing 91

92 Hashing 92 SOLUTION: bool marked_for_removal; THE CONSTRUCTOR FOR VALUE_TYPE SETS EACH bucket s marked_for_removal FIELD TO false. insert SETS marked_for_removal TO false; erase SETS marked_for_removal TO true. SO AFTER THE INSERTIONS:

93 Hashing 93

94 Hashing 94

95 Hashing 95

96 Hashing 96

97 Hashing 97 CLUSTER: A SEQUENCE OF NON-EMPTY LOCATIONS KEYS THAT HASH TO 54 FOLLOW THE SAME COLLISION-PATH AS KEYS THAT HASH TO 55, …

98 Hashing 98

99 Hashing 99

100 Hashing 100 SOLUTION 1: DOUBLE HASHING, THAT IS, OBTAIN BOTH INDICES AND OFFSETS BY HASHING: unsigned long hash_int = hash (key); int index = hash_int % length, offset = hash_int / length; NOW THE OFFSET DEPENDS ON THE KEY, SO DIFFERENT KEYS WILL USUALLY HAVE DIFFERENT OFFSETS, SO NO MORE PRIMARY CLUSTERING! Secondary hash function

101 Hashing 101 TO GET A NEW INDEX: index = (index + offset) % length; Notice that if a collision occurs, you rehash from the NEW index value.

102 Hashing 102 EXAMPLE: length = 11 key index offset WHERE WOULD THESE KEYS GO IN buckets ?

103 Hashing 103 index key

104 Hashing 104 PROBLEM: WHAT IF OFFSET IS A MULTIPLE OF length ? EXAMPLE: length = 11 key index offset // BUT 15 IS AT INDEX 4 // FOR KEY 246, NEW INDEX = (4 + 22) % 11 = 4. OOPS!

105 Hashing 105 SOLUTION : if (offset % length == 0) offset = 1; ON AVERAGE, offset % length WILL EQUAL 0 ONLY ONCE IN EVERY length TIMES.

106 Hashing 106 FINAL PROBLEM: WHAT IF length HAS SEVERAL FACTORS? EXAMPLE: length = 20 key index offset // BUT 30 IS AT INDEX 10 FOR KEY 110, NEW INDEX = (10 + 5) % 20 = 15, WHICH IS OCCUPIED, SO NEW INDEX = (15 + 5) % 20, WHICH IS OCCUPIED, SO NEW INDEX =...

107 Hashing 107 SOLUTION: MAKE length A PRIME.

108 Hashing 108 Consider a hash table storing integer keys that handles collision with double hashing N 13 h(k) k mod 13 d(k) 7 k mod 7 Insert keys 18, 41, 22, 44, 59, 32, 31, 73, in this order Example of Double Hashing

109 Hashing 109

110 Hashing 110 ANOTHER SOLUTION: QUADRATIC HASHING, THAT IS, ONCE COLLISION OCCURS AT h, GO TO LOCATION h + 1, THEN IF COLLISION OCCURS THERE GO TO LOCATION h + 4, then h + 9, then h + 16, etc. unsigned long hash_int = hash (key); int index = hash_int % length, offset = i 2 ; Notice that h stays at the same location. No clustering.

111 Hashing 111 QUADRATIC REHASHING EXAMPLE: length = 11 key index offset , final place index = , final place index = , final place index =

112 Hashing 112 Performance HOW DOES DOUBLE-HASHING COMPARE WITH CHAINED HASHING?

113 Hashing 113 Performance of Hash Tables Load factor = # filled cells / table size Between 0 and 1 Load factor has greatest effect on performance Lower load factor better performance Reduce collisions in sparsely populated tables Knuth gives expected # probes p for open addressing, linear probing, load factor L: p = ½(1 + 1/(1-L)) As L approaches 1, this zooms up For chaining, p = 1 + (L/2) Note: Here L can be greater than 1!

114 Hashing 114 Performance of Hash Tables (2)

115 Hashing 115 Performance of Hash Tables (3) Hash table: Insert: average O(1) Search: average O(1) Sorted array: Insert: average O(n) Search: average O(log n) Binary Search Tree: Insert: average O(log n) Search: average O(log n) But balanced trees can guarantee O(log n)

116 Hashing 116 We know that hashing becomes inefficient as the table fills up. What to do? EXPAND!

117 Hashing 117

118 Hashing 118

119 Hashing 119

120 Hashing 120

121 Hashing 121 Summary Slide 1 §- Hash Table - simulates the fastest searching technique, knowing the index of the required value in a vector and array and apply the index to access the value, by applying a hash function that converts the data to an integer - After obtaining an index by dividing the value from the hash function by the table size and taking the remainder, access the table. Normally, the number of elements in the table is much smaller than the number of distinct data values, so collisions occur. - To handle collisions, we must place a value that collides with an existing table element into the table in such a way that we can efficiently access it later.

122 Hashing 122 Summary Slide 2 §- Hash Table (Cont…) - average running time for a search of a hash table is O(1) - the worst case is O(n)

123 Hashing 123 Summary Slide 3 §- Collision Resolution - Types: 1) linear open probe addressing - the table is a vector or array of static size - After using the hash function to compute a table index, look up the entry in the table. - If the values match, perform an update if necessary. - If the table entry is empty, insert the value in the table.

124 Hashing 124 Summary Slide 4 §- Collision Resolution (Cont…) - Types: 1) linear open probe addressing - Otherwise, probe forward circularly, looking for a match or an empty table slot. - If the probe returns to the original starting point, the table is full. - you can search table items that hashed to different table locations. - Deleting an item difficult.

125 Hashing 125 Summary Slide 5 §- Collision Resolution (Cont…) 2) chaining with separate lists. - the hash table is a vector of list objects - Each list is a sequence of colliding items. - After applying the hash function to compute the table index, search the list for the data value. - If it is found, update its value; otherwise, insert the value at the back of the list. - you search only items that collided at the same table location

126 Hashing 126 Summary Slide 6 §- Collision Resolution (Cont…) - there is no limitation on the number of values in the table, and deleting an item from the table involves only erasing it from its corresponding list


Download ppt "Hash Tables Briana B. Morrison Adapted from William Collins."

Similar presentations


Ads by Google