Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)

CSC 231: Introduction to Data Structures Search, Hash, Sort CSC 231 Data Structures Jack Tompkins

Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element) length –1 elements (e.g., strings) hash table

Hashing, hash functions
The idea: somehow we map every element into some index in the list "hash" it This is its one and only place that it should go Lookup becomes constant-time: simply look at that one slot again later to see if the element is there add, remove, contains all become O(1) !

If the key is an integer …
For now, let's look at integers a "hash function" h for int is trivial: store int i at index i (a direct mapping) If i >= len(alist), Then store i at index(i % len(alist)) h(i) = i % len(alist)

Hash function example Elements = Integers h(i) = i % 10
Add 41, 34, 7, and 18 constant-time lookup: just look at i % 10 again later We lose all ordering information: Here’s what you can’t do quickly: getMin, getMax, removeMin, removeMax printing items in sorted order 1 41 2 3 4 34 5 6 7 8 18 9

Hash collisions 1 21 2 3 4 34 5 6 7 8 18 9 Collision: the event that two hash table elements map into the same slot in the list Example: add 41, 34, 7, 18, then 21 21 hashes into the same slot as 41! 21 should not replace 41 in the hash table; they should both be there Collision resolution: means for fixing collisions in a hash table

Separate Chaining Chaining: All keys that map to the same hash value are kept in a linked list 1 2 3 4 5 6 7 8 9 10 22 12 42 107

Linear probing 1 41 2 21 3 4 34 5 6 7 8 18 9 57 Linear probing: resolving collisions in slot i by putting the colliding element into the next available slot (i+1, i+2, ...) Add 41, 34, 7, 18, then 21, then 57 21 collides (41 is already there), so we search ahead until we find empty slot 2 57 collides (7 is already there), so we search ahead twice until we find empty slot 9 Lookup algorithm becomes slightly modified; we have to loop now until we find the element or an empty slot What happens when the table gets mostly full?

Hash function in action
Add these elements to the hash table: 89 18 49 58 9

Clustering problem 49 1 58 2 9 3 4 5 6 7 8 18 89 Clustering: nodes being placed close together by probing, which degrades hash table's performance Add 89, 18, 49, 58, 9 Now searching for the value 28 will have to check half the hash table! no longer constant time...

Quadratic probing Quadratic probing: resolving collisions on slot i by putting the colliding element into slot i+1, i+4, i+9, i+16, ... add 89, 18, 49, 58, 9 49 collides (89 is already there), so we search ahead by +1 to empty slot 0 58 collides (18 is already there), so we search ahead by +1 to occupied slot 9, then +4 to empty slot 2 9 collides (89 is already there), so we search ahead by +1 to occupied slot 0, then +4 to empty slot 3 Clustering is reduced 49 1 2 58 3 9 4 5 6 7 8 18 89

Quadratic probing in action

Load factor load factor: ratio of elements to capacity
1 41 2 21 3 4 34 5 6 7 8 18 9 57 load factor: ratio of elements to capacity The book uses the symbol lamda (λ) for the load factor load factor = size / capacity λ = 6 / λ = 0.6

Increasing the hash table size
If the load factor is high, increase the size of a hash table's list, and re-store all of the items into the new list using the hash function Can we just copy the old contents to the larger array? When should we do this? Some options: When load reaches a certain level (e.g.,  = 0.5) When an insertion fails

Why Increase the Size? What is the cost (Big-O) of increasing?
What is a good hash table list size? How much bigger should a hash table get when it grows?

Hash table removal lazy removal: instead of actually removing elements, replace them with a special REMOVED value avoids expensive re-shuffling of elements on remove example: remove > 1 41 2 21 3 4 34 5 6 7 8 REMOVED 9 57

Lazy Removal Lookup algorithm becomes slightly modified What should we do when we hit a slot containing the REMOVED value? Keep going Add algorithm becomes slightly modified use that slot, replace REMOVED with the new value add(17) --> slot 8 1 41 2 21 3 4 34 5 6 7 8 REMOVED 9 57

Hashing practice problem
Draw a diagram of the state of a hash table of size 10, initially empty, after adding the following elements: 7, 84, 31, 57, 44, 19, 27, 14, and 64 Assume that the hash table uses linear probing. Repeat the above problem using quadratic probing.

Writing a hash function
If we write a hash table that can store objects, we need a hash function for the objects, so that we know what index to store them We want a hash function to: Be simple/fast to compute Map equal elements to the same index Map different elements to different indexes Have keys distributed evenly among indexes

Hash functions Would Social Security numbers be a good hash value for a database of students? Student names? Student ID numbers?

Folding Method Hash function for integers
Break key into several equal parts and add together. Then divide by len(slots) = 210. If we assume our hash table has 11 slots, then we need to perform the extra step of dividing by % 11 is 1 So the phone number hashes to slot 1

Mid-Square Method We first square the item, and then extract some portion of the resulting digits. For example, if the item were 44, we would first compute 442=1,936. By extracting the middle two digits, 93, and performing the remainder step, we get 5 (93 % 11)

Hash function for strings
Elements = Strings Let's view a string by its letters: String s : s0, s1, s2, …, sn-1 How do we map a string into an integer index? (how do we "hash" it?) One possible hash function: Treat first character as an int, and hash on that h(s) = s0 % len(slots) is this a good hash function? When will strings collide?

Better string hash functions
View a string by its letters: String s : s0, s1, s2, …, sn-1 Treat each character as an int, sum them, and hash on that h(s) = % len(slots) What's wrong with this hash function? When will strings collide?

An even better function
Third option: perform a weighted sum of the letters, and hash on that h(s) = % len(slots)

Analysis of hash table search
load: the load  of a hash table is the ratio:  no. of elements  array size

Analysis of hash table search
Analysis of search, with chaining: unsuccessful:  (the average length of a list at hash(i)) Successful: 1 + (/2) (one node, plus half the avg. length of a list (not including the item))

Analysis of Hashing with Linear Probing
Average # comparisons, with linear probing: unsuccessful:  successful: 

Analysis of Hashing with Chaining
Average # comparisons, with chaining: unsuccessful:  successful: 

Linear Probing

Linear Probing Detail

λ vs Avg # Comparisons

Making the list bigger When the load factor exceeds a threshold, double the table size (smallest prime > 2 * old table size). Rehash each record in the old table into the new table. Expensive: O(N) work done in copying. However, if the threshold is large (e.g., ½), then we need to rehash only once per O(N) insertions, so the cost is “amortized” constant-time.

Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)

Similar presentations

Presentation on theme: "Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)

Similar presentations

Presentation on theme: "Hash tables Hash table: a list of some fixed size, that positions elements according to an algorithm called a hash function … hash function h(element)"— Presentation transcript:

Similar presentations

About project

Feedback