Download presentation
1
Hashing
2
Motivating Applications
Large collection of datasets Datasets are dynamic (insert, delete) Goal: efficient searching/insertion/deletion Hashing is ONLY applicable for exact-match searching
3
Direct Address Tables If the keys domain is U Create an array T of size U For each key K add the object to T[K] Supports insertion/deletion/searching in O(1)
4
Solution is to use hashing tables
Direct Address Tables Alg.: DIRECT-ADDRESS-SEARCH(T, k) return T[k] Alg.: DIRECT-ADDRESS-INSERT(T, x) T[key[x]] ← x Alg.: DIRECT-ADDRESS-DELETE(T, x) T[key[x]] ← NIL Running time for these operations: O(1) Solution is to use hashing tables Drawbacks >> If U is large, e.g., the domain of integers, then T is large (sometimes infeasible) >> Limited to integer values and does not support duplication
5
Direct Access Tables: Example
U is the domain K is the actual number of keys
6
Hashing A data structure that maps values from a certain domain or range to another domain or range Hash function 3 15 Domain: String values 20 55 Domain: Integer values
7
Hashing A data structure that maps values from a certain domain or range to another domain or range Hash function Student IDs 950000 ….. 960000 Range ….. 10000 Domain: numbers [950,000 … 960,000] Domain: numbers [0 … 10,000]
8
Hash Tables When K is much smaller than U, a hash table requires much less space than a direct-address table Can reduce storage requirements to |K| Can still get O(1) search time, but on the average case, not the worst case
9
Hash Tables: Main Idea Use a hash function h to compute the slot for each key k Store the element in slot h(k) Maintain a hash table of size m T [0…m-1] A hash function h transforms a key into an index in a hash table T[0…m-1]: h : U → {0, 1, , m - 1} We say that k hashes to slot h(k)
10
Hash Tables: Main Idea Hash Table (of size m) U (universe of keys)
U (universe of keys) h(k1) h(k4) k1 K (actual keys) k4 k2 h(k2) = h(k5) k5 k3 h(k3) m - 1 >> m is much smaller that U (m <<U) >> m can be even smaller than |K|
11
Example Back to the example of 100 students, each with 9-digit SSN
All what we need is a hash table of size 100
12
What About Collisions Collisions!
U (universe of keys) h(k1) h(k4) k1 K (actual keys) k4 k2 h(k2) = h(k5) Collisions! k5 k3 h(k3) m - 1 Collision means two or more keys will go to the same slot
13
Handling Collisions Many ways to handle it Chaining Open addressing
Linear probing Quadratic probing Double hashing
14
Chaining: Main Idea Put all elements that hash to the same slot into a linked list (Chain) Slot j contains a pointer to the head of the list of all elements that hash to j
15
Chaining - Discussion Choosing the size of the hash table
Small enough not to waste space Large enough such that lists remain short Typically 10% -20% of the total number of elements How should we keep the lists: ordered or not? Usually each list is unsorted linked list
16
Insertion in Hash Tables
Alg.: CHAINED-HASH-INSERT(T, x) insert x at the head of list T[h(key[x])] Worst-case running time is O(1) May or may not allow duplication based on the application
17
Deletion in Hash Tables
Alg.: CHAINED-HASH-DELETE(T, x) delete x from the list T[h(key[x])] Need to find the element to be deleted. Worst-case running time: Deletion depends on searching the corresponding list
18
Searching in Hash Tables
Alg.: CHAINED-HASH-SEARCH(T, k) search for an element with key k in list T[h(k)] Running time is proportional to the length of the list of elements in slot h(k) What is the worst case and average case??
19
Analysis of Hashing with Chaining: Worst Case
m - 1 T chain All keys will go to only one chain Chain size is O(n) Searching is O(n) + time to apply h(k)
20
Analysis of Hashing with Chaining: Average Case
m - 1 T chain With good hash function and uniform distribution of keys Any given element is equally likely to hash into any of the m slots All chain will have similar sizes Assume n (total # of keys), m is the hash table size Average chain size O (n/m) Average Search Time O(n/m): The common case
21
Analysis of Hashing with Chaining: Average Case
If m (# of slots) is proportional to n (# of keys): m = O(n) n/m = O(1) Searching takes constant time on average
22
Hash Functions
23
Hash Functions A hash function transforms a key (k) into a table address (0…m-1) What makes a good hash function? (1) Easy to compute (2) Approximates a random function: for every input, every output is equally likely (simple uniform hashing) (3) Reduces the number of collisions
24
Hash Functions Make table size (m) a prime number Common function
Goal: Map a key k into one of the m slots in the hash table Make table size (m) a prime number Avoids even and power-of-2 numbers Common function h(k) = F(k) mod m Some function or operation on K (usually generates an integer) The output of the “mod” is number [0…m-1]
25
Examples of Hash Functions
Collection of images F(k): Sum of the pixels colors h(k) = F(k) mod m Collection of strings F(k): Sum of the ascii values h(k) = F(k) mod m Collection of numbers F(k): just return k h(k) = F(k) mod m
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.