Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hashing.

Similar presentations


Presentation on theme: "Hashing."— Presentation transcript:

1 Hashing

2 Motivating Applications
Large collection of datasets Datasets are dynamic (insert, delete) Goal: efficient searching/insertion/deletion Hashing is ONLY applicable for exact-match searching

3 Direct Address Tables If the keys domain is U  Create an array T of size U For each key K  add the object to T[K] Supports insertion/deletion/searching in O(1)

4 Solution is to use hashing tables
Direct Address Tables Alg.: DIRECT-ADDRESS-SEARCH(T, k) return T[k] Alg.: DIRECT-ADDRESS-INSERT(T, x) T[key[x]] ← x Alg.: DIRECT-ADDRESS-DELETE(T, x) T[key[x]] ← NIL Running time for these operations: O(1) Solution is to use hashing tables Drawbacks >> If U is large, e.g., the domain of integers, then T is large (sometimes infeasible) >> Limited to integer values and does not support duplication

5 Direct Access Tables: Example
U is the domain K is the actual number of keys

6 Hashing A data structure that maps values from a certain domain or range to another domain or range Hash function 3 15 Domain: String values 20 55 Domain: Integer values

7 Hashing A data structure that maps values from a certain domain or range to another domain or range Hash function Student IDs 950000 ….. 960000 Range ….. 10000 Domain: numbers [950,000 … 960,000] Domain: numbers [0 … 10,000]

8 Hash Tables When K is much smaller than U, a hash table requires much less space than a direct-address table Can reduce storage requirements to |K| Can still get O(1) search time, but on the average case, not the worst case

9 Hash Tables: Main Idea Use a hash function h to compute the slot for each key k Store the element in slot h(k) Maintain a hash table of size m  T [0…m-1] A hash function h transforms a key into an index in a hash table T[0…m-1]: h : U → {0, 1, , m - 1} We say that k hashes to slot h(k)

10 Hash Tables: Main Idea Hash Table (of size m) U (universe of keys)
U (universe of keys) h(k1) h(k4) k1 K (actual keys) k4 k2 h(k2) = h(k5) k5 k3 h(k3) m - 1 >> m is much smaller that U (m <<U) >> m can be even smaller than |K|

11 Example Back to the example of 100 students, each with 9-digit SSN
All what we need is a hash table of size 100

12 What About Collisions Collisions!
U (universe of keys) h(k1) h(k4) k1 K (actual keys) k4 k2 h(k2) = h(k5) Collisions! k5 k3 h(k3) m - 1 Collision means two or more keys will go to the same slot

13 Handling Collisions Many ways to handle it Chaining Open addressing
Linear probing Quadratic probing Double hashing

14 Chaining: Main Idea Put all elements that hash to the same slot into a linked list (Chain) Slot j contains a pointer to the head of the list of all elements that hash to j

15 Chaining - Discussion Choosing the size of the hash table
Small enough not to waste space Large enough such that lists remain short Typically 10% -20% of the total number of elements How should we keep the lists: ordered or not? Usually each list is unsorted linked list

16 Insertion in Hash Tables
Alg.: CHAINED-HASH-INSERT(T, x) insert x at the head of list T[h(key[x])] Worst-case running time is O(1) May or may not allow duplication based on the application

17 Deletion in Hash Tables
Alg.: CHAINED-HASH-DELETE(T, x) delete x from the list T[h(key[x])] Need to find the element to be deleted. Worst-case running time: Deletion depends on searching the corresponding list

18 Searching in Hash Tables
Alg.: CHAINED-HASH-SEARCH(T, k) search for an element with key k in list T[h(k)] Running time is proportional to the length of the list of elements in slot h(k) What is the worst case and average case??

19 Analysis of Hashing with Chaining: Worst Case
m - 1 T chain All keys will go to only one chain Chain size is O(n) Searching is O(n) + time to apply h(k)

20 Analysis of Hashing with Chaining: Average Case
m - 1 T chain With good hash function and uniform distribution of keys Any given element is equally likely to hash into any of the m slots All chain will have similar sizes Assume n (total # of keys), m is the hash table size Average chain size  O (n/m) Average Search Time O(n/m): The common case

21 Analysis of Hashing with Chaining: Average Case
If m (# of slots) is proportional to n (# of keys): m = O(n) n/m = O(1)  Searching takes constant time on average

22 Hash Functions

23 Hash Functions A hash function transforms a key (k) into a table address (0…m-1) What makes a good hash function? (1) Easy to compute (2) Approximates a random function: for every input, every output is equally likely (simple uniform hashing) (3) Reduces the number of collisions

24 Hash Functions Make table size (m) a prime number Common function
Goal: Map a key k into one of the m slots in the hash table Make table size (m) a prime number Avoids even and power-of-2 numbers Common function h(k) = F(k) mod m Some function or operation on K (usually generates an integer) The output of the “mod” is number [0…m-1]

25 Examples of Hash Functions
Collection of images F(k): Sum of the pixels colors h(k) = F(k) mod m Collection of strings F(k): Sum of the ascii values h(k) = F(k) mod m Collection of numbers F(k): just return k h(k) = F(k) mod m


Download ppt "Hashing."

Similar presentations


Ads by Google