Presentation is loading. Please wait.

Presentation is loading. Please wait.

Hashing. Motivating Applications Large collection of datasets Datasets are dynamic (insert, delete) Goal: efficient searching/insertion/deletion Hashing.

Similar presentations


Presentation on theme: "Hashing. Motivating Applications Large collection of datasets Datasets are dynamic (insert, delete) Goal: efficient searching/insertion/deletion Hashing."— Presentation transcript:

1 Hashing

2 Motivating Applications Large collection of datasets Datasets are dynamic (insert, delete) Goal: efficient searching/insertion/deletion Hashing is ONLY applicable for exact-match searching

3 Direct Address Tables If the keys domain is U Create an array T of size U For each key K add the object to T[K] Supports insertion/deletion/searching in O(1)

4 Direct Address Tables Alg.: DIRECT-ADDRESS-SEARCH( T, k ) return T[k] Alg.: DIRECT-ADDRESS-INSERT( T, x ) T[key[x]] x Alg.: DIRECT-ADDRESS-DELETE( T, x ) T[key[x]] NIL Running time for these operations: O(1) Drawbacks >> If U is large, e.g., the domain of integers, then T is large (sometimes infeasible) >> Limited to integer values and does not support duplication Drawbacks >> If U is large, e.g., the domain of integers, then T is large (sometimes infeasible) >> Limited to integer values and does not support duplication Solution is to use hashing tables

5 Direct Access Tables: Example Example 1: Example 2: U is the domain K is the actual number of keys

6 Hashing A data structure that maps values from a certain domain or range to another domain or range Hash function Domain: String values Domain: Integer values

7 Hashing A data structure that maps values from a certain domain or range to another domain or range Hash function Domain: numbers [950,000 … 960,000] Student IDs … Domain: numbers [0 … 10,000] Range 0 …

8 Hash Tables When K is much smaller than U, a hash table requires much less space than a direct-address table – Can reduce storage requirements to |K| – Can still get O(1) search time, but on the average case, not the worst case

9 Hash Tables: Main Idea Use a hash function h to compute the slot for each key k Store the element in slot h(k) Maintain a hash table of size m T [0…m-1] A hash function h transforms a key into an index in a hash table T[0…m-1]: h : U {0, 1,..., m - 1} We say that k hashes to slot h(k)

10 Hash Tables: Main Idea U (universe of keys) K (actual keys) 0 m - 1 h(k 3 ) h(k 2 ) = h(k 5 ) h(k 1 ) h(k 4 ) k1k1 k4k4 k2k2 k5k5 k3k3 Hash Table (of size m) >> m is much smaller that U (m <> m can be even smaller than |K|

11 Example Back to the example of 100 students, each with 9-digit SSN All what we need is a hash table of size 100

12 What About Collisions U (universe of keys) K (actual keys) 0 m - 1 h(k 3 ) h(k 2 ) = h(k 5 ) h(k 1 ) h(k 4 ) k1k1 k4k4 k2k2 k5k5 k3k3 Collisions! Collision means two or more keys will go to the same slot

13 Handling Collisions Many ways to handle it – Chaining – Open addressing Linear probing Quadratic probing Double hashing

14 Chaining: Main Idea Put all elements that hash to the same slot into a linked list (Chain) Slot j contains a pointer to the head of the list of all elements that hash to j

15 Chaining - Discussion Choosing the size of the hash table – Small enough not to waste space – Large enough such that lists remain short – Typically 10% -20% of the total number of elements How should we keep the lists: ordered or not? – Usually each list is unsorted linked list

16 Insertion in Hash Tables Alg.: CHAINED-HASH-INSERT( T, x ) insert x at the head of list T[h(key[x])] Worst-case running time is O(1) May or may not allow duplication based on the application

17 Deletion in Hash Tables Alg.: CHAINED-HASH-DELETE(T, x) delete x from the list T[h(key[x])] Need to find the element to be deleted. Worst-case running time: – Deletion depends on searching the corresponding list

18 Searching in Hash Tables Alg.: CHAINED-HASH-SEARCH(T, k) search for an element with key k in list T[h(k)] Running time is proportional to the length of the list of elements in slot h(k) What is the worst case and average case??

19 Analysis of Hashing with Chaining: Worst Case All keys will go to only one chain Chain size is O(n) Searching is O(n) + time to apply h(k) 0 m - 1 T chain

20 Analysis of Hashing with Chaining: Average Case With good hash function and uniform distribution of keys – Any given element is equally likely to hash into any of the m slots All chain will have similar sizes Assume n (total # of keys), m is the hash table size – Average chain size O (n/m) 0 m - 1 T chain Average Search Time O(n/m): The common case

21 If m (# of slots) is proportional to n (# of keys): – m = O(n) – n/m = O(1) Searching takes constant time on average Analysis of Hashing with Chaining: Average Case

22 Hash Functions

23 A hash function transforms a key (k) into a table address (0…m-1) What makes a good hash function? (1) Easy to compute (2) Approximates a random function: for every input, every output is equally likely (simple uniform hashing) (3) Reduces the number of collisions

24 Hash Functions Goal: Map a key k into one of the m slots in the hash table Make table size (m) a prime number – Avoids even and power-of-2 numbers Common function h(k) = F(k) mod m Some function or operation on K (usually generates an integer) The output of the mod is number [0…m-1]

25 Examples of Hash Functions Collection of images F(k): Sum of the pixels colors h(k) = F(k) mod m Collection of strings F(k): Sum of the ascii values h(k) = F(k) mod m Collection of numbers F(k): just return k h(k) = F(k) mod m


Download ppt "Hashing. Motivating Applications Large collection of datasets Datasets are dynamic (insert, delete) Goal: efficient searching/insertion/deletion Hashing."

Similar presentations


Ads by Google