Presentation is loading. Please wait.

Presentation is loading. Please wait.

1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.

Similar presentations


Presentation on theme: "1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables."— Presentation transcript:

1 1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables

2 1.2 Data Structure and Algorithm Introduction A hash table is a generalization of the simpler notion of an ordinary array Searching for an element in a hash table can take as long as searching for an element in an array/linked list i.e. O(N) time in the worst case. But under reasonable assumptions, hashing takes O(1) time to search an element in a hash table.

3 1.3 Data Structure and Algorithm Dictionary/Table Keys Given a student ID find the record (entry)

4 1.4 Data Structure and Algorithm Direct Addressing Direct Addressing is a simple technique that works well when the universe U of keys is reasonably small. Let Universe U= (0,1,…….m-1} where m is not too large We may assume that no two elements have the same key. To represent the dynamic set, we use an array or direct address table T[0..m-1] in which each position or slot correspond to a key in the universe U

5 1.5 Data Structure and Algorithm Direct Addressing k4k4 k5k5 U (universe of keys) actual keys 5 3 8 2 1 9 4 0 7 6 9 88 7 6 55 4 33 22 1 0 key data T Slot k points to an element in the set with key k If the set contains no element with key k, then T[k]=nil

6 1.6 Data Structure and Algorithm Direct addressing is a simple technique that works well when the universe of keys is small. Assuming each key corresponds to a unique slot. Direct-Address-Search(T,k) return T[k] Direct-Address-Insert(T,x) return T[key[x]]  x Direct-Address-Delete(T,x) return T[key[x]]  Nil 1 7 5 0 1 2 3 4 5 6 7 / / / / / 1 5 7 entry Direct-address Table O(1) time for all operations

7 1.7 Data Structure and Algorithm The Problem With Direct Addressing If the universe U is large, storing a table T of size |U| may be impractical or even impossible. Furthermore, the set K of keys actually stored may be so small relative to U that most of the space allocated for T would be wasted. Solution: map keys to smaller range 0..m-1 This mapping is called a hash function

8 1.8 Data Structure and Algorithm Hash function Hash function h maps the universe U of keys into slots of a hash table T [0..m-1]: h : U {0,1,….m-1} But two keys may hash to the same slot – a collision T 0 m - 1 h(k 1 ) h(k 4 ) h(k 2 ) = h(k 5 ) h(k 3 ) k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys)

9 1.9 Data Structure and Algorithm Next Problem But two keys may hash to the same slot – a collision T 0 m - 1 h(k 1 ) h(k 4 ) h(k 2 ) = h(k 5 ) h(k 3 ) k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys)

10 1.10 Data Structure and Algorithm Resolving Collisions How can we solve the problem of collisions? Solution 1: chaining Solution 2: open addressing

11 1.11 Data Structure and Algorithm Chaining Chaining puts elements that hash to the same slot in a linked list: —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k4k4 k1k1 —— k7k7 k3k3 k8k8 k6k6 k5k5 k2k2

12 1.12 Data Structure and Algorithm Chaining (insert at the head) —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 ——

13 1.13 Data Structure and Algorithm Chaining (insert at the head) —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 —— k2k2 k3k3

14 1.14 Data Structure and Algorithm Chaining (insert at the head) —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 —— k2k2 k3k3 k4k4 k1k1

15 1.15 Data Structure and Algorithm Chaining (insert at the head) —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k1k1 —— k2k2 k3k3 k4k4 k1k1 k5k5 k2k2 k6k6

16 1.16 Data Structure and Algorithm Chaining (Insert to the head) —— T k4k4 k2k2 k3k3 k1k1 k5k5 U (universe of keys) K (actual keys) k6k6 k8k8 k7k7 k4k4 k1k1 —— k7k7 k3k3 k8k8 k6k6 k5k5 k2k2

17 1.17 Data Structure and Algorithm Operations Direct-Hash-Search(T,k) Search for an element with key k in list T[h(k)] (running time is proportional to length of the list) Direct-Hash-Insert(T,x) (worst case O(1)) Insert x at the head of the list T[h(key[x])] Direct-Hash-Delete(T,x) Delete x from the list T[h(key[x])] (same as searching)

18 1.18 Data Structure and Algorithm Open Addressing Basic idea (details in Section 12.4):  To insert: if slot is full, try another slot, …, until an open slot is found (probing)  To search, follow same sequence of probes as would be used when inserting the element  If reach element with correct key, return it  If reach a NULL pointer, element is not in table Good for fixed sets (adding but no deletion) Table needn’t be much bigger than n

19 1.19 Data Structure and Algorithm Choosing A Hash Function Choosing the hash function well is crucial  Bad hash function puts all elements in same slot  A good hash function:  Should distribute keys uniformly into slots  Should not depend on patterns in the data Three popular methods:  Division method  Multiplication method  Universal hashing

20 1.20 Data Structure and Algorithm The Division Method h(k) = k mod m  hash k into a table with m slots using the slot given by the remainder of k divided by m Elements with adjacent keys hashed to different slots: good If keys bear relation to m: bad In Practice: pick table size m = prime number not too close to a power of 2 (or 10)


Download ppt "1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables."

Similar presentations


Ads by Google