11.Hash Tables Hsu, Lih-Hsing. Computer Theory Lab. Chapter 11P.2 11.1 Directed-address tables Direct addressing is a simple technique that works well.

Slides:



Advertisements
Similar presentations
Chapter 11. Hash Tables.
Advertisements

Hash Table.
Hash Tables.
Hash Tables CIS 606 Spring 2010.
Hashing.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
What we learn with pleasure we never forget. Alfred Mercier Smitha N Pai.
CS 253: Algorithms Chapter 11 Hashing Credit: Dr. George Bebis.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Hashing CS 3358 Data Structures.
Data Structures – LECTURE 11 Hash tables
1.1 Data Structure and Algorithm Lecture 9 Hashing Topics Reference: Introduction to Algorithm by Cormen Chapter 12: Hash Tables.
Hash Tables How well do hash tables support dynamic set operations? Implementations –Direct address –Hash functions Collision resolution methods –Universal.
Hashing COMP171 Fall Hashing 2 Hash table * Support the following operations n Find n Insert n Delete. (deletions may be unnecessary in some applications)
Tirgul 9 Hash Tables (continued) Reminder Examples.
Tirgul 7. Find an efficient implementation of a dynamic collection of elements with unique keys Supported Operations: Insert, Search and Delete. The keys.
COMP 171 Data Structures and Algorithms Tutorial 10 Hash Tables.
Tirgul 8 Hash Tables (continued) Reminder Examples.
Lecture 10: Search Structures and Hashing
Hashing General idea: Get a large array
Data Structures Using C++ 2E Chapter 9 Searching and Hashing Algorithms.
Lecture 6 Hashing. Motivating Example Want to store a list whose elements are integers between 1 and 5 Will define an array of size 5, and if the list.
Spring 2015 Lecture 6: Hash Tables
Symbol Tables Symbol tables are used by compilers to keep track of information about variables functions class names type names temporary variables etc.
Implementing Dictionaries Many applications require a dynamic set that supports dictionary-type operations such as Insert, Delete, and Search. E.g., a.
Data Structures Hash Tables. Hashing Tables l Motivation: symbol tables n A compiler uses a symbol table to relate symbols to associated data u Symbols:
David Luebke 1 10/25/2015 CS 332: Algorithms Skip Lists Hash Tables.
Hash Tables Comp 550.
Hashing Sections 10.2 – 10.3 CS 302 Dr. George Bebis.
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
Hashing Hashing is another method for sorting and searching data.
Hashing Amihood Amir Bar Ilan University Direct Addressing In old days: LD 1,1 LD 2,2 AD 1,2 ST 1,3 Today: C
Searching Given distinct keys k 1, k 2, …, k n and a collection of n records of the form »(k 1,I 1 ), (k 2,I 2 ), …, (k n, I n ) Search Problem - For key.
David Luebke 1 11/26/2015 Hash Tables. David Luebke 2 11/26/2015 Hash Tables ● Motivation: Dictionaries ■ Set of key/value pairs ■ We care about search,
Lecture 12COMPSCI.220.FS.T Symbol Table and Hashing A ( symbol) table is a set of table entries, ( K,V) Each entry contains: –a unique key, K,
1 Hashing - Introduction Dictionary = a dynamic set that supports the operations INSERT, DELETE, SEARCH Dictionary = a dynamic set that supports the operations.
Hashing Basis Ideas A data structure that allows insertion, deletion and search in O(1) in average. A data structure that allows insertion, deletion and.
Tirgul 11 Notes Hash tables –reminder –examples –some new material.
Introduction to Algorithms 6.046J/18.401J LECTURE7 Hashing I Direct-access tables Resolving collisions by chaining Choosing hash functions Open addressing.
Hashtables. An Abstract data type that supports the following operations: –Insert –Find –Remove Search trees can be used for the same operations but require.
Hashing 1 Hashing. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
Midterm Midterm is Wednesday next week ! The quiz contains 5 problems = 50 min + 0 min more –Master Theorem/ Examples –Quicksort/ Mergesort –Binary Heaps.
Hashing COMP171. Hashing 2 Hashing … * Again, a (dynamic) set of elements in which we do ‘search’, ‘insert’, and ‘delete’ n Linear ones: lists, stacks,
CS6045: Advanced Algorithms Data Structures. Hashing Tables Motivation: symbol tables –A compiler uses a symbol table to relate symbols to associated.
Hashing. Search Given: Distinct keys k 1, k 2, …, k n and collection T of n records of the form (k 1, I 1 ), (k 2, I 2 ), …, (k n, I n ) where I j is.
1 Hash Tables Chapter Motivation Many applications require only: –Insert –Search –Delete Examples –Symbol tables –Memory management mechanisms.
CSC 413/513: Intro to Algorithms Hash Tables. ● Hash table: ■ Given a table T and a record x, with key (= symbol) and satellite data, we need to support:
Hashing Jeff Chastine.
Hash table CSC317 We have elements with key and satellite data
CS 332: Algorithms Hash Tables David Luebke /19/2018.
Hashing Alexandra Stefan.
Hashing Alexandra Stefan.
Hashing and Hash Tables
CSE 2331/5331 Topic 8: Hash Tables CSE 2331/5331.
Dictionaries and Their Implementations
Introduction to Algorithms 6.046J/18.401J
Hash Tables – 2 Comp 122, Spring 2004.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Hashing Alexandra Stefan.
CH 9.2 : Hash Tables Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and.
Introduction to Algorithms
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
Hashing Sections 10.2 – 10.3 Lecture 26 CS302 Data Structures
EE 312 Software Design and Implementation I
Hashing.
CS 3343: Analysis of Algorithms
Slide Sources: CLRS “Intro. To Algorithms” book website
EE 312 Software Design and Implementation I
Hash Tables – 2 1.
Presentation transcript:

11.Hash Tables Hsu, Lih-Hsing

Computer Theory Lab. Chapter 11P Directed-address tables Direct addressing is a simple technique that works well when the universe U of keys is reasonable small. Suppose that an application needs a dynamic set in which an element has a key drawn from the universe U={0,1,…,m-1} where m is not too large. We shall assume that no two elements have the same key.

Computer Theory Lab. Chapter 11P.3 To represent the dynamic set, we use an array, or directed-address table, T[0..m-1], in which each position, or slot, corresponds to a key in the universe U.

Computer Theory Lab. Chapter 11P.4

Computer Theory Lab. Chapter 11P.5 DIRECTED_ADDRESS_SEARCH(T,k) return DIRECTED_ADDRESS_INSERT(T,x) DIRECTED-ADDRESS_DELETE(T,x)

Computer Theory Lab. Chapter 11P Hash tables The difficulty with direct address is obvious: if the universe U is large, storing a table T of size |U | may be impractical, or even impossible. Furthermore, the set K of keys actually stored may be so small relative to U. Specifically, the storage requirements can be reduced to O(|K |), even though searching for an element in in the hash table still requires only O(1) time.

Computer Theory Lab. Chapter 11P.7

Computer Theory Lab. Chapter 11P.8 hash function: hash table: k hashs to slot: h (k) hash value collision: two keys hash to the same slot

Computer Theory Lab. Chapter 11P.9 Collision resolution technique: chaining open addressing

Computer Theory Lab. Chapter 11P.10 Collision resolution by chaining: In chaining, we put all the elements that hash to the same slot in a linked list.

Computer Theory Lab. Chapter 11P.11

Computer Theory Lab. Chapter 11P.12 C HAINED _H ASH _I NSERT (T,x ) Insert x at the head of the list T[h[key[x]]] C HAINED _H ASH _S EARCH (T,k ) Search for the element with key k in the list T[h[k]]

Computer Theory Lab. Chapter 11P.13 CHAINED_HASH_DELETE(T,x ) delete x from the list T[h[key[x]]] Complexity: INSERT O(1) DELETE O(1) if the list are doubly linked.

Computer Theory Lab. Chapter 11P.14 Analysis of hashing with chaining Given a hash table T with m slots that stores n elements. load factor: (the average number of elements stored in a chain.)

Computer Theory Lab. Chapter 11P.15 Assumption: simple uniform hashing uniform distribution, hashing function takes O(1) time. for j = 0, 1, …, m-1, let us denote the length of the list T[j] by n j, so that n = n 0 + n 1 + … + n m – 1, and the average value of n j is E[n j ] =  = n/m.

Computer Theory Lab. Chapter 11P.16 Theorem If a hash table in which collision are resolved by chaining, an unsuccessful search takes expected time  (1+  ), under the assumption of simple uniform hashing.

Computer Theory Lab. Chapter 11P.17 Proof. The average length of the list is. The expected number of elements examined in an unsuccessful search is . The total time required (including the time for computing h(k) is O(1+  ).

Computer Theory Lab. Chapter 11P.18 Theorem 11.2 If a hash table in which collision are resolved by chaining, a successful search takes time,  (1+  ) on the average, under the assumption of simple uniform hashing.

Computer Theory Lab. Chapter 11P.19 Proof. Assume the key being searched is equally likely to be any of the n keys stored in the table. Assume that CHAINED_HASH_INSERT procedure insert a new element at the end of the list instead of the front.

Computer Theory Lab. Chapter 11P.20 Total time required for a successful search

Computer Theory Lab. Chapter 11P Hash functions What makes a good hash function?

Computer Theory Lab. Chapter 11P.22 Example: Assume. Set.

Computer Theory Lab. Chapter 11P.23 Interpreting keys as natural number ASCII code

Computer Theory Lab. Chapter 11P The division method Suggestion: Choose m to be prime and not too close to exactly power of 2.

Computer Theory Lab. Chapter 11P The multiplication method

Computer Theory Lab. Chapter 11P.26 Suggestion: choose

Computer Theory Lab. Chapter 11P.27 Example:

Computer Theory Lab. Chapter 11P Universal hashing Choose the hash function randomly in a way that is independent of the keys that actually going to be stored.

Computer Theory Lab. Chapter 11P.29 Let H be a finite collection of hashing functions that maps a give universe U of keys into the range {0, 1, 2, …, m-1}. Such a collection is said to be universal if each pair of distinct keys k,l  U, the number of hash functions h  H for which h(k) = h(l) is at most | H |/m.

Computer Theory Lab. Chapter 11P.30 In other words, with a hash function randomly chosen from H, the chance of collision between distinct keys k and l is no more then the chance 1/m of a collision if h(k) and h(l) were randomly and independently chosen from the set {0, 1, …, m – 1}.

Computer Theory Lab. Chapter 11P.31 Theorem 11.3 Suppose that a hash function h is choose from a universal collection of hash function and is used to hash n keys into a table T of size m, using chaining to resolve collisions. If key k is not in the table, then the expected length E[n h(k) ] of the list that key k hashes to is at most . If key k is in the table, then the expected length E[n h(k) ] of the lest containing key k is at most 1 + .

Computer Theory Lab. Chapter 11P.32 Proof. For each pair k and l of distinct keys, define the indicator random variable X kl = I{h(k) = h(l)}. Pr{h(k)=h(l)} ≤ 1/m E[X kl ] ≤ 1/m

Computer Theory Lab. Chapter 11P.33

Computer Theory Lab. Chapter 11P.34 If k  T, then n h(k) = Y k and |{l: l  T and l ≠ k}| = n. Thus E[n h(k) ] = E[Y k ] ≤ n/m = . If k  T, then because key k appears in list T[h(k)] and the count Y k does not include key k, we have n h(k) = Y k + 1 and |{l: l  T and l ≠ k}| = n - 1. Thus E[n h(k) ]= E[Y k ] + 1 ≤ (n - 1) / m + 1 = 1+  - 1/m < 1 + .

Computer Theory Lab. Chapter 11P.35 Corollary 11.4 Using universal hashing and collision resolution by chaining in a table with m slots, it takes expected time  (n) to handle any sequence if n I NSERT, S EARCH and D ELETE operations containing O(m) I NSERT operations.

Computer Theory Lab. Chapter 11P.36 Proof. Since the number of insertions is O(m), we have n = O(m) and so  = O(1). The I NSERT and D ELETE operations take constant time and, by Theorem 11.3, the expected time for each S EARCH operation is O(1). BY linearity of expectation, therefore, the expected time for the entire sequence of operations is O(n)

Computer Theory Lab. Chapter 11P.37 Design a universal class of hash functions We begin by choosing a prime number p large enough so that every possible key k is in the range 0 to p – 1, inclusive. Let Z p denote the set {0,1,…, p – 1}, and let Z p * denote the set {1, 2, …, p – 1}. Since p is prime, we can solve equations modulo p with the methods given in Chapter 31. Because we assume that the size of the universe of keys is greater than the number of slots in the hash table we have p > m.

Computer Theory Lab. Chapter 11P.38 We now define the hash function h a,b for any a  Z p * and any b  Z p using a linear transformation followed by reductions modulo p and then modulo m : h a,b (k)=((ak + b) mod p) mod m. For example, with p = 17 and m = 6, we have h 3,4 (8) = 5. The family of all such hash functions is H p,m = {h a,b : a  Z p * and b  Z p }.

Computer Theory Lab. Chapter 11P.39 Each hash function h a,b maps Z p to Z m. This class of hash functions has the nice property that the size m of the output range is arbitrary – not necessarily prime – a feature which we shall use in Section Since there are p – 1 choices for a and there are p choices for b, there are p(p – 1) hash functions in H p,m.

Computer Theory Lab. Chapter 11P.40 Theorem 11.5 The class H p,m defined above is a universal hash functions.

Computer Theory Lab. Chapter 11P.41 Proof. Consider two distinct keys k, l from Z p, so k ≠ l. For a given hash function h a,b we let r = (ak + b) mod p, s = (al + b) mod p. r ≠ s a = ((r – s)((k – l) -1 mod p)) mod p, b = (r – ak) mod p, For any given pair of input k and l, if we pick (a, b) uniformly at random form Z p *  Z p, the resulting pair (r, s) is equally likely to be any pair of distinct values modulo p.

Computer Theory Lab. Chapter 11P.42 It then follows that the probability that distinct keys k ad l collide is equal to the probability that r  s(mod m) when r and s are randomly chosen as distinct values modulo p. For a given value of r, of the p – 1 possible remaining values for s, the number of values s such that s ≠ r and s  r (mod m) is at most  p/m  - 1 ≤ (( p + m – 1)/m – 1 = (p – 1)/m. The probability that s collides with r when reduced modulo m is at most ((p – 1)/m)/(p – 1) = 1/m.

Computer Theory Lab. Chapter 11P Open addressing (All elements are stored in the hash tables itself.) h : U  {0,1,…,m-1}  {0,1,…,m-1}. With open addressing, we require that for every key k, the probe sequence  h(k,0),h(k,1),…,h(k,m-1)  be a permutation of {0,1, …,m}.

Computer Theory Lab. Chapter 11P.44 H ASH _I NSERT (T,k) 1 i  0 2repeat j  h(k, i) 3if T[j]= NIL 4 then T[j]  k 5return j 6 else i  i + 1 7until i = m 8error “ hash table overflow ”

Computer Theory Lab. Chapter 11P.45 H ASH _S EARCH (T,k) 1 i  0 2repeat j  h(k, i) 3if T[j] = k 4 then return j 5 i  i + 1 6until T[j] = NIL or i = m 7return NIL

Computer Theory Lab. Chapter 11P.46 Linear probing: It suffers the primary clustering problem.

Computer Theory Lab. Chapter 11P.47 Quadratic probing: It suffers the secondary clustering problem.

Computer Theory Lab. Chapter 11P.48 Double hashing:

Computer Theory Lab. Chapter 11P.49

Computer Theory Lab. Chapter 11P.50 Example:

Computer Theory Lab. Chapter 11P.51 Double hashing represents an improvement over linear and quadratic probing in that probe sequence are used. Its performance is more closed to uniform hashing.

Computer Theory Lab. Chapter 11P.52 Analysis of open-address hash

Computer Theory Lab. Chapter 11P.53 Theorem 11.6 Given an open-address hash-table with load factor  = n/m < 1, the expected number of probes in an unsuccessful search is at most 1/(1-  ) assuming uniform hashing.

Computer Theory Lab. Chapter 11P.54 Proof. Define = Pr{exactly i probes access occupied slots} for. And The expected number of probes is. Define Pr{at least i probes access occupied slots}.

Computer Theory Lab. Chapter 11P.55

Computer Theory Lab. Chapter 11P.56 if.

Computer Theory Lab. Chapter 11P.57 for i > n.

Computer Theory Lab. Chapter 11P.58 Example:

Computer Theory Lab. Chapter 11P.59 Corollary 11.7 Inserting an element into an open- address hash table with load factor  requires at most 1/(1 -  ) probes on average, assuming uniform hashing.

Computer Theory Lab. Chapter 11P.60 Proof. An element is inserted only if there is room in the table, and thus. Inserting a key requires an unsuccessful search followed by placement of the key in the first empty slot found. Thus, the expected number of probes is.

Computer Theory Lab. Chapter 11P.61 Theorem 11.8 Given an open-address hash table with load factor, the expected number of successful search is at most assuming uniform hashing and assuming that each key in the table is equally likely to be searched for.

Computer Theory Lab. Chapter 11P.62 Proof. A search for k follows the same probe sequence as followed when k was inserted. If k is the (i+1) st key inserted in the hash table, the expected number of probes made in a search for k is at most.

Computer Theory Lab. Chapter 11P.63 Averaging over all n key in the hash table gives us the average number of probes in a successful search:

Computer Theory Lab. Chapter 11P.64

Computer Theory Lab. Chapter 11P.65 Example: