# Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree.

## Presentation on theme: "Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree."— Presentation transcript:

Part II Chapter 8 Hashing

Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree SortedNot Sortedunbalancedbalanced Insertion O(n) / O(1)O(h) O(log k n) Searching O(log n) / O(1)O(h) O( log k n) Deletion O(n) / O(1)O(h) O(log k n) Is it possible to perform these operations in O(1) ?

Introduction If we find a mapping from a key to an index, then we can locate a record quickly according its key and perform random access. S1 S2 S3 … 012…012…

Introduction This mapping can be illustrated as follows: Hashing: define a function h so that h(Key) = i, where h is called a hash function. Two kinds Static hashing Dynamic hashing h h Keyi

8.2 Static Hashing

Definition In static hashing, identifiers/keys are stored in table with a fixed size that is called hash table. slot1slot2 Bucket 0 Bucket 1 Bucket 2 Bucket n Bucket: Each bucket has its own address and is capable of holding a key. h h xh(x) Hash function IdentifierBucket address

Definition Slot: Each bucket may consists of s slots to hold synonym ( 同義字 ) i 1 and i 2 are synonyms if h(i 1 ) = h(i 2 ). Distinct synonyms enter into the same bucket as long as the bucket has slots available.

Example Number of buckets: Number of slots for each bucket: Define hashing function f(x) f(x) = {i | i is the order of the initial of x}. A and A2 are synonyms. GA and GB are synonyms. If “Doll” enters, it will be put at buckect _______ (according to the hash function). A A A2 slot1slot2 Bucket 0 Bucket 1 Bucket 2 Bucket 25 D D Bucket 3 GA GB

Overflow and Collision Overflow occurs when a new identifier is mapped into a full bucket. Collision occurs when two non-identical identifiers are hashed into the same bucket. If the number of slot is 1, then overflow and collision occur simutaneously. A A A2 slot1slot2 Bucket 0 Bucket 1 Bucket 2 If A3 enters bucket 0, A3 collides with A and A2. The bucket overflows as well.

8.2.2 Hash Functions Ideally, we expect to find a hash function that is one-to- one and easy to compute. The hash function f(x) where f(x) = {i | i is the order of the initial of x}. The hash function can result in a lot of collisions because it only considers the initial character. Key points: use every character in the identifier as possible.

Common Approaches Division Mid-square Folding Digit Analysis

Division The most widely used hash function The key k is divided by some number D, and the remainder is used as the bucket address. h(k) = k % D Since the bucket address is from 0 to b-1 if there are b buckets, D is usually selected as the number of buckets.

Selecting The Divisor When the divisor is an even number, odd integers hash into odd home buckets and even integers into even home buckets. 20%14 = 6, 30%14 = 2, 8%14 = 8 15%14 = 1, 3%14 = 3, 23%14 = 9 When the divisor is an odd number, odd (even) integers may hash into any home. 20%15 = 5, 30%15 = 0, 8%15 = 8 15%15 = 0, 3%15 = 3, 23%15 = 8 The bias in the keys does not result in a bias toward either the odd or even home buckets. Better chance of uniformly distributed home buckets. So do not use an even divisor.

Selecting The Divisor Similar biased distribution of home buckets is seen, in practice, when the divisor is a multiple of prime numbers such as 3, 5, 7, … The effect of each prime divisor p of b decreases as p gets larger. Ideally, choose b so that it is a prime number. Alternatively, choose b so that it has no prime factor smaller than 20.

Mid-square Squaring the key and then using an appropriate number of bits from the middle of the square. Example: Suppose a character is represented in 6 bits and the bucket size is 2 r. 0134 A 1 000001011010 92 92x92=8464 010000010000100 r bits

Mid-square Example Key = 113586, m =10000, where 9999 is the largest bucket address. Squaring the key, and then we have 12901779396 h(x) = 1779

Folding The key k is partitioned into several parts, all of the same length. These partitions are then added together to obtain the hash address of k. Two schemes Shift folding Folding at the boundaries 12320324111220 P1 P2P3P4P5

P1 Folding P2 P3 P4 P5 1 2 3 2 0 3 2 4 1 1 1 2 2 0 6 9 9 Shift folding P1 P2 P3 P4 P5 1 2 3 3 0 2 2 4 1 2 1 1 2 0 8 9 7 Folding at the boundaries

Overflow Handling An overflow occurs when the home bucket for a new pair (key, element) is full. We may handle overflows by: Search the hash table in some systematic fashion for a bucket that is not full. Linear probing (linear open addressing). Quadratic probing. Rehashing. Eliminate overflows by permitting each bucket to keep a list of all pairs for which it is the bucket address. Array linear list. Chain.

Linear Probing Also called linear opening addressing Search one by one until a empty slot is found. Procedures: suppose b denotes the bucket size. 1. Compute h(k). 2. Examine the hash table buckets in the order ht[h(k)], ht[(h(k)+1)%b],…, ht[(h(k)+j)%b] until one of the following happens: ht[(h(k)+j)%b] has a pair whose key is k; k is found. ht[(h(k)+j)%b] is empty; k is not in the table. Return to ht[h(k)]; the table is full.

Linear Probing divisor = b (number of buckets) = 17. Bucket address = key % 17. 0481216 Insert pairs whose keys are 6, 12, 34, 29, 28, 11, 23, 7, 0, 33, 30, 45 612293428112370333045

Linear Probing 0481216 612293428112370333045 Linear opening addressing tends to create “cluster”. These clusters become larger as more synonyms enter.

Quadratic Probing Suppose i is used as the increment. When overflow occurs, the search is carried out by examining h(x), (h(x)+i 2 )%b, and (h(x)-i 2 )%b. For 1 ≦ i ≦ (b-1)/2 and b is a prime number of 4j+3. For example, b=3, 7, 11,…,43, 59..

Rehashing If overflow occurs at h i (x), then try h i+1 (x). Use a series of hash function h 1, h 2, …, h m to find an empty bucket. xh m (x)

Chaining [0] [4] [8] [12] [16] 12 6 34 29 2811 23 7 0 33 30 45 Disadvantage of linear probing Comparison of identifiers with different hash values. Use linked list to connect the identifiers with the same hash value and to increase the capacity of a bucket.

Download ppt "Part II Chapter 8 Hashing Introduction Consider we may perform insertion, searching and deletion on a dictionary (symbol table). Array Linked list Tree."

Similar presentations