1 Hashing
2 Introduction - Hashing zUse an array to store key values. zBut instead of storing them in a position relative to other key values (i.e. in ascending order), store them based on a mathematical calculation. zThis mathematical calculation is called a “hash function”.
3 Discussion zAdvantage: yFastest search times - O(1) in the best case yInserts, deletes also fast zDisadvantages: yKey values not available in order. yEliminates the option of range queries. yWhat if two key values hash to the same place?
4 Basic Idea zA hash algorithm has two parts: yThe hash function yThe collision resolution strategy zSimple example: yA hash function for integers might be H(x) = x mod n where “n” is the number of elements in the array, or “buckets” yTo resolve collisions, simply search linearly for the next open bucket. (linear probing)
5 Example zFirst, hash 17: Keys to hash: 17, 23, 89, 103, 44 H(x) = x mod
6 Example z17 hashes to bucket 7...no problem. zNow hash 23: Keys to hash: 23, 89, 103, 44 H(x) = x mod
7 Example z23 hashes to bucket 3...no problem. zNow hash 89: Keys to hash: 89, 103, 44 H(x) = x mod
8 Example z89 hashes to bucket 9...no problem. zNow hash 103: Keys to hash: 103, 44 H(x) = x mod
9 Example z103 hashes to bucket 3; since 23 is already there, place 103 in bucket 4. zNow hash 44: Keys to hash: 44 H(x) = x mod
10 Example z44 hashes to bucket 4, which is full, so put 44 in bucket 5. Keys to hash: done. H(x) = x mod
11 Example zNote that bucket 4 contains a key value which does not hash to 4, but there is one that does. zThis makes delete tricky... Keys to hash: done. H(x) = x mod
12 Searching a Hash Table zHash the key value to search; zIf it is in that bucket, stop and report success. zIf not, search linearly until yIt is found (success) yAn empty bucket is encountered.
13 Deleting from a Hash Table zTo delete, first hash the key value to delete to find its bucket. zIf it is there, skip the next step zSearch linearly in the table until either yyou find it (success - continue) yyou reach an open bucket (failure - stop) zRemove the key value from its bucket
14 Deleting II zBut you might have previously hashed keys that collided with this one, so... zFor every key value from the current bucket to the next open bucket: yRehash the key value and move it if necessary.
15 Delete Example zDelete 17: Keys to delete: 17, 23, H(x) = x mod 10
16 Delete Example z17 hashes to bucket 7. It is there, so remove it; 8 is empty, so we’re done. zDelete 23: Keys to delete: 23, H(x) = x mod 10
17 Delete Example z23 hashes to bucket 3; it is there, so remove it. zNow rehash from bucket 4 to 5 (6 is empty) Keys to delete: H(x) = x mod 10
18 Delete Example zWhat do I do with 103? Keys to delete: H(x) = x mod 10
19 Delete Example zMove it to bucket 3. zWhat about 44? Keys to delete: H(x) = x mod 10
20 Delete Example zMove it to bucket 4. zDone with delete of 23. zDelete 103: Keys to delete: H(x) = x mod 10
21 Delete Example z103 hashes to bucket 3; it is there, so remove it. zWhat about rehashes? Keys to delete: done H(x) = x mod 10
22 Delete Example zI must rehash from 4 to 4 as bucket 5 is the next open bucket. z44 hashes to bucket 4, so no moves are necessary. Keys to delete: done H(x) = x mod 10
Picking a Good Hash Function zSuppose I wish to hash the values to the right. zSuppose further my table size is 16. zI could try H(X) = X mod
Picking a Good Hash Function II zThis doesn’t work out so well. zWhat else can I do? zTry looking at the bits: 24 Number Mod
Picking a Good Hash Function III 25 NumberMod
Picking a Good Hash Function IV 26 Num ber Mod bit*8 + 8bit*4 + 16bit*2 + 32bit*
Collision Resolution zLinear probing often produces a “cluster” of values. This slows the process of finding/inserting a value. zWhat else can we do? yLinear probing with a factor > 1. yQuadratic probing. yDouble Hashing. yClosed addressing. 27
Linear Probing with i>1 zSuppose we use a hash function such as yH(x) = (x+k*i) mod n y“k” is the multiplying constant. yStart with i=0 and proceed to n-1 until an open bucket is found. zThis will move values away from a cluster more quickly. 28
Linear Probing Problem zIn our previous example, n=16. zWith k=6, we have zH(x) = (x+6*i) mod 16 zWhat is the probe sequence for x=3? z3, 9, 15, 5, 11, 1, 7, 13, 3 zOOPS!!! All values are not available! 29
Problem Solution z“n” and “k” must be relatively prime. zThus, try it with k=3: zUsing H(x) = (x+3*i) mod 16 zWhat is the probe sequence for x=3? z3, 6, 9, 12, 15, 2, 5, 8, 11, 14, 1, 4, 7, 10, 13, 0, 3 zThe absolute best is to choose “n” prime. 30
Quadratic Probing zUses something like H(x) = (x+i 2 ) mod n zAdvantage: zMoves values farther more quickly. zDisadvantage: zNot guaranteed to probe every bucket z(But half is possible if n is prime). 31
Double Hashing zUses something like yH 1 (x) = [x+i*H 2 (x)] mod n yH 2 (x) = R – (x mod R) yHere, it is very important to choose n and R prime and that R < n. 32
Closed Addressing zIdea: avoid collisions entirely by allowing more than one key value per bucket. zHow to allow multiple values per bucket? yLinked List yBinary Search Tree yAVL Tree yAnother Hash table 33