3a group of people could be arranged in a database like this: Hashingis the transformation of a string of characters into a usually shorter fixed-length value or key that represents the original string.Examplea group of people could be arranged in a database like this:Allen, JaneMoore, SarahSmith, Dan
4H A S I N G Hash Table 7864 Allen, Jane Moore, Sarah 9802 1990 Smith, DanHASH VALUESHASH KEYSHASH FUNCTION
5Hash Tablestores things and allows 3 operations: insert, search and delete.associated with a set of records
6Bob Miller 34John SmithSally Wood21HJohn Smith 295
7Each slot of a hash table is called a bucket and hash values are called bucket indices. 7864Allen, JaneBUCKETBUCKETINDEX
8HASH FUNCTION Mapping of the keys to indices of a hash table 2 compositionsHash code map:keyintegerCompression map:integer[0, N-1]
9DIVISION Example: If table size m = 12 key k = 100 Map a key k into one of m slots by using thisfunction:h(k) = k mod mExample: If table size m = 12 key k = 100 than h(100) = 100 mod 12 = 4
10Ex. k=3121 then 31212=9740641 thus h(3121)= 406 MID-SQUARE FUNCTION The key is squared and the mid part is used as the address.Ex.k=3121then 31212=thus h(3121)= 406
11Folding Key is divided into several parts 2 types 1. shift folding 2. boundary folding
12Shift Folding Ex. (SSN) 123-45-6789 1. Divide into 3 parts: 123, 456 and 789.2. Add them.=13683. h(k)=k mod Mwhere M = 1000h(1368) = 1368 mod 1000= 3681. Divide into five parts: 12, 34, 56, 78 and 9.2. Add them.= 1893. h(k)=k mod Mwhere M = 1000h(189) = 189 mod 1000= 189
131st 4 digits = 1234 Last 4 digits = 6789 ExtractionOnly a part of the key is used to compute the address.Ex. (SSN)1st 4 digits = 1234Last 4 digits = 67891st 2 combined with the last 2 = 1289(address)
14Hash Method : Folding Chopping the Key in Two Parts Add the Two Parts to Generate the HashLeading Digit will be IgnoredExampleKeyPartsH(x)Option Rotate the Second DigitPartsH(x)
15Radix Transformation K is transformed into another number base M = 100 21210=2559M = 100H(k) = k mod MH(255) = 255 mod 100= 55
16212= 9(9(2)+ 5)+ 5 = 2(92)+ 5(9)+ 5. divide 212 by 9. 9 divides into times with remainder = 9(23)+ 59 divides into 23 twice with remainder = 9(2)+5212= 9(9(2)+ 5)+ 5= 2(92)+ 5(9)+ 5.
17different keys happen to have same hash value Hash Collisiondifferent keys happen to have same hash value
18Bob Miller 34Jane Depp 18Collision!Sally Wood212John Smith 29
19Collision Resolution There are two kinds of collision resolution: 1 – Chaining makes each entry a linked list so that when a collision occurs the new entry is added to the end of the list.2 – Open Addressing uses probing to discover an empty spot.
20Collision Resolution – Open Addressing the table is probed for an open slot when the first one already has an element.Linear probing in which the interval between probes is fixed — often at 1.Quadratic probing in which the interval between probes increases linearly (hence, the indices are described by a quadratic function).Double hashing in which the interval between probes is fixed for each record but is computed by another hash function.
21H(x,i) = (H(x) + i)(mod M) Linear Probingis a scheme in resolving hash collisions of values of hash functions by sequentially searching the hash table for a free locationtwo values - one as a starting value and one as an interval between successive values newLocation = (startingValue + stepSize) % arraySizeH(x,i) = (H(x) + i)(mod M)
22Linear Probing - Example 123456789emptyInsert 15, 17, 8123456789empty15178H(8)=8 mod 10 = 8H(15)=15 mod 10 = 5H(17)=17 mod 10 = 7
23123456789empty75153517825Insert 25Insert 35H(1,8)=(1 + 8) mod 10 = 9H(35)=35 mod 10 = 5H(1,6)=(1 + 6) mod 10 = 7H(1,7)=(1 + 7) mod 10 = 8H(1,5)=(1 + 5) mod 10 = 6H(25)=25 mod 10 = 5H(1,5)=(1 + 5) mod 10 = 6Insert 75H(1,9)=(1+9) mod 10 = 0H(1,8)=(1+8) mod 10 = 9H(1,5)=(1+5) mod 10 = 6H(75)=75 mod 10 = 5H(1,6)=(1+6) mod 10 = 7H(1,7)=(1+7) mod 10 = 8
24Has anyone spotted the flaw in the linear probing technique Has anyone spotted the flaw in the linear probing technique? Think about this: what would happen if we now inserted 85, then 95, then 55?
25Each one would probe exactly the same positions as its predecessors Each one would probe exactly the same positions as its predecessors. This is known as clustering. It leads to inefficient operations, because it causes the number of collisions to be much greater than it need be.
27Quadratic Probing - Example Table Size is 11 (0..10)Hash Function: h(x) = x mod 11Insert keys:20 mod 11 = 930 mod 11 = 82 mod 11 = 213 mod 11 = 2 2+12=325 mod 11 = 3 3+12=424 mod 11 = 2 2+12, 2+22=610 mod 11 = 109 mod 11 = 9 9+12, 9+22 mod 11,9+32 mod 11 =7123134255624798302010
28not all hash table slots will be on the probe sequence Using p(K, i) = i2 gives particularly inconsistent resultsIf all slots on that cycle happen to be full, this means that the record cannot be inserted at all!
29Double Hashing P = (1 + P) mod TABLE_SIZE increment P, not by a constant but by an amount that depends on the Key. P = (1 + P) mod TABLE_SIZEP = (P + INCREMENT(Key)) mod TABLE_SIZE
30Double Hashing - Example P = (P + INCR(Key)) mod TABLE_SIZESuppose INCR(Key) = 1 + (Key mod 7)Adding 1 guarantees it is never 0!Insert 15, 17, 8:
31Insert 35:P = H(35) = 5.P = (5 + ( mod 7)) mod 10 = 6.Insert 25:P = H(25) = 5.P = (5 + ( mod 7)) mod 10 = 0
3210321495867Let’s try!Insert 75:P = (P + INCR(Key)) mod TABLE_SIZESuppose INCR(Key) = 1 + (Key mod 7)
33Chaining/Separate Chaining uses an array as the primary hash tablean array of lists of entries
34Chaining nil nil nil : nil One way to handle collision is to store the collided records in a linked list. The array now stores pointers to such lists. If no key maps to a certain hash value, that array entry points to nil.1nil2nil34nil5:Key:name: tomscore: 73HASHMAXnil
37is a collision resolution method that Coalesced Hashingis a collision resolution method thatuses pointers to connect the elements of a synonymchain.A hybrid of separate chaining and open addressing.Linked lists within the hash table handle collisions.This strategy is effective, efficient and very easy toimplement.
42TOMBSTONE DELETION Deleting a record must not hinder later searches. The search process must still pass through the newly emptied slot to reach records whose probe sequence passed through this slot. It should not mark the slot as empty.Freed slot should be available to a future insertion.TOMBSTONE
45Perfect Hash Functions Quick to computeDistributes keys uniformly throughout the tableVery rare(birthday paradox)No collisionsPerfect hash functions are rare.
46A Perfect Hash Function for Strings R. J. Cichelli gave an algorithm for finding perfect hash functions for strings.He proposes the hash function:h(s)=size+g(s.charAt(0))+g(s.charAt(size-1))%nwhere size = s.length().The function g is to be constructed so that h(s) is uniquefor each string s.
51Example 1: Illustrating Perfect Hashing Use Cichelli's algorithm to build a minimal perfect hashfunction for the following nine strings:DODOWNTOELSEENDIFINTYPEVARWITH
52Example 1: SolutionFor Step 1 in the algorithm, we find the frequencies of thefirst and last letter of each word to find:D O E I F N T V R W HNext we find the sum of the first and last letter of each word:DO=5(D+0=3+2), DOWNTO=5, ELSE = 8, END=7,IF=3, IN=3, TYPE=5, VAR=2,WITH=2Sorting the keywords in decreasing frequency yields:ELSE END DOWNTO DO TYPE IN IF VAR WITHWe are now at step 5 of the algorithm, the heart of the algorithm. We try the words in frequency order:
53Example 1: Cichelli's Method (cont'd) s = ELSE g(E)=0 h(s)= s.length()+g(E)+g(E)=4s = END g(D)=0 h(s)= s.length()+g(E)+g(D)=3s = DOWNTO g(O)=0 h(s)= s.length()+g(D)+g(O)=6s = DO h(s)= s.length()+g(D)+g(O)=2s = TYPE g(T)=0 h(s)= s.length()+g(T)+g(E)=4*s = TYPE g(T)=1 h(s)= s.length()+g(T)+g(E)=5s = IN g(I)=0,g(N)=0 h(s)=s.length()+g(I)+g(N)=2*s = IN g(I)=1,g(N)=0 h(s)=s.length()+g(I)+g(N)=3*s = IN g(I)=2,g(N)=0 h(s)=s.length()+g(I)+g(N)=4*s = IN g(I)=3,g(N)=0 h(s)=s.length()+g(I)+g(N)=5*s = IN g(I)=3,g(N)=1 h(s)=s.length()+g(I)+g(N)=6*s = IN g(I)=3,g(N)=2 h(s)=s.length()+g(I)+g(N)=7s = IF g(F)=0 h(s)=s.length()+g(I)+g(F)=5*s = IF g(F)=1 h(s)=s.length()+g(I)+g(F)=6*s = IF g(F)=2 hash(s)=s.length()+g(I)+g(F)=7*s = IF g(F)=3 h(s)=s.length()+g(I)+g(F)=8
54Example 1: Cichelli's Algorithm (cont'd) VARWITHDOENDELSETYPRDOWNTOINIFThe hash table above is fully occupied with empty slots.Note that if there are empty slots or there is a collision, then the g-value assignments are in error.
63Hash Table Usesdriver's license record'sInternet search enginestelephone book databaseselectronic library catalogsimplementing passwords for systems with multiple users. Hash Tables allow for a fast retrieval of the password which corresponds to a given username