Download presentation
Presentation is loading. Please wait.
1
CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 Email: chen@cse.tamu.edu 1 Notes #8
2
key h(key) Hashing...... Buckets (typically 1 disk block) 2
3
...... Two alternatives records...... (1) key h(key) 3
4
(2) key h(key) Index record key 1 Two alternatives Option 2 for “secondary” search key 4
5
Example hash function Key = ‘x 1 x 2 … x n ’ n byte character string Have b buckets h: add x 1 + x 2 + … + x n –compute sum modulo b 5
6
This may not be the best function… Read Knuth Vol. 3 if you really need to select a good function Good hash Expected number of function:keys/bucket is the same for all buckets 6
7
Within a bucket: Do we keep keys sorted? Yes, if CPU time critical & inserts/deletes not too frequent 7
8
Next:example to illustrate inserts, overflows, deletes h(key) 8
9
EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 01230123 d a c b 9
10
EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 01230123 d a c b h(e) = 1 e 10
11
01230123 a b c e d EXAMPLE: deletion DELETE: e f f g 11
12
01230123 a b c e d EXAMPLE: deletion DELETE: e f f g maybe move “g” up 12
13
01230123 a b c e d EXAMPLE: deletion DELETE: e f f g c maybe move “g” up 13
14
01230123 a b c e d EXAMPLE: deletion DELETE: e f f g c d maybe move “g” up 14
15
Rule of thumb: Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit 15
16
Rule of thumb: Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket 16
17
How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear 17
18
Extensible hashing: two ideas (a) Use i of b bits output by hash function b h(k) use i grows over time… 00110101 18
19
(b) Use directory h(k)[i ] to bucket............ 19
20
Example: h(k) is 4 bits; 2 keys/bucket i = 1 1 1 0001 1001 1100 Insert 1010 20
21
Example: h(k) is 4 bits; 2 keys/bucket i = 1 1 1 0001 1001 1100 Insert 1010 1 1100 1010 21
22
Example: h(k) is 4 bits; 2 keys/bucket i = 1 1 1 0001 1001 1100 Insert 1010 1 1100 1010 New directory 2 00 01 10 11 i = 2 2 22
23
1 0001 2 1001 1010 2 1100 00 01 10 11 2 i = Example continued 23
24
1 0001 2 1001 1010 2 1100 Insert: 0111 00 01 10 11 2 i = Example continued 0111 24
25
1 0001 2 1001 1010 2 1100 Insert: 0111 0000 00 01 10 11 2 i = Example continued 0111 25
26
1 0001 2 1001 1010 2 1100 Insert: 0111 0000 00 01 10 11 2 i = Example continued 0111 0000 0111 0001 26
27
1 0001 2 1001 1010 2 1100 Insert: 0111 0000 00 01 10 11 2 i = Example continued 0111 0000 0111 0001 2 2 27
28
00 01 10 11 2 i = 2 1001 1010 2 1100 2 0111 2 0000 0001 Example continued 28
29
00 01 10 11 2 i = 2 1001 1010 2 1100 2 0111 2 0000 0001 Insert: 1001 Example continued 1001 1010 29
30
00 01 10 11 2 i = 2 1001 1010 2 1100 2 0111 2 0000 0001 Insert: 1001 Example continued 1001 1010 000 001 010 011 100 101 110 111 3 i = 3 3 30
31
Extensible hashing: deletion No merging of blocks Merge blocks and cut directory if possible (Reverse insert procedure) 31
32
Note: Still need overflow chains Example: many records with duplicate keys 1 1101 1100 22 insert 1100 1100 if we split: 1101 ? 32
33
Solution: overflow chains 1 1101 1100 1 insert 1100 add overflow block: 1101 33
34
Extensible hashing: Searching input: a search key K \\ H is the hash function, D is the directory, i is the current bit number. 1.m = the first i bits of H(K); 2.read in the disk block B with the address D[m]. Summary A. 34
35
Extensible hashing: Insertion input: a tuple t with search key K \\ H is the hash function, D is the directory, i is the current bit number. 1.m = the first i bits of H(K); 2.read in the disk block B with address D[m]; 3.IF B has room THEN add t in B 4.ELSE let j be the bit number of B IF i = j THEN {double the size of D, i = i + 1; and let the pointers in the new D[2h] and D[2h+1] both equal to that in the old D[h], 0 ≤ h ≤ 2 i ; } split B and t into B 1 and B 2, both with block bit number j+1; let the two corresponding pointers in D go to B 1 and B 2, resp. Summary B. 35
36
Extensible hashing Can handle growing files - with less wasted space - with no full reorganizations Summary C. + Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - - 36
37
Linear hashing Another dynamic hashing scheme Two ideas: (a) Use i low order bits of hash 01110101 grows b i (b) File grows linearly 37
38
Example b=4 bits, 2 keys/bucket 00 01 10 0101 11 0000 10 Future growth buckets 38
39
Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used blocks) 00 01 n =10 0101 11 0000 10 Future growth buckets 39
40
Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used blocks) i = 2 (# used bits = # bits of n) 00 01 n =10 0101 11 0000 10 Future growth buckets 40
41
Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used blocks) i = 2 (# used bits = # bits of n) 00 01 n =10 0101 11 0000 10 Future growth buckets 41
42
Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used blocks) i = 2 (# used bits = # bits of n) 00 01 n =10 01 11 00 10 Future growth buckets 42 Rules: If h(k)[i ] < n, then look at bucket h(k)[i ]
43
Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used blocks) i = 2 (# used bits = # bits of n) 00 01 n =10 0101 11 0000 10 Future growth buckets 43 Rules: If h(k)[i ] < n, then look at bucket h(k)[i ] If h(k)[i ] ≥ n, then look at bucket h(k)[i ] - 2 i -1 (i.e., replacing the leading bit 1 of h(k)[i] by 0)
44
Insertion: b=4 bits, 2 keys/bucket, n=10, i=2 00 01 n =10 0101 1111 0000 1010 Future growth buckets 0101 can have overflow chains! insert 0101 Rules: If h(k)[i ] < n, then look at bucket h(k)[i ] If h(k)[i ] ≥ n, then look at bucket h(k)[i ] - 2 i -1 (i.e., replacing the leading bit 1 of h(k)[i] by 0) 44
45
Increase size: b=4 bits, n=10, i=2 00 01 10 n =11 0101 1111 0000 1010 n = 10 (max used block) Future growth buckets 11 45
46
Increase size: b=4 bits, n=11, i=2 00 01 10 n=11 0101 1111 0000 1010 n = 10 (max used block) Future growth buckets 11 1010 46
47
Insertion: b=4 bits, n=11, i =2 00 01 10 n=11 0101 1111 0000 1010 n = 10 (max used block) Future growth buckets 11 1010 0101 insert 0101 47
48
Increase size: b=4 bits, n=11, i =2 00 01 1011 100 0101 1111 0000 1010 n = 10 (max used block) Future growth buckets 11 1010 0101 insert 0101 100 48
49
Increase size: b=4 bits, n=100, i =3 000 001 010 011 100 0101 1111 0000 1010 n = 10 (max used block) Future growth buckets 11 1010 0101 insert 0101 100 49
50
Increase size: b=4 bits, n=100, i=3 000 001 010 011 100 0101 1111 0000 1010 n = 10 (max used block) Future growth buckets 11 1010 0101 insert 0101 110 50 1111 0101
51
If U > threshold then increase m (and maybe i ) When do we expand file? Keep track of: # used slots total # of slots = U 51
52
Linear hashing: Searching input: a search key K \\ H is the hash function, i is the current bit number, \\ n is the current upper bound. 1. m = the last i bits of H(K); 2. IF m ≥ n THEN m = m – 2 i-1 ; 3. read in the disk block B with the address m \\ If K is not in B, you may have to read the \\ overflow blocks. Summary A. 52
53
Linear hashing: Insertion input: a tuple t with search key K \\ H is the hash function, i is the current bit number. \\ n is the current upper bound 1. m = the last i bits of H(K); 2. IF m ≥ n THEN m = m – 2 i-1 ; 3. read in the disk block B with the address m; insert t \\ If B is full, you need to use an overflow block. Summary B. 53
54
Linear hashing: Increasing hash table size \\ H is the hash function, i is the current bit number, \\ n is the current upper bound. 1.read in the disk block B of address n – 2 i -1 ; 2.split (properly) the tuples in B and put them in the block B and the block B’of address n; 3. n = n + 1; 4.IF # bits of n is increased THEN i = i + 1. Summary C. 54
55
Linear hashing: Decreasing hash table size \\ H is the hash function, i is the current bit number, \\ n is the current upper bound. 1.n = n - 1; 2.IF # bits of n is decreased THEN i = i − 1. 3.move the tuples in the block of address n to the block of address n – 2 i -1. Summary D. 55
56
Linear Hashing Can handle growing files - with less wasted space - with no full reorganizations No indirection like extensible hashing Summary E. + + Can still have overflow chains - 56
57
Example: BAD CASE Very full Very emptyNeed to move m here… Would waste space… 57
58
Hashing - How it works - Dynamic hashing - Extensible - Linear Summary 58
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.