Presentation is loading. Please wait.

Presentation is loading. Please wait.

CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 1 Notes #8.

Similar presentations


Presentation on theme: "CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 1 Notes #8."— Presentation transcript:

1 CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 Email: chen@cse.tamu.edu 1 Notes #8

2 key  h(key) Hashing...... Buckets (typically 1 disk block) 2

3 ...... Two alternatives records...... (1) key  h(key) 3

4 (2) key  h(key) Index record key 1 Two alternatives Option 2 for “secondary” search key 4

5 Example hash function Key = ‘x 1 x 2 … x n ’ n byte character string Have b buckets h: add x 1 + x 2 + … + x n –compute sum modulo b 5

6  This may not be the best function…  Read Knuth Vol. 3 if you really need to select a good function Good hash  Expected number of function:keys/bucket is the same for all buckets 6

7 Within a bucket: Do we keep keys sorted? Yes, if CPU time critical & inserts/deletes not too frequent 7

8 Next:example to illustrate inserts, overflows, deletes h(key) 8

9 EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 01230123 d a c b 9

10 EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 01230123 d a c b h(e) = 1 e 10

11 01230123 a b c e d EXAMPLE: deletion DELETE: e f f g 11

12 01230123 a b c e d EXAMPLE: deletion DELETE: e f f g maybe move “g” up 12

13 01230123 a b c e d EXAMPLE: deletion DELETE: e f f g c maybe move “g” up 13

14 01230123 a b c e d EXAMPLE: deletion DELETE: e f f g c d maybe move “g” up 14

15 Rule of thumb: Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit 15

16 Rule of thumb: Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket 16

17 How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear 17

18 Extensible hashing: two ideas (a) Use i of b bits output by hash function b h(k)  use i  grows over time… 00110101 18

19 (b) Use directory h(k)[i ] to bucket............ 19

20 Example: h(k) is 4 bits; 2 keys/bucket i = 1 1 1 0001 1001 1100 Insert 1010 20

21 Example: h(k) is 4 bits; 2 keys/bucket i = 1 1 1 0001 1001 1100 Insert 1010 1 1100 1010 21

22 Example: h(k) is 4 bits; 2 keys/bucket i = 1 1 1 0001 1001 1100 Insert 1010 1 1100 1010 New directory 2 00 01 10 11 i = 2 2 22

23 1 0001 2 1001 1010 2 1100 00 01 10 11 2 i = Example continued 23

24 1 0001 2 1001 1010 2 1100 Insert: 0111 00 01 10 11 2 i = Example continued 0111 24

25 1 0001 2 1001 1010 2 1100 Insert: 0111 0000 00 01 10 11 2 i = Example continued 0111 25

26 1 0001 2 1001 1010 2 1100 Insert: 0111 0000 00 01 10 11 2 i = Example continued 0111 0000 0111 0001 26

27 1 0001 2 1001 1010 2 1100 Insert: 0111 0000 00 01 10 11 2 i = Example continued 0111 0000 0111 0001 2 2 27

28 00 01 10 11 2 i = 2 1001 1010 2 1100 2 0111 2 0000 0001 Example continued 28

29 00 01 10 11 2 i = 2 1001 1010 2 1100 2 0111 2 0000 0001 Insert: 1001 Example continued 1001 1010 29

30 00 01 10 11 2 i = 2 1001 1010 2 1100 2 0111 2 0000 0001 Insert: 1001 Example continued 1001 1010 000 001 010 011 100 101 110 111 3 i = 3 3 30

31 Extensible hashing: deletion No merging of blocks Merge blocks and cut directory if possible (Reverse insert procedure) 31

32 Note: Still need overflow chains Example: many records with duplicate keys 1 1101 1100 22 insert 1100 1100 if we split: 1101 ? 32

33 Solution: overflow chains 1 1101 1100 1 insert 1100 add overflow block: 1101 33

34 Extensible hashing: Searching input: a search key K \\ H is the hash function, D is the directory, i is the current bit number. 1.m = the first i bits of H(K); 2.read in the disk block B with the address D[m]. Summary A. 34

35 Extensible hashing: Insertion input: a tuple t with search key K \\ H is the hash function, D is the directory, i is the current bit number. 1.m = the first i bits of H(K); 2.read in the disk block B with address D[m]; 3.IF B has room THEN add t in B 4.ELSE let j be the bit number of B IF i = j THEN {double the size of D, i = i + 1; and let the pointers in the new D[2h] and D[2h+1] both equal to that in the old D[h], 0 ≤ h ≤ 2 i ; } split B and t into B 1 and B 2, both with block bit number j+1; let the two corresponding pointers in D go to B 1 and B 2, resp. Summary B. 35

36 Extensible hashing Can handle growing files - with less wasted space - with no full reorganizations Summary C. + Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - - 36

37 Linear hashing Another dynamic hashing scheme Two ideas: (a) Use i low order bits of hash 01110101 grows b i (b) File grows linearly 37

38 Example b=4 bits, 2 keys/bucket 00 01 10 0101 11 0000 10 Future growth buckets 38

39 Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used blocks) 00 01 n =10 0101 11 0000 10 Future growth buckets 39

40 Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used blocks) i = 2 (# used bits = # bits of n) 00 01 n =10 0101 11 0000 10 Future growth buckets 40

41 Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used blocks) i = 2 (# used bits = # bits of n) 00 01 n =10 0101 11 0000 10 Future growth buckets 41

42 Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used blocks) i = 2 (# used bits = # bits of n) 00 01 n =10 01 11 00 10 Future growth buckets 42 Rules: If h(k)[i ] < n, then look at bucket h(k)[i ]

43 Example b=4 bits, 2 keys/bucket n = 10 (1 + the largest index of the used blocks) i = 2 (# used bits = # bits of n) 00 01 n =10 0101 11 0000 10 Future growth buckets 43 Rules: If h(k)[i ] < n, then look at bucket h(k)[i ] If h(k)[i ] ≥ n, then look at bucket h(k)[i ] - 2 i -1 (i.e., replacing the leading bit 1 of h(k)[i] by 0)

44 Insertion: b=4 bits, 2 keys/bucket, n=10, i=2 00 01 n =10 0101 1111 0000 1010 Future growth buckets 0101 can have overflow chains! insert 0101 Rules: If h(k)[i ] < n, then look at bucket h(k)[i ] If h(k)[i ] ≥ n, then look at bucket h(k)[i ] - 2 i -1 (i.e., replacing the leading bit 1 of h(k)[i] by 0) 44

45 Increase size: b=4 bits, n=10, i=2 00 01 10 n =11 0101 1111 0000 1010 n = 10 (max used block) Future growth buckets 11 45

46 Increase size: b=4 bits, n=11, i=2 00 01 10 n=11 0101 1111 0000 1010 n = 10 (max used block) Future growth buckets 11 1010 46

47 Insertion: b=4 bits, n=11, i =2 00 01 10 n=11 0101 1111 0000 1010 n = 10 (max used block) Future growth buckets 11 1010 0101 insert 0101 47

48 Increase size: b=4 bits, n=11, i =2 00 01 1011 100 0101 1111 0000 1010 n = 10 (max used block) Future growth buckets 11 1010 0101 insert 0101 100 48

49 Increase size: b=4 bits, n=100, i =3 000 001 010 011 100 0101 1111 0000 1010 n = 10 (max used block) Future growth buckets 11 1010 0101 insert 0101 100 49

50 Increase size: b=4 bits, n=100, i=3 000 001 010 011 100 0101 1111 0000 1010 n = 10 (max used block) Future growth buckets 11 1010 0101 insert 0101 110 50 1111 0101

51 If U > threshold then increase m (and maybe i )  When do we expand file? Keep track of: # used slots total # of slots = U 51

52 Linear hashing: Searching input: a search key K \\ H is the hash function, i is the current bit number, \\ n is the current upper bound. 1. m = the last i bits of H(K); 2. IF m ≥ n THEN m = m – 2 i-1 ; 3. read in the disk block B with the address m \\ If K is not in B, you may have to read the \\ overflow blocks. Summary A. 52

53 Linear hashing: Insertion input: a tuple t with search key K \\ H is the hash function, i is the current bit number. \\ n is the current upper bound 1. m = the last i bits of H(K); 2. IF m ≥ n THEN m = m – 2 i-1 ; 3. read in the disk block B with the address m; insert t \\ If B is full, you need to use an overflow block. Summary B. 53

54 Linear hashing: Increasing hash table size \\ H is the hash function, i is the current bit number, \\ n is the current upper bound. 1.read in the disk block B of address n – 2 i -1 ; 2.split (properly) the tuples in B and put them in the block B and the block B’of address n; 3. n = n + 1; 4.IF # bits of n is increased THEN i = i + 1. Summary C. 54

55 Linear hashing: Decreasing hash table size \\ H is the hash function, i is the current bit number, \\ n is the current upper bound. 1.n = n - 1; 2.IF # bits of n is decreased THEN i = i − 1. 3.move the tuples in the block of address n to the block of address n – 2 i -1. Summary D. 55

56 Linear Hashing Can handle growing files - with less wasted space - with no full reorganizations No indirection like extensible hashing Summary E. + + Can still have overflow chains - 56

57 Example: BAD CASE Very full Very emptyNeed to move m here… Would waste space… 57

58 Hashing - How it works - Dynamic hashing - Extensible - Linear Summary 58


Download ppt "CPSC-608 Database Systems Fall 2010 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 1 Notes #8."

Similar presentations


Ads by Google