CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 1 Notes #11.

CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 Email: chen@cse.tamu.edu 1 Notes #11

secondary storage (disks) in tables (relations) database administrator DDL language database programmer DML (query) language DBMS file manager buffer manager main memory buffers index/file manager DML complier DDL complier query execution engine transaction manager concurrency control lock table logging & recovery graduate database

The Main Purpose of Index Structures Speedup the search process 3 index σ a=6 (R) blocks contianing the desired tuples quickly figure out disks otherwise have to scan the entire R Example: B+ trees

Another Index Structure: Hash Tables 4 hush function h search key k h(k)h(k) buckets A bucket is typically a disk block (probably with overflow blocks) h(k), 0 ≤ k ≤ b-1, gives an easy way to compute the bucket address (direct: address from h(k); indirect: h(k) is the index in a directory.

Example hash function key k = ‘x 1 x 2 … x n ’ n byte character string have b buckets hash function h(k): h(k) = ( x 1 + x 2 + … + x n ) mod b 5

 This may not be the best function…  Read Knuth Vol. 3 if you really need to select a good function Good hash  Expected number of function: keys/bucket is roughly the same for all buckets 6

Within a bucket: Do we keep keys sorted? Yes, if CPU time critical & inserts/deletes not too frequent 7

Next:example to illustrate inserts, overflows, deletes h(key) 8

EXAMPLE (two records/bucket) INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 01230123 d a c b 9

EXAMPLE (two records/bucket) INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 01230123 d a c b h(e) = 1 e 10

01230123 a b c e d EXAMPLE: deletion DELETE: e f f g 11

01230123 a b c e d EXAMPLE: deletion DELETE: e f f g maybe move “g” up 12

01230123 a b c e d EXAMPLE: deletion DELETE: e f f g c maybe move “g” up 13

01230123 a b c e d EXAMPLE: deletion DELETE: e f f g c d maybe move “g” up 14

Rule of thumb: Try to keep space utilization between 50% and 80% Utilization =. # keys used. total # keys that fit 15

Rule of thumb: Try to keep space utilization between 50% and 80% Utilization =. # keys used. total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket 16

How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible Linear 17

Extensible hashing: two ideas (a) Use i of b bits output by hash function h(k)  use i  grows over time… 00110101 19 b (b) Use directory h(k)[i] to bucket............

Extensible Hashing: General framework 20 k h(k)h(k) i h h(k)ih(k)i i i 00…00 00…01 11…11 i # bits used by the directory...... j1j1 j1j1 j2j2 directory buckets # bits used by the buckets

Example: h(k) is 4 bits; 2 keys/bucket i = 1 1 1 0001 1001 1100 Insert 1010 21

Example: h(k) is 4 bits; 2 keys/bucket i = 1 1 1 0001 1001 1100 Insert 1010 1 1100 1010 22

Example: h(k) is 4 bits; 2 keys/bucket i = 1 1 1 0001 1001 1100 Insert 1010 1 1100 1010 New directory 2 00 01 10 11 i = 2 2 23

1 0001 2 1001 1010 2 1100 00 01 10 11 2 i = Example continued 24

1 0001 2 1001 1010 2 1100 Insert: 0111 00 01 10 11 2 i = Example continued 0111 25

1 0001 2 1001 1010 2 1100 Insert: 0111 0000 00 01 10 11 2 i = Example continued 0111 26

1 0001 2 1001 1010 2 1100 Insert: 0111 0000 00 01 10 11 2 i = Example continued 0111 0000 0111 0001 27

1 0001 2 1001 1010 2 1100 Insert: 0111 0000 00 01 10 11 2 i = Example continued 0111 0000 0111 0001 2 2 28

00 01 10 11 2 i = 2 1001 1010 2 1100 2 0111 2 0000 0001 Example continued 29

00 01 10 11 2 i = 2 1001 1010 2 1100 2 0111 2 0000 0001 Insert: 1000 Example continued 1000 1001 1010 30

00 01 10 11 2 i = 2 1001 1010 2 1100 2 0111 2 0000 0001 Insert: 1000 Example continued 1000 1001 1010 000 001 010 011 100 101 110 111 3 i = 3 3 31

Extensible hashing: deletion No merging of blocks Merge blocks and cut directory if possible (Reverse insert procedure) 32

Note: Still need overflow chains Example: many records with duplicate keys 1 1101 1100 22 insert 1100 1100 if we split: 1101 ? 33

Solution: overflow chains 1 1101 1100 1 insert 1100 add overflow block: 1101 34

Extensible hashing: Searching input: a search key k \\ h is the hash function, D is the directory, i is the current bit number. 1.m = the first i bits of h(k); 2.read in the disk block B with the address D[m]. Summary A. 35

Extensible hashing: Insertion input: a tuple t with search key k \\ h is the hash function, D is the directory, i is the current bit number. 1.m = the first i bits of h(k); 2.read in the disk block B with address D[m]; 3.IF B has room THEN add t in B 4.ELSE let j be the bit number of B IF i = j THEN {double the size of D, i = i + 1; and let the pointers in the new D[2h] and D[2h+1] both equal to that in the old D[h], 0 ≤ h ≤ 2 i ; } split B + t into B 1 and B 2, both with block bit number j+1; let the two corresponding pointers in D go to B 1 and B 2, resp. Summary B. 36

Extensible hashing Can handle growing files - with less wasted space - with no full reorganizations Summary C. + Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - - 37

CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 1 Notes #11.

Similar presentations

Presentation on theme: "CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 1 Notes #11."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 1 Notes #11.

Similar presentations

Presentation on theme: "CPSC-608 Database Systems Fall 2011 Instructor: Jianer Chen Office: HRBB 315C Phone: 845-4259 1 Notes #11."— Presentation transcript:

Similar presentations

About project

Feedback