Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS4432: Database Systems II

Similar presentations


Presentation on theme: "CS4432: Database Systems II"— Presentation transcript:

1 CS4432: Database Systems II
Lecture #12 Professor Elke A. Rundensteiner CS 4432 lecture #11 - indexing & hashing

2 lecture #11 - indexing & hashing
key  h(key) <key> Buckets (typically 1 disk block) . CS 4432 lecture #11 - indexing & hashing

3 One example hash function
Key = ‘x1 x2 … xn’ n-byte character string Have b buckets Example hash function : h: add (x1 + x2 + ….. Xn) modulo b CS 4432 lecture #11 - indexing & hashing

4 lecture #11 - indexing & hashing
 This may not be best function …  Read Knuth Vol. 3 if you really need to select a good function. Good hash  Expected number of function: keys/bucket is the same for all buckets CS 4432 lecture #11 - indexing & hashing

5 lecture #11 - indexing & hashing
Within a bucket: Do we keep keys sorted? Yes, but only if CPU time critical & Inserts/Deletes not too frequent CS 4432 lecture #11 - indexing & hashing

6 Next: example to illustrate inserts, overflows, deletes
h(K) CS 4432 lecture #11 - indexing & hashing

7 EXAMPLE 2 records/bucket
1 2 3 INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 d a c b e h(e) = 1 CS 4432 lecture #11 - indexing & hashing

8 lecture #11 - indexing & hashing
EXAMPLE: deletion Delete: e f 1 2 3 a d b d c c e maybe move “g” up f g CS 4432 lecture #11 - indexing & hashing

9 lecture #11 - indexing & hashing
Rule of thumb: Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket CS 4432 lecture #11 - indexing & hashing

10 How do we cope with growth?
Overflows and reorganizations Dynamic hashing Extensible hashing Others … CS 4432 lecture #11 - indexing & hashing

11 Extensible hashing : idea 1
(a) Use i of b bits output by hash function b h(K)  use i  grows over time…. Note: enables future doubling of space ! CS 4432 lecture #11 - indexing & hashing

12 lecture #11 - indexing & hashing
Extensible hashing : idea 2 (b) Hash to directory of pointers to buckets (instead of buckets directly) h(K)[i ] to bucket (c) Assume directory of pointers is contiguous Note : Double space by doubling the directory ! . . CS 4432 lecture #11 - indexing & hashing

13 Example: h(k) is 4 bits; 2 keys/bucket
New directory 2 00 01 10 11 i = 1 i = 0001 1 1 1 1001 1 1100 1010 1100 Insert 1010 CS 4432 lecture #11 - indexing & hashing

14 lecture #11 - indexing & hashing
Example continued 2 0000 0111 0001 i = 2 00 01 10 11 1 0001 0111 2 1001 1010 Insert: 0111 0000 2 1100 CS 4432 lecture #11 - indexing & hashing

15 lecture #11 - indexing & hashing
Example continued 000 001 010 011 100 101 110 111 3 i = 0000 2 i = 0001 2 00 01 10 11 0111 2 1001 1010 2 1001 1010 Insert: 1001 2 1100 CS 4432 lecture #11 - indexing & hashing

16 Extensible hashing: deletion
Merge blocks and cut directory if possible (Reverse insert procedure) CS 4432 lecture #11 - indexing & hashing

17 lecture #11 - indexing & hashing
Extensible hashing Summary If directory fits into main memory, then access cost is 1 IO, otherwise 2 IOs Can handle growing files - with less wasted space - with no full reorganizations + + Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - CS 4432 lecture #11 - indexing & hashing

18 lecture #11 - indexing & hashing
Use what when : Indexing : Tree-Structures vs Hashing CS 4432 lecture #11 - indexing & hashing

19 lecture #11 - indexing & hashing
Indexing vs Hashing Hashing good for equality probes given key: e.g., SELECT … FROM R WHERE R.A = 5 CS 4432 lecture #11 - indexing & hashing

20 lecture #11 - indexing & hashing
Indexing vs Hashing Tree indexing (Including B Trees) good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5 CS 4432 lecture #11 - indexing & hashing


Download ppt "CS4432: Database Systems II"

Similar presentations


Ads by Google