Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner.

Similar presentations


Presentation on theme: "CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner."— Presentation transcript:

1 CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner

2 CS 4432lecture #10 - indexing & hashing2 1.B+-tree Odds and Ends 2.Hashing (briefly) Chapter 4 – INDEXING Wrap-up  value record

3 CS 4432lecture #10 - indexing & hashing3 Root B+Tree Examplen=3 100 120 150 180 30 3 5 11 30 35 100 101 110 120 130 150 156 179 180 200

4 CS 4432lecture #10 - indexing & hashing4 Comparison B-tree vs. indexed seq. file Less space, so lookup faster Inserts managed by overflow area Requires temporary restructuring Unpredictable performance Consumes more space, so lookup slower Each insert/delete potentially restructures Build-in restructuring Predictable performance

5 CS 4432lecture #10 - indexing & hashing5 DBA does not know when to reorganize DBA does not know how full to load pages of new index B-trees better …

6 CS 4432lecture #10 - indexing & hashing6 A la buffering… Is LRU a good policyfor B+tree buffers?  Of course not!  Should try to keep root in memory at all times (and perhaps some nodes from second level)

7 CS 4432lecture #10 - indexing & hashing7 Interesting problem: For B+tree, how large should n be? … n is number of keys / node

8 CS 4432lecture #10 - indexing & hashing8 assumptions: n children per node and N records in database (1)Time to read B-Tree node from disk is (tseek + tread*n) msec. (2)Once in main memory, use binary search to locate key, (a + b log_2 n) msec (3)Need to search (read) log_n (N) tree nodes (4)t-search = (tseek + tread*n + (a + b*log_2(n)) * log n (N)

9 CS 4432lecture #10 - indexing & hashing9  Can get: f(n) = time to find a record f(n) n opt n  FIND n opt by f’(n) = 0 øWhat happens to n opt as: Disk gets faster? CPU get faster? …

10 CS 4432lecture #10 - indexing & hashing10 Bulk Loading of B+ Tree For large collection of records, create B+ tree. Method 1: Repeatedly insert records  slow. Method 2: Bulk Loading  more efficient.

11 CS 4432lecture #10 - indexing & hashing11 Bulk Loading of B+ Tree Initialization: – Sort all data entries – Insert pointer to first (leaf) page in new (root) page. 3* 4* 6*9*10*11*12*13* 20*22* 23*31* 35* 36*38*41*44* Sorted pages of data entries; not yet in B+ tree Root

12 CS 4432lecture #10 - indexing & hashing12 Bulk Loading (Contd.) Index entries for leaf pages always entered into right-most index page When this fills up, it splits. (Split may go up right- most path to root.) Faster than repeated inserts, especially when one considers locking! 3* 4* 6*9*10*11*12*13* 20*22* 23*31* 35* 36*38*41*44* Root Data entry pages not yet in B+ tree 3523126 1020 3* 4* 6*9*10*11*12*13* 20*22* 23*31* 35* 36*38*41*44* 6 Root 10 12 23 20 35 38 not yet in B+ tree Data entry pages

13 CS 4432lecture #10 - indexing & hashing13 Summary of Bulk Loading Method 1: multiple inserts. – Slow. – Does not give sequential storage of leaves. Method 2: Bulk Loading – Has advantages for concurrency control. – Fewer I/Os during build. – Leaves will be stored sequentially (and linked) – Can control “fill factor” on pages.

14 CS 4432lecture #10 - indexing & hashing14 key  h(key) Hashing...... Buckets (typically 1 disk block)

15 CS 4432lecture #10 - indexing & hashing15 Example hash function Key = ‘x 1 x 2 … x n ’ n byte character string Have b buckets h: add x 1 + x 2 + ….. x n – compute sum modulo b

16 CS 4432lecture #10 - indexing & hashing16  This may not be best function …  Read Knuth Vol. 3 if you really need to select a good function. Good hash  Expected number of function:keys/bucket is the same for all buckets

17 CS 4432lecture #10 - indexing & hashing17 Within a bucket: Do we keep keys sorted? Yes, if CPU time critical & Inserts/Deletes not too frequent

18 CS 4432lecture #10 - indexing & hashing18 Next: example to illustrate inserts, overflows, deletes h(K)

19 CS 4432lecture #10 - indexing & hashing19 EXAMPLE 2 records/bucket INSERT: h(a) = 1 h(b) = 2 h(c) = 1 h(d) = 0 01230123 d a c b h(e) = 1 e

20 CS 4432lecture #10 - indexing & hashing20 01230123 a b c e d EXAMPLE: deletion Delete: e f f g maybe move “g” up c d

21 CS 4432lecture #10 - indexing & hashing21 Rule of thumb: Try to keep space utilization between 50% and 80% Utilization = # keys used total # keys that fit If < 50%, wasting space If > 80%, overflows significant depends on how good hash function is & on # keys/bucket

22 CS 4432lecture #10 - indexing & hashing22 How do we cope with growth? Overflows and reorganizations Dynamic hashing Extensible hashing Others …

23 CS 4432lecture #10 - indexing & hashing23 Extensible hashing : idea 1 (a) Use i of b bits output by hash function b h(K)  use i  grows over time…. 00110101

24 CS 4432lecture #10 - indexing & hashing24 (b) Use directory h(K)[i ] to bucket............ Extensible hashing : idea 2

25 CS 4432lecture #10 - indexing & hashing25 Example: h(k) is 4 bits; 2 keys/bucket i = 1 1 1 0001 1001 1100 Insert 1010 1 1100 1010 New directory 2 00 01 10 11 i = 2 2 0 1

26 CS 4432lecture #10 - indexing & hashing26 1 0001 2 1001 1010 2 1100 Insert: 0111 0000 00 01 10 11 2 i = Example continued 0111 0000 0111 0001 2 2

27 CS 4432lecture #10 - indexing & hashing27 00 01 10 11 2 i = 2 1001 1010 2 1100 2 0111 2 0000 0001 Insert: 1001 Example continued 1001 1010 000 001 010 011 100 101 110 111 3 i = 3 3

28 CS 4432lecture #10 - indexing & hashing28 Extensible hashing: deletion Merge blocks and cut directory if possible (Reverse insert procedure)

29 CS 4432lecture #10 - indexing & hashing29 Extensible hashing Can handle growing files - with less wasted space - with no full reorganizations Summary + Indirection (Not bad if directory in memory) Directory doubles in size (Now it fits, now it does not) - -

30 CS 4432lecture #10 - indexing & hashing30 Hashing good for probes given key e.g., SELECT … FROM R WHERE R.A = 5 Indexing vs Hashing

31 CS 4432lecture #10 - indexing & hashing31 INDEXING (Including B Trees) good for Range Searches: e.g., SELECT FROM R WHERE R.A > 5 Indexing vs Hashing

32 CS 4432lecture #10 - indexing & hashing32 The BIG picture…. Chapters 2 & 3: Storage, records, blocks... Chapter 4 & 5: Access Mechanisms - Indexes - B trees - Hashing - Multi key Chapter 6 & 7: Query Processing


Download ppt "CS 4432lecture #10 - indexing & hashing1 CS4432: Database Systems II Lecture #10 Professor Elke A. Rundensteiner."

Similar presentations


Ads by Google