Presentation is loading. Please wait.

Presentation is loading. Please wait.

CS4432: Database Systems II

Similar presentations


Presentation on theme: "CS4432: Database Systems II"— Presentation transcript:

1 CS4432: Database Systems II
Lecture #10 Professor Elke A. Rundensteiner CS 4432 lecture #10 - b+ tree indexing

2 Hierarchy of index structures
Sequence field 10 20 30 40 50 60 70 ... 50 30 1 10 50 90 ... high Level (always sparse) 2 3 70 20 4 40 80 5 10 100 first level (dense, if non- sequential) 60 90 CS 4432 lecture #8 - indexing

3 Conventional indexes : pros/cons ?
Advantage: - Simple - Index is sequential file good for scans - Search efficient for static data Disadvantage: - Inserts expensive, and/or - Lose sequentiality & balance - Then search time unpredictable CS 4432 lecture #8 - indexing

4 lecture #10 - b+ tree indexing
Example Sequential Index continuous free space 10 39 31 35 36 32 38 34 33 overflow area (not sequential) 20 30 40 50 60 70 80 90 CS 4432 lecture #10 - b+ tree indexing

5 Problems … Problems … Problems …
Without re-organization we get unpredictable performance Too much/often re-organization brings too much overhead DBA does not know when to reorganize DBA does not know how full to load pages of new index CS 4432 lecture #10 - b+ tree indexing

6 So Let’s Try Another Index . . .
Give up “sequentiality” of index Predictable performance under updates Achieve always balance of “tree” Automate restructuring under updates CS 4432 lecture #10 - b+ tree indexing

7 lecture #10 - b+ tree indexing
B+Tree Example n=3 Root 100 120 150 180 30 3 5 11 120 130 180 200 30 35 100 101 110 150 156 179 CS 4432 lecture #10 - b+ tree indexing

8 lecture #10 - b+ tree indexing
B+ Trees in Practice Typical order: Typical fill-factor: 67%. average fanout = 133 Typical capacities: Height 4: 1334 = 312,900,700 records Height 3: 1333 = 2,352,637 records Can often hold top levels in buffer pool: Level 1 = page = Kbytes Level 2 = pages = Mbyte Level 3 = 17,689 pages = 133 Mbytes CS 4432 lecture #10 - b+ tree indexing

9 lecture #10 - b+ tree indexing
Sample non-leaf 57 81 95 to keys to keys to keys to keys <  k<81 81k<95 95 CS 4432 lecture #10 - b+ tree indexing

10 lecture #10 - b+ tree indexing
Sample leaf node: From non-leaf node to next leaf in sequence 57 81 95 with key 57 with key 81 To record with key 95 CS 4432 lecture #10 - b+ tree indexing

11 In textbook’s notation n=3
Leaf: Non-leaf: 30 35 30 35 30 30 CS 4432 lecture #10 - b+ tree indexing

12 lecture #10 - b+ tree indexing
Size of node n: n+1 pointers n keys (fixed) CS 4432 lecture #10 - b+ tree indexing

13 Don’t want nodes to be too empty
Use at least Non-leaf: (n+1)/2 pointers Leaf: (n+1)/2 pointers to data CS 4432 lecture #10 - b+ tree indexing

14 lecture #10 - b+ tree indexing
Non-leaf: (n+1)/2 pointers Leaf: (n+1)/2 pointers to data n=3 Full node min. node Non-leaf Leaf 120 150 180 30 3 5 11 30 35 counts even if null CS 4432 lecture #10 - b+ tree indexing

15 B+tree rules tree of order n
(1) All leaves at same lowest level (balanced tree) (2) Pointers in leaves point to records; except for the “sequence pointer” CS 4432 lecture #10 - b+ tree indexing

16 lecture #10 - b+ tree indexing
(3) Number of pointers/keys for B+tree Max Max Min Min ptrs keys ptrsdata keys Non-leaf (non-root) n+1 n (n+1)/2 (n+1)/2- 1 Leaf (non-root) n+1 n (n+1)/2 (n+1)/2 Root n+1 n 1 1 CS 4432 lecture #10 - b+ tree indexing

17 B+Tree Example : Searches
Root 100 120 150 180 30 3 5 11 120 130 180 200 30 35 100 101 110 150 156 179 CS 4432 lecture #10 - b+ tree indexing

18 lecture #10 - b+ tree indexing
Insert into B+tree (a) simple case space available in leaf (b) leaf overflow (c) non-leaf overflow (d) new root CS 4432 lecture #10 - b+ tree indexing

19 lecture #10 - b+ tree indexing
(a) Insert key = 32 n=3 100 30 3 5 11 30 31 32 CS 4432 lecture #10 - b+ tree indexing

20 lecture #10 - b+ tree indexing
(a) Insert key = 7 n=3 100 30 7 3 5 11 30 31 3 5 7 CS 4432 lecture #10 - b+ tree indexing

21 lecture #10 - b+ tree indexing
(c) Insert key = 160 n=3 100 160 120 150 180 180 150 156 179 180 200 160 179 CS 4432 lecture #10 - b+ tree indexing

22 lecture #10 - b+ tree indexing
(d) New root, insert 45 n=3 30 new root 10 20 30 40 1 2 3 10 12 20 25 30 32 40 40 45 CS 4432 lecture #10 - b+ tree indexing

23 Recap: Insert Data into B+ Tree
Find correct leaf L. Put data entry onto L. If L has enough space, done! Else, must split L (into L and a new node L2) Redistribute entries evenly, copy up middle key. Insert index entry pointing to L2 into parent of L. This can happen recursively To split index node, redistribute entries evenly, but push up middle key. (Contrast with leaf splits.) Splits “grow” tree; root split increases height. Tree growth: gets wider or one level taller at top. CS 4432 lecture #10 - b+ tree indexing 6

24 lecture #10 - b+ tree indexing
Deletion from B+tree (a) Simple case (b) Leaf-node: Coalesce with neighbor (sibling) (c) Leaf-node: Re-distribute keys (d) Cases (b) or (c) at non-leaf CS 4432 lecture #10 - b+ tree indexing

25 lecture #10 - b+ tree indexing
(a) Delete key = 11 n=3 100 30 3 5 11 30 31 CS 4432 lecture #10 - b+ tree indexing

26 lecture #10 - b+ tree indexing
(b) Coalesce with sibling Delete 50 n=4 10 40 100 10 20 30 40 50 40 CS 4432 lecture #10 - b+ tree indexing

27 lecture #10 - b+ tree indexing
(c) Redistribute keys Delete 50 n=4 10 40 100 35 10 20 30 35 40 50 35 CS 4432 lecture #10 - b+ tree indexing

28 lecture #10 - b+ tree indexing
(d) Coalese and Non-leaf coalese Delete 37 n=4 25 25 new root 10 20 30 40 40 30 25 26 1 3 10 14 20 22 30 37 40 45 CS 4432 lecture #10 - b+ tree indexing

29 Delete Data from B+ Tree
Start at root, find leaf L where entry belongs. Remove the entry. If L is at least half-full, done! If L has only d-1 entries, Try to re-distribute, borrowing from sibling (adjacent node with same parent as L). If re-distribution fails, merge L and sibling. If merge occurred, must delete entry (pointing to L or sibling) from parent of L. Merge could propagate to root, decreasing height. CS 4432 lecture #10 - b+ tree indexing 14

30 lecture #10 - b+ tree indexing
Discussion of B-trees (vs. static indexed sequential files) Concurrency control harder in B-Trees B-tree consumes more space B-tree automatically decides : when to reorganize how full to load pages of new index CS 4432 lecture #10 - b+ tree indexing

31 Comparison B-tree vs. indexed seq. file
Less space, so lookup faster Inserts managed by overflow area Requires temporary restructuring Unpredictable performance Consumes more space, so lookup slower Each insert/delete potentially restructures Build-in restructuring Predictable performance CS 4432 lecture #10 - b+ tree indexing

32 lecture #10 - b+ tree indexing
Speaking of buffering… Is LRU a good policy for B+tree buffers? Of course not! Should try to keep root in memory at all times (and perhaps some nodes from second level) Should keep the “path” when going down to leaves (just in case of restructuring) CS 4432 lecture #10 - b+ tree indexing

33 lecture #10 - b+ tree indexing
A la buffering… Is LRU a good policy for B+tree buffers?  Of course not!  Should try to keep root in memory at all times (and perhaps some nodes from second level) CS 4432 lecture #10 - b+ tree indexing

34 Interesting problem: For B+tree, how large should n be? …
n is number of keys / node CS 4432 lecture #10 - b+ tree indexing

35 assumptions: n children per node and N records in database
Time to read B-Tree node from disk is (tseek + tread*n) msec. Once in main memory, use binary search to locate key, (a + b log_2 n) msec Need to search (read) log_n (N) tree nodes t-search = (tseek + tread*n + (a + b*log_2(n)) * log n (N) CS 4432 lecture #10 - b+ tree indexing

36 Can get: f(n) = time to find a record
nopt n  FIND nopt by f’(n) = 0 What happens to nopt as: Disk gets faster? CPU get faster? … CS 4432 lecture #10 - b+ tree indexing

37 lecture #10 - b+ tree indexing
Bulk Loading of B+ Tree For large collection of records, create B+ tree. Method 1: Repeatedly insert records  slow. Method 2: Bulk Loading  more efficient. CS 4432 lecture #10 - b+ tree indexing 20

38 lecture #10 - b+ tree indexing
Bulk Loading of B+ Tree Initialization: Sort all data entries Insert pointer to first (leaf) page in new (root) page. Root Sorted pages of data entries; not yet in B+ tree 3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44* CS 4432 lecture #10 - b+ tree indexing 20

39 lecture #10 - b+ tree indexing
Bulk Loading (Contd.) Root 10 20 Index entries for leaf pages always entered into right-most index page When this fills up, it splits. Split may go up right-most path to root. Data entry pages 6 12 23 35 not yet in B+ tree 3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44* Root 20 10 35 Data entry pages not yet in B+ tree 6 12 23 38 CS 4432 lecture #10 - b+ tree indexing 3* 4* 6* 9* 10* 11* 12* 13* 20* 22* 23* 31* 35* 36* 38* 41* 44* 21

40 Summary of Bulk Loading
Method 1: multiple inserts. Slow. Does not give sequential storage of leaves. Method 2: Bulk Loading Has advantages for concurrency control. Fewer I/Os during build. Leaves will be stored sequentially (and linked) Can control “fill factor” on pages. CS 4432 lecture #10 - b+ tree indexing 10

41 lecture #10 - b+ tree indexing
Summary B+ tree idea: self-balancing index structure that supports both search and insert/delete in log_n time. B+ tree is versatile : handles equality and range searches B+ tree and its variants: common index structure in industrial DBMSs CS 4432 lecture #10 - b+ tree indexing


Download ppt "CS4432: Database Systems II"

Similar presentations


Ads by Google