Download presentation

Presentation is loading. Please wait.

Published byHerbert Burditt Modified about 1 year ago

1
B-Trees

2
But first, a little note about data structures Not all data structures work well as file structures Example: Binary Search Tree Knight GibsonSanders ColemanHudsonMonroe

3
Motivation for B-Trees Index too large for memory search time better than binary search not just fast search, but also fast delete and fast insert What's the "B" stand for? ◦ Bayer and McCreight ◦ Boeing ◦ balanced, bushy, broad

4
Indexed Files Searching ◦ Dannelly Deleting ◦ Duncan Adding ◦ Walters ◦ Aardvark Name Yadda yadda 0Carey 1Foster 2Barnes 3Zinn 4Critter 5Faulk 6Adams 7Wilks 8Bishop 9Farrow 10Duncan 11Dinkins 12West... 18Bell 19Conner 20Davis 21Dannelly... 80Camp 81Zane... 98Fuller 99Crook KeyDRRN 0Adams6 1Barnes2 2Bell18 3Bishop8 4Camp80 5Carey0 6Conner19 7Critter4 8Crook99 9Dannelly21 10Davis20 11Dinkins11 12Duncan10... 18Faulk5 19Farrow9 20Foster1 21Fuller98... 80West12 81Wilks7... 98Zane81 99Zinn3 KeyIRRN 0 Adams0 1 Davis10 2 Foster20 3 Ingram30 4 Lambert40 5 Norris50 6 Randall60 7 Tyler70 8 West80 9 Young90 Level One Level Two Data File

5
B-Tree Informal Definition multi-level indexes nodes are indexes, indexes are nodes "Order" - maximum references in a node, minimum references is ½ the order When node fills, split it and move up largest key When node is too empty, combine it with parent

6
Example Insertion Insert these letters into an 4-order B-Tree C S D T A M P I B W N G U R After C S D and T After A, node splits and largest keys move up M and P are added to right node, but so is I CDST DT STACD DPT IMPACDST

7
C S D T A M P I B W N G U R B, W, and N are no problem Insertion of G causes another split, then U is no problem Inserting R causes right node to split, then root to split DPW IMNPABCDSTW DMPW GIMABCDSTUWNP DMP ABCDUWNP RST GIM TW PW

8
Analysis Order Size? ◦ match to disk cluster size and memory Number of file accesses to Search ◦ depth of tree ◦ so bigger the Order the better Order = 100 and Levels = 4 == 100million records Number of file accesses to Delete 1.search downward to the leaf 2.modify node 3.if it was largest, adjust parent node

9
Analysis Number of file accesses to Add best case 1.search downward 2.adjust the node worst case (split) 1.search downward to the leaf 2.insert, overflow detect, split upward 3.create new root node

10
Definition of B-Tree In general, a B-Tree of Order N has the following properties: 1. the root has at least two descendants, or is a leaf 2. each node has no mode that N descendents 3. each node that is not the root or a leaf has at least N/2 descendants 4. all leaf nodes are at the same level 5. a nonleaf node with k descendants contains k-1 key values

11
B+ Trees Since search time = depth of tree, we need to keep the tree short and wide Uneven tree (some full nodes and some near empty nodes, or leans to one side) creates poor performance Using a slightly smarter split method during add keeps the tree short and balanced B+ Tree is the de facto standard for databases

12
Storing a B-Tree in Files Data File ◦ order does not matter Index File ◦ lists of indexes ◦ each index: key, RRN DMP ABCDUWNP RST GIM TW PW KeyIRNRRN 0P4 1W8 2-- 3 4D12 5M16 6P20 7-- 8T24 9W28 10-- 11-- 12A4 13B8 14C0 15D2 16G11 17I7 18M5 19-- 20N10 21P6 22-- 23-- 24R13 25S1 26T3 27-- When inserting a new node, does its placement in the index file matter?

13
Next Class Review of the entire semester Sorting and Binary Searching FAT v. NTFS v. Linux File Access Times Fragmented Files which is best storage method ◦ Indexed ◦ B-Tree ◦ Hashed

Similar presentations

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google