B+ Trees  What if you have A LOT of data that needs to be stored and accessed quickly  Won’t all fit in memory.  Means we have to access your hard.

Slides:



Advertisements
Similar presentations
 Definition of B+ tree  How to create B+ tree  How to search for record  How to delete and insert a data.
Advertisements

Advanced Database Discussion B Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
B+-trees. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B = n pages I/O complexity:
COMP 451/651 Indexes Chapter 1.
ICS 421 Spring 2010 Indexing (1) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 02/18/20101Lipyeow Lim.
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
CSE332: Data Abstractions Lecture 9: BTrees Tyler Robison Summer
CS4432: Database Systems II
CS CS4432: Database Systems II Basic indexing.
Tree-Structured Indexes. Introduction v As for any index, 3 alternatives for data entries k* : À Data record with key value k Á Â v Choice is orthogonal.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
Indexing (cont.). Insertion in a B+ Tree Another B+ Tree
(B+-Trees, that is) Steve Wolfman 2014W1
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
Tirgul 6 B-Trees – Another kind of balanced trees.
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
1 B+ Trees. 2 Tree-Structured Indices v Tree-structured indexing techniques support both range searches and equality searches. v ISAM : static structure;
CS4432: Database Systems II
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
Introduction to Database Systems1 B+-Trees Storage Technology: Topic 5.
Indexing and Hashing (emphasis on B+ trees) By Huy Nguyen Cs157b TR Lee, Sin-Min.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 6.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
COSC 2007 Data Structures II Chapter 15 External Methods.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Starting at Binary Trees
CPSC 221: Algorithms and Data Structures Lecture #7 Sweet, Sweet Tree Hives (B+-Trees, that is) Steve Wolfman 2010W2.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes.
CSE 326 Killer Bee-Trees David Kaplan Dept of Computer Science & Engineering Autumn 2001 Where was that root?
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Content based on Chapter 10 Database Management Systems, (3 rd.
Tree-Structured Indexes Chapter 10
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
1 Query Processing Part 3: B+Trees. 2 Dense and Sparse Indexes Advantage: - Simple - Index is sequential file good for scans Disadvantage: - Insertions.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
CSE 326: Data Structures Lecture #9 Big, Bad B-Trees Steve Wolfman Winter Quarter 2000.
Tree-Structured Indexes. Introduction As for any index, 3 alternatives for data entries k*: – Data record with key value k –  Choice is orthogonal to.
TCSS 342, Winter 2006 Lecture Notes
Multiway Search Trees Data may not fit into main memory
CS522 Advanced database Systems
B+-Trees.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B+-Trees.
B+-Trees.
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
CS222/CS122C: Principles of Data Management Notes #07 B+ Trees
CS222/CS122C: Principles of Data Management UCI, Fall 2018 Notes #06 B+ trees Instructor: Chen Li.
CS222P: Principles of Data Management UCI, Fall Notes #06 B+ trees
Presentation transcript:

B+ Trees  What if you have A LOT of data that needs to be stored and accessed quickly  Won’t all fit in memory.  Means we have to access your hard drive (disk access)  Ugh!  A B+ tree is a tree whose nodes are pages on disk.  Offers fast search  Fast tree traversal  B+ Tree: most widely used index for database management systems.

B+ Tree:  M-ary tree: general tree, with M-way branching  We decide how many keys are in each node (that’s the M)  Tree is balanced – all paths from the root to the leaf are equal in depth 10*15*20*27*33*37* 40* 46* 51* 55* 63* 97* Root

B+ Tree: Leaf vs Interior:  In B+ trees, we must distinguish between Leaf Nodes and Interior Nodes:  Leaf Nodes:  Leaf Nodes are where the data is  Leaf nodes are fixed in size  Leaf nodes are on the disk  Leaf nodes are sorted linked list of nodes  Interior Nodes:  Only used to navigate to the correct leaf node  For each interior node, the number of pointers is the number of keys + 1  Interior nodes are also sorted linked list of nodes

Example of interior node: to keys to keysto keys to keys < 5757  k<8181  k<95 95 

B+ Trees Fill factor  Nodes (both interior and leaf) may be partially filled.  There’s a fill factor,  a percentage that controls the minimal number of keys in all non-root nodes.  Usually 50%  Every node must be sufficiently filled  So if the capacity of each node is 4 keys, and the fill factor is 50%, no node other than the root can have under 2 keys  If a node is too empty, it’s underfilled. Only the root can be underfiled

2*3* Root (underfilled) *16* 17*20*22*24*27* 29*33*34* 38* 39* 145 7*5*8*

Searching B+ Tree:  A lot like Binary Search Tree  Only all searches must end in leaf nodes  Leaf node should contain key we’re looking for, or will definitively LeafNode search(Node p, Key key) { if p is leafnode, return p; Otherwise if key < p.keys[0] return search(p.beforeptr, key); otherwise go through keys in a node until key >= p.keys[i] and key < p.keys[i+1] return (search(p.keys[i].afterptr, key)) } Searching takes at most log d (n) = d is fill in each node (at least half of max fill of each node), and n is the number of entries in the tree The larger d is, the shorter the height of the tree

2*3* Root *16* 19*20*22*24*27* 29*33*34* 38* 39* 145 7*5*8* Find:

Inserting:  Trickier  We have to worry about overfilling a node  Case 1:  Leaf node isn’t filled  Insert new key in order (remember it’s a linked list)  Virtual memory – seems like disk space is memory – we can pretend that the stuff stored on a disk is stored in memory – it just takes a bit longer to load  The node(s) above don’t change

Insert 23* Root * 3*5* 7*13*16* 17*20* 22*24*27* 29*33*34* 38* 39* 13 Root * 3*5* 7*13*16* 17*20* 22*24*27* 29*33*34* 38* 39* 13 23* No splitting required.

Leaf Nodes class LeafNode { public: int keys[4]; // This could be a linked list Data *data[4]; // this is the data associated with each key int curr_fullness; LeafNode *nextleaf; }; // to insert into a non-full leaf node: keys[curr_fullness] = newkey; curr_fullness ++;

B+ Tree insertion  Case 2:  Leaf node is full, but parent node has space: 1. Create a new sibling leaf node after target leaf node (new_target) 2. Split the (sorted) data in the full leaf node 1. half is in the old leaf node, and half in the new target leaf node. 3. Adjust the fullness size of each of the leaf nodes  Now the parent must point to the new target node  Use the first value in the new target node new_target.keys[0] and insert this key value into the parent.

Root * 3*5* 7*13*16* 17*20* 22*24*27* 29* 13 23* Insert * 3*5* 7*14*16* 17*20* 24*27* 29* * 23* Insert 21 with pointer into parent node * 3*5* 7*13*16* 17*20* 24*27* 29* * 23* 24

Case 2 insertion pseudocode:  If the leaf node’s keys are full:  Make a new node (the new node goes after the full node)  newLeaf = new leafNode();  Split sorted keys between the old node and the new node  OldLeaf.keys[0 to fullness/2-1] stay the same  oldLeaf.curr_fullness = fullness/2;  newLeaf.keys[0 to fullness/2] become oldkeys[fullness/2 to end], including newly inserted key  newLeaf.curr_fullness = fullness/2 + 1;  Link the old node to the new node  tmp = oldLeaf.nextleaf;  oldLeaf.nextleaf = newLeaf;  newLeaf.nextleaf = tmp;  Now insert the first key in the new node into the parent node  Parent.keys[x] = newLeaf.keys[0];  Parent.leafptrs[x] = newLeaf; We’re done because we specified that the parent was not full.

Case 3: both target and parent are full:  Create new leaf node, divide keys in half and place in each node. Use the first key in the new node as the key for the parent (like case 2)  Interior (parent) node is full  create a new interior node  Divide the sorted keys (including the new node’s key) between the old interior node and the new node.  Insert a new pointer to the new interior node from its parent node, with the key being the first key in the new interior node.  (now we no longer need this key in the interior node)  Recursively insert the new key/pointer into the parent node, until the parent node is no longer full.  If you split the root, make a new root with the before pointer pointing to the old root and key and pointer to the new split-off node.

Insert 8* 2*3* New Root *16* 17*20*22*24*27* 29*30*34* 38* 39* 135 7*5*8* Root * 3*5* 7*13*16* 17*20* 22*24*27* 29*30*34* 38* 39* 13 Root * 3*5* 7*13*16* 17*20* 22*24*27* 29*30*34* 38* 39* 13 8* Insert 5 into parent * 3* 5* 7*13*16* 17*20* 22*24*27* 29*30*34* 38* 39* 13 8* 5 Bring 17 up to the parent (or, in this case, make a new root)

Deleting from a B+ Tree:  Worry about Underflow  Start at root, find leaf with key  Remove the key.  If the leaf is still at least half-full, done!  If leaf is less than half full,  Try redistribution:  Borrow from sibling (adjacent node with same parent as leaf)  Change key in parent  If redistribution fails, merge:  Merge with sibling  Delete key from parent of leaf  May need to propagate merging up tree  If parent ends up with underflow,  Adopt from neighbor, update parent  If necessary, merge and delete from parent  If root ends up with only one child (not key, child!) make the child be the new root.

Case 1a: Delete 5: 2*3* Root *16* 17*20*22*24*27* 29*30*34* 38* 39* 145 7*5*8* 2*3* Root *16* 17*20*24*27* 29*30*34* 38* 39* 147 8*7* leaf is more than half full, Note: we must modify the parent’s key value. 22*

Case 1b: Delete 17: 2*3* Root *16* 17*20*22*24*27* 29*30*34* 38* 39* 145 7*5*8* 2*3* Root *16* 20*22*24*27* 29*30*34* 38* 39* 145 7*5*8* leaf is more than half full, Note: the first value of a node pointed to by the BEFORE pointer is removed. We must modify the parent of the parent’s key value here.

Case 2: Delete 20: 2*3* Root *16* 20*22*24*27* 29*30*34* 38* 39* 145 7*5*8* 2*3* Root *16* 30*34* 38* 39* 145 7*5*8* 22*24* 27 27*29* leaf is less than half full, redistribution: Borrow from sibling (adjacent node with same parent as leaf) Change key in parent If removed first value from before pointer, change key in parent’s parent

Case 3: Delete 24: 2*3* Root *16* 30*34* 38* 39* 145 7*5*8* 22*24* 27 27*29* 2*3* Root *16* 30*34* 38* 39* 145 7*5*8* 22*27*29* 2*3* Root *16* 30*34* 38* 39* 145 7*5*8* 22*27*29*  merge:  Merge with sibling  Delete key from parent of leaf  propagate merging up tree  If parent ends up with underflow,  Adopt from neighbor, update parent  If necessary, merge and delete from parent  If root ends up with only one child (not key, child!)  Insert the root key into the child  Make the child be the new root.

B+ Tree:  Both insertion and deletion work in log d (n)