B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree

Slides:



Advertisements
Similar presentations
Advanced Database Discussion B Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if.
Advertisements

Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
1 Tree-Structured Indexes Module 4, Lecture 4. 2 Introduction As for any index, 3 alternatives for data entries k* : 1. Data record with key value k 2.
B+-trees. Model of Computation Data stored on disk(s) Minimum transfer unit: a page = b bytes or B records (or block) N records -> N/B = n pages I/O complexity:
COMP 451/651 Indexes Chapter 1.
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
B+-tree and Hashing.
Chapter 9 of DBMS First we look at a simple (strawman) approach (ISAM). We will see why it is unsatisfactory. This will motivate the B+Tree Read 9.1 to.
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
Tree-Structured Indexes. Introduction v As for any index, 3 alternatives for data entries k* : À Data record with key value k Á Â v Choice is orthogonal.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
1 Indexing Structures for Files. 2 Basic Concepts  Indexing mechanisms used to speed up access to desired data without having to scan entire.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
1 B+ Trees. 2 Tree-Structured Indices v Tree-structured indexing techniques support both range searches and equality searches. v ISAM : static structure;
CS4432: Database Systems II
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Tree-Structured Indexes. Range Searches ``Find all students with gpa > 3.0’’ –If data is in sorted file, do binary search to find first such student,
CPSC 335 BTrees Dr. Marina Gavrilova Computer Science University of Calgary Canada.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
B-trees (Balanced Trees) A B-tree is a special kind of tree, similar to a binary tree. However, It is not a binary search tree. It is not a binary tree.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts B + -Tree Index Files Indexing mechanisms used to speed up access to desired data.  E.g.,
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Multi-way Trees. M-way trees So far we have discussed binary trees only. In this lecture, we go over another type of tree called m- way trees or trees.
INTRODUCTION TO MULTIWAY TREES P INTRO - Binary Trees are useful for quick retrieval of items stored in the tree (using linked list) - often,
B-Trees And B+-Trees Jay Yim CS 157B Dr. Lee.
COSC 2007 Data Structures II Chapter 15 External Methods.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
Starting at Binary Trees
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Chapter 12 B+ Trees CS 157B Spring 2003 By: Miriam Sy.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 9.
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
B+ Tree Index tuning--. overview B + -Tree Scalability Typical order: 100. Typical fill-factor: 67%. –average fanout = 133 Typical capacities (root at.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
1 Tree-Structured Indexes Chapter Introduction  As for any index, 3 alternatives for data entries k* :  Data record with key value k   Choice.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Content based on Chapter 10 Database Management Systems, (3 rd.
B+-Tree Deletion Underflow conditions B+ tree Deletion Algorithm
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Tree-Structured Indexes Chapter 10.
Database Applications (15-415) DBMS Internals- Part III Lecture 13, March 06, 2016 Mohammad Hammoud.
Tree-Structured Indexes. Introduction As for any index, 3 alternatives for data entries k*: – Data record with key value k –  Choice is orthogonal to.
COMP261 Lecture 23 B Trees.
Unit 9 Multi-Way Trees King Fahd University of Petroleum & Minerals
Multilevel Indexing and B+ Trees
Multilevel Indexing and B+ Trees
Multiway Search Trees Data may not fit into main memory
CS 728 Advanced Database Systems Chapter 18
Tree-Structured Indexes
COP Introduction to Database Structures
B+-Trees.
B+-Trees.
B+-Trees.
B+-Trees and Static Hashing
B+-Trees (Part 1).
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
Tree-Structured Indexes
Presentation transcript:

B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree Insertion in a B+ tree NOTE: SOME EXAMPLES IN THIS LECTURE ARE ADOPTED FROM INTERNET SOURCES

What is a B+ tree? A B+-tree of order M ≥ 3 is an M-ary tree with the following properties: Leaves contain data items or references to data items all are at the same depth each leaf has L/2 to L data or data references (L may be equal to, less or greater than M; but usually L << M) Internal nodes contain searching keys The keys in each node are sorted in increasing order each node has at least M/2 and at most M subtrees The number of search keys in each node is one less than the number of subtrees key i in an internal node is the smallest key in subtree i+1 Root can be a single leaf, or has 2 to M children Node are at least half-full, so that the tree will not degenerate into simple binary tree or even link list 2

The internal node structure of a B+ tree Each leaf node stores key-data pair or key-dataReference pair. Data or data references are in leaves only. Leaves form a doubly-linked list that is sorted in increasing order of keys. Each internal node has the following structure: j a1 k1 a2 k2 a3 … kj aj+1 j == number of keys in node. ai is a reference to a subtree. ki == smallest key in subtree ai+1 and > largest key in subtree ai. k1 < k2 < k3 < . . . < kj 3

What is a B+ tree? Example: A B+ tree of order M = 5, L = 5 Records or references to records are stored at the leaves, but we only show the keys here At the internal nodes, only keys (and references to subtrees) are stored Note: The index set (i.e., internal nodes) contains distinct keys 4

What is a B+ tree? Example: A B+ tree of order M = 4, L = 4 Note: For simplicity the doubly linked list references that join leaf nodes are omitted 5

Why B+ trees? Like a B-tree each internal node and leaf node is designed to fit into one I/O block of data. An I/O block usually can hold quite a lot of data. Hence, an internal node can keep a lot of keys, i.e., large M. This implies that the tree has only a few levels and only a few disk accesses can accomplish a search, insertion, or deletion. B+-tree is a popular structure used in commercial databases. To further speed up searches, insertions, and deletions, the first one or two levels of the B+-tree are usually kept in main memory. The reason that B+ trees are used in databases is, unlike B-trees, B+ trees support both equality and range-searches efficiently: Example of equality search: Find a student record with key 950000 Example of range search: Find all student records with Exam grade greater than 70 and less than 90

Why B+ trees ? (Cont’d) A B+ tree supports equality and range-searches efficiently Index Entries Data Entries ("Sequence set") (Direct search) 7

B+ Trees in Practice For a B+ tree of order M and L = M, with h levels of index, where h  1: The maximum number of records stored is n = (M – 1)h The space required to store the tree is O(n) Inserting a record requires O(logMn) operations in the worst case Finding a record requires O(logMn) operations in the worst case Removing a (previously located) record requires O(logMn) operations in the worst case Performing a range query with k elements occurring within the range requires O(logMn + k) operations in the worst case. Example for a B+ tree of order M = 134 and L = 133: A tree with 3 levels stores a maximum of 1333 = 2,352,637 records A tree with 4 levels stores a maximum of: 1334 = 312,900,700 records 8

Searching a B+ Trees Searching KEY: Start from the root If an internal node is reached: Search KEY among the keys in that node linear search or binary search If KEY < smallest key, follow the leftmost child reference down If KEY >= largest key, follow the rightmost child reference down If Ki <= KEY < Kj, follow the child reference between Ki and Kj If a leaf is reached: Search KEY among the keys stored in that leaf If found, return the corresponding record; otherwise report not found

Searching a B+ Trees In processing a query, a path is traversed in the tree from the root to some leaf node. If there are K search-key values in the file, the path is no longer than  logm/2(K). With 1 million search key values and m = 100, at most log50(1,000,000) = 4 nodes are accessed in a lookup. 10

Insertion in B+ Trees Suppose that we want to insert a key K and its associated record into the B+ tree. A B+ tree has two OVERFLOW CONDITIONS: A leaf-node overflows if after insertion it contains L + 1 keys A root-node or an internal node of a B+ tree of order M overflows if, after a key insertion, it contains M keys. Insertion algorithm: Search for the appropriate leaf node x to insert the key. Note: Insertion of a key always starts at a leaf node. If the key exists in the leaf node x, report an error, else Insert the key in its proper sorted order in the leaf node If the leaf does not overflow (If x contains less than L+1 keys after insertion), the insertion is done, else If a leaf node overflows, split it into two, COPY the smallest key y of right splinted node to the parent of the node (Records with keys < y go to the left leaf node. Records with keys >= y go to the right leaf node). If the parent overflows, split the parent into two (keys < middle key go to the left node. keys > middle key go to the right node. The middle key PROPAGATES to the parent of the splinted parent. The process propagates upward until a parent that does not overflow is reached or the root node is reached. If the root node is reached and it overflows, create a new root node. 11

Insertion in B+ Trees: No overflow Insert KEY: Search for KEY using search operation If the key is found in a leaf node report an error Insert KEY into that leaf If the leaf does not overflow (contains <= L keys), just insert KEY into it If the leaf overflows (contains L+1 keys), splitting is necessary An example of inserting O into a B+ tree of order M = 4, L = 3. Search for O; this leaf has 2 keys. Insert O and maintain the order. 12

Insertion in B+ Trees: Splitting a Leaf Node If the leaf overflows (contains L+1 keys after insertion), splitting is necessary Splitting leaf: Split it into 2 new leaves LeftLeaf and RightLeaf LeftLeaf has the (L+1) / 2 smallest keys RightLeaf has the remaining (L+1) / 2 keys Make a copy of the smallest key in RightLeaf, say MinKeyRight, to be the parent of LeftLeaf and RightLeaf [COPY UP] Insert MinKeyRight, together with LeftLeaf and RightLeaf, into the original parent node An example of inserting T into a B+ tree of order M = 4 and L= 3 Overflow Search for T; this leaf has 3 keys. 13

Insertion in B+ Trees: Splitting Leaf (Cont’d) xL xR Split the leaf (xL gets (L+1)/2 keys, xR gets  (L+1)/2) keys , takes the minimum key in xR be the parent of xL and XR. Make S the parent of the two new leaves, and insert S to the parent. Since the parent only has 2 keys (U, Y), we can insert the subtree rooted at S to it. Insert S into the parent. Maintain the order of keys and child references (DONE). 14

Insertion in B+ Trees: Splitting Internal Node An insertion in a full parent node causes the parent to overflow, in that case this internal node must be split. Splitting internal node: Split it into 2 new internal nodes LeftNode and RightNode LeftNode has the smallest M/2 -1 keys RightNode has the largest M/2 keys  NumberOfKeys in LeftNode <= NumberOfKeysInRightNode Note that the M/2 th key is not in either node. Make the M/2 th key, say “MiddleKey”, to be the parent of LeftNode and RightNode [PROPAGATE UP] Insert “MiddleKey”, together with LeftNode and RightNode, into the original parent node Splitting root: Follow exactly the same procedure as splitting an internal node “MiddleKey”, the parent of LeftNode and RightNode, is now set to be the root of the tree After splitting the root, the depth of the tree increases by 1 15

Insertion in B+ Trees An example of inserting M into a B+ tree of order M= 4 and L = 3 Search for M; this leaf has 3 keys. Insert M and B+ tree condition is violated. Split the leaf and distribute the keys. 16

Insertion in B+ Trees Split the leaf and distribute the keys. Make L the parent of the two new leaves. However, we cannot just insert L into the parent as it is already full. Insert L and its child references into the parent. 17

Insertion in B+ Trees xL xR Since the parent is not full, we can just insert the subtree rooted at J to the parent  Done. The key J becomes the parent of the two internal nodes. Insert J into the next parent. 18

Insertion in B+ Trees Insert 16 then 8 in the following B+ tree of order M = 5, L = 4: Note: A * in a leaf node key indicates a key-dataReference pair Root 13 17 24 30 2* 3* 5* 7* 8* 14* 15* 16* overflow! 2* 5* 7* 3* 17 24 30 13 8* One new child (leaf node) generated; must add one more reference to its parent, thus one more key value as well. 19

Insertion in B+ Trees Inserting 8* (cont.) Copy up the middle value (leaf split) 13 17 24 30 Entry to be inserted in parent node. 5 (Note that 5 is s copied up and continues to appear in the leaf.) 2* 3* 5* 7* 8* overflow! 5 13 17 24 30 20

Insertion in B+ Trees 5 13 17 24 30 We split this node, redistribute entries evenly, and propagate up middle key.  appears once in the index. Contrast Entry to be inserted in parent node. this with a leaf split.) 5 24 30 17 13 (Note that 17 is pushed up and only 21

Insertion in B+ Trees Root 17 5 13 24 30 2* 3* 5* 7* 8* 14* 15* 19* 20* 22* 24* 27* 29* 33* 34* 38* 39* Notice that root was split, leading to increase in height. 22

Inserting a Data Entry into a B+ Tree: Summary Find correct leaf X. Put data entry onto X. If X has enough space, done! Else, must split X (into X and a new node X2) Redistribute entries evenly, put middle key in X2 copy up middle key. Insert reference (index entry) refering to X2 into parent of X. This can happen recursively To split index node, redistribute entries evenly, but push (propagate) up middle key. (Contrast with leaf splits.) Splits “grow” tree; root split increases height. Tree growth: gets wider or one level taller at top. 23