CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.

Slides:



Advertisements
Similar presentations
Advanced Database Discussion B Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if.
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 18 Indexing Structures for Files.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
COMP 451/651 Indexes Chapter 1.
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
Other time considerations Source: Simon Garrett Modifications by Evan Korth.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
CS 206 Introduction to Computer Science II 12 / 01 / 2008 Instructor: Michael Eckmann.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
CS 206 Introduction to Computer Science II 11 / 24 / 2008 Instructor: Michael Eckmann.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
Data Structures Using C++ 2E Chapter 11 Binary Trees and B-Trees.
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
E.G.M. PetrakisB-trees1 Multiway Search Tree (MST)  Generalization of BSTs  Suitable for disk  MST of order n:  Each node has n or fewer sub-trees.
1 CS 728 Advanced Database Systems Chapter 17 Database File Indexing Techniques, B- Trees, and B + -Trees.
CS4432: Database Systems II
Splay Trees and B-Trees
CPSC 335 BTrees Dr. Marina Gavrilova Computer Science University of Calgary Canada.
1 Multiway trees & B trees & 2_4 trees Go&Ta Chap 10.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 6.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
CSE AU B-Trees1 B-Trees CSE 373 Data Structures.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
INTRODUCTION TO MULTIWAY TREES P INTRO - Binary Trees are useful for quick retrieval of items stored in the tree (using linked list) - often,
COSC 2007 Data Structures II Chapter 15 External Methods.
1 Chapter 10 Trees. 2 Definition of Tree A tree is a set of linked nodes, such that there is one and only one path from a unique node (called the root.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Starting at Binary Trees
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Binary Search Trees (BSTs) 18 February Binary Search Tree (BST) An important special kind of binary tree is the BST Each node stores some information.
Week 10 - Friday.  What did we talk about last time?  Graph representations  Adjacency matrix  Adjacency lists  Depth first search.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Internal and External Sorting External Searching
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
CMSC 202, Version 5/02 1 Trees. CMSC 202, Version 5/02 2 Tree Basics 1.A tree is a set of nodes. 2.A tree may be empty (i.e., contain no nodes). 3.If.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Multiway Search Trees Data may not fit into main memory
Extra: B+ Trees CS1: Java Programming Colorado State University
B+-Trees.
B+-Trees.
B+-Trees.
B+ Tree.
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
B- Trees D. Frey with apologies to Tom Anastasio
B- Trees D. Frey with apologies to Tom Anastasio
CMSC 202 Trees.
B- Trees D. Frey with apologies to Tom Anastasio
CSE 373: Data Structures and Algorithms
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
B-Trees.
Presentation transcript:

CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio

Large Tree Tailored toward applications where tree doesn’t fit in memory –operations much faster than disk accesses –want to limit levels of tree (because each new level requires a disk access) –keep root and top level in memory

Textbook Errors Please check the textbook web page for typos and other errors. In particular, the section on B-Trees (4.7) has a couple of typos (pages 166 and 167) Page 166 – numbered item 5 right margin should be “…and L data items…”, not “…L children...” Page 167 – ½ way down, left margin, change “and the first level” to “and the next level”

An alternative to BSTs Up until now we assumed that each node in a BST stored the data. What about having the data stored only in the leaves? The internal nodes just guide our search to the leaf which contains the data we want. We’ll restrict this discussion of such trees to those in which all leaves are at the same level.

Figure 1 - A BST with data stored in the leaves

Observations Store data only at leaves; all leaves at same level –interior and exterior nodes have different structure –interior nodes store one key and two subtree pointers –all search paths have same length:  lg n  –can store multiple data elements in a leaf

M-Way Trees A generalization of the previous BST model –each interior node has M subtrees pointers and M-1 keys the previous BST would be called a “2-way tree” or “M-way tree of order 2” –as M increases, height decreases:  lg M n  –perfect M-way tree of height h has M h leaves

An M-way tree of order 3 Figure 2 (next page) shows the same data as figure 1, stored in an M-way tree of order 3. In this example M = 3 and h = 2, so the tree can support 9 leaves, although it contains only 8. One way to look at the reduced path length with increasing M is that the number of nodes to be visited in searching for a leaf is smaller for large M. We’ll see that when data is stored on the disk, each node visited requires a disk access, so reducing the nodes visited is essential.

Figure 2 -- An M-Way tree of order 3

Searching in an M-way tree Different from standard BST search –search always terminates at a leaf node –might need to scan more than one element at a leaf –might need to scan more than one key at an interior node Trade-offs –tree height decreases as M increases –computation at each node during search increases as M increases

Searching an M-way tree Search (MWayNode *v, DataType element, bool& foundIt) if v == NULL return failure if v is a leaf search the list of values looking for element if found, return success otherwise return failure else if v is an interior node search the keys to find which subtree element is in recursively search the subtree For “real” code, see Dr. Anastasio’s postscript notes

Figure 3 – searching in an M-way tree of order 4

Is it worth it? Is it worthwhile to reduce the height of the search tree by letting M increase? Although the number of nodes visited decreases, the amount of computation at each node increases. Where’s the payoff?

An example Consider storing 10 7 items in a balanced BST and in an M-way tree of order 10. The height of the BST will be lg(10 7 ) ~ 24. The height of the M-Way tree will be log(10 7 ) = 7. However, in the BST, just one comparison will be done at each interior node, but in the M-Way tree, 9 will be done (worst case)

How can this be worth the price? Only if it somehow takes longer to descend the tree than it does to do the extra computation This is exactly the situation when the nodes are stored externally (e.g. on disk) Compared to disk access time, the time for extra computation is insignificant We can reduce the number of accesses by sizing the M-way tree to match the disk block and record size. See Weiss text, section 4.7, page 165 for an example.

A generic M-Way Tree Node template class MWayNode { public: // constructors, destructor, accessors, mutators private: bool isLeaf;// true if node is a leaf int m;// the “order” of the node intnKeys;// nr of actual keys used Ktype *keys;// array of keys (size = m - 1) MWayNode *subtrees;// array of pts (size = m) intnElems;// nr possible elements in leaf List data;// data storage if leaf };

B-Tree Definition A B-Tree of order M is an M-Way tree with the following constraints 1.The root is either a leaf or has between 2 and M subtrees 2.All interior node (except maybe the root) have between  M / 2  and M subtrees (I.e. each interior node is at least “half full” 3.All leaves are at the same level. A leaf must store between  L / 2  and L data elements, where L is a fixed constant >= 1 (I.e. each leaf is at least half full, except when the tree has fewer than L/2 elements)

A B-Tree example The following figure (also figure 3) shows a B- Tree with M = 4 and L = 3 The root node can have between 2 and M=4 subtrees Each other interior node can have between  M / 2  =  4 / 2  = 2 and M = 4 subtrees and up to M – 1 = 3 keys. Each exterior node (leaf) can hold between  L / 2  =  3 / 2  = 2 and L = 3 data elements

Figure 4 – A B-Tree with M = 4 and L = 3

Designing a B-Tree Recall that M-way trees (and therefore B-trees) are often used when there is too much data to fit in memory. Therefore each node and leaf access costs one disk access. When designing a B-Tree (choosing the values of M and L), we need to consider the size of the data stored in the leaves, the size of the key and pointers stored in the interior nodes and the size of a disk block

Student Record Example Suppose our B-Tree stores student records which contain name, address, etc. and other data totaling 1024 bytes. Further assume that the key to each student record (ssn??) is 8 bytes long. Assume also that a pointer (really a disk block number, not a memory address) requires 4 bytes And finally, assume that our disk block is 4096 bytes

Calculating L L is the number of data records that can be stored in each leaf. Since we want to do just one disk access per leaf, this is the same as the number of data records per disk block. Since a disk block is 4096 and a data record is 1024, we choose L =  4096 / 1024  = 4 data records per leaf.

Calculating M Each interior node contains M pointers and M-1 keys. To maximize M (and therefore keep the tree flat and wide) and yet do just one disk access, we have the following relationship 4M + 8 ( M – 1) <= M <= 4014 M <= 342 So let’s choose a nice round number like M = 300

Performance of our B-Tree With M = 300 the height of our tree for N students will be  log 300 (N) . For example, with N = 100,000 (about 10 times the size of UMBC student population) the height of the tree with M = 300 would be no more than 3, because  log 300 (100000)  = 2.5 So any student record can be found in 3 disk accesses.

Insertion of X in a B-Tree Search to find which leaf X belongs in. If leaf has room (fewer than L elements), add it (and write back to disk). If leaf full, split into two leaves, each with half of elements. (write new leaves to disk) –Update the keys in the parent –if parent was already full, split in same manner –splits may propagate all the way to the root, in which case, the root is split (this is how the tree grows in height)

Insert 33 into this B-Tree Figure 5 – before inserting 33

Inserting 33 Traversing the tree from the root, we find that 33 is less than 36 and is greater than 33, leading us to the 2 nd subtree. Since 32 is greater than 32 we are led to the 3 rd leaf (the one containing 32 and 34). Since there is room for an additional data item in the leaf it is inserted (in sorted order which means reorganizing the leaf)

After inserting Figure 6 – after inserting 33

Now insert 35 This item also belongs in the 3 rd leaf of the 2 nd subtree. However, that leaf is full. Split the leaf in two and update the parent to get the tree in figure 7.

After inserting Figure 7 – after inserting

Inserting 21 This item belongs in the 4 th leaf of the 1 st subtree (the leaf containing 18, 19, 20). Since the leaf is full, we split it and update the keys in the parent. However, the parent is also full, so it must be split and its parent (the root) updated. But this would give the root 5 subtrees which is not allowed, so the root must also be split. This is the only way the tree grows in height

After inserting Figure 8 – after inserting

B-tree Deletion Find leaf containing element to be deleted. If that leaf is still full enough (still has  L / 2  elements after remove) write it back to disk without that element. Then change the key in the ancestor if necessary. If leaf is now too empty (has less than  L / 2  elements), borrow an element from a neighbor. –If neighbor would be too empty, combine two leaves into one. –This combining requires updating the parent which may now have too few subtrees. –If necessary, continue the combining up the tree