Download presentation
Presentation is loading. Please wait.
1
Chap 9. Multilevel Indexing and B-Trees
Chap9. B-trees File Structures by Folk, Zoellick, and Ricarrdi Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structure SNU-OOPSLA Lab 객체지향시스템연구실
2
Chap9. B-trees Chapter Objectives(1) Place the development of B-trees in the historical context of the problems they were designed to solve Look briefly at other tree structures that might be used on secondary storage, such as paged AVL trees Introduce multirecord and multilevel indexes and evaluate the speed of the search operation Provide an understanding of the important properties possessed by B-trees, and show how these properties are especially well suited to secondary storage applications File Structure SNU-OOPSLA Lab 객체지향시스템연구실
3
Chapter Objectives(2) Present the object-oriented design of B-trees
define class BTreeNode and Btree Explain the implementation of the fundamental operations on B-trees Introduce the notion of page buffering and virtual B-trees Describe variations of the fundamental B-trees algorithms, such as those used to build B* trees and B-trees with variable-length records File Structure SNU-OOPSLA Lab
4
Contents(1) 9.1 Introduction 9.2 Statement of the Problem
Chap9. B-trees Contents(1) 9.1 Introduction 9.2 Statement of the Problem 9.3 Indexing with Binary Search Trees : AVL Trees, Paged Binary Trees, Problems with Paged Tress 9.4 Multilevel Indexing 9.5 B-Trees 9.6 Example of Creating a B-Tree 9.7 An Object-Oriented Representation of B-Trees : Class BTreeNode , Class BTree File Structure SNU-OOPSLA Lab 객체지향시스템연구실
5
Contents(2) 9.8 B-Tree Methods Search, Insert, and Others
B-Tree Nomenclature Formal Definition of B-Tree Properties Worst-case Search Depth Deletion, Merging, and Redistribution Redistribution During Insertion B* Trees Buffering of Pages : Virtual B-Trees Variable-Length Records and Keys File Structure SNU-OOPSLA Lab
6
Introduction: The Invention of the B-tree
9.1 Introduction : Invention of the B-Tree Introduction: The Invention of the B-tree 1972 Acta Infomatica : R. Bayer and E. McCreight (at Boeing Corporation) “Organization and Maintenance of Large Ordered Indexes” 1979 : ‘de facto’ standard for database index D.Comer “The Ubiquitous B-tree” ACM Computing Survey Why the name B-tree? Balanced, Bushy, Broad, Boeing, Bayer Retrieval, Insertion, Deletion time = log K I ( I : no of indexes in file, K : no of indexes in a page) Excellent for dynamically changing random access files File Structure SNU-OOPSLA Lab
7
Statement of the Problem
Problems in an index on secondary storage Searching the index must be faster than binary searching In binary search: 15 items - 4 seeks, 1,000 items seeks Insertion and deletion must be as fast as search inserting a key may involve moving many other keys in some file structures File Structure SNU-OOPSLA Lab
8
Binary Search Tree(1) Advantages Disadvantages
9.3 Indexing with Binary Search Trees Binary Search Tree(1) Advantages Data may not be physically sorted Good performance on balanced tree Insert cost = search cost Disadvantages In out-of-balance binary tree, more seeks are required File Structure SNU-OOPSLA Lab
9
Binary Search Tree(2) Sorted list of keys KF FB CL HN SD PA WS DE FT
9.3 Indexing with Binary Search Trees Binary Search Tree(2) Sorted list of keys AX, CL, DE, FB, FT, HN, JD, KF, NR, PA, RF, SD, TK, YJ KF FB CL HN SD PA WS DE FT JD NR RF TK YJ AX At most 4 seeks/one record Binary search tree representation File Structure SNU-OOPSLA Lab
10
Internal Representation of Binary Tree
9.3 Indexing with Binary Search Trees Internal Representation of Binary Tree With RRN(fixed length record) or pointer ROOT FB JD RF SD AX YJ PA HN KF CL NR DE WS TK 1 2 3 4 5 6 7 8 9 10 11 12 13 FT 14 key left right key left right File Structure SNU-OOPSLA Lab
11
Unbalanced Binary Tree
9.3 Indexing with Binary Search Trees Unbalanced Binary Tree KF FB CL HN SD PA WS AX DE FT JD NR RF TK LV LA NP MB ND NK YJ - At most 9 seeks/one record - Worst case : sequential search File Structure SNU-OOPSLA Lab
12
9.3 Indexing with Binary Search Trees
AVL Tree(1) A height-balanced k tree ( HB(k) tree) Allowable difference in the height of any two sub-tree is k AVL Tree : HB(1) Tree G.M. Adel’son, Vel’skii, E.M. Landis Maintenance overhead is needed Performance Given N keys, worst-case search => 1.44 log2(N+2) cf. Completely balanced AVL tree : worst-case search => log2(N+1) File Structure SNU-OOPSLA Lab
13
AVL Tree(2) X X X X 9.3 Indexing with Binary Search Trees
(a) AVL Trees (b) Non - AVL Trees X X X X File Structure SNU-OOPSLA Lab
14
9.3 Indexing with Binary Search Trees
AVL Tree(3) Binary tree structure that is balanced nature with respect to the height of subtree Definition An empty tree is height balanced If T is a nonempty binary tree with TL and TR as its left and right subtrees, then T is height balanced iff (1) TL and TR are height balanced and (2) |hL-hR|<1 where hL and hR are the heights of TL and TR, respectively File Structure SNU-OOPSLA Lab
15
9.3 Indexing with Binary Search Trees
AVL Tree(4) BalanceFactor, BF(T), of a node T in a binary tree is hL-hR where hL and hR are the height of the left and right subtree of T For any node in tree T in AVL tree, BF(T) should be one of “ -1, 0, 1” If BF(T) is -2 or 2, then proper rotation is carried out in order to get balance File Structure SNU-OOPSLA Lab
16
AVL Tree(5) 9.3 Indexing with Binary Search Trees New Identifier
MARCH After Insertion No Rebalancing needed MAR New Identifier MAY No Rebalancing needed After Insertion -1 MAR MAY New Identifier NOVEMBER After Rebalancing After Insertion -2 MAR MAY RR -1 MAY MAR NOV NOV File Structure SNU-OOPSLA Lab
17
AVL Tree(6) 9.3 Indexing with Binary Search Trees New Identifier
AUGUST After Insertion No Rebalancing needed +1 MAY +1 MAR NOV AUG File Structure SNU-OOPSLA Lab
18
AVL Tree(7) 9.3 Indexing with Binary Search Trees New Identifier
APRIL After Insertion After Rebalancing +2 MAY LL +1 MAY +2 MAR NOV AUG NOV +1 AUG APR MAR APR File Structure SNU-OOPSLA Lab
19
AVL Tree(8) 9.3 Indexing with Binary Search Trees New Identifier
JANUARY After Rebalancing After Insertion +2 MAY LR MAR -1 AUG NOV AUG -1 MAY APR +1 MAR APR JAN NOV JAN File Structure SNU-OOPSLA Lab
20
AVL Tree(9) 9.3 Indexing with Binary Search Trees New Identifier
DECEMBER After Insertion No Rebalancing needed +1 MAR -1 AUG -1 MAY APR +1 JAN NOV DEC File Structure SNU-OOPSLA Lab
21
AVL Tree(10) 9.3 Indexing with Binary Search Trees New Identifier
JULY After Insertion No Rebalancing needed +1 MAR -1 AUG -1 MAY APR JAN NOV DEC JUL File Structure SNU-OOPSLA Lab
22
AVL Tree(11) 9.3 Indexing with Binary Search Trees New Identifier
FEBRUARY After Insertion After Rebalancing +2 MAR RL +1 MAR -2 AUG -1 MAY DEC -1 MAY APR +1 JAN NOV +1 AUG JAN NOV -1 DEC JUL APR FEB JUL FEB File Structure SNU-OOPSLA Lab
23
AVL Tree(12) 9.3 Indexing with Binary Search Trees New Identifier
JUNE After Insertion After Rebalancing LR JAN +2 MAR +1 DEC MAR -1 DEC -1 MAY +1 AUG +1 AUG -1 JAN FEB -1 JUL -1 MAY NOV APR APR JUN -1 NOV FEB -1 JUL JUN File Structure SNU-OOPSLA Lab
24
AVL Tree(13) 9.3 Indexing with Binary Search Trees New Identifier
OCTOBER After Insertion After Rebalancing -1 JAN RR +1 DEC -1 MAR +1 AUG FEB -1 JUL -2 MAY APR JUN -1 NOV OCT File Structure SNU-OOPSLA Lab
25
AVL Tree(14) 9.3 Indexing with Binary Search Trees JAN +1 DEC MAR +1
JAN +1 DEC MAR +1 AUG FEB -1 JUL NOV APR JUN MAY OCT File Structure SNU-OOPSLA Lab
26
AVL Tree(15) 9.3 Indexing with Binary Search Trees New Identifier
SEPTEMBER After Insertion No Rebalancing needed -1 JAN +1 DEC -1 MAR +1 AUG -1 NOV FEB -1 JUL APR JUN MAY -1 OCT SEP File Structure SNU-OOPSLA Lab
27
AVL Tree : Rebalancing(1)
9.3 Indexing with Binary Search Trees AVL Tree : Rebalancing(1) Rebalancing is carried out using four different kinds of rotations LL when new node Y is inserted in the left subtree of the left subtree of A LR when new node Y is inserted in the right subtree of the left subtree of A RR when new node Y is inserted in the right subtree of the right subtree of A RL when new node Y is inserted in the left subtree of the right subtree of A File Structure SNU-OOPSLA Lab
28
AVL Tree : Rebalancing(2)
9.3 Indexing with Binary Search Trees AVL Tree : Rebalancing(2) A Insert Y LL LR RL RR File Structure SNU-OOPSLA Lab
29
AVL Tree : Rebalancing(LL)
9.3 Indexing with Binary Search Trees AVL Tree : Rebalancing(LL) Unbalanced following insertion Balanced Subtree Balanced Subtree rotation type LL B +1 A +2 A BL A B AR B AR h+2 h+2 h BR AR BL BR BL BR Height of BL increase to h+1 (BL < B < BR < A < AR) File Structure SNU-OOPSLA Lab
30
AVL Tree : Rebalancing(RR)
9.3 Indexing with Binary Search Trees AVL Tree : Rebalancing(RR) Unbalanced following insertion Balanced Subtree Balanced Subtree rotation type RR B -1 A -2 A BR A B B AL AL h+2 h+2 BL BL BR BL BR Al Height of BR increase to h+1 (AL < A < BL < B < BR) File Structure SNU-OOPSLA Lab
31
AVL Tree : Rebalancing(LR)
9.3 Indexing with Binary Search Trees AVL Tree : Rebalancing(LR) Balanced Subtree Unbalanced following insertion Balanced Subtree rotation type LR(a) +1 A C +1 A B B -1 B A C (B < C < A) File Structure SNU-OOPSLA Lab
32
(BL < B < CL < C < CR < A < AR)
9.3 Indexing with Binary Search Trees AVL Tree : Rebalancing(LR) Unbalanced following insertion Balanced Subtree Balanced Subtree rotation type LR(b) C +1 A +2 A h+2 B -1 A B -1 B AR AR h+2 C h +1 C BL CL CR AR BL BL h h CL CR CL CR h-1 (BL < B < CL < C < CR < A < AR) File Structure SNU-OOPSLA Lab
33
RL a, b and c are symmetric to LR a, b and c
9.3 Indexing with Binary Search Trees AVL Tree : Rebalancing(LR) Unbalanced following insertion Balanced Subtree Balanced Subtree rotation type LR(c) C +1 A +2 A h+2 +1 B A B -1 B AR AR h+2 C h -1 C BL CL CR AR BL BL h CL CR CL CR h-1 RL a, b and c are symmetric to LR a, b and c File Structure SNU-OOPSLA Lab
34
Paged Binary Tree(1) 9.3 Indexing with Binary Search Trees Page
A unit of disk I/O for handling seek and transfer of disk data Typically, 4k, 8k, 16k ... Paged Binary Tree Divide a binary tree into pages and then store each page in a block of contiguous locations on disk. If every page holds 7 keys, 511 nodes(keys) in only three seeks Performance : # of seeks A completely full balanced tree : log2 (N+1) A completely full paged tree : log(k+1) (N+1) (k : # of keys hold in a single page) File Structure SNU-OOPSLA Lab
35
Paged Binary Tree(2) 9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab
36
The Problem with Paged Trees
9.3 Indexing with Binary Search Trees The Problem with Paged Trees Only valid when we have the entire set of keys in hand before the tree is built Problems due to out of balance How to select a good separator How to group keys How to guarantee the maximum loading B-tree provides a solution for above problems! File Structure SNU-OOPSLA Lab
37
Paged Binary Tree (Out of balance)
9.3 Indexing with Binary Search Trees Paged Binary Tree (Out of balance) I P X G E H D C A B F M S U T W V Y K N R O Q J L Z random input sequence : C S D T A M P I B W N G U R K E H O L J Y Q Z F X V File Structure SNU-OOPSLA Lab
38
Multilevel Indexing Approach as simple index record
9.4 Multilevel Indexing : A Better Approach to Tree Indexes Multilevel Indexing Approach as simple index record limited on the number of keys allowed Approach as multirecord index consists of a sequence of simple index records binary search is too expensive Approach as multilevel index reduced the number of records to be searched speed up the search <example> 80Mbytes file of 8,000,000 records 10-byte keys File Structure SNU-OOPSLA Lab
39
Example of Multilevel Indexing
9.4 Multilevel Indexing : A Better Approach to Tree Indexes Example of Multilevel Indexing 4th level index 1 a single index record with 8 keys 3rd level index 1 2 : 8 : 8 index records to index the largest keys in the 800 second-level records 2nd level index 1 2 : 9 800 : 800 index records with 80,000 keys choose one of the keys in each index record as the key of that whole record Lowest level index is an index to data file and its reference fields are record addresses in the data file File Structure SNU-OOPSLA Lab
40
Multi-level Indexing(3)
How can we insert new keys into the multilevel index? The index records in some level might be full The several levels of indexes might be rebuilt Overflow chain may be helpful, but still ugly Multi-level index structure is not strong in dynamic data processing applications B-tree will give you the right solution! File Structure SNU-OOPSLA Lab
41
B-Trees: Working up from the bottom
Bayer and McCreight, 1972, Acta Infomatica Build trees upward from the bottom instead of downward from the top Each node of B-tree is an index record which consists of “key-reference” pairs The order of B-tree: the max number of key-reference pairs Every index record should have at least half of the order File Structure SNU-OOPSLA Lab
42
Sample B-Tree T D P M P T A C D I S I M D A C D S T D P D
File Structure SNU-OOPSLA Lab
43
Splitting & Promoting(1)
9.6 Example of Creating a B-Tree Chap9. B-trees Splitting & Promoting(1) Splitting Creation of two nodes out of one because the original node becomes overfull Result in the need to promote a key to a higher-level node to provide an index separating the two new nodes Promotion of a key Movement of a key from one node into a higher-level node when split occurs File Structure SNU-OOPSLA Lab 객체지향시스템연구실
44
Splitting & Promoting(2)
9.6 Example of Creating a B-Tree Chap9. B-trees Splitting & Promoting(2) * A B C D E F G J Initial leaf of a B-tree with a page size of seven Splitting the leaf to accommodate the new J key Insert J key (continued....) File Structure SNU-OOPSLA Lab 객체지향시스템연구실
45
Splitting & Promoting(3)
9.6 Example of Creating a B-Tree Splitting & Promoting(3) * A B C D E F G Promotion of the E key into a root node J J File Structure SNU-OOPSLA Lab
46
Insertion in B-tree(1) Input Sequence
9.6 Example of Creating a B-Tree Insertion in B-tree(1) Input Sequence : C S D T A M P I B W N G U R K E H O L J Y Q Z F X V D T C D S T Insertion of C, S, D, T into the initial page D A C S T Insertion of A causes node to split and the largest key in each leaf node(D and T)to be placed in the root node File Structure SNU-OOPSLA Lab
47
Insertion in B-tree(2) T D P M P T A C D I S
9.6 Example of Creating a B-Tree Insertion in B-tree(2) T D P M P T A C D I S M and P are inserted into the rightmost leaf node, then insertion of I causes it to split File Structure SNU-OOPSLA Lab
48
Insertion in B-tree(3) P W D M D W A B C G I M N P S T
9.6 Example of Creating a B-Tree Insertion in B-tree(3) P W D M D W A B C G I M N P S T Insertions of B,W,N, and G into leaf nodes causes another split and the root is now full File Structure SNU-OOPSLA Lab
49
Insertion in B-tree(4) P W D M D U W A B C G I M N P S T
9.6 Example of Creating a B-Tree Insertion in B-tree(4) P W D M D U W A B C G I M N P S T Insertion of U proceeds without incident, but R would have to be inserted into the rightmost leaf, which is full File Structure SNU-OOPSLA Lab
50
Insertion in B-tree(5) 9.6 Example of Creating a B-Tree P D P T A C D
W D M P T W A B C D G I M N P R S T U W Insertion of causes the rightmost leaf node to split, insertion into the root to split and the tree grows to level three File Structure SNU-OOPSLA Lab
51
Insertion in B-tree(6) 9.6 Example of Creating a B-Tree P Z D I M P T
Q R S T U W Y Z A B C D E G M I J K L M N O P Insertions of K,E,H,O,L,J,Y,Q, and Z, continue with another node split File Structure SNU-OOPSLA Lab
52
Insertions of F, X, and V finish the insertion of the alphabet
9.6 Example of Creating a B-Tree Insertion in B-tree(7) I P Z D G I M P T X Z Y Z A B C D J K L M Q R S T E F G H I N O P U V W X Insertions of F, X, and V finish the insertion of the alphabet File Structure SNU-OOPSLA Lab
53
Insertion in B-trees Major components of insertion Split the node
Promote the middle key Increase the height of the B-tree Insertion may touch no more than 2 nodes per level Insertion cost is strictly linear in the height of the tree File Structure SNU-OOPSLA Lab
54
Class BTreeNode(1) Represent B-Tree nodes in memory
9.7 An Object-Oriented Representation of B-Trees Class BTreeNode(1) Represent B-Tree nodes in memory B-tree is an index file associated with a data file Specified in btnode.h of Appendix I The template BTreeNode class based on the SimpleIndex template class SimpleIndex Class BTreeNode Class Public methods Insert, Remove, Clear, Search Print, NumKeys Insert, Remove, LargestKey Split, Pack, Unpack File Structure SNU-OOPSLA Lab
55
Class BTreeNode(2) Members
9.7 An Object-Oriented Representation of B-Trees Class BTreeNode(2) Members Public methods: insert : simply calls SimpleIndex::Insert and then check for overflow remove a key, split and merge nodes search : inherited from SimpleIndex class(works perfectly well) pack/unpack : manage the difference between the memory and the disk representation of BTreeNode objects Protected member store the file address of the node and the minimum and maximum number of keys File Structure SNU-OOPSLA Lab
56
Template <class keyType>
class BTreeNode: public SimpleIndex <keyType> {public BTreeNode(int maxKeys, int unique = 1); int Insert (const keyType key, int recAddr); int Remove(const keyType key, int recAddr = -1); int LargestKey (); int Split (BTreeNode<ketType>*newNode); int Pack (IOBuffer& buffer); int Unpack(IOBuffer& buffer); protected int MaxBKeys; int Init(); friend class Btree<keyType>; } File Structure SNU-OOPSLA Lab
57
Class BTree 9.7 An Object-Oriented Representation of B-Trees
Uses in-memory BTreeNode objects adds the file access portion enforces the consistent size of the nodes specified in btree.h of Appendix I Methods Create, Open, Close a B-Tree Search, Insert, Remove key-reference pairs Protected area Fetch(transfer nodes from disk to memory) Store(transfer nodes back to disk) root node, height of the tree, file of index records BTNode **Node:used to keep a collection of tree nodes in memory and reduce disk access File Structure SNU-OOPSLA Lab
58
Template <class keyType> class Btree {public:
Btree(int order, int keySize=sizeof(keyType), int unique=1); int Open (char * name, int mode); int Create (char * name, int mode); int Close (); int Insert (const keyType key, const int recAddr); int Remove (const ketType key, const int recAddr = -1); int Search (const keyType key, const int recAddr = -1); protected typedef BTreeNode<keyType> BTNode; BTNode * FindLeaf (const ketType key); BTNode * Fetch(const int recaddr); int Store (BTNode *); BTNode Root; int Height; int Order; BTNode ** Nodes; RecordFile<BTNode> BtreeFile; }| File Structure SNU-OOPSLA Lab
59
Page Structure A B G N P D M C I W U S T
9.8 B-Tree Methods Search, Insert, and Others Page Structure 2 A B G N P D M C I W U S T 3 8 5 KEYCOUNT KEY array CHILD array Page 2 4 D M P W content of PAGE 2, 3 Page 3 G I M Nil Nil Nil Nil 3 File Structure SNU-OOPSLA Lab
60
Algorithm for Search Searching procedure iterative work in two stages
9.8 B-Tree Methods Search, Insert, and Others Algorithm for Search Searching procedure iterative work in two stages operating alternatively on entire pages (Class BTree) and then within pages (Class BTreeNode) Step1: Loading a page into memeory Step 2: Searching through a page, looking for the key along the tree until it reaches the leaf level File Structure SNU-OOPSLA Lab
61
Search and FindLeaf method
9.8 B-Tree Methods Search, Insert, and Others Search and FindLeaf method Specifications of Search and FindLeaf methods(Fig 9.18) Template <class keyType> int BTree<keyType>::Search(const keyType key, const int recAddr) template <class keyType> BTreeNode<keyType>* BTree<keyType>::FindLeaf(const keyType key) Search method recAddr = btree.Search(‘L’) call FindLeaf(‘L’); Search key in the leaf node, and then if key exists, return the data file address of record with key ‘L’ otherwise, return -1 FindLeaf method Search down to leafNode, beginning of the root return the address of leafNode File Structure SNU-OOPSLA Lab
62
Algorithm for Insertion(1)
9.8 B-Tree Methods Search, Insert, and Others Algorithm for Insertion(1) Observations of Insertion, Splitting, and Promotion proceed all the way down to the leaf level after finding the insertion location at the leaf level, the work proceeds upward from the bottom Iterative procedure as having three phases Search to the leaf level, using FindLeaf method Insertion, overflow detection, and splitting on the upward path Creation of a new root node, if the current root was split File Structure SNU-OOPSLA Lab
63
Algorithm for Insertion(2)
9.8 B-Tree Methods Search, Insert, and Others Algorithm for Insertion(2) With no redistribution (Step 1) Locate node on bottom most level in which to insert record. Location is determined by key search. (Step 2) If vacant record slot is available, insert the record so that key sequencing is maintained. Then, update the pointer associated with the record (Pointer is null for level 0 records). Then Stop! (Step 3) If no vacant record slot exists, identify median record. All records and pointers to the left of the median records are stored in one node (the original) and those to the right are stored in another node(the new node). File Structure SNU-OOPSLA Lab
64
Algorithm for Insertion(3)
9.8 B-Tree Methods Search, Insert, and Others Algorithm for Insertion(3) (Step 4) If the topmost node was split, create a new topmost node which contains the median record identified in Step 3, filled with pointers to the original and split nodes. Update the root node to point to the new topmost node. Then Stop! (Step 5) If topmost node was not split, prepare to insert median record identified in Step 3 and a pointer to the new node (created in Step 3). Then Goto Step 2. Note : Step 4 makes B-tree increase in height by 1 level B-trees have 70% occupancy(like B+-trees) on an average File Structure SNU-OOPSLA Lab
65
Insertion Example 9.8 B-Tree Methods Search, Insert, and Others
3 4 19 20 Insert 1 1 2 split 13 16 9 Insert 9 File Structure SNU-OOPSLA Lab
66
Create, Open, and Close Specified in btree.tc of Appendix I
9.8 B-Tree Methods Search, Insert, and Others Create, Open, and Close Specified in btree.tc of Appendix I Method Create writes the empty root node into the file BTreeFile so that its first record is reserved for that root node Method Open opens BTreeFile and load the root node into memory from the first record in the file Method Close simply stores the node into BTreeFile and close it File Structure SNU-OOPSLA Lab
67
9.9 B-Tree Nomenclature B-Tree Nomenclature Be aware that terms are not uniform in the literature Definitions are also quite different In fact, there are a number of B-tree variations This text book uses “B tree” for B+ tree by other books In this book, “B+ tree” is B+ tree with a linked list of sorted data blocks File Structure SNU-OOPSLA Lab
68
Root C G A B E F H I Data Block Data Block Data Block Data Block
Other Book Our Book B-Tree N/A C G A B E F H I Data Block Data Block Data Block Data Block File Structure SNU-OOPSLA Lab
69
Root C G I A B C E F G H I Data Block Data Block Data Block Data Block
Other Book Our Book B+-Tree B-Tree C G I A B C E F G H I Data Block Data Block Data Block Data Block File Structure SNU-OOPSLA Lab
70
Root C G I A B C E F G H I Data Block Data Block Data Block Data Block
Other Book Our Book B+-Tree with Linked List C G I A B C E F G H I Data Block Data Block Data Block Data Block File Structure SNU-OOPSLA Lab
71
Another aspect (node structures) Homogeneous Trees :B-Tree in other text
Homogeneous trees - leaf nodes and interior nodes have same structures; Each contains both data pointers and tree pointers Average search length less for homogeneous trees, because some searches may conclude before reaching a leaf node File Structure SNU-OOPSLA Lab.
72
B-Tree in other text 23 pointers to 23 records in data file 37 64 45
53 85 91 8 23 1 7 14 20 27 36 70 80 88 95 38 40 50 52 60 File Structure SNU-OOPSLA Lab.
73
Another Aspect (node structures) Heterogeneous Trees :B+-Tree in other text
Heterogeneous trees - leaf nodes and interior nodes have different structures File Structure SNU-OOPSLA Lab.
74
B+-Tree in other text 23 pointers to 23 records in data file 37 64 45
53 85 91 14 23 1 7 8 14 20 23 27 36 64 70 91 95 80 85 88 37 38 40 45 50 52 53 60 File Structure SNU-OOPSLA Lab.
75
Comparison of B-Tree and B+-Tree in other text
File Structure SNU-OOPSLA Lab.
76
Comparison of B-Tree and B+-Tree in other text
Historical Note B-tree : Bayer & McCreight B+-tree: Comer B*-tree : Knuth, B-trees with 67% minimum occupancy B÷-trees : B+-trees with 67% minimum occupancy File Structure SNU-OOPSLA Lab.
77
Formal Definition of B-Tree Properties
** The properties of a B-tree of order m 1. Every page has a maximum of m descendants 2. Every page, except for the root and the leaves, has at least ceiling of (m/2) descendants 3. The root has at least two descendants (unless it is a leaf) 4. All the leaves appear on the same level 5. The leaf level forms a complete, ordered index of the associated data file File Structure SNU-OOPSLA Lab
78
Worst-case Search Depth(1)
Search depth : depth of the tree Worst case When every page of the tree has only the minimum # of descendants A maximal height with a minimum breadth File Structure SNU-OOPSLA Lab
79
Worst-case Search Depth(2)
B-TREE WITH ORDER m level 1(root) 2 3 ... d minimum # of descendants 2 x [m/2] 2 x [m/2]2 2 x [m/2]d-1 u For a tree with N keys in its leaves, N >= 2 x [m/2]d-1 u Upper bound for the depth of a B-tree ---> d e.g.. Btree order = 512 keys, given 1,000,000 keys d <= 3.37 at most 3 depth ( 3 disk I/O ) d <= 1 + log[m/2](N/2) File Structure SNU-OOPSLA Lab
80
Deletion, Redistribution, and Concatenation
9.12 Deletion, Merging, and Redistribution Deletion, Redistribution, and Concatenation Ensure that the B-tree properties are maintained after a deletion Algorithm (with redistribution and cocatenation) 1. If the key to be deleted is not in a leaf, swap it with its immediate successor, which is in a leaf (might be redistributed or concatenated!) 2. Delete the key File Structure SNU-OOPSLA Lab
81
Deletion Algorithm(Cont’d)
9.12 Deletion, Merging, and Redistribution Deletion Algorithm(Cont’d) 3. If underflow occurs (the leaf now contains one too few keys), 3.1 If the left or right sibling has more than the minimum number of keys , redistribute 3.2 Otherwise, concatenate the two leaves and the median key from the parent into one leaf 3.3 Apply above step 3 to the parent as if it were deleted File Structure SNU-OOPSLA Lab
82
Redistribution Not necessarily fixed Even distribution is desired
9.12 Deletion, Merging, and Redistribution Redistribution Occur when a sibling has more than the minimum # of keys Idea: Move keys between siblings Result in a change in the key in the parent page Does not propagate : strictly local effects How many keys should be moved? Not necessarily fixed Even distribution is desired File Structure SNU-OOPSLA Lab
83
Concatenation(merge)
9.12 Deletion, Merging, and Redistribution Concatenation(merge) Occur in case of underflow Combining the two pages and the key from the parent page ==> make a single full page Reverse the splitting Concatenation must involve demotion of keys : may cause underflow in the parent page The effects propagate upward File Structure SNU-OOPSLA Lab
84
e.g. Deletion(1) Figure A 9.12 Deletion, Merging, and Redistribution I
P Z D G I M P T X Z A B C D J K L M Q R S T Y Z E F G H I N O P U V W X File Structure SNU-OOPSLA Lab
85
e.g. Deletion(2) 9.12 Deletion, Merging, and Redistribution
Removal of key C from figure A: Change occurs only in leaf node I P Z A B C D D G I M P T X Z A B D J K L M Q R S T Y Z E F G H I N O P U V W X File Structure SNU-OOPSLA Lab
86
e.g. Deletion(3) 9.12 Deletion, Merging, and Redistribution
Result of deleting P from figure A : P changes to O in the second level and the root I O Z D F I M O T X Z A B C D J K L M Q R S T Y Z E F G H I N O U V W X File Structure SNU-OOPSLA Lab
87
e.g. Deletion(4) 9.12 Deletion, Merging, and Redistribution
Result of deleting H from figure A : Removal of H caused an underflow, and two leaf nodes were merged I P Z D I M P T X Z A B C D J K L M Q R S T Y Z E F G I N O P U V W X File Structure SNU-OOPSLA Lab
88
Redistribution during Insertion
A way to improve storage utilization A way of avoiding the creation of new pages Tend to make an efficient B-tree in terms of space utilization Worst case : around 50% Average case : 67 ~ 69% With redistribution during insertion : over 85% File Structure SNU-OOPSLA Lab
89
DELETE J (No change) DELETE M (Swap with N)
9.13 Redistribution During Insertion M DELETE J (No change) 1 D H Q U A C E F I J K N O P R S V W X Y Z DELETE M (Swap with N) M N 1 Q U D H MN O P A C E F I K R S V W X Y Z File Structure SNU-OOPSLA Lab
90
DELETE R (Redistribution) DELETE A (Concatenation)
9.13 Redistribution During Insertion DELETE R (Redistribution) N 1 Q U W D H A C E F I K O P R S U V V W X Y Z DELETE A (Concatenation) N 1 Q W D H underflow A C E F O P I K S U V X Y Z File Structure SNU-OOPSLA Lab C D E F
91
HEIGHT OF THE TREE DECREASED
9.13 Redistribution During Insertion NOW UNDERFLOW PROPAGATE UPWARD! N 1 underflow H Q W C D E F I K O P S U V X Y Z HEIGHT OF THE TREE DECREASED H N Q W C D E F I K O P S U V X Y Z File Structure SNU-OOPSLA Lab
92
B* Trees Knuth, 1973, Addison-Wesley
Use redistribution operation during insertion Perform two-to-three split When split, the page has at least one sibling that is also full After split, the pages are about 2/3 full The page with at least (ceiling of (2m -1)/3) keys c.f. remember (ceiling of (m/2)) -1 keys File Structure SNU-OOPSLA Lab
93
B* Tree(Cont’d) Insert B A A C D F H K P R S T V X F R H S A B C D K M
9.14 B* Trees B* Tree(Cont’d) Original tree: A A C D F H K P R S T V X Two-to-three-split: F R Insert B H S A B C D K M T V P X File Structure SNU-OOPSLA Lab
94
Buffering of B-tree pages: Virtual B-Trees
9.15 Buffering of Pages:Virtual B-Trees Buffering of B-tree pages: Virtual B-Trees B-tree size >> main memory (in practice) Need buffering pages of B-tree Better to keep the root page in the main memory Buffer replacement algorithm: LRU + page height weighting factor Keep pages of top some levels all the time in main memory File Structure SNU-OOPSLA Lab
95
Placement of Information associated with the Key
9.15 Buffering of Pages:Virtual B-Trees Placement of Information associated with the Key How to store associated information In a data and index mingled file Once the key is found, no more disk access required In a separate file Larger number of keys per a page Higher order, shallower tree File Structure SNU-OOPSLA Lab
96
Variable Length Records and Keys
Chap9. B-trees Variable Length Records and Keys A B-tree with variable length keys No single, fixed order A different criterion for over/underflow condition Using max/min number of bytes (c.f. max/min number of keys) Key promotion mechanism Shortest variable-length keys are promoted in preference to longer ones Pages with the largest numbers of descendants up high in the tree File Structure SNU-OOPSLA Lab 객체지향시스템연구실
97
Let’s Review !!! 9.1 Introduction 9.2 Statement of the Problem
Chap9. B-trees Let’s Review !!! 9.1 Introduction 9.2 Statement of the Problem 9.3 Indexing with Binary Search Trees : AVL Trees, Paged Binary Trees, Problems with Paged Tress 9.4 Multilevel Indexing 9.5 B-Trees 9.6 Example of Creating a B-Tree 9.7 An Object-Oriented Representation of B-Trees : Class BTreeNode , Class BTree File Structure SNU-OOPSLA Lab 객체지향시스템연구실
98
Let’s Review !!! 9.8 B-Tree Methods Search, Insert, and Others
B-Tree Nomenclature 9.10 Formal Definition of B-Tree Properties 9.11 Worst-case Search Depth 9.12 Deletion, Merging, and Redistribution 9.13 Redistribution During Insertion 9.14 B* Trees 9.15 Buffering of Pages : Virtual B-Trees 9.16 Variable-Length Records and Keys File Structure SNU-OOPSLA Lab
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.