Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chap 9. Multilevel Indexing and B-Trees

Similar presentations


Presentation on theme: "Chap 9. Multilevel Indexing and B-Trees"— Presentation transcript:

1 Chap 9. Multilevel Indexing and B-Trees
Chap9. B-trees File Structures by Folk, Zoellick, and Ricarrdi Chap 9. Multilevel Indexing and B-Trees 서울대학교 컴퓨터공학부 객체지향시스템연구실 SNU-OOPSLA-LAB 김 형 주 교수 File Structure SNU-OOPSLA Lab 객체지향시스템연구실

2 Chap9. B-trees Chapter Objectives(1) Place the development of B-trees in the historical context of the problems they were designed to solve Look briefly at other tree structures that might be used on secondary storage, such as paged AVL trees Introduce multirecord and multilevel indexes and evaluate the speed of the search operation Provide an understanding of the important properties possessed by B-trees, and show how these properties are especially well suited to secondary storage applications File Structure SNU-OOPSLA Lab 객체지향시스템연구실

3 Chapter Objectives(2) Present the object-oriented design of B-trees
define class BTreeNode and Btree Explain the implementation of the fundamental operations on B-trees Introduce the notion of page buffering and virtual B-trees Describe variations of the fundamental B-trees algorithms, such as those used to build B* trees and B-trees with variable-length records File Structure SNU-OOPSLA Lab

4 Contents(1) 9.1 Introduction 9.2 Statement of the Problem
Chap9. B-trees Contents(1) 9.1 Introduction 9.2 Statement of the Problem 9.3 Indexing with Binary Search Trees : AVL Trees, Paged Binary Trees, Problems with Paged Tress 9.4 Multilevel Indexing 9.5 B-Trees 9.6 Example of Creating a B-Tree 9.7 An Object-Oriented Representation of B-Trees : Class BTreeNode , Class BTree File Structure SNU-OOPSLA Lab 객체지향시스템연구실

5 Contents(2) 9.8 B-Tree Methods Search, Insert, and Others
B-Tree Nomenclature Formal Definition of B-Tree Properties Worst-case Search Depth Deletion, Merging, and Redistribution Redistribution During Insertion B* Trees Buffering of Pages : Virtual B-Trees Variable-Length Records and Keys File Structure SNU-OOPSLA Lab

6 Introduction: The Invention of the B-tree
9.1 Introduction : Invention of the B-Tree Introduction: The Invention of the B-tree 1972 Acta Infomatica : R. Bayer and E. McCreight (at Boeing Corporation) “Organization and Maintenance of Large Ordered Indexes” 1979 : ‘de facto’ standard for database index D.Comer “The Ubiquitous B-tree” ACM Computing Survey Why the name B-tree? Balanced, Bushy, Broad, Boeing, Bayer Retrieval, Insertion, Deletion time = log K I ( I : no of indexes in file, K : no of indexes in a page) Excellent for dynamically changing random access files File Structure SNU-OOPSLA Lab

7 Statement of the Problem
Problems in an index on secondary storage Searching the index must be faster than binary searching In binary search: 15 items - 4 seeks, 1,000 items seeks Insertion and deletion must be as fast as search inserting a key may involve moving many other keys in some file structures File Structure SNU-OOPSLA Lab

8 Binary Search Tree(1) Advantages Disadvantages
9.3 Indexing with Binary Search Trees Binary Search Tree(1) Advantages Data may not be physically sorted Good performance on balanced tree Insert cost = search cost Disadvantages In out-of-balance binary tree, more seeks are required File Structure SNU-OOPSLA Lab

9 Binary Search Tree(2) Sorted list of keys KF FB CL HN SD PA WS DE FT
9.3 Indexing with Binary Search Trees Binary Search Tree(2) Sorted list of keys AX, CL, DE, FB, FT, HN, JD, KF, NR, PA, RF, SD, TK, YJ KF FB CL HN SD PA WS DE FT JD NR RF TK YJ AX At most 4 seeks/one record Binary search tree representation File Structure SNU-OOPSLA Lab

10 Internal Representation of Binary Tree
9.3 Indexing with Binary Search Trees Internal Representation of Binary Tree With RRN(fixed length record) or pointer ROOT FB JD RF SD AX YJ PA HN KF CL NR DE WS TK 1 2 3 4 5 6 7 8 9 10 11 12 13 FT 14 key left right key left right File Structure SNU-OOPSLA Lab

11 Unbalanced Binary Tree
9.3 Indexing with Binary Search Trees Unbalanced Binary Tree KF FB CL HN SD PA WS AX DE FT JD NR RF TK LV LA NP MB ND NK YJ - At most 9 seeks/one record - Worst case : sequential search File Structure SNU-OOPSLA Lab

12 9.3 Indexing with Binary Search Trees
AVL Tree(1) A height-balanced k tree ( HB(k) tree) Allowable difference in the height of any two sub-tree is k AVL Tree : HB(1) Tree G.M. Adel’son, Vel’skii, E.M. Landis Maintenance overhead is needed Performance Given N keys, worst-case search => 1.44 log2(N+2) cf. Completely balanced AVL tree : worst-case search => log2(N+1) File Structure SNU-OOPSLA Lab

13 AVL Tree(2) X X X X 9.3 Indexing with Binary Search Trees
(a) AVL Trees (b) Non - AVL Trees X X X X File Structure SNU-OOPSLA Lab

14 9.3 Indexing with Binary Search Trees
AVL Tree(3) Binary tree structure that is balanced nature with respect to the height of subtree Definition An empty tree is height balanced If T is a nonempty binary tree with TL and TR as its left and right subtrees, then T is height balanced iff (1) TL and TR are height balanced and (2) |hL-hR|<1 where hL and hR are the heights of TL and TR, respectively File Structure SNU-OOPSLA Lab

15 9.3 Indexing with Binary Search Trees
AVL Tree(4) BalanceFactor, BF(T), of a node T in a binary tree is hL-hR where hL and hR are the height of the left and right subtree of T For any node in tree T in AVL tree, BF(T) should be one of “ -1, 0, 1” If BF(T) is -2 or 2, then proper rotation is carried out in order to get balance File Structure SNU-OOPSLA Lab

16 AVL Tree(5) 9.3 Indexing with Binary Search Trees New Identifier
MARCH After Insertion No Rebalancing needed MAR New Identifier MAY No Rebalancing needed After Insertion -1 MAR MAY New Identifier NOVEMBER After Rebalancing After Insertion -2 MAR MAY RR -1 MAY MAR NOV NOV File Structure SNU-OOPSLA Lab

17 AVL Tree(6) 9.3 Indexing with Binary Search Trees New Identifier
AUGUST After Insertion No Rebalancing needed +1 MAY +1 MAR NOV AUG File Structure SNU-OOPSLA Lab

18 AVL Tree(7) 9.3 Indexing with Binary Search Trees New Identifier
APRIL After Insertion After Rebalancing +2 MAY LL +1 MAY +2 MAR NOV AUG NOV +1 AUG APR MAR APR File Structure SNU-OOPSLA Lab

19 AVL Tree(8) 9.3 Indexing with Binary Search Trees New Identifier
JANUARY After Rebalancing After Insertion +2 MAY LR MAR -1 AUG NOV AUG -1 MAY APR +1 MAR APR JAN NOV JAN File Structure SNU-OOPSLA Lab

20 AVL Tree(9) 9.3 Indexing with Binary Search Trees New Identifier
DECEMBER After Insertion No Rebalancing needed +1 MAR -1 AUG -1 MAY APR +1 JAN NOV DEC File Structure SNU-OOPSLA Lab

21 AVL Tree(10) 9.3 Indexing with Binary Search Trees New Identifier
JULY After Insertion No Rebalancing needed +1 MAR -1 AUG -1 MAY APR JAN NOV DEC JUL File Structure SNU-OOPSLA Lab

22 AVL Tree(11) 9.3 Indexing with Binary Search Trees New Identifier
FEBRUARY After Insertion After Rebalancing +2 MAR RL +1 MAR -2 AUG -1 MAY DEC -1 MAY APR +1 JAN NOV +1 AUG JAN NOV -1 DEC JUL APR FEB JUL FEB File Structure SNU-OOPSLA Lab

23 AVL Tree(12) 9.3 Indexing with Binary Search Trees New Identifier
JUNE After Insertion After Rebalancing LR JAN +2 MAR +1 DEC MAR -1 DEC -1 MAY +1 AUG +1 AUG -1 JAN FEB -1 JUL -1 MAY NOV APR APR JUN -1 NOV FEB -1 JUL JUN File Structure SNU-OOPSLA Lab

24 AVL Tree(13) 9.3 Indexing with Binary Search Trees New Identifier
OCTOBER After Insertion After Rebalancing -1 JAN RR +1 DEC -1 MAR +1 AUG FEB -1 JUL -2 MAY APR JUN -1 NOV OCT File Structure SNU-OOPSLA Lab

25 AVL Tree(14) 9.3 Indexing with Binary Search Trees JAN +1 DEC MAR +1
JAN +1 DEC MAR +1 AUG FEB -1 JUL NOV APR JUN MAY OCT File Structure SNU-OOPSLA Lab

26 AVL Tree(15) 9.3 Indexing with Binary Search Trees New Identifier
SEPTEMBER After Insertion No Rebalancing needed -1 JAN +1 DEC -1 MAR +1 AUG -1 NOV FEB -1 JUL APR JUN MAY -1 OCT SEP File Structure SNU-OOPSLA Lab

27 AVL Tree : Rebalancing(1)
9.3 Indexing with Binary Search Trees AVL Tree : Rebalancing(1) Rebalancing is carried out using four different kinds of rotations LL when new node Y is inserted in the left subtree of the left subtree of A LR when new node Y is inserted in the right subtree of the left subtree of A RR when new node Y is inserted in the right subtree of the right subtree of A RL when new node Y is inserted in the left subtree of the right subtree of A File Structure SNU-OOPSLA Lab

28 AVL Tree : Rebalancing(2)
9.3 Indexing with Binary Search Trees AVL Tree : Rebalancing(2) A Insert Y LL LR RL RR File Structure SNU-OOPSLA Lab

29 AVL Tree : Rebalancing(LL)
9.3 Indexing with Binary Search Trees AVL Tree : Rebalancing(LL) Unbalanced following insertion Balanced Subtree Balanced Subtree rotation type LL B +1 A +2 A BL A B AR B AR h+2 h+2 h BR AR BL BR BL BR Height of BL increase to h+1 (BL < B < BR < A < AR) File Structure SNU-OOPSLA Lab

30 AVL Tree : Rebalancing(RR)
9.3 Indexing with Binary Search Trees AVL Tree : Rebalancing(RR) Unbalanced following insertion Balanced Subtree Balanced Subtree rotation type RR B -1 A -2 A BR A B B AL AL h+2 h+2 BL BL BR BL BR Al Height of BR increase to h+1 (AL < A < BL < B < BR) File Structure SNU-OOPSLA Lab

31 AVL Tree : Rebalancing(LR)
9.3 Indexing with Binary Search Trees AVL Tree : Rebalancing(LR) Balanced Subtree Unbalanced following insertion Balanced Subtree rotation type LR(a) +1 A C +1 A B B -1 B A C (B < C < A) File Structure SNU-OOPSLA Lab

32 (BL < B < CL < C < CR < A < AR)
9.3 Indexing with Binary Search Trees AVL Tree : Rebalancing(LR) Unbalanced following insertion Balanced Subtree Balanced Subtree rotation type LR(b) C +1 A +2 A h+2 B -1 A B -1 B AR AR h+2 C h +1 C BL CL CR AR BL BL h h CL CR CL CR h-1 (BL < B < CL < C < CR < A < AR) File Structure SNU-OOPSLA Lab

33 RL a, b and c are symmetric to LR a, b and c
9.3 Indexing with Binary Search Trees AVL Tree : Rebalancing(LR) Unbalanced following insertion Balanced Subtree Balanced Subtree rotation type LR(c) C +1 A +2 A h+2 +1 B A B -1 B AR AR h+2 C h -1 C BL CL CR AR BL BL h CL CR CL CR h-1 RL a, b and c are symmetric to LR a, b and c File Structure SNU-OOPSLA Lab

34 Paged Binary Tree(1) 9.3 Indexing with Binary Search Trees Page
A unit of disk I/O for handling seek and transfer of disk data Typically, 4k, 8k, 16k ... Paged Binary Tree Divide a binary tree into pages and then store each page in a block of contiguous locations on disk. If every page holds 7 keys, 511 nodes(keys) in only three seeks Performance : # of seeks A completely full balanced tree : log2 (N+1) A completely full paged tree : log(k+1) (N+1) (k : # of keys hold in a single page) File Structure SNU-OOPSLA Lab

35 Paged Binary Tree(2) 9.3 Indexing with Binary Search Trees
File Structure SNU-OOPSLA Lab

36 The Problem with Paged Trees
9.3 Indexing with Binary Search Trees The Problem with Paged Trees Only valid when we have the entire set of keys in hand before the tree is built Problems due to out of balance How to select a good separator How to group keys How to guarantee the maximum loading B-tree provides a solution for above problems! File Structure SNU-OOPSLA Lab

37 Paged Binary Tree (Out of balance)
9.3 Indexing with Binary Search Trees Paged Binary Tree (Out of balance) I P X G E H D C A B F M S U T W V Y K N R O Q J L Z random input sequence : C S D T A M P I B W N G U R K E H O L J Y Q Z F X V File Structure SNU-OOPSLA Lab

38 Multilevel Indexing Approach as simple index record
9.4 Multilevel Indexing : A Better Approach to Tree Indexes Multilevel Indexing Approach as simple index record limited on the number of keys allowed Approach as multirecord index consists of a sequence of simple index records binary search is too expensive Approach as multilevel index reduced the number of records to be searched speed up the search <example> 80Mbytes file of 8,000,000 records 10-byte keys File Structure SNU-OOPSLA Lab

39 Example of Multilevel Indexing
9.4 Multilevel Indexing : A Better Approach to Tree Indexes Example of Multilevel Indexing 4th level index 1 a single index record with 8 keys 3rd level index 1 2 : 8 : 8 index records to index the largest keys in the 800 second-level records 2nd level index 1 2 : 9 800 : 800 index records with 80,000 keys choose one of the keys in each index record as the key of that whole record Lowest level index is an index to data file and its reference fields are record addresses in the data file File Structure SNU-OOPSLA Lab

40 Multi-level Indexing(3)
How can we insert new keys into the multilevel index? The index records in some level might be full The several levels of indexes might be rebuilt Overflow chain may be helpful, but still ugly Multi-level index structure is not strong in dynamic data processing applications B-tree will give you the right solution! File Structure SNU-OOPSLA Lab

41 B-Trees: Working up from the bottom
Bayer and McCreight, 1972, Acta Infomatica Build trees upward from the bottom instead of downward from the top Each node of B-tree is an index record which consists of “key-reference” pairs The order of B-tree: the max number of key-reference pairs Every index record should have at least half of the order File Structure SNU-OOPSLA Lab

42 Sample B-Tree T D P M P T A C D I S I M D A C D S T D P D
File Structure SNU-OOPSLA Lab

43 Splitting & Promoting(1)
9.6 Example of Creating a B-Tree Chap9. B-trees Splitting & Promoting(1) Splitting Creation of two nodes out of one because the original node becomes overfull Result in the need to promote a key to a higher-level node to provide an index separating the two new nodes Promotion of a key Movement of a key from one node into a higher-level node when split occurs File Structure SNU-OOPSLA Lab 객체지향시스템연구실

44 Splitting & Promoting(2)
9.6 Example of Creating a B-Tree Chap9. B-trees Splitting & Promoting(2) * A B C D E F G J Initial leaf of a B-tree with a page size of seven Splitting the leaf to accommodate the new J key Insert J key (continued....) File Structure SNU-OOPSLA Lab 객체지향시스템연구실

45 Splitting & Promoting(3)
9.6 Example of Creating a B-Tree Splitting & Promoting(3) * A B C D E F G Promotion of the E key into a root node J J File Structure SNU-OOPSLA Lab

46 Insertion in B-tree(1) Input Sequence
9.6 Example of Creating a B-Tree Insertion in B-tree(1) Input Sequence : C S D T A M P I B W N G U R K E H O L J Y Q Z F X V D T C D S T Insertion of C, S, D, T into the initial page D A C S T Insertion of A causes node to split and the largest key in each leaf node(D and T)to be placed in the root node File Structure SNU-OOPSLA Lab

47 Insertion in B-tree(2) T D P M P T A C D I S
9.6 Example of Creating a B-Tree Insertion in B-tree(2) T D P M P T A C D I S M and P are inserted into the rightmost leaf node, then insertion of I causes it to split File Structure SNU-OOPSLA Lab

48 Insertion in B-tree(3) P W D M D W A B C G I M N P S T
9.6 Example of Creating a B-Tree Insertion in B-tree(3) P W D M D W A B C G I M N P S T Insertions of B,W,N, and G into leaf nodes causes another split and the root is now full File Structure SNU-OOPSLA Lab

49 Insertion in B-tree(4) P W D M D U W A B C G I M N P S T
9.6 Example of Creating a B-Tree Insertion in B-tree(4) P W D M D U W A B C G I M N P S T Insertion of U proceeds without incident, but R would have to be inserted into the rightmost leaf, which is full File Structure SNU-OOPSLA Lab

50 Insertion in B-tree(5) 9.6 Example of Creating a B-Tree P D P T A C D
W D M P T W A B C D G I M N P R S T U W Insertion of causes the rightmost leaf node to split, insertion into the root to split and the tree grows to level three File Structure SNU-OOPSLA Lab

51 Insertion in B-tree(6) 9.6 Example of Creating a B-Tree P Z D I M P T
Q R S T U W Y Z A B C D E G M I J K L M N O P Insertions of K,E,H,O,L,J,Y,Q, and Z, continue with another node split File Structure SNU-OOPSLA Lab

52 Insertions of F, X, and V finish the insertion of the alphabet
9.6 Example of Creating a B-Tree Insertion in B-tree(7) I P Z D G I M P T X Z Y Z A B C D J K L M Q R S T E F G H I N O P U V W X Insertions of F, X, and V finish the insertion of the alphabet File Structure SNU-OOPSLA Lab

53 Insertion in B-trees Major components of insertion Split the node
Promote the middle key Increase the height of the B-tree Insertion may touch no more than 2 nodes per level Insertion cost is strictly linear in the height of the tree File Structure SNU-OOPSLA Lab

54 Class BTreeNode(1) Represent B-Tree nodes in memory
9.7 An Object-Oriented Representation of B-Trees Class BTreeNode(1) Represent B-Tree nodes in memory B-tree is an index file associated with a data file Specified in btnode.h of Appendix I The template BTreeNode class based on the SimpleIndex template class SimpleIndex Class BTreeNode Class Public methods Insert, Remove, Clear, Search Print, NumKeys Insert, Remove, LargestKey Split, Pack, Unpack File Structure SNU-OOPSLA Lab

55 Class BTreeNode(2) Members
9.7 An Object-Oriented Representation of B-Trees Class BTreeNode(2) Members Public methods: insert : simply calls SimpleIndex::Insert and then check for overflow remove a key, split and merge nodes search : inherited from SimpleIndex class(works perfectly well) pack/unpack : manage the difference between the memory and the disk representation of BTreeNode objects Protected member store the file address of the node and the minimum and maximum number of keys File Structure SNU-OOPSLA Lab

56 Template <class keyType>
class BTreeNode: public SimpleIndex <keyType> {public BTreeNode(int maxKeys, int unique = 1); int Insert (const keyType key, int recAddr); int Remove(const keyType key, int recAddr = -1); int LargestKey (); int Split (BTreeNode<ketType>*newNode); int Pack (IOBuffer& buffer); int Unpack(IOBuffer& buffer); protected int MaxBKeys; int Init(); friend class Btree<keyType>; } File Structure SNU-OOPSLA Lab

57 Class BTree 9.7 An Object-Oriented Representation of B-Trees
Uses in-memory BTreeNode objects adds the file access portion enforces the consistent size of the nodes specified in btree.h of Appendix I Methods Create, Open, Close a B-Tree Search, Insert, Remove key-reference pairs Protected area Fetch(transfer nodes from disk to memory) Store(transfer nodes back to disk) root node, height of the tree, file of index records BTNode **Node:used to keep a collection of tree nodes in memory and reduce disk access File Structure SNU-OOPSLA Lab

58 Template <class keyType> class Btree {public:
Btree(int order, int keySize=sizeof(keyType), int unique=1); int Open (char * name, int mode); int Create (char * name, int mode); int Close (); int Insert (const keyType key, const int recAddr); int Remove (const ketType key, const int recAddr = -1); int Search (const keyType key, const int recAddr = -1); protected typedef BTreeNode<keyType> BTNode; BTNode * FindLeaf (const ketType key); BTNode * Fetch(const int recaddr); int Store (BTNode *); BTNode Root; int Height; int Order; BTNode ** Nodes; RecordFile<BTNode> BtreeFile; }| File Structure SNU-OOPSLA Lab

59 Page Structure A B G N P D M C I W U S T
9.8 B-Tree Methods Search, Insert, and Others Page Structure 2 A B G N P D M C I W U S T 3 8 5 KEYCOUNT KEY array CHILD array Page 2 4 D M P W content of PAGE 2, 3 Page 3 G I M Nil Nil Nil Nil 3 File Structure SNU-OOPSLA Lab

60 Algorithm for Search Searching procedure iterative work in two stages
9.8 B-Tree Methods Search, Insert, and Others Algorithm for Search Searching procedure iterative work in two stages operating alternatively on entire pages (Class BTree) and then within pages (Class BTreeNode) Step1: Loading a page into memeory Step 2: Searching through a page, looking for the key along the tree until it reaches the leaf level File Structure SNU-OOPSLA Lab

61 Search and FindLeaf method
9.8 B-Tree Methods Search, Insert, and Others Search and FindLeaf method Specifications of Search and FindLeaf methods(Fig 9.18) Template <class keyType> int BTree<keyType>::Search(const keyType key, const int recAddr) template <class keyType> BTreeNode<keyType>* BTree<keyType>::FindLeaf(const keyType key) Search method recAddr = btree.Search(‘L’) call FindLeaf(‘L’); Search key in the leaf node, and then if key exists, return the data file address of record with key ‘L’ otherwise, return -1 FindLeaf method Search down to leafNode, beginning of the root return the address of leafNode File Structure SNU-OOPSLA Lab

62 Algorithm for Insertion(1)
9.8 B-Tree Methods Search, Insert, and Others Algorithm for Insertion(1) Observations of Insertion, Splitting, and Promotion proceed all the way down to the leaf level after finding the insertion location at the leaf level, the work proceeds upward from the bottom Iterative procedure as having three phases Search to the leaf level, using FindLeaf method Insertion, overflow detection, and splitting on the upward path Creation of a new root node, if the current root was split File Structure SNU-OOPSLA Lab

63 Algorithm for Insertion(2)
9.8 B-Tree Methods Search, Insert, and Others Algorithm for Insertion(2) With no redistribution (Step 1) Locate node on bottom most level in which to insert record. Location is determined by key search. (Step 2) If vacant record slot is available, insert the record so that key sequencing is maintained. Then, update the pointer associated with the record (Pointer is null for level 0 records). Then Stop! (Step 3) If no vacant record slot exists, identify median record. All records and pointers to the left of the median records are stored in one node (the original) and those to the right are stored in another node(the new node). File Structure SNU-OOPSLA Lab

64 Algorithm for Insertion(3)
9.8 B-Tree Methods Search, Insert, and Others Algorithm for Insertion(3) (Step 4) If the topmost node was split, create a new topmost node which contains the median record identified in Step 3, filled with pointers to the original and split nodes. Update the root node to point to the new topmost node. Then Stop! (Step 5) If topmost node was not split, prepare to insert median record identified in Step 3 and a pointer to the new node (created in Step 3). Then Goto Step 2. Note : Step 4 makes B-tree increase in height by 1 level B-trees have 70% occupancy(like B+-trees) on an average File Structure SNU-OOPSLA Lab

65 Insertion Example 9.8 B-Tree Methods Search, Insert, and Others
3 4 19 20 Insert 1 1 2 split 13 16 9 Insert 9 File Structure SNU-OOPSLA Lab

66 Create, Open, and Close Specified in btree.tc of Appendix I
9.8 B-Tree Methods Search, Insert, and Others Create, Open, and Close Specified in btree.tc of Appendix I Method Create writes the empty root node into the file BTreeFile so that its first record is reserved for that root node Method Open opens BTreeFile and load the root node into memory from the first record in the file Method Close simply stores the node into BTreeFile and close it File Structure SNU-OOPSLA Lab

67 9.9 B-Tree Nomenclature B-Tree Nomenclature Be aware that terms are not uniform in the literature Definitions are also quite different In fact, there are a number of B-tree variations This text book uses “B tree” for B+ tree by other books In this book, “B+ tree” is B+ tree with a linked list of sorted data blocks File Structure SNU-OOPSLA Lab

68 Root C G A B E F H I Data Block Data Block Data Block Data Block
Other Book Our Book B-Tree N/A C G A B E F H I Data Block Data Block Data Block Data Block File Structure SNU-OOPSLA Lab

69 Root C G I A B C E F G H I Data Block Data Block Data Block Data Block
Other Book Our Book B+-Tree B-Tree C G I A B C E F G H I Data Block Data Block Data Block Data Block File Structure SNU-OOPSLA Lab

70 Root C G I A B C E F G H I Data Block Data Block Data Block Data Block
Other Book Our Book B+-Tree with Linked List C G I A B C E F G H I Data Block Data Block Data Block Data Block File Structure SNU-OOPSLA Lab

71 Another aspect (node structures) Homogeneous Trees :B-Tree in other text
Homogeneous trees - leaf nodes and interior nodes have same structures; Each contains both data pointers and tree pointers Average search length less for homogeneous trees, because some searches may conclude before reaching a leaf node File Structure SNU-OOPSLA Lab.

72 B-Tree in other text 23 pointers to 23 records in data file 37 64 45
53 85 91 8 23 1 7 14 20 27 36 70 80 88 95 38 40 50 52 60 File Structure SNU-OOPSLA Lab.

73 Another Aspect (node structures) Heterogeneous Trees :B+-Tree in other text
Heterogeneous trees - leaf nodes and interior nodes have different structures File Structure SNU-OOPSLA Lab.

74 B+-Tree in other text 23 pointers to 23 records in data file 37 64 45
53 85 91 14 23 1 7 8 14 20 23 27 36 64 70 91 95 80 85 88 37 38 40 45 50 52 53 60 File Structure SNU-OOPSLA Lab.

75 Comparison of B-Tree and B+-Tree in other text
File Structure SNU-OOPSLA Lab.

76 Comparison of B-Tree and B+-Tree in other text
Historical Note B-tree : Bayer & McCreight B+-tree: Comer B*-tree : Knuth, B-trees with 67% minimum occupancy B÷-trees : B+-trees with 67% minimum occupancy File Structure SNU-OOPSLA Lab.

77 Formal Definition of B-Tree Properties
** The properties of a B-tree of order m 1. Every page has a maximum of m descendants 2. Every page, except for the root and the leaves, has at least ceiling of (m/2) descendants 3. The root has at least two descendants (unless it is a leaf) 4. All the leaves appear on the same level 5. The leaf level forms a complete, ordered index of the associated data file File Structure SNU-OOPSLA Lab

78 Worst-case Search Depth(1)
Search depth : depth of the tree Worst case When every page of the tree has only the minimum # of descendants A maximal height with a minimum breadth File Structure SNU-OOPSLA Lab

79 Worst-case Search Depth(2)
B-TREE WITH ORDER m level 1(root) 2 3 ... d minimum # of descendants 2 x [m/2] 2 x [m/2]2 2 x [m/2]d-1 u For a tree with N keys in its leaves, N >= 2 x [m/2]d-1 u Upper bound for the depth of a B-tree ---> d e.g.. Btree order = 512 keys, given 1,000,000 keys d <= 3.37 at most 3 depth ( 3 disk I/O ) d <= 1 + log[m/2](N/2) File Structure SNU-OOPSLA Lab

80 Deletion, Redistribution, and Concatenation
9.12 Deletion, Merging, and Redistribution Deletion, Redistribution, and Concatenation Ensure that the B-tree properties are maintained after a deletion Algorithm (with redistribution and cocatenation) 1. If the key to be deleted is not in a leaf, swap it with its immediate successor, which is in a leaf (might be redistributed or concatenated!) 2. Delete the key File Structure SNU-OOPSLA Lab

81 Deletion Algorithm(Cont’d)
9.12 Deletion, Merging, and Redistribution Deletion Algorithm(Cont’d) 3. If underflow occurs (the leaf now contains one too few keys), 3.1 If the left or right sibling has more than the minimum number of keys , redistribute 3.2 Otherwise, concatenate the two leaves and the median key from the parent into one leaf 3.3 Apply above step 3 to the parent as if it were deleted File Structure SNU-OOPSLA Lab

82 Redistribution Not necessarily fixed Even distribution is desired
9.12 Deletion, Merging, and Redistribution Redistribution Occur when a sibling has more than the minimum # of keys Idea: Move keys between siblings Result in a change in the key in the parent page Does not propagate : strictly local effects How many keys should be moved? Not necessarily fixed Even distribution is desired File Structure SNU-OOPSLA Lab

83 Concatenation(merge)
9.12 Deletion, Merging, and Redistribution Concatenation(merge) Occur in case of underflow Combining the two pages and the key from the parent page ==> make a single full page Reverse the splitting Concatenation must involve demotion of keys : may cause underflow in the parent page The effects propagate upward File Structure SNU-OOPSLA Lab

84 e.g. Deletion(1) Figure A 9.12 Deletion, Merging, and Redistribution I
P Z D G I M P T X Z A B C D J K L M Q R S T Y Z E F G H I N O P U V W X File Structure SNU-OOPSLA Lab

85 e.g. Deletion(2) 9.12 Deletion, Merging, and Redistribution
Removal of key C from figure A: Change occurs only in leaf node I P Z A B C D D G I M P T X Z A B D J K L M Q R S T Y Z E F G H I N O P U V W X File Structure SNU-OOPSLA Lab

86 e.g. Deletion(3) 9.12 Deletion, Merging, and Redistribution
Result of deleting P from figure A : P changes to O in the second level and the root I O Z D F I M O T X Z A B C D J K L M Q R S T Y Z E F G H I N O U V W X File Structure SNU-OOPSLA Lab

87 e.g. Deletion(4) 9.12 Deletion, Merging, and Redistribution
Result of deleting H from figure A : Removal of H caused an underflow, and two leaf nodes were merged I P Z D I M P T X Z A B C D J K L M Q R S T Y Z E F G I N O P U V W X File Structure SNU-OOPSLA Lab

88 Redistribution during Insertion
A way to improve storage utilization A way of avoiding the creation of new pages Tend to make an efficient B-tree in terms of space utilization Worst case : around 50% Average case : 67 ~ 69% With redistribution during insertion : over 85% File Structure SNU-OOPSLA Lab

89 DELETE J (No change) DELETE M (Swap with N)
9.13 Redistribution During Insertion M DELETE J (No change) 1 D H Q U A C E F I J K N O P R S V W X Y Z DELETE M (Swap with N) M N 1 Q U D H MN O P A C E F I K R S V W X Y Z File Structure SNU-OOPSLA Lab

90 DELETE R (Redistribution) DELETE A (Concatenation)
9.13 Redistribution During Insertion DELETE R (Redistribution) N 1 Q U W D H A C E F I K O P R S U V V W X Y Z DELETE A (Concatenation) N 1 Q W D H underflow A C E F O P I K S U V X Y Z File Structure SNU-OOPSLA Lab C D E F

91 HEIGHT OF THE TREE DECREASED
9.13 Redistribution During Insertion NOW UNDERFLOW PROPAGATE UPWARD! N 1 underflow H Q W C D E F I K O P S U V X Y Z HEIGHT OF THE TREE DECREASED H N Q W C D E F I K O P S U V X Y Z File Structure SNU-OOPSLA Lab

92 B* Trees Knuth, 1973, Addison-Wesley
Use redistribution operation during insertion Perform two-to-three split When split, the page has at least one sibling that is also full After split, the pages are about 2/3 full The page with at least (ceiling of (2m -1)/3) keys c.f. remember (ceiling of (m/2)) -1 keys File Structure SNU-OOPSLA Lab

93 B* Tree(Cont’d) Insert B A A C D F H K P R S T V X F R H S A B C D K M
9.14 B* Trees B* Tree(Cont’d) Original tree: A A C D F H K P R S T V X Two-to-three-split: F R Insert B H S A B C D K M T V P X File Structure SNU-OOPSLA Lab

94 Buffering of B-tree pages: Virtual B-Trees
9.15 Buffering of Pages:Virtual B-Trees Buffering of B-tree pages: Virtual B-Trees B-tree size >> main memory (in practice) Need buffering pages of B-tree Better to keep the root page in the main memory Buffer replacement algorithm: LRU + page height weighting factor Keep pages of top some levels all the time in main memory File Structure SNU-OOPSLA Lab

95 Placement of Information associated with the Key
9.15 Buffering of Pages:Virtual B-Trees Placement of Information associated with the Key How to store associated information In a data and index mingled file Once the key is found, no more disk access required In a separate file Larger number of keys per a page Higher order, shallower tree File Structure SNU-OOPSLA Lab

96 Variable Length Records and Keys
Chap9. B-trees Variable Length Records and Keys A B-tree with variable length keys No single, fixed order A different criterion for over/underflow condition Using max/min number of bytes (c.f. max/min number of keys) Key promotion mechanism Shortest variable-length keys are promoted in preference to longer ones Pages with the largest numbers of descendants up high in the tree File Structure SNU-OOPSLA Lab 객체지향시스템연구실

97 Let’s Review !!! 9.1 Introduction 9.2 Statement of the Problem
Chap9. B-trees Let’s Review !!! 9.1 Introduction 9.2 Statement of the Problem 9.3 Indexing with Binary Search Trees : AVL Trees, Paged Binary Trees, Problems with Paged Tress 9.4 Multilevel Indexing 9.5 B-Trees 9.6 Example of Creating a B-Tree 9.7 An Object-Oriented Representation of B-Trees : Class BTreeNode , Class BTree File Structure SNU-OOPSLA Lab 객체지향시스템연구실

98 Let’s Review !!! 9.8 B-Tree Methods Search, Insert, and Others
B-Tree Nomenclature 9.10 Formal Definition of B-Tree Properties 9.11 Worst-case Search Depth 9.12 Deletion, Merging, and Redistribution 9.13 Redistribution During Insertion 9.14 B* Trees 9.15 Buffering of Pages : Virtual B-Trees 9.16 Variable-Length Records and Keys File Structure SNU-OOPSLA Lab


Download ppt "Chap 9. Multilevel Indexing and B-Trees"

Similar presentations


Ads by Google