October 24, 2005L11.1 Introduction to Algorithms LECTURE (Chapter 18) B-Trees ‧ 18.1 Definition of B-Tree ‧ 18.2 Basic operations on B-Tree ‧ 18.3 Deleting.

Slides:



Advertisements
Similar presentations
Advanced Database Discussion B Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if.
Advertisements

0 Course Outline n Introduction and Algorithm Analysis (Ch. 2) n Hash Tables: dictionary data structure (Ch. 5) n Heaps: priority queue data structures.
Data Structures Haim Kaplan and Uri Zwick November 2012 Lecture 5 B-Trees.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
COMP 451/651 Indexes Chapter 1.
Tirgul 6 B-Trees – Another kind of balanced trees Some notes regarding Home Work.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
COMP 171 Data Structures and Algorithms Tutorial 9 B-Trees.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
E.G.M. PetrakisB-trees1 Multiway Search Tree (MST)  Generalization of BSTs  Suitable for disk  MST of order n:  Each node has n or fewer sub-trees.
Tirgul 6 B-Trees – Another kind of balanced trees.
CS4432: Database Systems II
CPSC 335 BTrees Dr. Marina Gavrilova Computer Science University of Calgary Canada.
1 Multiway trees & B trees & 2_4 trees Go&Ta Chap 10.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Advanced Algorithm Design and Analysis (Lecture 2) SW5 fall 2004 Simonas Šaltenis E1-215b
Multi-way Trees. M-way trees So far we have discussed binary trees only. In this lecture, we go over another type of tree called m- way trees or trees.
October 10, Algorithms and Data Structures Lecture IX Simonas Šaltenis Nykredit Center for Database Research Aalborg University
INTRODUCTION TO MULTIWAY TREES P INTRO - Binary Trees are useful for quick retrieval of items stored in the tree (using linked list) - often,
March 7 & 9, Csci 2111: Data and File Structures Week 8, Lectures 1 & 2 Multi-Level Indexing and B-Trees.
Data Structures Balanced Trees 1CSCI Outline  Balanced Search Trees 2-3 Trees Trees Red-Black Trees 2CSCI 3110.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
School of Engineering and Computer Science Victoria University of Wellington Copyright: Xiaoying Gao, Peter Andreae, VUW B Trees and B+ Trees COMP 261.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
November 12, Algorithms and Data Structures Lecture IX Simonas Šaltenis Nykredit Center for Database Research Aalborg University
B-trees Eduardo Laber David Sotelo. What are B-trees? Balanced search trees designed for secondary storage devices Similar to AVL-trees but better at.
Data Structures and Algorithms (AT70.02) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Prof. Sumanta Guha Slide Sources: CLRS “Intro.
1 Multi-Level Indexing and B-Trees. 2 Statement of the Problem When indexes grow too large they have to be stored on secondary storage. However, there.
Data Structures – Week #6 Special Trees. January 14, 2016Borahan Tümer, Ph.D.2 Outline Adelson-Velskii-Landis (AVL) Trees Splay Trees B-Trees.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
B-Tree.
Advanced Algorithm Design and Analysis (Lecture 1) SW5 fall 2004 Simonas Šaltenis E1-215b
B-Trees Katherine Gurdziel 252a-ba. Outline What are b-trees? How does the algorithm work? –Insertion –Deletion Complexity What are b-trees used for?
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
COMP261 Lecture 23 B Trees.
B-Tree Michael Tsai 2017/06/06.
Multiway Search Trees Data may not fit into main memory
Chapter 18: B-Trees Example: M Note: Each leaf has the same depth D H
B-Trees Example: Comp 750, Fall 2009 M Note: Each leaf
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B Tree Adhiraj Goel 1RV07IS004.
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
Wednesday, April 18, 2018 Announcements… For Today…
Data Structures – Week #6
CS 583 Analysis of Algorithms
B- Trees D. Frey with apologies to Tom Anastasio
B-Tree Presenter: Jun Tao.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B- Trees D. Frey with apologies to Tom Anastasio
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
CSE 373 Data Structures and Algorithms
Algorithms and Data Structures Lecture IX
Presentation transcript:

October 24, 2005L11.1 Introduction to Algorithms LECTURE (Chapter 18) B-Trees ‧ 18.1 Definition of B-Tree ‧ 18.2 Basic operations on B-Tree ‧ 18.3 Deleting a key from a B-Tree

Disk Based Data Structures So far search trees were limited to main memory structures –Assumption: the dataset organized in a search tree fits in main memory (including the tree overhead) Counter-example: transaction data of a bank > 1 GB per day –increase main memory size by 1GB per day (power failure?) –use secondary storage media (punch cards, hard disks, magnetic tapes, etc.) Consequence: make a search tree structure secondary-storage-enabled

Hard disk I In real systems, we need to cope with data that does not fit in main memory Reading a data element from the hard-disk: –Seek with the head –Wait while the necessary sector rotates under the head –Transfer the data

Hard Disks II Large amounts of storage, but slow access! Identifying a page takes a long time (seek time plus rotational delay – 5-10ms), reading it is fast –It pays off to read or write data in pages (or blocks) of 2-16 Kb in size.

Algorithm analysis The running time of disk-based algorithms is measured in terms of –computing time (CPU) –number of disk accesses sequential reads random reads Regular main-memory algorithms that work one data element at a time can not be “ported” to secondary storage in a straight-forward way

Principles Pointers in data structures are no longer addresses in main memory but locations in files If x is a pointer to an object –if x is in main memory key[x] refers to it –otherwise DiskRead(x) reads the object from disk into main memory (DiskWrite(x) – writes it back to disk)

Principles (2) A typical working pattern 01 … 02 x  a pointer to some object 03 DiskRead(x) 04 operations that access and/or modify x 05 DiskWrite(x) //omitted if nothing changed 06 other operations, only access no modify 07 …

B-tree Definitions Node x has fields –n[x]: the number of keys of that the node –key 1 [x] <=  … <= key n[x] [x]: the keys in ascending order –leaf[x]: true if leaf node, false if internal node –if internal node, then c 1 [x], …, c n[x]+1 [x]: pointers to children Keys separate the ranges of keys in the sub- trees. If k i is an arbitrary key in the subtree c i [x] then k i <=  key i [x] <=  k i+1

B-tree Definitions (2) Every leaf has the same depth All nodes except the root node have between t and 2t children (i.e. between t–1 and 2t–1 keys). A B-tree of a degree t. The root node has between 0 and 2t children (i.e. between 0 and 2t–1 keys)

B-tree: Definition (3) We are concerned only with keys B-tree is a balanced tree The nodes have high fan-out (many children) C G M A BJ K LQ R SN OY ZU V T X P D E F

B-tree T of height h, containing n ≥  1 keys and minimum degree t ≥  2, the following restriction on the height holds: Height of a B-tree 1 t - 1 … tt … t depth #of nodes

Binary-trees vs. B-trees 1000 … … node 1000 keys 1001 nodes, 1,001,000 keys 1,002,001 nodes, 1,002,001,000 keys Size of B-tree nodes is determined by the page size. One page – one node. A B-tree of height 2 containing > 1 Billion keys! Heights of Binary-tree and B-tree are logarithmic –B-tree: logarithm of base, e.g., 1000 –Binary-tree: logarithm of base 2

Red-Black-trees and B-trees Comparing RB-trees and B-trees –both have a height of O(log n) –for RB-tree the log is of base 2 –for B-trees the base is ~1000 The difference with respect to the height of the tree is lg t When t=2, B-trees are trees (which are representations of red-black trees)!

B-tree Operations An implementation needs to support the following B-tree operations –Searching (simple) –Creating an empty tree (trivial) –Insertion (complex) –Deletion (complex)

Searching n():int – n(x) the number of keys in node x key i () – key i (X) the i-th key in x c i () – c i (x) the i-th pointer in x leaf():bool - leaf(x) is true if x is a leaf BTreeSearch(x,k) 01 i  1 02 while i  n[x] and k > key i [x] 03 i  i+1 04 if i  n[x] and k = key i [x] then 05 return(x,i) 06 if leaf[x] then 08 return NIL 09 else DiskRead(c i [x]) 10 return BTtreeSearch(c i [x],k)

Creating an Empty Tree Empty B-tree = create a root & write it to disk! BTreeCreate(T) 01 x  AllocateNode(); 02 leaf[x]  TRUE; 03 n[x]  0; 04 DiskWrite(x); 05 root[T]  x

Splitting Nodes Nodes fill up and reach their maximum capacity 2t – 1 Before we can insert a new key, we have to “make room,” i.e., split nodes

Splitting Nodes (2) P Q R S T V W T1T1 T8T N W... y = c i [x] key i-1 [x] key i [x] x... N S W... key i-1 [x] key i [x] x key i+1 [x] P Q RT V W y = c i [x]z = c i+1 [x] Result: one key of x moves up to parent + 2 nodes with t-1 keys

Splitting Nodes (2) BTreeSplitChild(x,i,y) 01 z  AllocateNode() 02 leaf[z]  leaf[y] 03 n[z]  t-1 04 for j  1 to t-1 05 key j [z]  key j+t [y] 06 if not leaf[y] then 07 for j  1 to t 08 c j [z]  c j+t [y] 09 n[y]  t-1 10 for j  n[x]+1 downto i+1 11 c j+1 [x]  c j [x] 12 c i+1 [x]  z 13 for j  n[x] downto i 14 key j+1 [x]  key j [x] 15 key i [x]  key t [y] 16 n[x]  n[x]+1 17 DiskWrite(y) 18 DiskWrite(z) 19 DiskWrite(x) x: parent node y: node to be split and child of x i: index in x z: new node P Q R S T V W T1T1 T8T N W... y = c i [x] key i-1 [x] key i [x] x

Split: Running Time A local operation that does not traverse the tree  (t) CPU-time, since two loops run t times 3 I/Os

Inserting Keys Done recursively, by starting from the root and recursively traversing down the tree to the leaf level Before descending to a lower level in the tree, make sure that the node contains < 2t – 1 keys

Inserting Keys (2) Special case: root is full (BtreeInsert) BTreeInsert(T) 01 r  root[T] 02 if n[r] = 2t – 1 then 03 s  AllocateNode() 05 root[T]  s 06 leaf[s]  FALSE 07 n[s]  0 08 c 1 [s]  r 09 BTreeSplitChild(s,1,r) 10 BTreeInsertNonFull(s,k) 11 else BTreeInsertNonFull(r,k)

Splitting the root requires the creation of new nodes The tree grows at the top instead of the bottom Splitting the Root A D F H L N P T1T1 T8T8... root[T] r A D FL N P H root[T] s r

Inserting Keys BtreeNonFull tries to insert a key k into a node x, whch is assumed to be nonfull when the procedure is called BTreeInsert and the recursion in BTreeInsertNonFull guarantee that this assumption is true!

Inserting Keys: Pseudo Code BTreeInsertNonFull(x,k) 01 i  n[x] 02 if leaf[x] then 03 while i  1 and k < key i [x] 04 key i+1 [x]  key i [x] 05 i  i key i+1 [x]= k 07 n[x]  n[x] DiskWrite(x) 09 else while i  1 and k < key i [x] 10 i  i i  i DiskRead c i [x] 13 if n[c i [x]] = 2t – 1 then 14 BTreeSplitChild(x,i,c i [x]) 15 if k > key i [x] then 16 i  i BTreeInsertNonFull(c i [x],k) leaf insertion internal node: traversing tree

Insertion: Example G M P X A C D EJ KR S T U VN OY Z G M P X A B C D EJ KR S T U VN OY Z G M P T X A B C D EJ KQ R SN OY ZU V initial tree (t = 3) B inserted Q inserted

Insertion: Example (2) G M A B C D EJ K LQ R SN OY ZU V T X P C G M A BJ K LQ R SN OY ZU V T X P D E F L inserted F inserted

Insertion: Running Time Disk I/O: O(h), since only O(1) disk accesses are performed during recursive calls of BTreeInsertNonFull CPU: O(th) = O(t log t n) At any given time there are O(1) number of disk pages in main memory

Deleting Keys Done recursively, by starting from the root and recursively traversing down the tree to the leaf level Before descending to a lower level in the tree, make sure that the node contains  t keys (cf. insertion < 2t – 1 keys) BtreeDelete distinguishes three different stages/scenarios for deletion –Case 1: key k found in leaf node –Case 2: key k found in internal node –Case 3: key k suspected in lower level node

Case 1: If the key k is in node x, and x is a leaf, delete k from x Deleting Keys (2) C G M A BJ K LQ R SN OY ZU V T X P D E F initial tree C G M A BJ K LQ R SN OY ZU V T X P D E F deleted: case 1 x

Deleting Keys (3) C G L A BJ KQ R SN OY ZU V T X P D E M deleted: case 2a Case 2: If the key k is in node x, and x is not a leaf, delete k from x –a) If the child y that precedes k in node x has at least t keys, then find the predecessor k’ of k in the sub-tree rooted at y. Recursively delete k’, and replace k with k’ in x. –b) Symmetrically for successor node z x y

Deleting Keys (4) If both y and z have only t –1 keys, merge k with the contents of z into y, so that x loses both k and the pointers to z, and y now contains 2t – 1 keys. Free z and recursively delete k from y. C L A BD E J KQ R SN OY ZU V T X P G deleted: case 2c y = k + z - k x - k

Deleting Keys - Distribution Descending down the tree: if k not found in current node x, find the sub-tree c i [x] that has to contain k. If c i [x] has only t – 1 keys take action to ensure that we descent to a node of size at least t. We can encounter two cases. –If c i [x] has only t-1 keys, but a sibling with at least t keys, give c i [x] an extra key by moving a key from x to c i [x], moving a key from c i [x]’s immediate left and right sibling up into x, and moving the appropriate child from the sibling into c i [x] - distribution

Deleting Keys – Distribution(2) C L P T X A BE J KQ R SN OY ZU V ci[x]ci[x] x sibling delete B B deleted: E L P T X A CJ KQ R SN OY ZU V... k’ k A B ci[x]ci[x] x... k k’ A ci[x]ci[x] B

Deleting Keys - Merging If c i [x] and both of c i [x]’s siblings have t – 1 keys, merge c i with one sibling, which involves moving a key from x down into the new merged node to become the median key for that node x... l’ m’......l k m... A B x... l’ k m’ l m … A B ci[x]ci[x]

Deleting Keys – Merging (2) tree shrinks in height D deleted: C L P T X A BE J KQ R SN OY ZU V C L A BD E J KQ R SN OY ZU V T X P delete D ci[x]ci[x] sibling

Deletion: Running Time Most of the keys are in the leaf, thus deletion most often occurs there! In this case deletion happens in one downward pass to the leaf level of the tree Deletion from an internal node might require “backing up” (case 2) Disk I/O: O(h), since only O(1) disk operations are produced during recursive calls CPU: O(th) = O(t log t n)

Two-pass Operations Simpler, practical versions of algorithms use two passes (down and up the tree): –Down – Find the node where deletion or insertion should occur –Up – If needed, split, merge, or distribute; propagate splits or merges up the tree To avoid reading the same nodes twice, use a buffer of nodes

Other Access Methods B-tree variants: B + -trees, B * -trees B + -trees used in data base management systems General Scheme for access methods (used in B + - trees, too): –Data keys stored only in leaves –Keys are grouped into leaf nodes –Each entry in a non-leaf node stores a pointer to a sub-tree a compact description of the set of keys stored in this sub-tree