CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.

Slides:



Advertisements
Similar presentations
Advanced Database Discussion B Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if.
Advertisements

CSE332: Data Abstractions Lecture 10: More B-Trees Tyler Robison Summer
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
CSE332: Data Abstractions Lecture 9: BTrees Tyler Robison Summer
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
Other time considerations Source: Simon Garrett Modifications by Evan Korth.
CSE 326: Data Structures Lecture #11 B-Trees Alon Halevy Spring Quarter 2001.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
CS 206 Introduction to Computer Science II 12 / 01 / 2008 Instructor: Michael Eckmann.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CS 206 Introduction to Computer Science II 11 / 24 / 2008 Instructor: Michael Eckmann.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
(B+-Trees, that is) Steve Wolfman 2014W1
Homework #3 Due Thursday, April 17 Problems: –Chapter 11: 11.6, –Chapter 12: 12.1, 12.2, 12.3, 12.4, 12.5, 12.7.
Tirgul 6 B-Trees – Another kind of balanced trees.
Splay Trees and B-Trees
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
More Trees Multiway Trees and 2-4 Trees. Motivation of Multi-way Trees Main memory vs. disk ◦ Assumptions so far: ◦ We have assumed that we can store.
B+ Trees COMP
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 6.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
CSE373: Data Structures & Algorithms Lecture 15: B-Trees Linda Shapiro Winter 2015.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
COSC 2007 Data Structures II Chapter 15 External Methods.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Starting at Binary Trees
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
CPSC 221: Algorithms and Data Structures Lecture #7 Sweet, Sweet Tree Hives (B+-Trees, that is) Steve Wolfman 2010W2.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
CSE 326 Killer Bee-Trees David Kaplan Dept of Computer Science & Engineering Autumn 2001 Where was that root?
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
B+ Trees  What if you have A LOT of data that needs to be stored and accessed quickly  Won’t all fit in memory.  Means we have to access your hard.
© 2004 Goodrich, Tamassia Trees
 B-tree is a specialized multiway tree designed especially for use on disk  B-Tree consists of a root node, branch nodes and leaf nodes containing the.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
CSE 326: Data Structures Lecture #9 Big, Bad B-Trees Steve Wolfman Winter Quarter 2000.
COMP261 Lecture 23 B Trees.
Multiway Search Trees Data may not fit into main memory
Btrees Deletion.
CSE 332 Data Abstractions B-Trees
B+-Trees.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B+-Trees.
B+-Trees.
CSE373: Data Structures & Algorithms Lecture 15: B-Trees
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B-Tree.
Solution for Section Worksheet 4, #7b & #7c
CSE 373: Data Structures and Algorithms
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
B-Trees.
CSE 326: Data Structures Lecture #10 B-Trees
Presentation transcript:

CSE 326: Data Structures B-Trees Ben Lerner Summer 2007

2 Splay Tree Summary All operations are in amortized O(log n) time Splaying can be done top-down; this may be better because: –only one pass –no recursion or parent pointers necessary –we didn’t cover top-down in class Splay trees are very effective search trees –Relatively simple –No extra fields required –Excellent locality properties: frequently accessed keys are cheap to find

3 Trees so far BST AVL Splay

4 Something We Forgot: Disk Acesses Real world: constants matter! Distance

5 Trees on Disk Each pointer lookup means seeking the disk Want as shallow a tree as possible Balanced binary tree with N nodes has height ________? Balanced M-ary tree with N nodes has height ________?

6 M-ary Search Tree Maximum branching factor of M Complete tree has height = # disk accesses for find: Runtime of find:

7 Problems with M-ary trees? Problems come from each of our design goals: Structure? Order? Balance?

8 Solution: B-Trees specialized M-ary search trees Each node has (up to) M-1 keys: –subtree between two keys x and y contains leaves with values v such that x  v < y Pick branching factor M such that each node takes one full {page, block} of memory x<3 3  x<77  x<1212  x<2121  x

9 B-Trees What makes them disk-friendly? 1.Many keys stored in a node All brought to memory/cache in one access! 2.Internal nodes contain only keys; Only leaf nodes contain keys and actual data The tree structure can be loaded into memory irrespective of data object size Data actually resides in disk

10 Example 1KB pages Key is 8bytes, pointer is 4bytes 8*(M-1) + 4*(M) = M = 1032 M = floor(1032/12) = 86

11 B-Trees are multi-way search trees commonly used in database systems or other applications where data is stored externally on disks and keeping the tree shallow is important. A B-Tree of order M has the following properties: 1. The root is either a leaf or has between 2 and M children. 2. All nonleaf nodes (except the root) have between  M/2  and M children. 3. All leaves are at the same depth. All data records are stored at the leaves. Leaves store between  M/2  and M data records. Internal nodes only used for searching. B-Trees

12 B-Tree: Example B-Tree with M = 4 (# pointers in internal node) and L = 4 (# data items in leaf) 1 AB 2 xG Note: All leaves at the same depth! Data objects, that I’ll ignore in slides

13 B-Tree Details Each (non-leaf) internal node of a B-tree has: –Between  M/2  and M children. –up to M-1 keys k 1  k 2  k M-1 Keys are ordered so that: k 1 < k 2 <... < k M-1 k M-1... k i-1 kiki k1k1

14 B-Tree Details Each leaf node of a B-tree has: –Between  M/2  and M keys and pointers. Keys are ordered so that: k 1 < k 2 <... < k M-1 kMkM... k i-1 kiki k1k1 Keys point to data on other pages.

15 Properties of B-Trees Children of each internal node are "between" the items in that node. Suppose subtree T i is the i-th child of the node: all keys in T i must be between keys k i-1 and k i i.e. k i-1  T i  k i k i-1 is the smallest key in T i All keys in first subtree T 1  k 1 All keys in last subtree T M  k M-1 k 1 TT ii... kk i-1 kk ii TT M TT 11 kk M-1...

16 Inserting into B-Trees Insert X: Do a Find on X and find appropriate leaf node –If leaf node is not full, fill in empty slot with X E.g. Insert 5 –If leaf node is full, split leaf node and adjust parents up to root node E.g. Insert 9 13:- 6: :-

17 Insert Example 13:- 6: :- 8 9

18 Insert Example 13:- 6: : :-

19 Insert Example 8:13 6: : :-

20 Deleting From B-Trees Delete X : Do a find and remove from leaf –Leaf underflows – borrow from a neighbor E.g. 11 –Leaf underflows and can’t borrow – merge nodes, delete parent E.g :- 6: :-

21 Delete Example 13:- 6: :-

22 Delete Example 13:- 6: :-

23 Delete Example 11:- 6: :-

24 Run Time Analysis of B-Tree Operations For a B-Tree of order M –Each internal node has up to M-1 keys to search –Each internal node has between  M/2  and M children –Depth of B-Tree storing N items is O(log  M/2  N) Example: M = 86 –log 43 N = log 2 N / log 2 43 =.184 log 2 N –log 43 1,000,000,000 = 5.51

25 Summary of Search Trees Problem with Search Trees: Must keep tree balanced to allow –fast access to stored items AVL trees: Insert/Delete operations keep tree balanced Splay trees: Repeated Find operations produce balanced trees on average Multi-way search trees (e.g. B-Trees): More than two children –per node allows shallow trees; all leaves are at the same depth –keeping tree balanced at all times

26 B-Tree Properties ‡ –Data is only stored at the leaves –All leaves are at the same depth and contains between  L/2  and L data items –Internal nodes store up to M-1 keys –Internal nodes have between  M/2  and M children –Root (special case) has between 2 and M children (or root could be a leaf) ‡ These are technically B + -Trees

27 Example, Again B-Tree with M = 4 and L = (Only showing keys, but leaves also have data!)

28 B-trees vs. AVL trees Suppose we have 100 million items (100,000,000): Depth of AVL Tree Depth of B+ Tree with M = 128, L = 64

29 Building a B-Tree The empty B-Tree M = 3 L = 2 3 Insert(3) 314 Insert(14) Now, Insert(1)?

30 Splitting the Root And create a new root Insert(1) Too many keys in a leaf! So, split the leaf. M = 3 L = 2

31 Overflowing leaves Insert(59) Insert(26) And add a new child Too many keys in a leaf! So, split the leaf. M = 3 L = 2

32 Propagating Splits Insert(5) Add new child Create a new root Split the leaf, but no space in parent! So, split the node. M = 3 L = 2

33 Insertion Algorithm 1.Insert the key in its leaf 2.If the leaf ends up with L+1 items, overflow! –Split the leaf into two nodes: original with  (L+1)/2  items new one with  (L+1)/2  items –Add the new child to the parent –If the parent ends up with M+1 items, overflow! 3.If an internal node ends up with M+1 items, overflow! –Split the node into two nodes: original with  (M+1)/2  items new one with  (M+1)/2  items –Add the new child to the parent –If the parent ends up with M+1 items, overflow! 4.Split an overflowed root in two and hang the new nodes under a new root This makes the tree deeper!

34 After More Routine Inserts Insert(89) Insert(79) M = 3 L = 2

35 Deletion Delete(59) What could go wrong? 1.Delete item from leaf 2.Update keys of ancestors if necessary M = 3 L = 2

36 Deletion and Adoption Delete(5) ? A leaf has too few keys! So, borrow from a sibling M = 3 L = 2

37 Does Adoption Always Work? What if the sibling doesn’t have enough for you to borrow from? e.g. you have  L/2  -1 and sibling has  L/2  ?

38 Deletion and Merging Delete(3) ? A leaf has too few keys! And no sibling with surplus! So, delete the leaf But now an internal node has too few subtrees! M = 3 L = 2

39 Adopt a neighbor Deletion with Propagation (More Adoption) M = 3 L = 2

40 Delete(1) (adopt a sibling) A Bit More Adoption M = 3 L = 2

41 Delete(26) Pulling out the Root A leaf has too few keys! And no sibling with surplus! So, delete the leaf; merge A node has too few subtrees and no neighbor with surplus! Delete the node But now the root has just one subtree! M = 3 L = 2

42 Pulling out the Root (continued) The root has just one subtree! Simply make the one child the new root! M = 3 L = 2

43 Deletion Algorithm 1.Remove the key from its leaf 2.If the leaf ends up with fewer than  L/2  items, underflow! –Adopt data from a sibling; update the parent –If adopting won’t work, delete node and merge with neighbor –If the parent ends up with fewer than  M/2  items, underflow!

44 Deletion Slide Two 3.If an internal node ends up with fewer than  M/2  items, underflow! –Adopt from a neighbor; update the parent –If adoption won’t work, merge with neighbor –If the parent ends up with fewer than  M/2  items, underflow! 4.If the root ends up with only one child, make the child the new root of the tree This reduces the height of the tree!

45 Thinking about B-Trees B-Tree insertion can cause (expensive) splitting and propagation B-Tree deletion can cause (cheap) adoption or (expensive) deletion, merging and propagation Propagation is rare if M and L are large (Why?) If M = L = 128, then a B-Tree of height 4 will store at least 30,000,000 items

46 Tree Names You Might Encounter FYI: –B-Trees with M = 3, L = x are called 2-3 trees Nodes can have 2 or 3 keys –B-Trees with M = 4, L = x are called trees Nodes can have 2, 3, or 4 keys