CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.

Slides:



Advertisements
Similar presentations
Chapter 4: Trees Part II - AVL Tree
Advertisements

B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Indexes. Primary Indexes Dense Indexes Pointer to every record of a sequential file, (ordered by search key). Can make sense because records may be much.
CSE332: Data Abstractions Lecture 10: More B-Trees Tyler Robison Summer
AVL Deletion: Case #1: Left-left due to right deletion Spring 2010CSE332: Data Abstractions1 h a Z Y b X h+1 h h+2 h+3 b ZY a h+1 h h+2 X h h+1 Same single.
CSE332: Data Abstractions Lecture 7: AVL Trees Dan Grossman Spring 2010.
CSE332: Data Abstractions Lecture 10: More B Trees; Hashing Dan Grossman Spring 2010.
CSE332: Data Abstractions Lecture 9: BTrees Tyler Robison Summer
CSE332: Data Abstractions Lecture 8: AVL Delete; Memory Hierarchy Dan Grossman Spring 2010.
1 B trees Nodes have more than 2 children Each internal node has between k and 2k children and between k-1 and 2k-1 keys A leaf has between k-1 and 2k-1.
Tirgul 5 AVL trees.
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
CS 206 Introduction to Computer Science II 12 / 03 / 2008 Instructor: Michael Eckmann.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
CS 206 Introduction to Computer Science II 12 / 01 / 2008 Instructor: Michael Eckmann.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
CS 206 Introduction to Computer Science II 11 / 24 / 2008 Instructor: Michael Eckmann.
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
(B+-Trees, that is) Steve Wolfman 2014W1
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
CS4432: Database Systems II
CSE373: Data Structures & Algorithms Lecture 6: Binary Search Trees Nicki Dell Spring 2014 CSE373: Data Structures & Algorithms1.
Splay Trees and B-Trees
1 B-Trees Section AVL (Adelson-Velskii and Landis) Trees AVL tree is binary search tree with balance condition –To ensure depth of the tree is.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
B-trees (Balanced Trees) A B-tree is a special kind of tree, similar to a binary tree. However, It is not a binary search tree. It is not a binary tree.
More Trees Multiway Trees and 2-4 Trees. Motivation of Multi-way Trees Main memory vs. disk ◦ Assumptions so far: ◦ We have assumed that we can store.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Oct 29, 2001CSE 373, Autumn External Storage For large data sets, the computer will have to access the disk. Disk access can take 200,000 times longer.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Starting at Binary Trees
CPSC 221: Algorithms and Data Structures Lecture #7 Sweet, Sweet Tree Hives (B+-Trees, that is) Steve Wolfman 2010W2.
CSE 326 Killer Bee-Trees David Kaplan Dept of Computer Science & Engineering Autumn 2001 Where was that root?
CS 206 Introduction to Computer Science II 04 / 22 / 2009 Instructor: Michael Eckmann.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
B+ Trees  What if you have A LOT of data that needs to be stored and accessed quickly  Won’t all fit in memory.  Means we have to access your hard.
 B-tree is a specialized multiway tree designed especially for use on disk  B-Tree consists of a root node, branch nodes and leaf nodes containing the.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Internal and External Sorting External Searching
Lecture 10COMPSCI.220.FS.T Binary Search Tree BST converts a static binary search into a dynamic binary search allowing to efficiently insert and.
B-Trees Katherine Gurdziel 252a-ba. Outline What are b-trees? How does the algorithm work? –Insertion –Deletion Complexity What are b-trees used for?
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
CSE 326: Data Structures Lecture #9 Big, Bad B-Trees Steve Wolfman Winter Quarter 2000.
CSE332: Data Abstractions Lecture 7: AVL Trees
COMP261 Lecture 23 B Trees.
B/B+ Trees 4.7.
CSE 332 Data Abstractions B-Trees
Wednesday, April 18, 2018 Announcements… For Today…
BTrees.
Data Structures and Algorithms
B-Tree.
CSE373: Data Structures & Algorithms Lecture 5: AVL Trees
CSE 332: Data Abstractions AVL Trees
CSE 373: Data Structures and Algorithms
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
Richard Anderson Spring 2016
B-Trees.
CSE 326: Data Structures Lecture #10 B-Trees
Presentation transcript:

CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010

Our goal Problem: A dictionary with so much data most of it is on disk Desire: A balanced tree (logarithmic height) that is even shallower than AVL trees so that we can minimize disk accesses and exploit disk-block size A key idea: Increase the branching factor of our tree Spring 20102CSE332: Data Abstractions

M-ary Search Tree Perfect tree of height h has (M h+1 -1)/(M-1) nodes (textbook, page 4) # hops for find : If balanced, using log M n instead of log 2 n –If M=256, that’s an 8x improvement –Example: M = 256 and n = 2 40 that’s 5 instead of 40 Runtime of find if balanced: O( log 2 M log M n) (binary search children) Build some sort of search tree with branching factor M: –Have an array of sorted children ( Node[] ) –Choose M to fit snugly into a disk block (1 access for array) Spring 20103CSE332: Data Abstractions

Problems with M-ary search trees What should the order property be? How would you rebalance (ideally without more disk accesses)? Any “useful” data at the internal nodes takes up some disk-block space without being used by finds moving past it So let’s use the branching-factor idea, but for a different kind of balanced tree –Not a binary search tree –But still logarithmic height for any M > 2 Spring 20104CSE332: Data Abstractions

B+ Trees (we and the book say “B Trees”) Each internal node has room for up to M-1 keys and M children –No other data; all data at the leaves! Order property: Subtree between keys x and y contains only data that is  x and < y (notice the  ) Leaf nodes have up to L sorted data items As usual, we’ll ignore the “along for the ride” data in our examples –Remember no data at non-leaves Spring 20105CSE332: Data Abstractions  x12  x<217  x<123  x<7 x<3

Find This is a new kind of tree –We are used to data at internal nodes But find is still an easy root-to-leaf recursive algorithm –At each internal node do binary search on the  M-1 keys –At the leaf do binary search on the  L data items But to get logarithmic running time, we need a balance condition… Spring 20106CSE332: Data Abstractions  x12  x<217  x<123  x<7 x<3

Structure Properties Root (special case) –If tree has  L items, root is a leaf (very strange case) –Else has between 2 and M children Internal nodes –Have between  M/2  and M children, i.e., at least half full Leaf nodes –All leaves at the same depth –Have between  L/2  and L data items, i.e., at least half full (Any M > 2 and L will work; picked based on disk-block size) Spring 20107CSE332: Data Abstractions

Example Suppose M=4 (max children) and L=5 (max items at leaf) –All internal nodes have at least 2 children –All leaves have at least 3 data items (only showing keys) –All leaves at same depth Spring 20108CSE332: Data Abstractions

Balanced enough Not hard to show height h is logarithmic in number of data items n Let M > 2 (if M = 2, then a list tree is legal – no good!) Because all nodes are at least half full (except root may have only 2 children) and all leaves are at the same level, the minimum number of data items n for a height h>0 tree is… n  2  M/2  h-1  L/2  Spring 20109CSE332: Data Abstractions minimum number of leaves minimum data per leaf Exponential in height because  M/2  > 1

B-Tree vs. AVL Tree Suppose we have 100,000,000 items Maximum height of AVL tree? Maximum height of B tree with M=128 and L=64? Spring CSE332: Data Abstractions

B-Tree vs. AVL Tree Suppose we have 100,000,000 items Maximum height of AVL tree? –Recall S(h) = 1 + S(h-1) + S(h-2) –lecture7.xlsx reports: 47 Maximum height of B tree with M=128 and L=64? –Recall (2 +  M/2  h-1 )  L/2  –lecture9.xlsx reports: 5 (and 4 is more likely) –Also not difficult to compute via algebra Spring CSE332: Data Abstractions

Disk Friendliness What makes B trees so disk friendly? Many keys stored in one node –All brought into memory in one disk access –Pick M wisely. Example: block=1KB, then M=128 –Makes the binary search over M-1 keys totally worth it Internal nodes contain only keys –Any find wants only one data item –So only bring one leaf of data items into memory –Data-item size doesn’t affect what M is Spring CSE332: Data Abstractions

Maintaining balance So this seems like a great data structure (and it is) But we haven’t implemented the other dictionary operations yet –insert –delete As with AVL trees, the hard part is maintaining structure properties –Example: for insert, there might not be room at the correct leaf Spring CSE332: Data Abstractions

Building a B-Tree (insertions) Spring CSE332: Data Abstractions The empty B-Tree M = 3 L = 3 Insert(3)Insert(18) Insert(14)

Insert(30) M = 3 L = Spring CSE332: Data Abstractions 18

Insert(32) Insert(36) Insert(15) M = 3 L = Spring CSE332: Data Abstractions

Insert(16) M = 3 L = Spring CSE332: Data Abstractions

Insert(12,40,45,38) M = 3 L = 3 Spring CSE332: Data Abstractions

Insertion Algorithm 1.Insert the data in its leaf in sorted order 2.If the leaf now has L+1 items, overflow! –Split the leaf into two nodes: Original leaf with  (L+1)/2  smaller items New leaf with  (L+1)/2  =  L/2  larger items –Attach the new child to the parent Adding new key to parent in sorted order 3.If step (2) caused the parent to have M+1 children, overflow! –… Spring CSE332: Data Abstractions

Insertion algorithm continued 3.If an internal node has M+1 children –Split the node into two nodes Original node with  (M+1)/2  smaller items New node with  (M+1)/2  =  M/2  larger items –Attach the new child to the parent Adding new key to parent in sorted order Splitting at a node (step 3) could make the parent overflow too –So repeat step 3 up the tree until a node doesn’t overflow –If the root overflows, make a new root with two children This is the only case that increases the tree height Spring CSE332: Data Abstractions

Efficiency of insert Find correct leaf: O( log 2 M log M n) Insert in leaf: O(L) Split leaf: O(L) Split parents all the way up to root: O(M log M n) Total: O(L + M log M n) But it’s not that bad: –Splits are not that common (have to fill up nodes) –Remember disk accesses were the name of the game: O( log M n) Spring CSE332: Data Abstractions