COMP261 Lecture 23 B Trees.

Slides:



Advertisements
Similar presentations
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Advertisements

AA Trees another alternative to AVL trees. Balanced Binary Search Trees A Binary Search Tree (BST) of N nodes is balanced if height is in O(log N) A balanced.
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
Preliminaries Multiway trees have nodes with greater than two children. Multiway trees of order k have nodes with most k children Trees –For all.
Splay Trees and B-Trees
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
B+ Trees COMP
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
B-Trees And B+-Trees Jay Yim CS 157B Dr. Lee.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
2-3 Tree. Slide 2 Outline  Balanced Search Trees 2-3 Trees Trees.
Starting at Binary Trees
School of Engineering and Computer Science Victoria University of Wellington Copyright: Xiaoying Gao, Peter Andreae, VUW B Trees and B+ Trees COMP 261.
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
Binary Search Trees (BSTs) 18 February Binary Search Tree (BST) An important special kind of binary tree is the BST Each node stores some information.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Week 15 – Wednesday.  What did we talk about last time?  Review up to Exam 1.
B+ Trees.
CS422 Principles of Database Systems Indexes
Lecture 23 Red Black Tree Chapter 10 of textbook
Unit 9 Multi-Way Trees King Fahd University of Petroleum & Minerals
TCSS 342, Winter 2006 Lecture Notes
AA Trees.
File Organization and Processing Week 3
B-Trees B-Trees.
B/B+ Trees 4.7.
Multiway Search Trees Data may not fit into main memory
B-Trees B-Trees.
COMP261 Lecture 24 B and B+ Trees
B+-Trees.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B+-Trees.
B+-Trees.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Lecture 25 Splay Tree Chapter 10 of textbook
(edited by Nadia Al-Ghreimil)
B Tree Adhiraj Goel 1RV07IS004.
Data Structures Balanced Trees CSCI
Wednesday, April 18, 2018 Announcements… For Today…
Lecture 26 Multiway Search Trees Chapter 11 of textbook
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Height Balanced Trees 2-3 Trees.
Balanced-Trees This presentation shows you the potential problem of unbalanced tree and show two way to fix it This lecture introduces heaps, which are.
B+-Trees and Static Hashing
B-Tree.
B+Trees The slides for this text are organized into chapters. This lecture covers Chapter 9. Chapter 1: Introduction to Database Systems Chapter 2: The.
Balanced-Trees This presentation shows you the potential problem of unbalanced tree and show two way to fix it This lecture introduces heaps, which are.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
CSIT 402 Data Structures II With thanks to TK Prasad
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
(edited by Nadia Al-Ghreimil)
CSE 373: Data Structures and Algorithms
2-3 Trees Extended tree. Tree in which all empty subtrees are replaced by new nodes that are called external nodes. Original nodes are called internal.
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
Mark Redekopp David Kempe
B-Trees.
Tree-Structured Indexes
1 Lecture 13 CS2013.
Splay Trees Binary search trees.
Presentation transcript:

COMP261 Lecture 23 B Trees

Storing large amounts of data File systems: Lots of files, each file stored as lots of blocks. Databases: large tables of data each indexed by a key (or perhaps multiple alternative keys) How do we access the data efficiently: individual items (given key) sequence of all items (perhaps in order) assume data is stored in files on hard drives (slow access time) Use some kind of index structure Usually also stored in a file B Trees, B+ Trees: data structures and algorithms for large data stored on disk (ie, slow access)

COMP103 approaches: Efficient Set and Map implementations: Hash Tables – fast, but hard to list in order Binary Search Trees Intended for in-core data structures http://www.cs.usfca.edu/~galles/visualization Binary search tree Add M H T S R Q B A F D Z Search: Log(n) What if we add in alphabetical order?

Problems with Binary Search Trees Trees can become unbalanced ⇒ lookup times can become linear How can we keep the tree balanced? AVL trees: self-balanced by tree rotations Red-Black trees: node with a colour bit Splay trees: move recently accessed node to the root B/B+ trees: always add levels at the top! B C F H K P N M A G

Problems with Binary Search Trees Lots of pointer following ⇒ slow if each node is stored in a file! In this case, loading a node to read its value can take far longer than the comparisons we usually count! How can we reduce pointer following? More data in each node More children of each node ⇒ "bushier" trees, fewer steps to leaves B F C A C F H G K P N M

B Trees Like binary search trees, but: Designed for external data structures, stored in files Non-leaf nodes have up to m children, and m-1 data values (for B tree of order m, or m-1:m tree) Non-leaf, non-root nodes always have at least ⌈m/2⌉ children and ⌈m/2⌉-1 data values (ie, at least half full) Leaf nodes have up to m-1 data values and no children Adding is done "at the top" rather than "at the bottom”, so leaves are all at same depth

B Trees Like binary search trees, but Non-leaf nodes have up to m children, and m-1 data values Non-leaf, non-root nodes always have at least ⌈m/2⌉ children and ⌈m/2⌉-1 data values (ie, at least half full) Leaf nodes contain up to m-1 data values and no children Adding is done "at the top" rather than "at the bottom” – leaves are all at the same depth B tree example: m=3, ("2-3 tree") 3 children, 2 values 2 children, 1 value N V E J A C G K M R X P Q S T W Y Z

2-3 trees Every internal node is a 3-node or a 2-node 3-node: 3 children, 2 values 2-node: 2 children, 1 value All leaves have 1 or 2 values Data is kept in sorted order

B Trees: Search Data values might be single items (set of values) key:value pairs (map) Search(key, node): Just like binary search, but more comparisons at each node: if node is null return fail for i = 0 to k-1 (k is number of keys in node) if key < keys[i] return search(key, children[i]) if key == keys[i] return value[i] return search(key, children[k]) E J A C G K M

B Trees: Insert To insert item (eg key:value pair): Search for leaf where the item should be. If leaf is not full, add item to the leaf. If leaf is full: Identify the middle item (existing item or the new item) Create a new leaf node: retain items before middle key in original node put items after middle key in new leaf node, push item up to parent, along with pointer to new node To add new item to parent: if parent is not full: add new item to parent, and add new child pointer just right of new item else: split parent node into two nodes (like leaf) push middle item up to grandparent. add pointer to new child just right of pushed up item

2-3 B Tree: Inserting values Add M H T S R Q B A F D Z http://www.cs.usfca.edu/~galles/visualization/BTree.html

2-3 B Tree: Inserting values Add 8, 5, 1, 7, 3,12, 9, 6 http://www.cs.usfca.edu/~galles/visualization/BTree.html

2-3 BTree: Deletion Opposite of inserting: if a node becomes empty, if possible, rotate a value from a sibling through the parent to ensure minimum number of values per node. if two siblings, require >= 5 keys in parent and siblings if one sibling, require >=3 keys in parent and siblings else merge nodes: if two siblings, merge Easier at a leaf. Harder if at an internal node.

B Tree: Deletion To delete a value, x: Search for the node, say n, containing x. If n is a leaf, just delete x from n. If n is an internal node, replace x by the next smallest (largest in right subtree) or next largest (smallest in left subtree). E.g. if children are leaves: […|x|…| ] […|w|…| ]  […|w| ] […| ] or: […|x|…| ] […|y|…| ]  [y|…| ] […| ] Added later and shown in lecture 25.

B Tree: Deletion After deleting x, one leaf node has one fewer values. If this node now has fewer than the required minimum, we must rebalance the tree. Either: Rotate: Move a value from a sibling node to the parent node, in place of the separator, and move the separator to the under-full leaf; Or: Merge: Combine the under-full node with a sibling and the separator, and delete the empty node and the separator from the parent. Added later and shown in lecture 25. Does this cover all cases?

B Tree: Deletion - Rotate If a sibling node has more than the minimum number of values, move its smallest/largest value to the parent node, in place of the separator, and move the separator to the under-full leaf. If left sibling has a “spare” value: […|x|…| ] […|w|…| ]  […|w| ] […| ] […| ] [x|…| ] Exercise! Added later and shown in lecture 25.

B Tree: Deletion - Merge If there is not a sibling with a “spare” value: merge the under-full leaf with a sibling (which has minimum number of values) Move the separator and all of the values from the sibling into the under-full node. Delete the sibling and delete the separator from the parent. Merge […|s|…| ] […|…| ]  […| ] […| ] […|s|…] If left sibling has a “spare” value: Exercise! Added later and shown in lecture 25.

Deleting from leaves E J A C G K M E J A C G M C J A E M C A J J A C M Only reduce levels at the root A C

More deletions: When internal node becomes too empty propagate the deletion up the tree N V E J R X A C G K P S W Y Z

More deletions: Deletion from internal node ⇒ rotate key and child from sibling J V E N X A C G K R S W Y Z

Analysis B trees are balanced: A new level is introduced only at the top A level is removed only from the top Therefore: all leaves are at the same level. Cost of search/add/delete: O(log⌈m/2⌉(n)) (at worst) = depth of tree with all half full nodes O(logm(n)) (at best) = depth of tree with full nodes if 100 million items in a B tree with m = 20, log10(100,000,000) = ? 8 log20(100,000,000) = ? 6.14 if billion items in a B tree with m = 100, log50(1,000,000,000) = ? 5.3 log100(1,000,000,000) = ? 4.5