CSC212 Data Structure - Section AB

Slides:



Advertisements
Similar presentations
Exam Review 3 Chapters 10 – 13, 15 CSC212 Section FG CS Dept, CCNY.
Advertisements

COL 106 Shweta Agrawal and Amit Kumar
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
Binary Trees, Binary Search Trees CMPS 2133 Spring 2008.
They’re not just binary anymore!
Exam Review 3 Chapters 10 – 13, 15 CSC212 Section FG CS Dept, CCNY.
Trees II Kruse and Ryba Ch 10.1,10.2,10.4 and 11.3.
Binary Heaps CSE 373 Data Structures Lecture 11. 2/5/03Binary Heaps - Lecture 112 Readings Reading ›Sections
1 Foundations of Software Design Fall 2002 Marti Hearst Lecture 19: B-Trees: Data Structures for Disk.
@ Zhigang Zhu, CSC212 Data Structure - Section RS Lecture 18a Trees, Logs and Time Analysis Instructor: Zhigang Zhu Department of Computer.
CS 206 Introduction to Computer Science II 12 / 03 / 2008 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 12 / 01 / 2008 Instructor: Michael Eckmann.
1 BST Trees A binary search tree is a binary tree in which every node satisfies the following: the key of every node in the left subtree is.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
Spring 2006 Copyright (c) All rights reserved Leonard Wesley0 B-Trees CMPE126 Data Structures.
B-trees (Balanced Trees) A B-tree is a special kind of tree, similar to a binary tree. However, It is not a binary search tree. It is not a binary tree.
Data Structures - CSCI 102 Binary Tree In binary trees, each Node can point to two other Nodes and looks something like this: template class BTNode { public:
INTRODUCTION TO BINARY TREES P SORTING  Review of Linear Search: –again, begin with first element and search through list until finding element,
A Binary Search Tree Binary Search Trees.
BINARY SEARCH TREE. Binary Trees A binary tree is a tree in which no node can have more than two children. In this case we can keep direct links to the.
Binary Trees, Binary Search Trees RIZWAN REHMAN CENTRE FOR COMPUTER STUDIES DIBRUGARH UNIVERSITY.
COSC 2007 Data Structures II Chapter 15 External Methods.
P p Chapter 10 has several programming projects, including a project that uses heaps. p p This presentation shows you what a heap is, and demonstrates.
Binary Search Tree Traversal Methods. How are they different from Binary Trees?  In computer science, a binary tree is a tree data structure in which.
CS 206 Introduction to Computer Science II 04 / 22 / 2009 Instructor: Michael Eckmann.
CSC211 Data Structures Lecture 18 Heaps and Priority Queues Instructor: Prof. Xiaoyan Li Department of Computer Science Mount Holyoke College.
1 CSC212 Data Structure - Section AB Lecture 22 Recursive Sorting, Heapsort & STL Quicksort Instructor: Edgardo Molina Department of Computer Science City.
CSC212 Data Structure Lecture 16 Heaps and Priority Queues Instructor: George Wolberg Department of Computer Science City College of New York.
Tree Data Structures. Heaps for searching Search in a heap? Search in a heap? Would have to look at root Would have to look at root If search item smaller.
@ Zhigang Zhu, CSC212 Data Structure - Section FG Lecture 17 B-Trees and the Set Class Instructor: Zhigang Zhu Department of Computer Science.
(c) University of Washington20c-1 CSC 143 Binary Search Trees.
Partially Ordered Data ,Heap,Binary Heap
CSC212 Data Structure - Section AB
AA Trees.
Data Structures and Algorithms for Information Processing
Multiway Search Trees Data may not fit into main memory
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
BST Trees
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Binary Search Tree Chapter 10.
Lecture 22 Binary Search Trees Chapter 10 of textbook
B+ Tree.
Trees.
O(lg n) Search Tree Tree T is a search tree made up of n elements: x0 x1 x2 x3 … xn-1 No function (except transverse) takes more than O(lg n) in the.
Revised based on textbook author’s notes.
Binary Search Trees Why this is a useful data structure. Terminology
Binary Trees, Binary Search Trees
Chapter 22 : Binary Trees, AVL Trees, and Priority Queues
Binary Search Trees.
(edited by Nadia Al-Ghreimil)
Map interface Empty() - return true if the map is empty; else return false Size() - return the number of elements in the map Find(key) - if there is an.
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
original list {67, 33,49, 21, 25, 94} pass { } {67 94}
Lecture 26 Multiway Search Trees Chapter 11 of textbook
CSC212 Data Structure - Section RS
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Balanced-Trees This presentation shows you the potential problem of unbalanced tree and show two way to fix it This lecture introduces heaps, which are.
B-Tree Insertions, Intro to Heaps
CSE 214 – Computer Science II B-Trees
B-Trees This presentation shows you the potential problem of unbalanced tree and show one way to fix it This lecture introduces heaps, which are used.
Balanced-Trees This presentation shows you the potential problem of unbalanced tree and show two way to fix it This lecture introduces heaps, which are.
Lecture 12 CS203 1.
CSE 12 – Basic Data Structures
CSC 143 Binary Search Trees.
Mark Redekopp David Kempe
Heaps By JJ Shepherd.
B-Trees.
Trees.
B-Trees This presentation shows you the potential problem of unbalanced tree and show one way to fix it This lecture introduces heaps, which are used.
Presentation transcript:

CSC212 Data Structure - Section AB Lecture 17 B-Trees and the Set Class Instructor: Edgardo Molina Department of Computer Science City College of New York

Topics Why B-Tree The B-Tree Rules The Set Class ADT with B-Trees The problem of an unbalanced tree The B-Tree Rules The Set Class ADT with B-Trees Search for an Item in a B-Tree Insert an Item in a B-Tree (*) Remove a Item from a B-Tree (*)

The problem of an unbalanced BST Maximum depth of a BST with n entires: n-1 1 2 3 4 5 An Example: Insert 1, 2, 3,4,5 in that order into a bag using a BST Run BagTest!

Worst-Case Times for BSTs Adding, deleting or searching for an entry in a BST with n entries is O(d) in the worst case, where d is the depth of the BST Since d is no more than n-1, the operations in the worst case is (n-1). Conclusion: the worst case time for the add, delete or search operation of a BST is O(n)

Solutions to the problem Periodically balance the search tree Project 10.9, page 516 Solution 2 A particular kind of tree : B-Tree proposed by Bayer & McCreight in 1972 B-Tree - after Bayer?

The B-Tree Basics Similar to a binary search tree (BST) where the implementation requires the ability to compare two entries via a less-than operator (<) But a B-tree is NOT a BST – in fact it is not even a binary tree B-tree nodes have many (more than two) children Another important property each node contains more than just a single entry Advantages: Easy to search, and not too deep A good place to compare a BST, a heap, and a B-tree

Applications: bag and set The Difference two or more equal entries can occur many times in a bag, but not in a set C++ STL: set and multiset (= bag) The B-Tree Rules for a Set We will look at a “set formulation” of the B-Tree rules, but keep in mind that a “bag formulation” is also possible

The B-Tree Rules The entries in a B-tree node B-tree Rule 1: The root may have as few as one entry (or 0 entry if no children); every other node has at least MINIMUM entries B-tree Rule 2: The maximum number of entries in a node is 2* MINIMUM. B-tree Rule 3: The entries of each B-tree node are stored in a partially filled array, sorted from the smallest to the largest.

The B-Tree Rules (cont.) The subtrees below a B-tree node B-tree Rule 4: The number of the subtrees below a non-leaf node with n entries is always n+1 B-tree Rule 5: For any non-leaf node: (a). An entry at index i is greater than all the entries in subtree number i of the node (b) An entry at index i is less than all the entries in subtree number i+1 of the node

An Example of B-Tree 93 and 107 subtree number 0 subtree number 1 [0] [1] each entry < 93 each entry  (93,107) each entry > 107 What kind traversal can print a sorted list?

The B-Tree Rules (cont.) A B-tree is balanced B-tree Rule 6: Every leaf in a B-tree has the same depth This rule ensures that a B-tree is balanced

Another Example, MINIMUM = 1 2 and 4 6 7 and 8 9 10 5 3 1 Can you verify that all 6 rules are satisfied?

The set ADT with a B-Tree set.h (p 528-529) template <class Item> class set { public: ... ... bool insert(const Item& entry); std::size_t erase(const Item& target); std::size_t count(const Item& target) const; private: // MEMBER CONSTANTS static const std::size_t MINIMUM = 200; static const std::size_t MAXIMUM = 2 * MINIMUM; // MEMBER VARIABLES std::size_t data_count; Item data[MAXIMUM+1]; // why +1? -for insert/erase std::size_t child_count; set *subset[MAXIMUM+2]; // why +2? - one more }; Combine fixed size array with linked nodes data[] *subset[] number of entries vary data_count up to 200! number of children vary child_count = data_count+1?

Invariant for the set Class The entries of a set are stored in a B-tree, satisfying the six B-tree rules. The number of entries in a node are stored in data_count, and the entries are stored in data[0] through data[data_count-1] The number of subtrees of a node are stored in child_count, and the subtrees are pointed to by set pointers subset[0] through subset[child_count-1]

Search for a Item in a B-Tree Prototype: std::size_t count(const Item& target) const; Post-condition: Returns the number of items equal to the target (either 0 or 1 for a set).

Searching for an Item: count search for 10: cout << count (10); Start at the root. locate i so that !(data[i]<target) If (data[i] is target) return 1; else if (no children) return 0; else return subset[i]->count (target); 6 and 17 4 12 19 and 22 2 and 3 5 10 16 18 20 25

Searching for an Item: count search for 10: cout << count (10); i = 1 Start at the root. locate i so that !(data[i]<target) If (data[i] is target) return 1; else if (no children) return 0; else return subset[i]->count (target); 19 and 22 6 and 17 2 and 3 4 25 16 5 10 12 20 18

Searching for an Item: count search for 10: cout << count (10); i = 1 Start at the root. locate i so that !(data[i]<target) If (data[i] is target) return 1; else if (no children) return 0; else return subset[i]->count (target); 19 and 22 6 and 17 2 and 3 4 25 16 5 10 12 20 18 subset[1]

Searching for an Item: count search for 10: cout << count (10); Start at the root. locate i so that !(data[i]<target) If (data[i] is target) return 1; else if (no children) return 0; else return subset[i]->count (target); 19 and 22 6 and 17 2 and 3 4 25 16 5 10 12 20 18 i = 0

Searching for an Item: count search for 10: cout << count (10); Start at the root. locate i so that !(data[i]<target) If (data[i] is target) return 1; else if (no children) return 0; else return subset[i]->count (target); 19 and 22 6 and 17 2 and 3 4 25 16 5 10 12 20 18 i = 0 subset[0]

Searching for an Item: count search for 10: cout << count (10); Start at the root. locate i so that !(data[i]<target) If (data[i] is target) return 1; else if (no children) return 0; else return subset[i]->count (target); 19 and 22 6 and 17 2 and 3 4 25 16 5 10 12 20 18 i = 0 data[i] is target !

Insert an Item into a B-Tree Prototype: bool insert(const Item& entry); Post-condition: If an equal entry was already in the set, the set is unchanged and the return value is false. Otherwise, entry was added to the set and the return value is true.

Insert an Item in a B-Tree Start at the root. locate i so that !(data[i]<entry) If (data[i] is entry) return false; // no work! else if (no children) insert entry at i; return true; else return subset[i]->insert (entry); 19 and 22 6 and 17 2 and 3 4 25 16 5 10 12 20 18 i = 0 i = 0 data[i] is target !

Insert an Item in a B-Tree insert (11); // MIN = 1 -> MAX = 2 i = 1 Start at the root. locate i so that !(data[i]<entry) If (data[i] is entry) return false; // no work! else if (no children) insert entry at i; return true; else return subset[i]->insert (entry); 19 and 22 6 and 17 2 and 3 4 25 16 5 10 12 20 18 i = 0 i = 1 data[0] < entry !

Insert an Item in a B-Tree insert (11); // MIN = 1 -> MAX = 2 19 and 22 6 and 17 2 and 3 4 25 16 5 10 & 11 12 20 18 i = 1 put entry in data[1] i = 0 Start at the root. locate i so that !(data[i]<entry) If (data[i] is entry) return false; // no work! else if (no children) insert entry at i; return true; else return subset[i]->insert (entry);

Insert an Item in a B-Tree insert (1); // MIN = 1 -> MAX = 2 19 and 22 6 and 17 2 and 3 4 25 16 5 10 & 11 12 20 18 i = 0 => put entry in data[0] Start at the root. locate i so that !(data[i]<entry) If (data[i] is entry) return false; // no work! else if (no children) insert entry at i; return true; else return subset[i]->insert (entry);

Insert an Item in a B-Tree insert (1); // MIN = 1 -> MAX = 2 19 and 22 6 and 17 1, 2 and 3 4 25 16 5 10 & 11 12 20 18 i = 0 Start at the root. locate i so that !(data[i]<entry) If (data[i] is entry) return false; // no work! else if (no children) insert entry at i; return true; else return subset[i]->insert (entry); It seems to be true that the loose insertion always inserts a new entry in a leaf node. a node has MAX+1 = 3 entries!

Insert an Item in a B-Tree insert (1); // MIN = 1 -> MAX = 2 Fix the node with MAX+1 entries split the node into two from the middle move the middle entry up 6 and 17 4 12 19 and 22 Then the non-leaf node may change only when the leaf node needs to be split. 1, 2 and 3 5 10 & 11 16 18 20 25 a node has MAX+1 = 3 entries!

Insert an Item in a B-Tree insert (1); // MIN = 1 -> MAX = 2 Fix the node with MAX+1 entries split the node into two from the middle move the middle entry up 6 and 17 2 and 4 12 19 and 22 If the root node needs to be split, then a new level needs to be added for the new root – which does not happen in this example. 1 3 5 10 & 11 16 18 20 25 Note: This shall be done recursively... the recursive function returns the middle entry to the root of the subset.

Inserting an Item into a B-Tree What if the node already have MAXIMUM number of items? Solution – loose insertion (p 551 – 557) A loose insert may results in MAX +1 entries in the root of a subset Two steps to fix the problem: fix it – but the problem may move to the root of the set fix the root of the set Note: I am going to show how to fix the MAX+1 nodes!

Erasing an Item from a B-Tree Prototype: std::size_t erase(const Item& target); Post-Condition: If target was in the set, then it has been removed from the set and the return value is 1. Otherwise the set is unchanged and the return value is zero.

Erasing an Item from a B-Tree Similarly, after “loose erase”, the root of a subset may just have MINIMUM –1 entries Solution: (p557 – 562) Fix the shortage of the subset root – but this may move the problem to the root of the entire set Fix the root of the entire set (tree)

Summary A B-tree is a tree for sorting entries following the six rules B-Tree is balanced - every leaf in a B-tree has the same depth Adding, erasing and searching an item in a B-tree have worst-case time O(log n), where n is the number of entries However the implementation of adding and erasing an item in a B-tree is not a trivial task.