Multiway Search Tree (MST)

Slides:



Advertisements
Similar presentations
CS4432: Database Systems II
Advertisements

©Silberschatz, Korth and Sudarshan12.1Database System Concepts Chapter 12: Part B Part A:  Index Definition in SQL  Ordered Indices  Index Sequential.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
E.G.M. Petrakissearching1 Searching  Find an element in a collection in the main memory or on the disk  collection: (K 1,I 1 ),(K 2,I 2 )…(K N,I N )
File Organizations March 2007R McFadyen ACS In SQL Server 2000 Tree terms root, internal, leaf, subtree parent, child, sibling balanced, unbalanced.
E.G.M. PetrakisHashing1 Hashing on the Disk  Keys are stored in “disk pages” (“buckets”)  several records fit within one page  Retrieval:  find address.
E.G.M. PetrakisB-trees1 Multiway Search Tree (MST)  Generalization of BSTs  Suitable for disk  MST of order n:  Each node has n or fewer sub-trees.
CS4432: Database Systems II
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
 B+ Tree Definition  B+ Tree Properties  B+ Tree Searching  B+ Tree Insertion  B+ Tree Deletion.
Multi-way Trees. M-way trees So far we have discussed binary trees only. In this lecture, we go over another type of tree called m- way trees or trees.
12.1 Chapter 12: Indexing and Hashing Spring 2009 Sections , , Problems , 12.7, 12.8, 12.13, 12.15,
DBMS 2001Notes 4.1: B-Trees1 Principles of Database Management Systems 4.1: B-Trees Pekka Kilpeläinen (after Stanford CS245 slide originals by Hector Garcia-Molina,
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Starting at Binary Trees
1 Tree Indexing (1) Linear index is poor for insertion/deletion. Tree index can efficiently support all desired operations: –Insert/delete –Multiple search.
Arboles B External Search The algorithms we have seen so far are good when all data are stored in primary storage device (RAM). Its access is fast(er)
Marwan Al-Namari Hassan Al-Mathami. Indexing What is Indexing? Indexing is a mechanisms. Why we need to use Indexing? We used indexing to speed up access.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 B+-Tree Index Chapter 10 Modified by Donghui Zhang Nov 9, 2005.
Indexing Database Management Systems. Chapter 12: Indexing and Hashing Basic Concepts Ordered Indices B + -Tree Index Files File Organization 2.
1 Chapter 12: Indexing and Hashing Indexing Indexing Basic Concepts Basic Concepts Ordered Indices Ordered Indices B+-Tree Index Files B+-Tree Index Files.
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
Tree-Structured Indexes. Introduction As for any index, 3 alternatives for data entries k*: – Data record with key value k –  Choice is orthogonal to.
COMP261 Lecture 23 B Trees.
Data Indexing Herbert A. Evans.
Unit 9 Multi-Way Trees King Fahd University of Petroleum & Minerals
TCSS 342, Winter 2006 Lecture Notes
B-Trees B-Trees.
Indexing Structures for Files and Physical Database Design
CS522 Advanced database Systems
Multiway Search Trees Data may not fit into main memory
Tree-Structured Indexes: Introduction
Azita Keshmiri CS 157B Ch 12 indexing and hashing
B-Trees B-Trees.
CS522 Advanced database Systems
Indexing ? Why ? Need to locate the actual records on disk without having to read the entire table into memory.
UNIT III TREES.
Database System Implementation CSE 507
Extra: B+ Trees CS1: Java Programming Colorado State University
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
Lecture 22 Binary Search Trees Chapter 10 of textbook
B+ Tree.
Hashing Exercises.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees © Dave Bockus Acknowledgements to:
C. Faloutsos Indexing and Hashing – part I
B Tree Adhiraj Goel 1RV07IS004.
CMSC 341 Lecture 10 B-Trees Based on slides from Dr. Katherine Gibson.
Database Applications (15-415) DBMS Internals- Part III Lecture 15, March 11, 2018 Mohammad Hammoud.
Topics covered (since exam 1):
Lecture 26 Multiway Search Trees Chapter 11 of textbook
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B-Trees.
B- Trees D. Frey with apologies to Tom Anastasio
Indexing and Hashing Basic Concepts Ordered Indices
B- Trees D. Frey with apologies to Tom Anastasio
B-Tree.
Multiway Trees Searching and B-Trees Advanced Tree Structures
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Topics covered (since exam 1, excluding PQ):
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B- Trees D. Frey with apologies to Tom Anastasio
Database Systems (資料庫系統)
Indexing 4/11/2019.
15-826: Multimedia Databases and Data Mining
CENG 351 Data Management and File Structures
ITCS6114 Algorithms and Data Structures
B-Trees.
Index Structures Chapter 13 of GUW September 16, 2019
Presentation transcript:

Multiway Search Tree (MST) Generalization of BSTs Suitable for disk MST of order n: Each node has n or fewer sub-trees S1 S2…. Sm , m <=n Each node has n-1 or fewer keys K1 Κ2 …Κm-1 : m-1 keys in ascending order K(Si) <= Κi <= K(Si+1) , K(Sm-1) < K(Sm) E.G.M. Petrakis B-trees

n=4 n=3 E.G.M. Petrakis B-trees

MSTs for Disk Nodes correspond to disk pages Pros: Cons: tree height is low for large n fewer disk accesses Cons: low space utilization if non-full MSTs are non-balanced in general! E.G.M. Petrakis B-trees

4000 keys, n=5 At least 4000/(5-1) nodes (pages) 1st level (root): 1 node, 4 keys, 5 sub-trees + 2nd level: 5 nodes, 20 keys, 25 sub-trees + 3rd level: 25 nodes, 100 keys, 125 sub-trees + 4th level: 125 nodes, 500 keys,525 sub-trees + 5th level: 525 nodes, 2100 keys, 2625 sub-tress + 6th level: 2625 nodes, 10500 keys, …. tree height = 6 (including root) If n = 11 at least 400 nodes and tree height = 3 E.G.M. Petrakis B-trees

Operations on MSTs Search: returns pointer to node containing the key and position of key in the node Insert new key if not already in the tree Delete existing key E.G.M. Petrakis B-trees

Important Issues Keep MST balanced after insertions or deletions Balanced MSTs: B-trees, B+-trees Reduce number of disk accesses Data storage: two alternatives inside nodes: less sub-trees, nodes pointers from the nodes to data pages E.G.M. Petrakis B-trees

Search Algorithm MST *search (MST *tree, int n, keytype key, int *position) { MST *p = tree; while (p != null) { // search a node i = nodesearch(p, key); if (i < numtrees(p) – 1 && key == key(p, i)) { *position = i; return p; } p = son(p, i); } position = -1; return(-1); E.G.M. Petrakis B-trees

E.G.M. Petrakis B-trees

Insertions Search & insert key it if not there the tree grows at the leafs => the tree becomes imbalanced => not equal number of disk accesses for every key the shape of the tree depends on the insertion sequence low storage utilization, many non-full nodes E.G.M. Petrakis B-trees

insert: 70 75 82 77 71 73 84 86 87 85 n = 4 max 3 keys/node 71 73 70 75 82 n = 4 max 3 keys/node 70 75 82 77 71 73 84 86 87 85 E.G.M. Petrakis B-trees

MST Deletions Find and delete a key from a node Delete key if it has empty left or right sub-tree If it is the only key in the node: free the node If the key has both left and right sub-trees => find its successor, put the successor in the position of the key and delete the successor The tree becomes imbalanced E.G.M. Petrakis B-trees

60 180 300 30 50 150 170 220 280 n=4 153 162 187 202 173 178 delete 150, 180 60 187 300 170 153 162 202 E.G.M. Petrakis B-trees

Definition of n-Order B-Tree Every path from the root to a leaf has the same length h >= 0 Each node except the root and the leafs has at least sub-trees and keys Each node has at most n sub-trees and n-1 keys The root has at least 2 sub-trees or it is a leaf E.G.M. Petrakis B-trees

22 9 17 30 40 4 8 12 13 18 12 13 n=3 13 25 70 4 8 9 12 17 18 22 27 30 32 n=5 4 8 9 12 13 17 22 25 27 30 32 18 50 n=7 E.G.M. Petrakis B-trees

B-Tree Insertions Create a new node and split contents of old node Find leaf to insert new node If not full, insert key in proper position else Create a new node and split contents of old node left and right node the n/2 lower keys go into the left node the n/2 larger keys go into the right node the separator key goes up to the father node If n is even: one more key in left (left bias) or right node (right bias) E.G.M. Petrakis B-trees

n=5 E.G.M. Petrakis B-trees

n=4 E.G.M. Petrakis B-trees

B-Tree Insertions (cont.) If the father node is full: split it and proceed the same way, the new father may split again if it is full … The splits may reach the root which may also split creating a new root The changes are propagated from the leafs to the root The tree remains balanced E.G.M. Petrakis B-trees

n=5 E.G.M. Petrakis B-trees

Advantages The tree remains balanced!! Good storage utilization: about 50% Low tree height, equal for all leafs => same number of disk accesses for all leafs B-trees: MSTs with additional properties special insertion & deletion algorithms E.G.M. Petrakis B-trees

B-Tree Deletions Delete key as usual, Two cases: Merging: If less than n/2 keys in left and right node then, sum of keys + separator key in father node < n the two nodes are merged, one node is freed Borrowing: if after deletion the node has less than n/2 keys and its left (or right) node has more than n/2 keys find successor of separator key put separator in the place of the deleted key put the successor as separator key E.G.M. Petrakis B-trees

n = 4 n = 5 E.G.M. Petrakis B-trees

n = 5 E.G.M. Petrakis B-trees

[logn (N+1) ]  h [log(n+1)/2((N+1)/2)] +1 Performance disc accesses n N 4 50 6 5 7 3 2 100 10 150 103 104 105 106 107 [logn (N+1) ]  h [log(n+1)/2((N+1)/2)] +1 Search: the cost increases with the lognN Tree height ~ lognN E.G.M. Petrakis B-trees

Performance (cont.) Insertions/Deletions: Cost proportional to 2h the changes propagate towards the root Observation: The cost drops with n larger size of track (page) but, in fact the cost increases with the amount of information which is transferred from the disk into the main memory E.G.M. Petrakis B-trees

Problem B-trees are good for random accesses E.g., search(30) But not for range queries: require multiple tree traversals search(20-30) E.G.M. Petrakis B-trees

B+-Trees At most n sub-trees and n-1 keys At least sub-trees and keys Root: at least 2 sub-trees and 1 key The keys can be repeated in non-leaf nodes Only the leafs point to data pages The leafs are linked together with pointers E.G.M. Petrakis B-trees

index data pages E.G.M. Petrakis B-trees

Performance Better than B-trees for range searching B-trees are better for random accesses The search must reach a leaf before it is confirmed internal keys may not correspond to actual record information (can be separator keys only) insertions: leave middle key in father node deletions: do not always delete key from internal node (if it is a separator key) E.G.M. Petrakis B-trees

Insertions Find appropriate leaf node to insert key If it is full: allocate new node split its contents and insert separator key in father node If the father is full allocate new node and split the same way continue upwards if necessary if the root is split create new root with two sub-trees E.G.M. Petrakis B-trees

Deletions Find and delete key from the leaf If the leaf has < n/2 keys borrowing if its neighbor leaf has more than n/2 keys update father node (the separator key may change) or merging with neighbor if both have < n keys causes deletion of separator in father node update father node Continue upwards if father node is not the root and has less than n/2 keys E.G.M. Petrakis B-trees

index 46 69 19 31 46 61 69 82 50 66 75 84 13 28 41 53 69 79 88 19 31 46 61 82 data pages data pages: min 2, max 3 records index pages: min 2, max 3 pointers B+ order: n = 4, n/2 =2 data pages don’t store pointers as non-leaf nodes do but they are kinked together E.G.M. Petrakis B-trees

insert 55 with page splitting 69 53 61 50 55 delete 84 with borrowing: 79 75 82 88 E.G.M. Petrakis B-trees

delete 31 with merging 19 46 28 41 free page E.G.M. Petrakis B-trees

B-trees and Variants Many variants: B-trees: Bayer and Mc Greight, 1971 B+-trees, B*-trees the most successful variants The most successful data organization for secondary storage (based on primary key) very good performance, comparable to hashing fully dynamic behavior good space utilization (69% on the average) E.G.M. Petrakis B-trees

B+/B-Trees Comparison B-trees: no key repetition, better for random accesses (do not always reach a leaf), data pages on any node B+-trees: key repetition, data page on leaf nodes only, better for range queries, easier implementation E.G.M. Petrakis B-trees