B-tree. Why B-Trees When the data is too big, we will have to use disk storage instead of putting all the data in main memory In such case, we have to.

Slides:



Advertisements
Similar presentations
Databasteknik Databaser och bioinformatik Data structures and Indexing (II) Fang Wei-Kleiner.
Advertisements

B-Trees. Motivation When data is too large to fit in the main memory, then the number of disk accesses becomes important. A disk access is unbelievably.
COMP 451/651 Indexes Chapter 1.
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter Trees and B-Trees.
Other time considerations Source: Simon Garrett Modifications by Evan Korth.
File Organizations and Indexes ISYS 464. Disk Devices Disk drive: Read/write head and access arm. Single-sided, double-sided, disk pack Track, sector,
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
CS 206 Introduction to Computer Science II 12 / 01 / 2008 Instructor: Michael Eckmann.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CS 206 Introduction to Computer Science II 11 / 24 / 2008 Instructor: Michael Eckmann.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
1 B-Trees Section AVL (Adelson-Velskii and Landis) Trees AVL tree is binary search tree with balance condition –To ensure depth of the tree is.
Indexing. Goals: Store large files Support multiple search keys Support efficient insert, delete, and range queries.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
©Silberschatz, Korth and Sudarshan12.1Database System Concepts B + -Tree Index Files Indexing mechanisms used to speed up access to desired data.  E.g.,
More Trees Multiway Trees and 2-4 Trees. Motivation of Multi-way Trees Main memory vs. disk ◦ Assumptions so far: ◦ We have assumed that we can store.
ALGORITHMS FOR ISNE DR. KENNETH COSH WEEK 6.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Oct 29, 2001CSE 373, Autumn External Storage For large data sets, the computer will have to access the disk. Disk access can take 200,000 times longer.
CSE AU B-Trees1 B-Trees CSE 373 Data Structures.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
1 Chapter 17 Disk Storage, Basic File Structures, and Hashing Chapter 18 Index Structures for Files.
Announcements Exam Friday. More Physical Storage Lecture 10.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
CPSC 252 External Searching Page 1 External Searching Motivation: To this point in the course we have assumed that any data that we are searching through.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Comp 335 File Structures B - Trees. Introduction Simple indexes provided a way to directly access a record in an entry sequenced file thereby decreasing.
Arboles B External Search The algorithms we have seen so far are good when all data are stored in primary storage device (RAM). Its access is fast(er)
IKI 10100: Data Structures & Algorithms Ruli Manurung (acknowledgments to Denny & Ade Azurat) 1 Fasilkom UI Ruli Manurung (Fasilkom UI)IKI10100: Lecture17.
CompSci 100E 39.1 Memory Model  For this course: Assume Uniform Access Time  All elements in an array accessible with same time cost  Reality is somewhat.
Indexing CS 400/600 – Data Structures. Indexing2 Memory and Disk  Typical memory access: 30 – 60 ns  Typical disk access: 3-9 ms  Difference: 100,000.
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
B-Trees ( Rizwan Rehman) Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Internal and External Sorting External Searching
B-Trees B-Trees.
B-Trees Text B-Tree Objects Building a B-Tree Read Weiss, §19.8
Multiway Search Trees Data may not fit into main memory
B-Trees B-Trees.
B+-Trees.
B+-Trees.
B+-Trees.
Trees 4 The B-Tree Section 4.7
Chapter Trees and B-Trees
Chapter Trees and B-Trees
BTrees.
B-Trees CSE 373 Data Structures CSE AU B-Trees.
B+-Trees (Part 1).
Other time considerations
B-Trees CSE 373 Data Structures CSE AU B-Trees.
CSIT 402 Data Structures II With thanks to TK Prasad
CSE 373, Copyright S. Tanimoto, 2002 B-Trees -
CSE 373: Data Structures and Algorithms
Chapter 20: Binary Trees.
CSE 373 Data Structures and Algorithms
CSE 373: Data Structures and Algorithms
B-Trees CSE 373 Data Structures CSE AU B-Trees.
B-Trees.
B-Trees.
B-Trees B-trees are characterized in the following way:
Presentation transcript:

B-tree

Why B-Trees When the data is too big, we will have to use disk storage instead of putting all the data in main memory In such case, we have to take into account the fact that disk access takes much longer time than many other instructions

Disk Access Time Access time = seek time + rotational delay (latency) + transfer time – Seek time is slow, it depends on the mechanical movement of the disk head to position the head at the correct track of the disk – Latency is the time required to position the head above the correct block and on average, it is one-half of a revolution. Ex. to transfer 5kb from a disk requires 40ms to locate a track, making 7200 RPM and with a data transfer rate of 1000kb per second, access time = 40ms + 4ms + 5ms = 49 ms

Why B-Trees Assume that we use a Balanced Binary tree to store all 20 million recordslog 2 20,000,000 is about 25, end up with a very deep tree it will take more than 1 second to transfer a record cannot improve on the log n for a binary tree The solution is to use more branches and thus less height, as branching increases, depth decreases

Definition of a B-tree A B-tree of order m is an m-way tree (i.e., a tree where each node may have up to m children) in which: 1.the root has at least two subtree unless it is a leaf 2. each non-leaf and nonroot nodes have k-1 keys and k pointers where  m / 2  <= k <= m 3. Each leaf node holds k-1 keys where  m / 2  <= k <= m 4. All leaves are on the same level.

An example B-Tree A B-tree of order 5 8

B-Trees Example: 2-3 Tree If we take m = 3, we get a 2-3 tree, in which non-leaf nodes have two or three children (i.e., one or two keys), and it is balanced as B-Trees are always balanced (since the leaves are all at the same level),

Search B-Tree

B-Tree: Insertion insert the new key into a leaf If the resulting leaf becoming too big, split the leaf into two, promoting the middle key to the leaf ’ s parent If the promotion results in the parent becoming too big, split the parent into two, promoting the middle key This strategy might have to be repeated all the way to the top If necessary, the root is split in two and the middle key is promoted to a new root, making the tree one level higher

From Wiki B-tree insertion example (order 3)

B-tree: Deletion During insertion, the key always goes into a leaf. For deletion we wish to remove from a leaf. There are three possible ways we can do this: 1. If the key is already in a leaf node, and removing it doesn’t cause that leaf node to have too few keys, then simply remove the key to be deleted. 2. If the key is in a non-leaf node, then delete the key and promote the predecessor or successor key to the non-leaf deleted key’s position

B-tree: Deletion If 1 or 2 cause a leaf node containing less than the minimum number of keys then we either get help from sibling or merge nodes. 3. Check if one of the siblings immediately adjacent to the leaf in question has more than the minimum number of keys, if yes, then promote one of its keys to the parent and take the parent key into the lacking leaf 4. if neither of them has more than the minimum number of keys then merge the lacking leaf and one of its neighbours with their shared parent (the opposite of promoting a key) if the merge step causes the parent with too few keys, then we repeat the process up to the root, if required

Analysis of B-Trees The maximum number of items in a B-tree of order m and height h: rootm – 1 level 1m(m – 1) level 2m 2 (m – 1)... level hm h (m – 1) m h+1 – 1 the total number of items is (1 + m + m 2 + m 3 + … + m h )(m – 1) = [(m h+1 – 1)/ (m – 1)] (m – 1) = m h+1 – 1 When m = 5 and h = 2 this gives 5 3 – 1 = 124

Demo

Revisit: B-Trees Motivation When searching tables held on disc, the cost of each disc transfer is high – If we use a B-tree of order 101, and assume that we can transfer each node in one disc read operation – A B-tree of order 101 and height 3 can hold – 1 items (approximately 100 million) and any item can be accessed with 3 disc reads (assuming we hold the root in memory)