CPSC 252 External Searching Page 1 External Searching Motivation: To this point in the course we have assumed that any data that we are searching through.

Slides:



Advertisements
Similar presentations
COSC 2007 Data Structures II Chapter 14 External Methods.
Advertisements

B-tree. Why B-Trees When the data is too big, we will have to use disk storage instead of putting all the data in main memory In such case, we have to.
AVL Trees1 Part-F2 AVL Trees v z. AVL Trees2 AVL Tree Definition (§ 9.2) AVL trees are balanced. An AVL Tree is a binary search tree such that.
CPSC 252 AVL Trees Page 1 AVL Trees Motivation: We have seen that when data is inserted into a BST in sorted order, the BST contains only one branch (it.
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter Trees and B-Trees.
CPSC 231 B-Trees (D.H.)1 LEARNING OBJECTIVES Problems with simple indexing. Multilevel indexing: B-Tree. –B-Tree creation: insertion and deletion of nodes.
Other time considerations Source: Simon Garrett Modifications by Evan Korth.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 B-Trees Disk Storage What is a multiway tree? What is a B-tree? Why B-trees? Comparing B-trees and AVL-trees Searching a B-tree Insertion in a B-tree.
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
B-Trees and B+-Trees Disk Storage What is a multiway tree?
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
Data Structures Using C++ 2E Chapter 11 Binary Trees and B-Trees.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
Splay Trees and B-Trees
1 B-Trees Section AVL (Adelson-Velskii and Landis) Trees AVL tree is binary search tree with balance condition –To ensure depth of the tree is.
Indexing structures for files D ƯƠ NG ANH KHOA-QLU13082.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.
More Trees Multiway Trees and 2-4 Trees. Motivation of Multi-way Trees Main memory vs. disk ◦ Assumptions so far: ◦ We have assumed that we can store.
1 B Trees - Motivation Recall our discussion on AVL-trees –The maximum height of an AVL-tree with n-nodes is log 2 (n) since the branching factor (degree,
Chapter 19: Binary Trees. Objectives In this chapter, you will: – Learn about binary trees – Explore various binary tree traversal algorithms – Organize.
Oct 29, 2001CSE 373, Autumn External Storage For large data sets, the computer will have to access the disk. Disk access can take 200,000 times longer.
CSE AU B-Trees1 B-Trees CSE 373 Data Structures.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
 … we have been assuming that the data collections we have been manipulating were entirely stored in memory.
Indexing.
COSC 2007 Data Structures II Chapter 15 External Methods.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Arboles B External Search The algorithms we have seen so far are good when all data are stored in primary storage device (RAM). Its access is fast(er)
Lecture 11COMPSCI.220.FS.T Balancing an AVLTree Two mirror-symmetric pairs of cases to rebalance the tree if after the insertion of a new key to.
B-Trees ( Rizwan Rehman) Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
Internal and External Sorting External Searching
8/3/2007CMSC 341 BTrees1 CMSC 341 B- Trees D. Frey with apologies to Tom Anastasio.
B-Trees B-Trees.
B-Trees Text B-Tree Objects Building a B-Tree Read Weiss, §19.8
Multiway Search Trees Data may not fit into main memory
B-Trees B-Trees.
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
B+-Trees.
B+-Trees.
B+-Trees.
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
Chapter Trees and B-Trees
Chapter Trees and B-Trees
Data Structures and Algorithms
B-Tree.
B+-Trees (Part 1).
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
CSIT 402 Data Structures II With thanks to TK Prasad
B-Trees Disk Storage What is a multiway tree? What is a B-tree?
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
CSE 373 Data Structures and Algorithms
B-Trees.
B-Trees Large degree B-trees used to represent very large dictionaries that reside on disk. Smaller degree B-trees used for internal-memory dictionaries.
B-Trees B-trees are characterized in the following way:
Presentation transcript:

CPSC 252 External Searching Page 1 External Searching Motivation: To this point in the course we have assumed that any data that we are searching through can be stored in memory. In practice, this is not always a reasonable assumption. Some large databases are such that they cannot be read into memory. We refer to such data as disk-bound. Accessing data that is stored on a hard disk is extremely slow compared to accessing data in memory. The time taken to access data in memory is typically on the order of nanoseconds (10 -9 ) while the time taken to access data on a disk is of the order of milliseconds (10 -3 ). Given the roughly million-to-1 ratio of disk access time to memory access time we obviously want to minimize the number of times we access the disk!

CPSC 252 External Searching Page 2 We have already seen that an AVL tree allows us to cut down the amount of data to be searched by approximately one half each time we compare the data we are searching for against the data stored in a node. Now let’s suppose that we have a database with 30,000,000 records – not unreasonable as a database of Canadian citizens, for example, would have such a size. How many levels would there be in the AVL tree? We know that a complete binary tree that has its last level filled has: 2 L – 1 nodes, where L is the number of levels in the tree. Hence a tree with 30,000,000 nodes will have log 2 (30,000,001) or 25 levels. So if we are searching for a record that happens to be stored in a leaf node, we will have to perform 25 disk accesses – one for each node visited as we work down the tree. If we have to search the database frequently, this will be unacceptably slow.

CPSC 252 External Searching Page 3 In order to minimize the number of disk accesses, we need to minimize the number of levels in our search tree. Definition: a tree of order m is a tree that has at most m children. Definition: an m-way search tree T is a tree of order m such that: - T is either empty or - each node has subtrees: T 0, T 1, …, T n and key values: K 1 < K 2 < … < K n, where 1 <= n < m - for every key value V in subtree T i : V K n, i = n - every subtree T i is also an m-way search tree.

CPSC 252 External Searching Page 4 Example: The following is an 4-way search tree:

CPSC 252 External Searching Page 5 Now suppose that we have a database of 30,000,000 records and that each node in our 4-way tree is full (ie. contains 3 records), how many disk accesses will be required to retrieve a node at the bottom of the tree? If a 4-way tree has each node filled (ie. contains 3 records) and has every level filled then it contains: (4 L – 1) nodes, where L is the number of levels. Hence a database containing 30,000,000 records will have log 4 ( 30,000,001 ) or 13 levels – a definite improvement over a binary search tree. In practice, commercial databases use specialized versions of m- way search trees where m is of the order of 100. By increasing the value of m we decrease the number of levels in the tree for a fixed number of records. In the following sections we will examine some of these specialized m-way search trees.