CS 206 Introduction to Computer Science II 12 / 03 / 2008 Instructor: Michael Eckmann.

Slides:



Advertisements
Similar presentations
Introduction to Algorithms Quicksort
Advertisements

Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
Advanced Database Discussion B Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if.
CS 206 Introduction to Computer Science II 03 / 23 / 2009 Instructor: Michael Eckmann.
B+-Trees (PART 1) What is a B+ tree? Why B+ trees? Searching a B+ tree
CS 206 Introduction to Computer Science II 09 / 24 / 2008 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 04 / 28 / 2009 Instructor: Michael Eckmann.
Sorting Chapter Sorting Consider list x 1, x 2, x 3, … x n We seek to arrange the elements of the list in order –Ascending or descending Some O(n.
Introduction to Algorithms Rabie A. Ramadan rabieramadan.org 4 Some of the sides are exported from different sources.
CS 206 Introduction to Computer Science II 04 / 27 / 2009 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 12 / 09 / 2009 Instructor: Michael Eckmann.
CSE332: Data Abstractions Lecture 9: B Trees Dan Grossman Spring 2010.
B-Trees. Motivation for B-Trees Index structures for large datasets cannot be stored in main memory Storing it on disk requires different approach to.
CS 206 Introduction to Computer Science II 12 / 05 / 2008 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 11 / 04 / 2009 Instructor: Michael Eckmann.
Other time considerations Source: Simon Garrett Modifications by Evan Korth.
CS 206 Introduction to Computer Science II 10 / 14 / 2009 Instructor: Michael Eckmann.
Quicksort.
CS 206 Introduction to Computer Science II 10 / 29 / 2008 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 12 / 01 / 2008 Instructor: Michael Eckmann.
CS 106 Introduction to Computer Science I 03 / 07 / 2008 Instructor: Michael Eckmann.
B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.
1 Database indices Database Systems manage very large amounts of data. –Examples: student database for NWU Social Security database To facilitate queries,
B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.
Tirgul 6 B-Trees – Another kind of balanced trees Problem set 1 - some solutions.
B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.
CSE 326: Data Structures B-Trees Ben Lerner Summer 2007.
CS 206 Introduction to Computer Science II 11 / 24 / 2008 Instructor: Michael Eckmann.
Sorting CS-212 Dick Steflik. Exchange Sorting Method : make n-1 passes across the data, on each pass compare adjacent items, swapping as necessary (n-1.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
Balanced Trees. Binary Search tree with a balance condition Why? For every node in the tree, the height of its left and right subtrees must differ by.
CS 206 Introduction to Computer Science II 10 / 08 / 2008 Instructor: Michael Eckmann.
B + -Trees COMP171 Fall AVL Trees / Slide 2 Dictionary for Secondary storage * The AVL tree is an excellent dictionary structure when the entire.
CS 206 Introduction to Computer Science II 12 / 08 / 2008 Instructor: Michael Eckmann.
CS 106 Introduction to Computer Science I 10 / 15 / 2007 Instructor: Michael Eckmann.
CS 106 Introduction to Computer Science I 10 / 16 / 2006 Instructor: Michael Eckmann.
CS4432: Database Systems II
1 B-Trees Section AVL (Adelson-Velskii and Landis) Trees AVL tree is binary search tree with balance condition –To ensure depth of the tree is.
IntroductionIntroduction  Definition of B-trees  Properties  Specialization  Examples  2-3 trees  Insertion of B-tree  Remove items from B-tree.
B-Tree. B-Trees a specialized multi-way tree designed especially for use on disk In a B-tree each node may contain a large number of keys. The number.
B-trees (Balanced Trees) A B-tree is a special kind of tree, similar to a binary tree. However, It is not a binary search tree. It is not a binary tree.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
1 B-Trees & (a,b)-Trees CS 6310: Advanced Data Structures Western Michigan University Presented by: Lawrence Kalisz.
COSC 2007 Data Structures II Chapter 15 External Methods.
Sorting. Pseudocode of Insertion Sort Insertion Sort To sort array A[0..n-1], sort A[0..n-2] recursively and then insert A[n-1] in its proper place among.
B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.
B-Trees. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it.
B-Trees. CSM B-Trees 2 Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so.
CS 206 Introduction to Computer Science II 10 / 05 / 2009 Instructor: Michael Eckmann.
CS 206 Introduction to Computer Science II 02 / 13 / 2009 Instructor: Michael Eckmann.
Sorting Chapter Sorting Consider list x 1, x 2, x 3, … x n We seek to arrange the elements of the list in order –Ascending or descending Some O(n.
CS 206 Introduction to Computer Science II 04 / 22 / 2009 Instructor: Michael Eckmann.
 B-tree is a specialized multiway tree designed especially for use on disk  B-Tree consists of a root node, branch nodes and leaf nodes containing the.
B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.
CS 106 Introduction to Computer Science I 03 / 02 / 2007 Instructor: Michael Eckmann.
HEAPS. Review: what are the requirements of the abstract data type: priority queue? Quick removal of item with highest priority (highest or lowest key.
Internal and External Sorting External Searching
Week 15 – Friday.  What did we talk about last time?  Student questions  Review up to Exam 2  Recursion  Binary trees  Heaps  Tries  B-trees.
B-Trees B-Trees.
B-Trees B-Trees.
B+-Trees.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
B+-Trees.
B+-Trees.
B+ Trees What are B+ Trees used for What is a B Tree What is a B+ Tree
CSIT 402 Data Structures II With thanks to TK Prasad
Quick-Sort 4/25/2019 8:10 AM Quick-Sort     2
B-Trees.
B-Trees.
Presentation transcript:

CS 206 Introduction to Computer Science II 12 / 03 / 2008 Instructor: Michael Eckmann

Michael Eckmann - Skidmore College - CS Fall 2008 Today’s Topics Questions/comments? B-Trees –Continued Quicksort

A B-tree can guarantee only a few disk accesses. An M-ary B-Tree has the following properties –data items stored in leaves only –interior (non-leaf) nodes store a maximum of M-1 keys –root is a leaf or has from 2 to M children –all non-leaf nodes have between the ceil(M/2) up to M children –all leaves at same depth –all leaves have between the ceil(L/2) up to L children L means... (see next few slides)‏ all non leaf nodes have at least half of M children, so for large M this will guarantee that the M-ary tree will not approach anything close to a binary tree (why is that a good thing?)‏ B-Trees

Each node represents a disk block (the amount of data read in one disk access). –example: if a disk block is 8k each node holds M-1 keys and M branches (links)‏ –let's say our keys are of size 32 bytes each –and a link is 4 bytes How would we determine M? –the largest value such that a node doesn't hold more than 8k Determine L by the number of records we can store in one block. –if our records are 256 bytes, how many could we store in one block of 8k? B-Trees

Each node represents a disk block (the amount of data read in one disk access). –example: if a disk block is 8k (=8192 bytes)‏ each node holds M-1 keys and M branches (links)‏ –let's say our keys are of size 32 bytes each –and a link is 4 bytes How would we determine M? –the largest value such that a node doesn't hold more than 8k 32*(M-1) + 4*M = 8192 M = 228 (is the largest M such that we don't go over 8192)‏ Determine L by the number of records we can store in one block. –if our records are 256 bytes, how many could we store in one block of 8k? 8192/256 = 32 From the rules of our B-Tree, each leaf then has to have between 16 and 32 records and each non-leaf node (except the root) has to have at least 114 children (up to 228 children). B-Trees

Example: –From the rules of our B-Tree, each leaf then has to have between 16 and 32 records and each non-leaf node (except the root) has to have at least 114 children (up to 228 children). –Picking the worst case B-tree for our example, that is the one with the least number of children per node to give us our highest possible B-Tree for 10,000,000 records, we'll have at most 625,000 leaves which means the height of our B-tree is 4. The worst height of an M-ary tree is approx. log M/2 n. Why? B-Trees

Each disk read will get a block which is a whole node. When we read an interior node's data from disk we get keys and links. When we read a leaf node's data from disk we get up to L data records. Insert examples –normal insert –may need to split a leaf into two (OR put a child up for adoption to a neighbor)‏ –may need to split parents may need to split the root (root will then have 2 children)‏ Splits are infrequent (For every split there will be approx. L/2 nonsplits)‏ Heightening of the tree is even more infrequent ( the only way the tree gets higher when insertion leads to splitting the root.)‏ –notice: for a tree with four levels, the root was only split 3 times during all those inserts. And for M & L as we set in the example it occurred 3 times in 10,000,000 inserts. Splits and heightening cause slow processing when they happen (because there are extra disk reads and writes) but they don't happen often. B-Trees

Let's see some insertions into an existing B-Tree –a 5-ary B-tree, M=5, (with L=5 too) will be show on the board. First verify visually that it is indeed a 5-ary B-tree with L=5. Insert 57 and rearrange the leaf (1 disk access)‏ Insert 55. The leaf is full (it has L=5 items already). With the 55 it would have L+1. (L+1) / 2 is > L/2 so we can split it into 2 leaves. B-Trees

What would be the first step in deletion of a node? Do we have to check anything after the deletion of a node? B-Trees

Delete examples: –normal delete –may need to adopt if leaf goes below minimum ok if the neighbor is not already at minimum or if neighbor has minimum too, then join the two leaves together to get one full leaf –this implies that the parent loses a child –if parent is now below min, then continue up if root ever loses a child and would cause only 1 remaining child, reduce the height of the tree by one and make that child the root B-Trees

Anyone remember MergeSort? What kind of algorithm was that? Quicksort

Anyone remember MergeSort? What kind of algorithm was that? –Divide and Conquer It divided the list in half and did MergeSort on each half then combined the 2 halves --- how did it do this? The divide part of MergeSort is simple. The conquer part of MergeSort is more time consuming. Quicksort

Quicksort is a Divide and Conquer sort algorithm as well. But instead of dividing the list into same size halves we divide it in some other way into two sublists. The DIVIDE part of Quicksort is more complex than MergeSort but the CONQUER part of Quicksort is much simpler than MergeSort. Quicksort

Quicksort algorithm 1) if size of list, L is 0 or 1, return 2) pick some element in list as a pivot element 3) divide the remaining elements (minus the pivot) of L into two groups, L 1, those with elements less than the pivot, and L 2, those with elements greater than or equal to the pivot 4) return (Quicksort(L 1 ) followed by pivot, followed by Quicksort(L 2 ))‏ Depending on which is the pivot element, the sizes of the two sides could differ greatly. Compared to mergeSort, Quicksort does not guarantee equal size portions to sort (which is bad.) But, the divide stage can be done in-place (without any additional space like another array.)‏ Quicksort

To pick some element in list as a pivot element we can either –pick the first (bad if list is almost sorted, why?)‏ –pick a random one (random # generation is time consuming)‏ –a good way is to pick the pivot is the median of 3 elements (say the median of the first, middle and last element) not much extra work the almost sorted case isn't a problem for this Divide strategy – how to divide our list into two sublists of less than pivot and greater than pivot (assume all elements distinct for now)‏ The strategy about to be described gives good results. Quicksort

Divide strategy 1) swap the pivot with the last in the list 2) start index i pointing to first in list and index j to next to last element 3) while (i < j && element at i < pivot)‏ increment i 4) while (i pivot)‏ decrement j 5) if (i pivot and element at j is < pivot so, we swap them and repeat from step 3. 6) when i > j, we swap the pivot that is in the last place with the element at i. Quicksort

Notice that in the best case, if Quicksort could divide the list in equal portions at each level then we would have the fewest recursion levels O(log 2 n)‏ The work to be done on each level is on the order of n. So in the best case quicksort is O(n log 2 n)‏ Any ideas on what it'd be in the worst case? Quicksort

Let's write Quicksort –We can make quicksort be a recursive method that takes in an array the starting index of the data to be sorted the number of elements of the data to be sorted –quicksort will call a method to partition the elements find a pivot and divide a (portion of a) list into elements less than pivot, followed by pivot, followed by elements greater than pivot This method will take in –an array –the starting index of the data to be sorted –the number of elements of the data to be sorted and it will return the pivot index and alter the order of the elements of a subset of the array passed in Quicksort

A typical speedup for Quicksort is to do the following: –when we get down to some small number of elements (say 10) in our list, instead of using quicksort on them, we do insertion sort. –Let's use that xSort applet to visualize insertion sort. How would we alter the code we just wrote to do insertion sort when the number of elements to sort is small? Quicksort