Sorting Gordon College 13.1 Some O(n2) Sorting Schemes

Slides:



Advertisements
Similar presentations
Transform and Conquer Chapter 6. Transform and Conquer Solve problem by transforming into: a more convenient instance of the same problem (instance simplification)
Advertisements

1 HeapSort CS 3358 Data Structures. 2 Heapsort: Basic Idea Problem: Arrange an array of items into sorted order. 1) Transform the array of items into.
Data Structures Data Structures Topic #13. Today’s Agenda Sorting Algorithms: Recursive –mergesort –quicksort As we learn about each sorting algorithm,
Sorting Chapter Sorting Consider list x 1, x 2, x 3, … x n We seek to arrange the elements of the list in order –Ascending or descending Some O(n.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Sorting.
Section 8.8 Heapsort.  Merge sort time is O(n log n) but still requires, temporarily, n extra storage locations  Heapsort does not require any additional.
Ver. 1.0 Session 5 Data Structures and Algorithms Objectives In this session, you will learn to: Sort data by using quick sort Sort data by using merge.
FALL 2006CENG 351 Data Management and File Structures1 External Sorting.
CS 206 Introduction to Computer Science II 12 / 03 / 2008 Instructor: Michael Eckmann.
Sorting CS-212 Dick Steflik. Exchange Sorting Method : make n-1 passes across the data, on each pass compare adjacent items, swapping as necessary (n-1.
Sorting Chapter 13 Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved
© 2006 Pearson Addison-Wesley. All rights reserved10 A-1 Chapter 10 Algorithm Efficiency and Sorting.
1 HEAPS & PRIORITY QUEUES Array and Tree implementations.
Sorting HKOI Training Team (Advanced)
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Final Review Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2010.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
1 Joe Meehean.  Problem arrange comparable items in list into sorted order  Most sorting algorithms involve comparing item values  We assume items.
Adapted from instructor resource slides Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All.
Sorting. Pseudocode of Insertion Sort Insertion Sort To sort array A[0..n-1], sort A[0..n-2] recursively and then insert A[n-1] in its proper place among.
Adapted from instructor resource slides Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All.
CSC 211 Data Structures Lecture 13
Sorting Chapter Sorting Consider list x 1, x 2, x 3, … x n We seek to arrange the elements of the list in order –Ascending or descending Some O(n.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Sorting.
Sorting Dr. Yingwu Zhu. Heaps A heap is a binary tree with properties: 1. It is complete Each level of tree completely filled Except possibly bottom level.
CS 206 Introduction to Computer Science II 04 / 22 / 2009 Instructor: Michael Eckmann.
Sorting Dr. Yingwu Zhu. Sorting Consider list x 1, x 2, x 3, … x n We seek to arrange the elements of the list in order Ascending or descending Some O(n.
Priority Queues and Heaps. October 2004John Edgar2  A queue should implement at least the first two of these operations:  insert – insert item at the.
Review 1 Selection Sort Selection Sort Algorithm Time Complexity Best case Average case Worst case Examples.
The ADT Table The ADT table, or dictionary Uses a search key to identify its items Its items are records that contain several pieces of data 2 Figure.
Sorting part 2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2008.
UNIT 5.  The related activities of sorting, searching and merging are central to many computer applications.  Sorting and merging provide us with a.
Chapter 9 Sorting. The efficiency of data handling can often be increased if the data are sorted according to some criteria of order. The first step is.
Data Structures - CSCI 102 Selection Sort Keep the list separated into sorted and unsorted sections Start by finding the minimum & put it at the front.
Chapter 13 Priority Queues. 2 Priority queue A stack is first in, last out A queue is first in, first out A priority queue is least-in-first-out The “smallest”
Sorting Dr. Yingwu Zhu. Heaps A heap is a binary tree with properties: 1. It is complete Each level of tree completely filled Except possibly bottom level.
HEAPS. Review: what are the requirements of the abstract data type: priority queue? Quick removal of item with highest priority (highest or lowest key.
FALL 2005CENG 213 Data Structures1 Priority Queues (Heaps) Reference: Chapter 7.
ICS201 Lecture 21 : Sorting King Fahd University of Petroleum & Minerals College of Computer Science & Engineering Information & Computer Science Department.
Heaps © 2010 Goodrich, Tamassia. Heaps2 Priority Queue ADT  A priority queue (PQ) stores a collection of entries  Typically, an entry is a.
1 Chapter 8 Sorting. 2 OBJECTIVE Introduces: Sorting Concept Sorting Types Sorting Implementation Techniques.
Chapter 4, Part II Sorting Algorithms. 2 Heap Details A heap is a tree structure where for each subtree the value stored at the root is larger than all.
PREVIOUS SORTING ALGORITHMS  BUBBLE SORT –Time Complexity: O(n 2 ) For each item, make (n –1) comparisons Gives: Comparisons = (n –1) + (n – 2)
Week 13 - Wednesday.  What did we talk about last time?  NP-completeness.
Sorting Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Chapter 9: Sorting1 Sorting & Searching Ch. # 9. Chapter 9: Sorting2 Chapter Outline  What is sorting and complexity of sorting  Different types of.
Quicksort Dr. Yingwu Zhu. 2 Quicksort A more efficient exchange sorting scheme than bubble sort – A typical exchange involves elements that are far apart.
Sorting part 2 Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2008.
Basic Sorting Algorithms Dr. Yingwu Zhu. Sorting Problem Consider list x 1, x 2, x 3, … x n Goal: arrange the elements of the list in order Ascending.
Course: Programming II - Abstract Data Types HeapsSlide Number 1 The ADT Heap So far we have seen the following sorting types : 1) Linked List sort by.
Sorting Cont. Quick Sort As the name implies quicksort is the fastest known sorting algorithm in practice. Quick-sort is a randomized sorting algorithm.
Sorting and Runtime Complexity CS255. Sorting Different ways to sort: –Bubble –Exchange –Insertion –Merge –Quick –more…
1 Priority Queues (Heaps). 2 Priority Queues Many applications require that we process records with keys in order, but not necessarily in full sorted.
Priority Queues and Heaps. John Edgar  Define the ADT priority queue  Define the partially ordered property  Define a heap  Implement a heap using.
Heaps © 2010 Goodrich, Tamassia Heaps Heaps
Description Given a linear collection of items x1, x2, x3,….,xn
original list {67, 33,49, 21, 25, 94} pass { } {67 94}
Sorting Chapter 13 Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved
Quicksort and Mergesort
Sub-Quadratic Sorting Algorithms
Adapted from instructor resource slides
Sorting Dr. Yingwu Zhu.
Algorithms: Design and Analysis
CENG 351 Data Management and File Structures
Searching/Sorting/Searching
Heaps and priority queues
EECE.3220 Data Structures Instructor: Dr. Michael Geiger Spring 2019
Instructor: Dr. Michael Geiger Spring 2017 Lecture 36: Exam 3 Preview
Instructor: Dr. Michael Geiger Spring 2017 Lecture 30: Sorting & heaps
EE 312 Software Design and Implementation I
Presentation transcript:

Sorting Gordon College 13.1 Some O(n2) Sorting Schemes 13.2 Heaps, Heapsort, and Priority Queues 13.3 Quicksort 13.4 Mergesort 13.5 Radix Sort

Sorting Consider a list x1, x2, x3, … xn We seek to arrange the elements of the list in order Ascending or descending Some O(n2) schemes easy to understand and implement inefficient for large data sets

Categories of Sorting Algorithms 1. Selection sort Make passes through a list On each pass reposition correctly some element Look for smallest and position that element correctly... Look for smallest in list and replace 1st element, now start the process over with the remainder of the list Look for smallest in list and replace 1st element, now start the process over with the remainder of the list

Selection Recursive Algorithm If the list has only 1 element ANCHOR stop – list is sorted Else do the following: a. Find the smallest element and place in front b. Sort the rest of the list complexity? First pass – compare n – 1 elements to find smallest Second pass - compare n – 2 to find smallest Total comparisons: n-1 + n-2 +…. + 2 + 1 = n(n-1)/2 = (n^2 – n)/2 = n^2 (see example called recursive_selection.cpp)

Categories of Sorting Algorithms 2. Exchange sort Systematically interchange pairs of elements which are out of order Bubble sort does this When your reach the end of section without exchanging then the list is in order you can shorten the section based on the last set to not be exchanged. Out of order, exchange In order, do not exchange

Bubble Sort Algorithm 1. Initialize numCompares to n - 1 2. While numCompares!= 0, do following a. Set last = 1 // location of last element in a swap b. For i = 1 to numPairs if xi > xi + 1 Swap xi and xi + 1 and set last = i c. Set numCompares = last – 1 End while

Bubble Sort Algorithm 1. Initialize numCompares to n - 1 2. While numCompares!= 0, do following a. Set last = 1 // location of last element in a swap b. For i = 1 to numPairs if xi > xi + 1 Swap xi and xi + 1 and set last = i c. Set numCompares = last – 1 End while 45 67 12 34 25 39 45 12 67 34 25 39 45 12 34 67 25 39 45 12 34 25 67 39 45 12 34 25 39 67 45 12 34 25 39 67 12 45 34 25 39 67 12 34 45 25 39 67 12 34 25 45 39 67 12 34 25 39 45 67 … Allows it to quit if In order Try: 23 12 34 45 67 Also allows us to Label the highest as sorted

Categories of Sorting Algorithms 3. Insertion sort Repeatedly insert a new element into an already sorted list Note this works well with a linked list implementation

Example of Insertion Sort Given list to be sorted 67, 33, 21, 84, 49, 50, 75 Note sequence of steps carried out 34 2 14 4 THE CURRENT NUMBER IS PULLED OUT INTO A TEMP LOCATION (POS 0) - SO THAT NUMS CAN BE MOVED OVER IF NECESSARY! Position 0 is used to store the item that is being insert --- the result is a sorted array from x[1] to x[n]

Improved Schemes These 3 sorts - have computing time O(n2) We seek improved computing times for sorts of large data sets There are sorting schemes which can be proven to have average computing time O( n log2n ) No universally good sorting scheme Results may depend on the order of the list

Comparisons of Sorts Sort of a randomly generated list of 500 items Note: times are on 1970s hardware Algorithm Type of Sort Time (sec) Simple selection Heapsort Bubble sort 2 way bubble sort Quicksort Linear insertion Binary insertion Shell sort Selection Exchange Insertion 69 18 165 141 6 66 37 11

Indirect Sorts What happens if items being sorted are large structures (like objects)? Data transfer/swapping time unacceptable Alternative is indirect sort Uses index table to store positions of the objects Manipulate the index table for ordering

Heaps A heap is a binary tree with properties: It is complete Each level of tree completely filled Except possibly bottom level (nodes in left most positions) It satisfies heap-order property Data in each node >= data in children Type of selection sort Uses a new data structure called a heap to organize the data in such a way to make the sort efficient.

Heaps Which of the following are heaps? C A B Yes – No (not complete) – No (out of order) (ONLY TREE A is a HEAP) A B C

Maximum and Minimum Heaps Example

Implementing a Heap Use an array or vector Number the nodes from top to bottom, then on each row – left to right. Store data in ith node in ith location of array (vector)

Implementing a Heap Note the placement of the nodes in the array 41

Implementing a Heap In an array implementation children of ith node are at myArray[2*i] and myArray[2*i+1] Parent of the ith node is at mayArray[i/2] Therefore PARENT at 3 then CHILDREN at 6 and 7

Basic Heap Operations Constructor Empty Retrieve max item Set mySize to 0, allocate array (if dynamic array) Empty Check value of mySize Retrieve max item Return root of the binary tree, myArray[1]

Result called a semiheap Basic Heap Operations Delete max item (popHeap) Max item is the root, replace with last node in tree Then interchange root with larger of two children Continue this with the resulting sub-tree(s) – result is a new heap. Result called a semiheap

Exchanging elements when performing a popHeap()

Adjusting the heap for popHeap()

Percolate Down Algorithm converts semiheap to heap r = current root node n = number of nodes 1. Set c = 2 * r //location of left child 2. While r <= n do following // must be child(s) for root a. If c < n and myArray[c] < myArray[c + 1] Increment c by 1 //find larger child b. If myArray[r] < myArray[c] i. Swap myArray[r] and myArray[c] ii. set r = c iii. set c = 2 * r else Terminate repetition End while Percolate_down for semiheap: myArray[r]…myArray[n] -only the value at heap[r] may fail the heap order condition C = 2 * r (location of left child) r = current root node N = number of nodes Recursive possibilities?

Basic Heap Operations Insert an item (pushHeap) Amounts to a percolate up routine Place new item at end of array Interchange with parent so long as it is greater than its parent

Example of Heap Before and After Insertion of 50

Reordering the tree for the insertion

Heapsort Given a list of numbers in an array Stored in a complete binary tree Convert to a heap (heapify) Begin at last node not a leaf Apply “percolated down” to this subtree Continue Step 1 – convert array of numbers into a heap – heapify the list of numbers * To get non-leaf node - go to last item and calculate backwards

Example of Heapifying a Vector

Example of Heapifying a Vector How are these nodes stored in an array? 9 12 17 30 50 20 60 65 4 19

Example of Heapifying a Vector Remember: 17 is compared to both children – the larger of the children get brought up

Example of Heapifying a Vector

Example of Heapifying a Vector

Example of Heapifying a Vector

Heapsort Algorithm for converting a complete binary tree to a heap – called "heapify" For r = n/2 down to 1: Apply percolate_down to the subtree in myArray[r] , … myArray[n] End for Puts largest element at root n = index for the last node in the tree therefore n/2 is the parent How do you work the for loop in this case? n = index for last node in tree therefore n/2 is parent

Heapsort Now swap element 1 (root of tree) with last element This puts largest element in correct location Use percolate down on remaining sublist Converts from semi-heap to heap

Heapsort Again swap root with rightmost leaf Continue this process with shrinking sublist 60 60

Heapsort Algorithm 1. Consider x as a complete binary tree, use heapify to convert this tree to a heap 2. for i = n down to 2: a. Interchange x[1] and x[i] (puts largest element at end) b. Apply percolate_down to convert binary tree corresponding to sublist in x[1] .. x[i-1] Notice how the list shrinks with each loop…

Example of Implementing heap sort int arr[] = {50, 20, 75, 35, 25}; vector<int> v(arr, 5);

Example of Implementing heap sort

Example of Implementing heap sort

Heap Algorithms in STL Found in the <algorithm> library make_heap() heapify push_heap() insert pop_heap() delete sort_heap() heapsort See example heapsortSTL in the folder “cs212” and heapsortSTLBugNumber in the folder “BigNumber”

Priority Queue A collection of data elements Implementation ? Items stored in order by priority Higher priority items removed ahead of lower Implementation ?

Implementation possibilities list (array, vector, linked list) insert – O(1) remove max - O(n) ordered list insert - linear insertion sort O(n) remove max - O(1) Heap (Best) Basic operations have O(log2n) time

Priority Queue Basic Operations Constructor Insert Find, remove smallest/largest (priority) element Replace Change priority Delete an item Join two priority queues into a larger one

Priority Queue STL priority queue adapter uses heap priority_queue<BigNumber, vector<BigNumber> > v; cout << "BIG NUMBER DEMONSTRATION" << endl; for(int k=0;k<6;k++) { cout << "Enter BigNumber: "; cin >> a; v.push(a); } cout<<"POP IN ORDER"<<endl; while(!v.empty()) cout<<v.top()<<endl; v.pop(); Basic operations - page 751

Quicksort More efficient exchange sorting scheme (bubble sort is an exchange sort) Typical exchange: involves elements that are far apart Fewer interchanges are required to correctly position an element. Quicksort uses a divide-and-conquer strategy A recursive approach: The original problem partitioned into simpler sub problems Each sub problem considered independently. Subdivision continues until sub problems obtained are simple enough to be solved directly More efficient exchange sorting scheme than bubble sort

Quicksort Basic Algorithm Choose an element - pivot Perform sequence of exchanges so that <elements less than P> <P> <elements greater than P> All elements that are less than this pivot are to its left and All elements that are greater than the pivot are to its right. Divides the (sub)list into two smaller sub lists, Each of which may then be sorted independently in the same way.

Quicksort recursive If the list has 0 or 1 elements, ANCHOR return. // the list is sorted Else do: Pick an element in the list to use as the pivot.   Split the remaining elements into two disjoint groups: SmallerThanPivot = {all elements < pivot} LargerThanPivot = {all elements > pivot}    Return the list rearranged as: Quicksort(SmallerThanPivot), pivot, Quicksort(LargerThanPivot).

Quicksort Example Given to sort: 75, 70, 65, , 98, 78, 100, 93, 55, 61, 81, Select arbitrarily pivot: the first element 75 Search from right for elements <= 75, stop at first match Search from left for elements > 75, stop at first match Swap these two elements, and then repeat this process. When can you stop? 84 68

Quicksort Example When done, swap with pivot 75, 70, 65, 68, 61, 55, 100, 93, 78, 98, 81, 84 When done, swap with pivot This SPLIT operation places pivot 75 so that all elements to the left are <= 75 and all elements to the right are >75. 75 is in place. Need to sort sublists on either side of 75 Swap with pivot – swap the highest of the low numbers…must keep our finger on this one.

Quicksort Example Need to sort (independently): 55, 70, 65, 68, 61 and 100, 93, 78, 98, 81, 84 Let pivot be 55, look from each end for values larger/smaller than 55, swap Same for 2nd list, pivot is 100 Sort the resulting sublists in the same manner until sublist is trivial (size 0 or 1)

QuickSort Recursive Function template <typename ElementType> void quicksort (ElementType x[], int first int last) { int pos; // pivot's final position if (first < last) // list size is > 1 split(x, first, last, pos); // Split into 2 sublists quicksort(x, first, pos - 1); // Sort left sublist quicksort(x,pos + 1, last); // Sort right sublist } 23 45 12 67 32 56 90 2 15

template <typename ElementType> void split (ElementType x[], int first, int last, int & pos) { ElementType pivot = x[first]; // pivot element int left = first, // index for left search right = last; // index for right search while (left < right) while (pivot < x[right]) // Search from right for right--; // element <= pivot // Search from left for while (left < right && // element > pivot x[left] <= pivot) left++; if (left < right) // If searches haven't met swap (x[left], x[right]); // interchange elements } // End of searches; place pivot in correct position pos = right; x[first] = x[pos]; x[pos] = pivot; }  Code of the split function 23 45 12 67 32 56 90 2 15

Quicksort Visual example of a quicksort on an array etc. …

QuickSort Example v = {800, 150, 300, 650, 550, 500, 400, 350, 450, 900} Note: use of v[0] – item indexed by 0 holds the pivot. Step 1 – pivot is selected (random) Pivot selected at random

QuickSort Example

QuickSort Example

QuickSort Example

QuickSort Example quicksort(x, 0, 4); quicksort(x, 6, 9);

QuickSort Example quicksort(x, 0, 0); quicksort(x, 2, 4);

QuickSort Example quicksort(x, 6, 6); quicksort(x, 8, 9);

QuickSort Example

Quicksort Performance O(n log2n) is the average case computing time If the pivot results in sublists of approximately the same size. O(n2) worst-case List already ordered or elements in reverse. When Split() repeatedly creates a sublist with one element. (when pivot is always smallest or largest value) 12 34 45 56 78 88 90 100 99 45 12 67 32 56 90 2 15 What 2 pivots would result in empty sublist?

Improvements to Quicksort 12 34 45 56 78 88 90 100 An arbitrary pivot gives a poor partition for nearly sorted lists (or lists in reverse) Virtually all the elements go into either SmallerThanPivot or LargerThanPivot all through the recursive calls. Quicksort takes quadratic time to do essentially nothing at all. Stopped 3/31

Improvements to Quicksort Better method for selecting the pivot is the median-of-three rule, Select the median (middle value) of the first, middle, and last elements in each sublist as the pivot. (4 10 6) - median is 6 Often the list to be sorted is already partially ordered Median-of-three rule will select a pivot closer to the middle of the sublist than will the “first-element” rule. Median – select the middle value 34 66 21 - select 34 as the pivot.

Improvements to Quicksort Quicksort is a recursive function stack of activation records must be maintained by system to manage recursion. The deeper the recursion is, the larger this stack will become. (major overhead) The depth of the recursion and the corresponding overhead can be reduced sort the smaller sublist at each stage

Improvements to Quicksort Another improvement aimed at reducing the overhead of recursion is to use an iterative version of Quicksort() Implementation: use a stack to store the first and last positions of the sublists sorted "recursively". In other words – create your own low-overhead execution stack.

Improvements to Quicksort For small files (n <= 20), quicksort is worse than insertion sort; small files occur often because of recursion. Use an efficient sort (e.g., insertion sort) for small files. Better yet, use Quicksort() until sublists are of a small size and then apply an efficient sort like insertion sort.

Mergesort Sorting schemes are either … internal -- designed for data items stored in main memory external -- designed for data items stored in secondary memory. Previous sorting schemes were all internal sorting algorithms: required direct access to list elements not possible for sequential files made many passes through the list not practical for files

Mergesort Mergesort can be used both as an internal and an external sort. Basic operation in mergesort is merging, combining two lists that have previously been sorted resulting list is also sorted.

Merge Algorithm 1. Open File1 and File2 for input, File3 for output 2. Read first element x from File1 and first element y from File2 3. While neither eof File1 or eof File2 If x < y then a. Write x to File3 b. Read a new x value from File1 Otherwise a. Write y to File3 b. Read a new y from File2 End while 4. If eof File1 encountered copy rest of of File2 into File3. If eof File2 encountered, copy rest of File1 into File3

Binary Merge Sort Given a single file Split into two files (alternatively into each file) Split the two files – alternatively taking a value from F and placing them into F1 and F2.

Binary Merge Sort Merge first one-element "subfile" of F1 with first one-element subfile of F2 Gives a sorted two-element subfile of F Continue with rest of one-element subfiles If either F1 or F2 contains a remaining sublist – simply copy that to F

Binary Merge Sort Split again Merge again as before Each time, the size of the sorted subgroups doubles

Note we always are limited to subfiles of some power of 2 Binary Merge Sort Last splitting gives two files each in order Last merging yields a single file, entirely in order Note we always are limited to subfiles of some power of 2 CRITICISM: restricts itself to subfile of size 1, 2, 4, 8, …. 2^k where 2^k >= size of F - must go through a series of k split-merge phases. ALLOW OTHER SIZES – SO THAT # OF PHASES IS REDUCED IF FILES CONTAIN LONGER “RUNS”

Natural Merge Sort Allows sorted subfiles of other sizes Number of phases can be reduced when file contains longer "runs" of ordered elements Consider file to be sorted, note in order groups Takes advantage of natural longer “runs”

Natural Merge Sort Copy alternate groupings into two files Use the sub-groupings, not a power of 2 Look for possible larger groupings

EOF for F2, Copy remaining groups from F1 Natural Merge Sort Merge the corresponding sub files EOF for F2, Copy remaining groups from F1

Natural Merge Sort Split again, alternating groups Merge again, now two subgroups One more split, one more merge gives sort

Natural Merge Sort Split algorithm for natural merge sort Open F for input and F1 and F2 for output While the end of F has not been reached: Copy a sorted subfile of F into F1 as follows: repeatedly read an element of F and write it to F1 until the next element in F is smaller than this copied item or the end of F is reached. If the end of F has not been reached, copy the next sorted subfile of F into F2 using the method above. End while.

Natural Merge Sort Merge algorithm for natural merge sort Open F1 and F2 for input, F for output. Initialize numSubfiles to 0 While not eof F1 or not eof F2 While no end of subfile in F1 or F2 has been reached: If the next element in F1 is less than the next element in F2 Copy the next element from F1 into F. Else Copy the next element from F2 into F. End While If the eof F1 has been reached Copy the rest of subfile F2 to F. Copy the rest of subfile F1 to F. Increment numSubfiles by 1. Copy any remaining subfiles to F, incrementing numSubfiles by 1 for each.

Natural Merge Sort Mergesort algorithm Repeat the following until numSubfiles is equal to 1: Call the Split algorithm to split F into files F1 and F2. Call the Merge algorithm to merge corresponding subfiles in F1 and F2 back into F. Worst case for natural merge sort O(n log2n) Worst case – items in reverse order. This algorithm can easily be modified to become an internal sort – using arrays or vectors.

Natural MergeSort Example The merge algorithm takes a sequence of elements in a vector v having index range [first, last). The sequence consists of two ordered sublists separated by an intermediate index, called mid.

Natural MergeSort Review

Natural MergeSort Review So forth and so on…

Natural MergeSort Review Finish with the few remaining nums from the sublist B and then copy all of the numbers from the tempVector to the original vector.

Recursive Natural MergeSort Recursively work this sort down to the lowest level (anchor condition: only one element left) - work our way back to the top merging as we go.

Recursive Natural MergeSort Show them the simulation for mergesort recursive… (http://en.wikipedia.org/wiki/Mergesort) or (http://www.geocities.com/SiliconValley/Program/2864/File/Merge1/mergesort.html ) ANY ALGORITHM WHICH PERFORMS SORTING USING COMPARISONS CANNOT HAVE A WORST-CASE PERFORMANCE BETTER THAN O(n log n) A SORTING ALGORITHM BASED ON COMPARISIONS CANNOT BE O(n) - even its average runtime. TO OBTAIN O(n) – YOU MUST USE A SORT WITHOUT COMPARISONS LIKE Radix Sort

Sorting Fact any algorithm which performs sorting using comparisons cannot have a worst-case performance better than O(n log n) a sorting algorithm based on comparisons cannot be O(n) - even for its average runtime.

Radix Sort Based on examining digits in some base-b numeric representation of items Least significant digit radix sort Processes digits from right to left Create groupings of items with same value in specified digit Collect in order and create grouping with next significant digit Base-b - 10-base 16-base 2-base Used in early punched-card sorting machines (drop your card and then need to sort them using hoppers)

Radix Sort Order ten 2 digit numbers in 10 bins from smallest number to largest number. Requires 2 calls to the sort Algorithm. Initial Sequence: 91 6 85 15 92 35 30 22 39 Pass 0: Distribute the cards into bins according to the 1's digit (100). TIME Each pass through the array takes O(n) time. If the maximum magnitude of a number in the array is `v', and we are treating entries as base `b' numbers, then 1+floor(logb(v)) passes are needed. If `v' is a constant, radix sort takes linear time, O(n). Note however that if all of the numbers in the array are different then v is at least O(n), so O(log(n)) passes are needed, O(n.log(n))-time overall.. EXTRA SPACE If a temporary array is used, the extra work-space used is O(n). It is possible do the sorting on each digit-position in-situ and then only O(log(n)) space is needed to keep track of the array sections yet to be processed, either recursively or on an explicit stack.

Radix Sort Final Sequence: 91 6 85 15 92 35 30 22 39 Pass 1: Take the new sequence and distribute the cards into bins determined by the 10's digit (101). Show the program for this one (radix sort demo) The inner loop (the instructions that distribute values into the containers are encountered most often) n times each time the loop is encountered This process is executed in NUM_DIGITS * n times – O(n)

Sort Algorithm Analysis Selection Sort (uses a swap) Worst and average case O(n^2) can be used with linked-list (doesn’t require random-access data) Can be done in place not at all fast for nearly sorted data

Sort Algorithm Analysis Bubble Sort (uses an exchange) Worst and average case O(n^2) Since it is using localized exchanges - can be used with linked-list Can be done in place O(n^2) - even if only one item is out of place

Sort Algorithm Analysis sorts actually used Insertion Sort (uses an insert) Worst and average case O(n^2) Does not require random-access data Can be done in place It is fast (linear time) for nearly sorted data It is fast for small lists Most good sorting methods call Insertion Sort for small lists

Sort Algorithm Analysis sorts actually used Merge Sort Worst and average case O(n log n) Does not require random-access data For linked-list - can be done in place For an array - need to use a buffer It is not significantly faster on nearly sorted data (but it is still log-linear time)

Sort Algorithm Analysis sorts actually used QuickSort Worst O(n^2) Average case O(n log n) [good time] can be done in place Additional space for recursion O(log n) Can be slow O(n^2) for nearly sorted or reverse data. Sort used for STL sort()

End of sorting