Spring 2015 Lecture 5: QuickSort & Selection

Slides:



Advertisements
Similar presentations
Comp 122, Spring 2004 Order Statistics. order - 2 Lin / Devi Comp 122 Order Statistic i th order statistic: i th smallest element of a set of n elements.
Advertisements

Introduction to Algorithms Quicksort
Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 6.
CS 3343: Analysis of Algorithms Lecture 14: Order Statistics.
Medians and Order Statistics
Introduction to Algorithms
Analysis of Algorithms CS 477/677 Sorting – Part B Instructor: George Bebis (Chapter 7)
Quicksort Ack: Several slides from Prof. Jim Anderson’s COMP 750 notes. UNC Chapel Hill1.
Lecture 7COMPSCI.220.FS.T Algorithm MergeSort John von Neumann ( 1945 ! ): a recursive divide- and-conquer approach Three basic steps: –If the.
Quicksort Many of the slides are from Prof. Plaisted’s resources at University of North Carolina at Chapel Hill.
CS 253: Algorithms Chapter 7 Mergesort Quicksort Credit: Dr. George Bebis.
Ch. 7 - QuickSort Quick but not Guaranteed. Ch.7 - QuickSort Another Divide-and-Conquer sorting algorithm… As it turns out, MERGESORT and HEAPSORT, although.
Data Structures, Spring 2006 © L. Joskowicz 1 Data Structures – LECTURE 4 Comparison-based sorting Why sorting? Formal analysis of Quick-Sort Comparison.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
CS421 - Course Information Website Syllabus Schedule The Book:
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Sorting. Introduction Assumptions –Sorting an array of integers –Entire sort can be done in main memory Straightforward algorithms are O(N 2 ) More complex.
Quicksort CIS 606 Spring Quicksort Worst-case running time: Θ(n 2 ). Expected running time: Θ(n lg n). Constants hidden in Θ(n lg n) are small.
1 QuickSort Worst time:  (n 2 ) Expected time:  (nlgn) – Constants in the expected time are small Sorts in place.
Computer Algorithms Lecture 10 Quicksort Ch. 7 Some of these slides are courtesy of D. Plaisted et al, UNC and M. Nicolescu, UNR.
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
10 Algorithms in 20th Century Science, Vol. 287, No. 5454, p. 799, February 2000 Computing in Science & Engineering, January/February : The Metropolis.
Chapter 7 Quicksort Ack: This presentation is based on the lecture slides from Hsu, Lih- Hsing, as well as various materials from the web.
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
Lecture 2 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
David Luebke 1 6/3/2016 CS 332: Algorithms Analyzing Quicksort: Average Case.
Order Statistics ● The ith order statistic in a set of n elements is the ith smallest element ● The minimum is thus the 1st order statistic ● The maximum.
Order Statistics(Selection Problem)
Deterministic and Randomized Quicksort Andreas Klappenecker.
QuickSort (Ch. 7) Like Merge-Sort, based on the three-step process of divide- and-conquer. Input: An array A[1…n] of comparable elements, the starting.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
COSC 3101A - Design and Analysis of Algorithms 4 Quicksort Medians and Order Statistics Many of these slides are taken from Monica Nicolescu, Univ. of.
2IS80 Fundamentals of Informatics Fall 2015 Lecture 6: Sorting and Searching.
Sorting Fundamental Data Structures and Algorithms Aleks Nanevski February 17, 2004.
Lecture 2 Algorithm Analysis Arne Kutzner Hanyang University / Seoul Korea.
David Luebke 1 2/19/2016 Priority Queues Quicksort.
CSC317 1 Quicksort on average run time We’ll prove that average run time with random pivots for any input array is O(n log n) Randomness is in choosing.
2IS80 Fundamentals of Informatics Fall 2015 Lecture 7: Sorting and Directed Graphs.
Quicksort Quicksort is a well-known sorting algorithm that, in the worst case, it makes Θ(n 2 ) comparisons. Typically, quicksort is significantly faster.
QuickSort. Yet another sorting algorithm! Usually faster than other algorithms on average, although worst-case is O(n 2 ) Divide-and-conquer: –Divide:
Mudasser Naseer 1 3/4/2016 CSC 201: Design and Analysis of Algorithms Lecture # 6 Bubblesort Quicksort.
SORTING AND ASYMPTOTIC COMPLEXITY Lecture 13 CS2110 – Fall 2009.
Sorting. 2 The Sorting Problem Input: A sequence of n numbers a 1, a 2,..., a n Output: A permutation (reordering) a 1 ’, a 2 ’,..., a n ’ of the input.
David Luebke 1 6/26/2016 CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics.
1 Chapter 7 Quicksort. 2 About this lecture Introduce Quicksort Running time of Quicksort – Worst-Case – Average-Case.
David Luebke 1 7/2/2016 CS 332: Algorithms Linear-Time Sorting: Review + Bucket Sort Medians and Order Statistics.
Analysis of Algorithms CS 477/677
Order Statistics.
Quick Sort Divide: Partition the array into two sub-arrays
Order Statistics Comp 122, Spring 2004.
CSC 413/513: Intro to Algorithms
Ch 7: Quicksort Ming-Te Chi
Lecture 3 / 4 Algorithm Analysis
Medians and Order Statistics
Lecture No 6 Advance Analysis of Institute of Southern Punjab Multan
Order Statistics Comp 550, Spring 2015.
CS 583 Analysis of Algorithms
Order Statistics Def: Let A be an ordered set containing n elements. The i-th order statistic is the i-th smallest element. Minimum: 1st order statistic.
CS 332: Algorithms Quicksort David Luebke /9/2019.
Ack: Several slides from Prof. Jim Anderson’s COMP 202 notes.
Chapter 7 Quicksort.
Order Statistics Comp 122, Spring 2004.
The Selection Problem.
Quicksort and Randomized Algs
Quicksort Quick sort Correctness of partition - loop invariant
Presentation transcript:

Spring 2015 Lecture 5: QuickSort & Selection 2IL50 Data Structures Spring 2015 Lecture 5: QuickSort & Selection

One more sorting algorithm … QuickSort One more sorting algorithm …

Sorting algorithms Input: a sequence of n numbers ‹a1, a2, …, an› Output: a permutation of the input such that ‹ai1 ≤ … ≤ ain› Important properties of sorting algorithms: running time: how fast is the algorithm in the worst case in place: only a constant number of input elements are ever stored outside the input array

worst case running time Sorting algorithms Input: a sequence of n numbers ‹a1, a2, …, an› Output: a permutation of the input such that ‹ai1 ≤ … ≤ ain› Important properties of sorting algorithms: running time: how fast is the algorithm in the worst case in place: only a constant number of input elements are ever stored outside the input array worst case running time in place InsertionSort MergeSort HeapSort QuickSort Θ(n2) yes Θ(n log n) no Θ(n log n) yes Θ(n2) yes

worst case running time QuickSort worst case running time in place InsertionSort MergeSort HeapSort QuickSort Θ(n2) yes Θ(n log n) no Θ(n log n) yes Θ(n2) yes Why QuickSort? Expected running time: Θ(n log n) (randomized QuickSort) Constants hidden in Θ(n log n) are small using linear time median finding to guarantee good pivot gives worst case Θ(n log n)

QuickSort QuickSort is a divide-and-conquer algorithm To sort the subarray A[p..r]: Divide Partition A[p..r] into two subarrays A[p..q-1] and A[q+1..r], such that each element in A[p..q-1] is ≤ A[q] and A[q] is < each element in A[q+1..r]. Conquer Sort the two subarrays by recursive calls to QuickSort Combine No work is needed to combine the subarrays, since they are sorted in place. Divide using a procedure Partition which returns q.

QuickSort QuickSort(A, p, r) if p < r then q = Partition(A, p, r) QuickSort(A, p, q-1) QuickSort(A, q+1, r) Partition(A, p, r) x = A[r] i = p-1 for j = p to r-1 do if A[j] ≤ x then i = i+1 exchange A[i] ↔ A[j] exchange A[i+1] ↔ A[r] return i+1 Initial call: QuickSort(A, 1, n) Partition always selects A[r] as the pivot (the element around which to partition)

Partition As Partition executes, the array is partitioned into four regions (some may be empty) Loop invariant all entries in A[p..i] are ≤ pivot all entries in A[i+1..j-1] are > pivot A[r] = pivot Partition(A, p, r) x = A[r] i = p-1 for j = p to r-1 do if A[j] ≤ x then i = i+1 exchange A[i] ↔ A[j] exchange A[i+1] ↔ A[r] return i+1 x p i j r ≤ x > x ???

Partition x p i j r ≤ x > x ??? 8 1 6 4 3 9 5 p i j r 8 1 6 4 3 9 5 Partition(A, p, r) x = A[r] i = p-1 for j = p to r-1 do if A[j] ≤ x then i = i+1 exchange A[i] ↔ A[j] exchange A[i+1] ↔ A[r] return i+1 8 1 6 4 3 9 5 p i j r 8 1 6 4 3 9 5 p i j r 1 8 6 4 3 9 5 p i j r 1 4 8 6 3 9 5 p i j r 1 4 3 5 8 9 6 p i r 1 8 6 4 3 9 5 p i j r 1 4 3 6 8 9 5 p i j r 1 4 6 8 3 9 5 p i j r 1 4 3 6 8 9 5 p i r

Partition - Correctness Loop invariant all entries in A[p..i] are ≤ pivot all entries in A[i+1..j-1] are > pivot A[r] = pivot Initialization before the loop starts, all conditions are satisfied, since r is the pivot and the two subarrays A[p..i] and A[i+1..j-1] are empty Maintenance while the loop is running, if A[j] ≤ pivot, then A[j] and A[i+1] are swapped and then i and j are incremented ➨ 1. and 2. hold. If A[j] > pivot, then increment only j ➨ 1. and 2. hold. Partition(A, p, r) x = A[r] i = p-1 for j = p to r-1 do if A[j] ≤ x then i = i+1 exchange A[i] ↔ A[j] exchange A[i+1] ↔ A[r] return i+1 x p i j ≤ x > x ??? r

Partition - Correctness Loop invariant all entries in A[p..i] are ≤ pivot all entries in A[i+1..j-1] are > pivot A[r] = pivot Termination when the loop terminates, j = r, so all elements in A are partitioned into one of three cases: A[p..i] ≤ pivot, A[i+1..r-1] > pivot, and A[r] = pivot Lines 7 and 8 move the pivot between the two subarrays Running time: Partition(A, p, r) x = A[r] i = p-1 for j = p to r-1 do if A[j] ≤ x then i = i+1 exchange A[i] ↔ A[j] exchange A[i+1] ↔ A[r] return i+1 x p i j ≤ x > x ??? r Θ(n) for an n-element subarray

QuickSort: running time QuickSort(A, p, r) if p < r then q = Partition(A, p, r) QuickSort(A, p, q-1) QuickSort(A, q+1, r) Running time depends on partitioning of subarrays: if they are balanced, then QuickSort is as fast as MergeSort if they are unbalanced, then QuickSort can be as slow as InsertionSort Worst case subarrays completely unbalanced: 0 elements in one, n-1 in the other T(n) = T(n-1) + T(0) + Θ(n) = T(n-1) + Θ(n) = Θ(n2) input: sorted array

QuickSort: running time QuickSort(A, p, r) if p < r then q = Partition(A, p, r) QuickSort(A, p, q-1) QuickSort(A, q+1, r) Running time depends on partitioning of subarrays: if they are balanced, then QuickSort is as fast as MergeSort if they are unbalanced, then QuickSort can be as slow as InsertionSort Best case subarrays completely balanced: each has ≤ n/2 elements T(n) = 2T(n/2) + Θ(n) = Θ(n log n) Average?

QuickSort: running time Average running time is much closer to best case than to worst case. Intuition imagine that Partition always produces a 9-to1 split T(n) = T(9n/10) + T(n/10) + Θ(n)

T(n) = T(9n/10) + T(n/10) + Θ(n) Remember Section 4.2 (or Lecture 2) log10n full levels, log10/9n non-empty levels base of log does not matter in asymptotic notation (as long as it is constant)

QuickSort: running time Average running time is much closer to best case than to worst case. Intuition imagine that Partition always produces a 9-to1 split T(n) = T(9n/10) + T(n/10) + Θ(n) = Θ(n log n) Any split of constant proportionality yields a recursion tree of depth Θ(log n) But splits will not always be constant, there will be a mix of good and bad splits …

QuickSort: running time Average running time is much closer to best case than to worst case. More intuition … mixing good and bad splits does not affect the asymptotic running time assume levels alternate between best-case and worst case splits extra levels add only to hidden constant, in both cases O(n log n) Θ(n) n 0 n-1 (n-1)/2 -1 (n-1)/2 -1 Θ(n) n (n-1)/2 -1 (n-1)/2 -1

Randomized QuickSort pick pivot at random RandomizedPartition(A, p, r) i = Random(p, r) exchange A[r] ↔A[i] return Partition(A, p, r) random pivot results in reasonably balanced split on average ➨ expected running time Θ(n log n) see book for detailed analysis alternative: use linear time median finding to find a good pivot ➨ worst case running time Θ(n log n) price to pay: added complexity

Medians and Order Statistics Selection Medians and Order Statistics

Definitions ith order statistic: ith smallest of a set of n elements minimum: 1st order statistic maximum: nth order statistic median: “halfway point” n odd ➨ unique median at i = (n+1)/2 n even ➨ lower median at i = n/2, upper median at i = n/2+1 here: median means lower median

The selection problem Input: a set A of of n distinct numbers and a number i, with 1 ≤ i ≤ n. Output: The element x ∈ A that is larger than exactly i-1 other elements in A. (The ith smallest element of A.) Easy solution: sort the input in Θ(n log n) time return the ith element in the sorted array This can be done faster … start with minimum and maximum

Minimum and maximum Find the minimum with n-1 comparisons: examine each element in turn and keep track of the smallest one Is this the best we can do? Each element (except the minimum) must be compared to a smaller element at least once … Minimum(A, n) min = A[1] for i = 2 to n do if min > A[i] then min = A[i] return min Find maximum by replacing > with < yes

Simultaneous minimum and maximum Assume we need to find both the minimum and the maximum Easy solution: find both separately ➨ 2n-2 comparisons ➨ Θ(n) time But only 3 n/2 are needed … maintain the minimum and maximum seen so far don’t compare elements to the minimum and maximum separately, process them in pairs compare the elements of each pair to each other, then compare the largest to the maximum and the smallest to the minimum ➨ 3 comparisons for every 2 elements

The selection problem Input: a set A of of n distinct numbers and a number i, with 1 ≤ i ≤ n. Output: The element x ∈ A that is larger than exactly i-1 other elements in A. (The ith smallest element of A.) Theorem The ith smallest element of A can be found in O(n) time in the worst case. Idea: partition the input array, recurse on one side of the split guarantee a good split use Partition with a designated pivot element

Selection in worst-case linear time 20 3 10 8 14 6 12 9 11 18 7 4 5 17 15 1 2 13 i = 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Divide the n elements into groups of 5 ➨ n/5 groups

Selection in worst-case linear time x 20 3 10 8 14 6 12 9 11 18 7 4 5 17 15 1 2 13 i = 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 Divide the n elements into groups of 5 ➨ n/5 groups Find the median of each of the n/5 groups (sort each group of 5 elements in constant time and simply pick the median) Find the median x of the n/5 medians recursively Partition the array around x

Selection in worst-case linear time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20 x i = 12 Divide the n elements into groups of 5 ➨ n/5 groups Find the median of each of the n/5 groups (sort each group of 5 elements in constant time and simply pick the median) Find the median x of the n/5 medians recursively Partition the array around x ➨ x is the kth element after partitioning 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20 x k-1 n-k

Selection in worst-case linear time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20 x i = 12 Divide the n elements into groups of 5 ➨ n/5 groups Find the median of each of the n/5 groups (sort each group of 5 elements in constant time and simply pick the median) Find the median x of the n/5 medians recursively Partition the array around x If i = k, return x. If i < k, recursively find the ith smallest element on the low side. If i > k, recursively find the (i-k)th smallest element on the high side. ➨ x is the kth element after partitioning 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20 x k-1 n-k

Selection in worst-case linear time 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20 x i = 12 Divide the n elements into groups of 5 ➨ n/5 groups Find the median of each of the n/5 groups (sort each group of 5 elements in constant time and simply pick the median) Find the median x of the n/5 medians recursively Partition the array around x If i = k, return x. If i < k, recursively find the ith smallest element on the low side. If i > k, recursively find the (i-k)th smallest element on the high side. ➨ x is the kth element after partitioning 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20 i = 5

Analysis How many elements are larger than x? 20 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 20 How many elements are larger than x?

Analysis How many elements are larger than x? Half of the medians found in step 2 are ≥ x The groups of these medians contain 3 elements each which are > x (discounting x’s group and the last group) ➨ at least elements are > x x

Analysis Symmetrically, at least 3n/10 – 6 elements are < x ➨ the algorithm recurses on ≤ 7n/10 + 6 elements x

Analysis Divide the n elements into groups of 5 ➨ n/5 groups Find the median of each of the n/5 groups (sort each group of 5 elements in constant time and simply pick the median) Find the median x of the n/5 medians recursively Partition the array around x If i = k, return x. If i < k, recursively find the ith smallest element on the low side. If i > k, recursively find the (i-k)th smallest element on the high side. O(n) T( n/5 ) ≤ T(7n/10 + 6) T(n) = O(1) for small n (< 140) O(1) if n < 140 T( n/5 ) + T(7n/10 + 6) + O(n) if n ≥ 140 T(n) =

Solving the recurrence O(1) if n < 140 T( n/5 ) + T(7n/10 + 6) + O(n) if n ≥ 140 T(n) = Solve by substitution Inductive hypothesis: T(n) ≤ cn for some constant c and all n > 0 assume that c is large enough such that T(n) ≤ cn for all n < 140 pick constant a such that the O(n) term is ≤ an for all n > 0 T(n) ≤ c n/5 + c(7n/10 + 6) + an ≤ c n/5 + c + 7cn/10 + 6c + an = 9cn/10 + 7c + an = cn + (-cn/10 + 7c + an) remains to show: -cn/10 + 7c + an ≤ 0

Solving the recurrence O(1) if n < 140 T( n/5 ) + T(7n/10 + 6) + O(n) if n ≥ 140 T(n) = remains to show: -cn/10 + 7c + an ≤ 0 -cn/10 + 7c + an ≤ 0 cn/10 -7c ≥ an cn -70c ≥ 10an c(n -70) ≥ 10an c ≥ 10a(n/(n-70)) n ≥ 140 ➨ n/(n-70) ≤ 2 ➨ 20a ≥ 10a(n/(n-70)) choose c ≥ 20a ➨ T(n) = O(n) ■ Why 140? Any integer > 70 would have worked …

Selection Theorem The ith smallest element of A can be found in O(n) time in the worst case. Does not require any assumptions on the input Is not in conflict with the Ω(n log n) lower bound for sorting, since it does not use sorting Randomized Selection: pick a pivot at random Theorem The ith smallest element of A can be found in O(n) expected time.