Chapter 9: Selection of Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.

Slides:



Advertisements
Similar presentations
©2001 by Charles E. Leiserson Introduction to AlgorithmsDay 9 L6.1 Introduction to Algorithms 6.046J/18.401J/SMA5503 Lecture 6 Prof. Erik Demaine.
Advertisements

Comp 122, Spring 2004 Order Statistics. order - 2 Lin / Devi Comp 122 Order Statistic i th order statistic: i th smallest element of a set of n elements.
1 More Sorting; Searching Dan Barrish-Flood. 2 Bucket Sort Put keys into n buckets, then sort each bucket, then concatenate. If keys are uniformly distributed.
Order Statistics(Selection Problem) A more interesting problem is selection:  finding the i th smallest element of a set We will show: –A practical randomized.
CS 3343: Analysis of Algorithms Lecture 14: Order Statistics.
Medians and Order Statistics
1 Selection --Medians and Order Statistics (Chap. 9) The ith order statistic of n elements S={a 1, a 2,…, a n } : ith smallest elements Also called selection.
Introduction to Algorithms
Introduction to Algorithms Jiafen Liu Sept
1 Today’s Material Medians & Order Statistics – Ch. 9.
Data Structures and Algorithms (AT70.02) Comp. Sc. and Inf. Mgmt. Asian Institute of Technology Instructor: Dr. Sumanta Guha Slide Sources: CLRS “Intro.
Spring 2015 Lecture 5: QuickSort & Selection
Probabilistic (Average-Case) Analysis and Randomized Algorithms Two different approaches –Probabilistic analysis of a deterministic algorithm –Randomized.
Analysis of Algorithms CS 477/677 Randomizing Quicksort Instructor: George Bebis (Appendix C.2, Appendix C.3) (Chapter 5, Chapter 7)
Median/Order Statistics Algorithms
25 May Quick Sort (11.2) CSE 2011 Winter 2011.
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
CS Section 600 CS Section 002 Dr. Angela Guercio Spring 2010.
CS 253: Algorithms Chapter 7 Mergesort Quicksort Credit: Dr. George Bebis.
Probabilistic (Average-Case) Analysis and Randomized Algorithms Two different but similar analyses –Probabilistic analysis of a deterministic algorithm.
Ch. 7 - QuickSort Quick but not Guaranteed. Ch.7 - QuickSort Another Divide-and-Conquer sorting algorithm… As it turns out, MERGESORT and HEAPSORT, although.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Median, order statistics. Problem Find the i-th smallest of n elements.  i=1: minimum  i=n: maximum  i= or i= : median Sol: sort and index the i-th.
Selection: Find the ith number
Analysis of Algorithms CS 477/677
Tirgul 4 Order Statistics Heaps minimum/maximum Selection Overview
1 QuickSort Worst time:  (n 2 ) Expected time:  (nlgn) – Constants in the expected time are small Sorts in place.
David Luebke 1 8/17/2015 CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics.
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Analysis of Algorithms CS 477/677
Introduction to Algorithms Jiafen Liu Sept
Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
Order Statistics ● The ith order statistic in a set of n elements is the ith smallest element ● The minimum is thus the 1st order statistic ● The maximum.
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
Order Statistics(Selection Problem)
Deterministic and Randomized Quicksort Andreas Klappenecker.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
1 Medians and Order Statistics CLRS Chapter 9. upper median lower median The lower median is the -th order statistic The upper median.
Lecture 10. Paradigm #8: Randomized Algorithms Back to the “majority problem” (finding the majority element in an array A). FIND-MAJORITY(A, n) while (true)
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II.
COSC 3101A - Design and Analysis of Algorithms 4 Quicksort Medians and Order Statistics Many of these slides are taken from Monica Nicolescu, Univ. of.
CSC317 1 Quicksort on average run time We’ll prove that average run time with random pivots for any input array is O(n log n) Randomness is in choosing.
Chapter 4: Solution of recurrence relationships Techniques: Substitution: proof by induction Tree analysis: graphical representation Master theorem: Recipe.
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
David Luebke 1 6/26/2016 CS 332: Algorithms Linear-Time Sorting Continued Medians and Order Statistics.
David Luebke 1 7/2/2016 CS 332: Algorithms Linear-Time Sorting: Review + Bucket Sort Medians and Order Statistics.
Order Statistics.
Order Statistics Comp 122, Spring 2004.
Introduction to Algorithms Prof. Charles E. Leiserson
Linear-Time Sorting Continued Medians and Order Statistics
Randomized Algorithms
Order Statistics(Selection Problem)
Randomized Algorithms
Medians and Order Statistics
CS 3343: Analysis of Algorithms
Order Statistics Comp 550, Spring 2015.
Order Statistics Def: Let A be an ordered set containing n elements. The i-th order statistic is the i-th smallest element. Minimum: 1st order statistic.
Algorithms: the big picture
Chapter 9: Medians and Order Statistics
Algorithms CSCI 235, Spring 2019 Lecture 20 Order Statistics II
Order Statistics Comp 122, Spring 2004.
Chapter 9: Selection of Order Statistics
The Selection Problem.
CS200: Algorithm Analysis
Medians and Order Statistics
Presentation transcript:

Chapter 9: Selection of Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic Selection by sorting T(n) =  (nlgn) Selection in linear time best case worst case average case

Given a set of n elements, i th order statistic = i th smallest element min is 1 st order statistic; max is the n th order statistic parity of a set is whether n is even or odd median is roughly half way between min and max unique for an odd parity set i th smallest with i = (n+1)/2 regardless of parity lower median means i th smallest with i =  (n+1)/2  upper median means i th smallest with i =  (n+1)/2  Min, Max and Median order statistics

Find the i th order statistic in set of n (distinct) elements A= (i.e. find x  A that x is larger than exactly i –1 other elements of A) Selection problem can be solve in O(nlgn) by sorting Since min and max can be found in linear time, expect that any order statistic can be found in linear time. Analyze deterministic algorithm, SELECT, that finds the i th order statistic with linear a worst-case runtime. Analyze RANDOMIZED-SELECT that finds the i th order statistic by randomized partition and has an expected runtime O(n) The selection problem

Select-by-Partition(A,p,r,i) %argument i specifies which order statistic 1if p=r then return A[p] %single element is i th smallest by default 2q  Partition(A,p,r) %get upper and lower sub-arrays 3k  q – p + 1 %number of elements in lower including pivot 4if i = k then 5return A[q] %pivot is the i th smallest element 6else 7if i < k then return Select-by-Partition(A,p,q-1,i) 8else 9return Select-by-Partition(A,q+1,r,i - k) Select by partition pseudocode Note: index of i th order statistic changed in upper sub-array With favorable splits, T(n) = O(n) Why not O(nlg(n)) as in quicksort?

Selection algorithm with worst-case runtime = O(n) Possible to design a deterministic selection algorithm that has a linear worst-case runtime. Making the pivot an input parameter, can guarantee a good split when partition is called Processing before calling partition determines a good choice for pivot.

SELECT by partition with preprocessing: T(n)= O(n) Step 1: Divide n-element sequence into floor(n/5) groups of 5 elements and at most one with less than 5: cost =  (n) Step 2: Use insertion sort to find median of each subgroup: cost = constant (cost of sorting 5 elements) x number of subgroups =  (n) Step 3: Use SELECT to find the median of the medians: cost = T(ceiling(n/5)) The median of the group that may contain less than 5 is included. Step 4: Partition the input array with pivot = median of medians. Calculate k, the number of elements < pivot: cost =  (n) + constant. If k=i return pivot. Step 5: If pivot is not the i th smallest element, bound the runtime by the time by assuming the i th smallest element is in larger sub-array: cost < T(7n/10 + 6) (explained below)

Diagram to help explain cost of Step 5 Dots represent elements of input. Subgroups of 5 occupy columns Arrows point from larger to smaller elements. Medians are white. x marks median of medians. Shaded area shows elements greater than x 3 out of 5 are shaded if subgroup is full and does not contain x

Odd number in full groups so that median is unique Total number 28 so that partial group has unique median Lower median of medians so that no elements > x are in groups with median < x Rationale for this diagram

LB(A(k)>x) = 3((n/5)/2-2) = 3n/10-6 Value of constants in formula are easy to rationalize 3 for number in full groups (n/5)/2 – 2 for approximate number of full groups Conservative approximation because we know there is at least one partial group contain elements with A(k)>x Simple formula for lower bound on number of elements > x

LB(A(k)>x) = 3((n/5)/2-2) = 3n/10-6 UB(A(k)<x) = n – (3n/10-6) = 7n/10+6 By similar set of arguments, UB(A(k)>x) = 7n/10+6 7n/10+6 is a conservative estimate of the size of the larger subarray when partition is called with pivot equal to median of medians Worst case described by T(n) < T(ceiling(n/5)) + T(7n/10+6) +  (n) Simple formula for upper bound on number of elements < x

By similar arguments, 7n/10+6 is shown to be an upper bound on the number of elements of A with value less than x

Show by substitution that T(n) = T(ceiling(n/5)) + T(7n/10+6) +  (n) has asymptotic solution T(n) = O(n).

CS 350 Spring 2016 [All problems are from Cormen et al, 3 rd Edition] Homework Assignment 15: due 4/6/16 Ex p 223: (a) Show that SELECT with groups of 7 has a linear worst-case runtime (b) Show that SELECT with groups of 3 does not run in linear time.

CS 350 Spring 2016 [All problems are from Cormen et al, 3 rd Edition] Homework Assignment 16: due 4/8/16 1. ex p ex p 223 Write a pseudo code (variation of codes in text) Explain how code works Analyze its run time

Randomized-Select lets us analyze the runtime for the average case Randomized-Select(A,p,r,i) 1if p=r then return A[p] 2q  Randomized-Partition(A,p,r) 3k  q – p + 1 4if i = k then 5return A[q] (pivot is the i th smallest element) 6else 7if i < k then return Randomized-Select(A,p,q-1,i) 8else 9return Randomized-Select(A,q+1,r,i –k) As in Randomized-Quicksort, Randomized-Partition chooses a pivot at random from array elements between p and r

Upper bound on the expected value of T(n) for Randomized-Select Call to Randomized-Partition creates upper and lower sub-arrays Include the pivot in lower sub-array A(p..q) Define indicator random variables X k = I{sub-array A[p...q]} has exactly k elements} 1 < k < n All possibilities values of k are equally likely. E[X k ] = 1/n

Assume that ith smallest always falls in larger partition This assumption ensures an upper bound on E(T(n)) T(n) < {X k T(max(k-1,n-k))} + O(n) randomized recurrence T(n) = T(n-1) + O(n) when lower sub-array has 1 element T(n) = T(n-2) + O(n) when lower sub-array has 2 element. T(n) = T(n-2) + O(n) when lower sub-array has n-1 element T(n) = T(n-1) + O(n) when lower sub-array has n element

E[T(n)] < { E[X k T(max(k-1,n-k))] } + O(n) (linearity of expected values) E[T(n)] < { E[X k ] E[ T(max(k-1,n-k))] } + O(n) (expected value of independent of random variables) E[T(n)] < (1/n) E[ T(max(k-1,n-k))] + O(n) (using E[X k ] = 1/n)

E[T(n)] < (1/n) E[ T(max(k-1,n-k))] + O(n) if k >  n/2 , max(k-1,n-k) = k-1 if k <  n/2 , max(k-1,n-k) = n-k For even n, each term from T(n/2) to T(n-1) occurs exactly twice Similar argument applies for odd n E[T(n)] < (2/n) E[ T(k)] + O(n) (using the redundancy of T’s) E[T(n)] < (2/n) { E[ T(k)] - E[ T(k)] } + O(n) (Get setup to use the arithmetic sum)

Apply substitution method: assume E[T(k)] = O(k) Then exist c > 0 such that E[T(k)] < ck E[T(n)] 0 Now use arithmetic sum After much algebra (text p219) E[T(n)] < cn – (cn/4 – c/2 – dn) Find c and n 0