Algorithms CSCI 235, Spring 2019 Lecture 20 Order Statistics II

Slides:



Advertisements
Similar presentations
Comp 122, Spring 2004 Order Statistics. order - 2 Lin / Devi Comp 122 Order Statistic i th order statistic: i th smallest element of a set of n elements.
Advertisements

Linear-time Median Def: Median of elements A=a 1, a 2, …, a n is the (n/2)-th smallest element in A. How to find median? sort the elements, output the.
1 More Sorting; Searching Dan Barrish-Flood. 2 Bucket Sort Put keys into n buckets, then sort each bucket, then concatenate. If keys are uniformly distributed.
Order Statistics(Selection Problem) A more interesting problem is selection:  finding the i th smallest element of a set We will show: –A practical randomized.
CS 3343: Analysis of Algorithms Lecture 14: Order Statistics.
Medians and Order Statistics
1 Selection --Medians and Order Statistics (Chap. 9) The ith order statistic of n elements S={a 1, a 2,…, a n } : ith smallest elements Also called selection.
Introduction to Algorithms
Introduction to Algorithms Jiafen Liu Sept
Median Finding, Order Statistics & Quick Sort
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
Quicksort CSE 331 Section 2 James Daly. Review: Merge Sort Basic idea: split the list into two parts, sort both parts, then merge the two lists
Spring 2015 Lecture 5: QuickSort & Selection
Median/Order Statistics Algorithms
Updated QuickSort Problem From a given set of n integers, find the missing integer from 0 to n using O(n) queries of type: “what is bit[j]
CS38 Introduction to Algorithms Lecture 7 April 22, 2014.
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
Median, order statistics. Problem Find the i-th smallest of n elements.  i=1: minimum  i=n: maximum  i= or i= : median Sol: sort and index the i-th.
Analysis of Algorithms CS 477/677
Sorting (Part II: Divide and Conquer) CSE 373 Data Structures Lecture 14.
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.
Computer Science 101 Fast Searching and Sorting. Improving Efficiency We got a better best case by tweaking the selection sort and the bubble sort We.
Order Statistics. Order statistics Given an input of n values and an integer i, we wish to find the i’th largest value. There are i-1 elements smaller.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Analysis of Algorithms CS 477/677
Chapter 9: Selection Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
Order Statistics ● The ith order statistic in a set of n elements is the ith smallest element ● The minimum is thus the 1st order statistic ● The maximum.
Order Statistics David Kauchak cs302 Spring 2012.
Order Statistics(Selection Problem)
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II.
COSC 3101A - Design and Analysis of Algorithms 4 Quicksort Medians and Order Statistics Many of these slides are taken from Monica Nicolescu, Univ. of.
Young CS 331 D&A of Algo. Topic: Divide and Conquer1 Divide-and-Conquer General idea: Divide a problem into subprograms of the same kind; solve subprograms.
329 3/30/98 CSE 143 Searching and Sorting [Sections 12.4, ]
Chapter 9: Selection of Order Statistics What are an order statistic? min, max median, i th smallest, etc. Selection means finding a particular order statistic.
Analysis of Algorithms CS 477/677
Order Statistics.
Order Statistics Comp 122, Spring 2004.
CPSC 311 Section 502 Analysis of Algorithm
Randomized Algorithms
Algorithms CSCI 235, Fall 2017 Lecture 16 Quick Sort Read Ch. 7
Advance Analysis of Algorithms
Order Statistics(Selection Problem)
Quick Sort (11.2) CSE 2011 Winter November 2018.
Dr. Yingwu Zhu Chapter 9, p Linear Time Selection Dr. Yingwu Zhu Chapter 9, p
Randomized Algorithms
Medians and Order Statistics
Topic: Divide and Conquer
CS 3343: Analysis of Algorithms
Order Statistics Comp 550, Spring 2015.
Chapter 4.
CSE 326: Data Structures Sorting
Order Statistics Def: Let A be an ordered set containing n elements. The i-th order statistic is the i-th smallest element. Minimum: 1st order statistic.
Chapter 9: Medians and Order Statistics
Topic: Divide and Conquer
Algorithms CSCI 235, Spring 2019 Lecture 16 Quick Sort Read Ch. 7
Data Structures & Algorithms
Order Statistics Comp 122, Spring 2004.
The Selection Problem.
Quicksort and Randomized Algs
Algorithms CSCI 235, Spring 2019 Lecture 19 Order Statistics
Richard Anderson Lecture 14 Divide and Conquer
CSE 332: Sorting II Spring 2016.
Algorithms CSCI 235, Spring 2019 Lecture 26 Midterm 2 Review
Algorithms CSCI 235, Spring 2019 Lecture 17 Quick Sort II
CS200: Algorithm Analysis
Algorithm Efficiency and Sorting
Medians and Order Statistics
Presentation transcript:

Algorithms CSCI 235, Spring 2019 Lecture 20 Order Statistics II

Finding the Median Last time, we showed that we can find the kth order statistic (i.e. the kth smallest element) in Q(n) time, by repeatedly finding the minimum and discarding it. How long will it take to find the median using this strategy? Note that the position of the median (n/2) increases as n increases. T(n) = ? Conclusion: This method does not work as well for finding the median. Larger values of k take longer to find (although the order of growth is the same). Can we do better?

Randomized-Select Idea: Partition the array as in Quick-sort. Recursively search the appropriate partition for the kth element. Randomized-Select(A, lo, hi, i) //Find the ith order statistic // between lo and hi if lo = hi then return A[lo] split = Randomized-Partition(A, lo, hi) length = (split - lo) + 1 if i <= length then return Randomized-Select(A, lo, split, i) else return Randomized-Select(A, split+1, hi, i-length)

Example A 17 6 34 18 9 5 11 22 28 2 1 2 3 4 5 6 7 8 9 10 Find the 3rd order statistic: Randomized-Select(A, 1, 10, 3)

Running time of Randomized-Select Worst Case: As with QuickSort, we can get unlucky and partition the array into two pieces of size 1 and n-1, with the ith statistic in the larger side. T(n) = T(n-1) + n = Q(n2) cost of partition A good case: Partition into two equal parts: T(n) = T(n/2) + n (We will work this one out in class). Average case: Can show that T(n) <= cn, so T(n) = O(n)

Selection in Worst case linear time To make a selection in worst case linear time, we want to use an algorithm that guarantees a good split when we partition. To do this, we use the "median of median of c" algorithm. To start, we pick c, an integer constant >= 1. We write our input array, A, as a 2-D array with c rows, n/c columns. (If n/c is not an integer, we can pad the array with large numbers that won't change the result). Sort the columns of this new, 2D array.

Example A=[43, 5, 17, 91, 2, 42, 19, 72, 37, 3, 7, 15, 0, 63, 51, 73, 6, 30, 62, 10, 24, 26, 25, 28, 29] n = 25 Choose c = 5 Sort each column: B[1..c, 1..n/c] = B[1..5, 1..5] After sorting, the median row contains the median of each column. Sorting the columns takes Q(c2(n/c)) = Q(n) time.

Median-of-median-of-c continued We now call the Median-of-median-of-c algorithm again, on the single median row of B, with the same value of c as before. Write median row as B' = [17, 37, 15, 30, 26] Write B' as 2D array, with c= 5 rows and n/c = 1 column: Value at the middle row is mm, the median of medians. We use this as our pivot for the partition. Sort columns:

Showing that it gives a good split We can show that at least 1/4 of the elements are less than mm and at least 1/4 of the elements are greater than mm by imagining that the columns of B are sorted by the value of each median. (Note: we only imagine it, we don't actually do it). At least 1/4 are less than 26 At least 1/4 are greater than 26

Partitioning Partition A using mm = 26 as the pivot. Use a partition that keeps mm in the high part of the partition: "low" = 2, 5, 17, 3, 19, 0, 7, 15, 6, 10, 24, 25 (12 items) "high" = 26, 43, 91, 37, 42, 72, 51, 63, 30, 62, 73, 28, 29 (13 items) If the number of items in the low part of the partition = k, and the order statistic we are looking for is given by i, then if i <= k, iterate the entire procedure on the lower partition if i > k, iterate on the higher partition (looking for (i - k)th element).

Running time T(n) = Q(n) + T(n/c) + T(3n/4) + Q(n) Worst case split. Cost of sorting columns Cost of partition Cost of finding m-of-m-of-c on median row of B T(n) = T(n/c) + T(3n/4) + Q(n) We can show that T(n) = Q(n) for c >=5

Benefits of M-of-M-of-c Good order statistic algorithm Can use this with other algorithms. For example, we can use it with QuickSort to guarantee a good split and an nlgn order of growth. The linear time is not the result of constraining the problem (as we did with counting-sort). It is a comparison-based method!