Presentation is loading. Please wait.

Presentation is loading. Please wait.

Algorithms CSCI 235, Spring 2019 Lecture 20 Order Statistics II

Similar presentations


Presentation on theme: "Algorithms CSCI 235, Spring 2019 Lecture 20 Order Statistics II"— Presentation transcript:

1 Algorithms CSCI 235, Spring 2019 Lecture 20 Order Statistics II

2 Finding the Median Last time, we showed that we can find the kth order statistic (i.e. the kth smallest element) in Q(n) time, by repeatedly finding the minimum and discarding it. How long will it take to find the median using this strategy? Note that the position of the median (n/2) increases as n increases. T(n) = ? Conclusion: This method does not work as well for finding the median. Larger values of k take longer to find (although the order of growth is the same). Can we do better?

3 Randomized-Select Idea: Partition the array as in Quick-sort. Recursively search the appropriate partition for the kth element. Randomized-Select(A, lo, hi, i) //Find the ith order statistic // between lo and hi if lo = hi then return A[lo] split = Randomized-Partition(A, lo, hi) length = (split - lo) + 1 if i <= length then return Randomized-Select(A, lo, split, i) else return Randomized-Select(A, split+1, hi, i-length)

4 Example A Find the 3rd order statistic: Randomized-Select(A, 1, 10, 3)

5 Running time of Randomized-Select
Worst Case: As with QuickSort, we can get unlucky and partition the array into two pieces of size 1 and n-1, with the ith statistic in the larger side. T(n) = T(n-1) + n = Q(n2) cost of partition A good case: Partition into two equal parts: T(n) = T(n/2) + n (We will work this one out in class). Average case: Can show that T(n) <= cn, so T(n) = O(n)

6 Selection in Worst case linear time
To make a selection in worst case linear time, we want to use an algorithm that guarantees a good split when we partition. To do this, we use the "median of median of c" algorithm. To start, we pick c, an integer constant >= 1. We write our input array, A, as a 2-D array with c rows, n/c columns. (If n/c is not an integer, we can pad the array with large numbers that won't change the result). Sort the columns of this new, 2D array.

7 Example A=[43, 5, 17, 91, 2, 42, 19, 72, 37, 3, 7, 15, 0, 63, 51, 73, 6, 30, 62, 10, 24, 26, 25, 28, 29] n = 25 Choose c = 5 Sort each column: B[1..c, 1..n/c] = B[1..5, 1..5] After sorting, the median row contains the median of each column. Sorting the columns takes Q(c2(n/c)) = Q(n) time.

8 Median-of-median-of-c continued
We now call the Median-of-median-of-c algorithm again, on the single median row of B, with the same value of c as before. Write median row as B' = [17, 37, 15, 30, 26] Write B' as 2D array, with c= 5 rows and n/c = 1 column: Value at the middle row is mm, the median of medians. We use this as our pivot for the partition. Sort columns:

9 Showing that it gives a good split
We can show that at least 1/4 of the elements are less than mm and at least 1/4 of the elements are greater than mm by imagining that the columns of B are sorted by the value of each median. (Note: we only imagine it, we don't actually do it). At least 1/4 are less than 26 At least 1/4 are greater than 26

10 Partitioning Partition A using mm = 26 as the pivot. Use a partition that keeps mm in the high part of the partition: "low" = 2, 5, 17, 3, 19, 0, 7, 15, 6, 10, 24, 25 (12 items) "high" = 26, 43, 91, 37, 42, 72, 51, 63, 30, 62, 73, 28, 29 (13 items) If the number of items in the low part of the partition = k, and the order statistic we are looking for is given by i, then if i <= k, iterate the entire procedure on the lower partition if i > k, iterate on the higher partition (looking for (i - k)th element).

11 Running time T(n) = Q(n) + T(n/c) + T(3n/4) + Q(n)
Worst case split. Cost of sorting columns Cost of partition Cost of finding m-of-m-of-c on median row of B T(n) = T(n/c) + T(3n/4) + Q(n) We can show that T(n) = Q(n) for c >=5

12 Benefits of M-of-M-of-c
Good order statistic algorithm Can use this with other algorithms. For example, we can use it with QuickSort to guarantee a good split and an nlgn order of growth. The linear time is not the result of constraining the problem (as we did with counting-sort). It is a comparison-based method!


Download ppt "Algorithms CSCI 235, Spring 2019 Lecture 20 Order Statistics II"

Similar presentations


Ads by Google