Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II.

Similar presentations


Presentation on theme: "1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II."— Presentation transcript:

1 1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II

2 2 Finding the Median Last time, we showed that we can find the k th order statistic (i.e. the k th smallest element) in  (n) time, by repeatedly finding the minimum and discarding it. How long will it take to find the median using this strategy? Note that the position of the median (n/2) increases as n increases. T(n) = ? Conclusion: This method does not work as well for finding the median. Larger values of k take longer to find (although the order of growth is the same). Can we do better?

3 3 Randomized-Select Randomized-Select(A, lo, hi, i){Find the ith order statistic between lo and hi} if lo = hi then return A[lo] split  Randomized-Partition(A, lo, hi) length  (split - lo) + 1 if i <= length then return Randomized-Select(A, lo, split, i) else return Randomized-Select(A, split+1, hi, i-length) Idea: Partition the array as in Quick-sort. Recursively search the appropriate partition for the k th element.

4 4 Example A 17 6 34 18 9 5 11 22 28 2 Find the 3rd order statistic: Randomized-Select(A, 1, 10, 3) 1 2 3 4 5 6 7 8 9 10

5 5 Running time of Randomized- Select Worst Case: As with QuickSort, we can get unlucky and partition the array into two pieces of size 1 and n-1, with the ith statistic in the larger side. T(n) = T(n-1) + n =  (n 2 ) cost of partition A good case: Partition into two equal parts: T(n) = T(n/2) + n (We will work this one out in class). Average case: Can show that T(n) <= cn, so T(n) = O(n)

6 6 Selection in Worst case linear time To make a selection in worst case linear time, we want to use an algorithm that guarantees a good split when we partition. To do this, we use the "median of median of c" algorithm. To start, we pick c, an integer constant >= 1. We write our input array, A, as a 2-D array with c rows, n/c columns. (If n/c is not an integer, we can pad the array with large numbers that won't change the result). Sort the columns of this new, 2D array.

7 7 Example A=[43, 5, 17, 91, 2, 42, 19, 72, 37, 3, 7, 15, 0, 63, 51, 73, 6, 30, 62, 10, 24, 26, 25, 28, 29]n = 25 Choose c = 5 Sort each column: B[1..c, 1..n/c] = B[1..5, 1..5] After sorting, the median row contains the median of each column. Sorting the columns takes  (c 2 (n/c)) =  (n) time.

8 8 Median-of-median-of-c continued We now call the Median-of-median-of-c algorithm again, on the single median row of B, with the same value of c as before. Write median row as B' = [17, 37, 15, 30, 26] Write B' as 2D array, with c= 5 rows and n/c = 1 column: Sort columns: Value at the middle row is mm, the median of medians. We use this as our pivot for the partition.

9 9 Showing that it gives a good split We can show that at least 1/4 of the elements are less than mm and at least 1/4 of the elements are greater than mm by imagining that the columns of B are sorted by the value of each median. (Note: we only imagine it, we don't actually do it). At least 1/4 are less than 26 At least 1/4 are greater than 26

10 10 Partitioning Partition A using mm = 26 as the pivot. Use a partition that keeps mm in the high part of the partition: "low" = 2, 5, 17, 3, 19, 0, 7, 15, 6, 10, 24, 25(12 items) "high" = 26, 43, 91, 37, 42, 72, 51, 63, 30, 62, 73, 28, 29(13 items) If the number of items in the low part of the partition = k, and the order statistic we are looking for is given by i, then if i <= k, iterate the entire procedure on the lower partition if i > k, iterate on the higher partition (looking for (i - k) th element).

11 11 Running time T(n) =  (n) + T(n/c) + T(3n/4) +  (n) Cost of sorting columns Cost of finding m-of-m-of-c on median row of B Worst case split. Cost of partition T(n) = T(n/c) + T(3n/4) +  (n) We can show that T(n) =  (n) for c >=5

12 12 Benefits of M-of-M-of-c Good order statistic algorithm Can use this with other algorithms. For example, we can use it with QuickSort to guarantee a good split and an nlgn order of growth. The linear time is not the result of constraining the problem (as we did with counting-sort). It is a comparison-based method!


Download ppt "1 Algorithms CSCI 235, Fall 2015 Lecture 19 Order Statistics II."

Similar presentations


Ads by Google