Presentation is loading. Please wait.

Presentation is loading. Please wait.

SELECTION CS16: Introduction to Data Structures & Algorithms Tuesday, March 3, 2015 1.

Similar presentations


Presentation on theme: "SELECTION CS16: Introduction to Data Structures & Algorithms Tuesday, March 3, 2015 1."— Presentation transcript:

1 SELECTION CS16: Introduction to Data Structures & Algorithms Tuesday, March 3, 2015 1

2 Outline Medians Selection Randomized Selection (Hoare’s) Median-of-Medians Selection Tuesday, March 3, 2015 2

3 Medians The median of a collection of numbers is its “middle” element: half the numbers are smaller, half the numbers are bigger A median is used to summarize a set of numbers by a single, typical value. A mean (average) is also used for this purpose, but means can be corrupted by outliers, whereas the median is less affected What are the median and mean of this list? Which is more helpful? [9, 5, 4, 6, 5, 7, 10000, 6, 4, 8] Finding the median of a list is easy: sort the list and pick the middle element! This is O(nlogn)… can we do better? Tuesday, March 3, 2015 3

4 Medians (2) If we had a fast way of finding a median, we could also use it in quick sort! Remember, quick sort picks a random pivot in order to subdivide the list Since the pivot is random, it’s still possible that we’ll choose the minimal element each time, and the sort will take O(n 2 ) If we were able to choose the median as our pivot each time in O(n), quick sort would have the same worst case runtime as merge sort: O(nlogn) Tuesday, March 3, 2015 4

5 Selection To find a faster algorithm for medians, lets consider the more general problem of selection: select(list, k): // Input: a list of numbers, an integer k // Output: the kth smallest element in the list To find the median using select, we’d just call select(list, n/2) Tuesday, March 3, 2015 5

6 Selection (2) Divide-and-conquer strategy for selection: Divide: Pick a random pivot, x, and partition the list into 3 sublists: L: elements smaller than x E: elements equal to x G: elements greater than x Recur: Because we know the size of the lists, we know which sublist the k th element will be in select(L, k) if k ≤ |L| return x if |L| < k ≤ |L| + |E| select(G, k – (|L| + |E|)) if k > |L| + |E| Conquer: Return! Tuesday, March 3, 2015 6

7 Selection Pseudocode select(list, k): // Base case omitted pivot = list[rand(0, list.size)] L = [] E = [] G = [] for x in list: if x < pivot: L.append(x) if x == pivot: E.append(x) if x > pivot: G.append(x) if k < L.size: return select(L, k) else if k < (L.size + E.size) return pivot else return select(G, k – (L.size + E.size)) Tuesday, March 3, 2015 7

8 Selection Analysis When selection randomly picks a pivot like this, it’s known as Hoare’s Selection How fast is this select? Just like in quick sort, we run the risk of always choosing a horrible pivot, resulting to an O(n 2 ) runtime As with any randomized algorithm, we are more interested in the expected runtime In the worst case, the input list will contain all distinct elements, since duplicate elements could only shrink the sublists (if they are equal to the pivot) and improve the runtime For our analysis, we’ll assume this worst case situation and find the expected runtime Tuesday, March 3, 2015 8

9 Selection Analysis (2) Because every pivot has equal probability of being picked, the recurrence relation looks something like: Any given pivot splits the list into two sublists of size i and (n–1–i) The pivot itself is not included in either sublist The algorithm could recur on either sublist. Since the k th smallest element could be anywhere, it will recur on the first sublist with probability i/(n–1) and the second sublist with probability (n–1–i)/(n–1) Our recurrence relation now looks like: Which solves to O(n). (See Wocjan’s reading for the full solution) Tuesday, March 3, 2015 9

10 Well… great. The expected runtime of select is O(n) But the worst case is still O(n 2 ) This means if we use our select() function in quick sort, the expected runtime is still O(nlogn), and the worst case is still O(n 2 ) Turns out, it’s actually possible to do selection in worst case O(n), and it’s all contingent on picking a good pivot Median of Medians Select assures that the worst case for select is O(n)… Tuesday, March 3, 2015 10

11 Median-of-Medians Select General strategy: Pick a pivot that is always “good” (i.e. between the 25 th and 75 th percentiles) We do this by picking the “median of medians” as our pivot: Partition the input list into n/5 lists of size 5 Sort each of these lists in constant time There are always 5 elements in each list, which is a constant Take the median element of each list and recursively call momSelect() on the list of n/5 medians to find the median of that list Using the median of medians as the pivot, continue with the rest of the selection algorithm as before… Tuesday, March 3, 2015 11

12 Median-of-Medians Select (2) momSelect(list, k) // Base case omitted miniLists = divide list into n/5 lists of 5 medians = [] for miniList in miniLists: sort5(miniList) // in O(1), because miniList is always size 5 medians.append(miniList[2]) pivot = momSelect(medians, medians.size/2) L = [] E = [] G = [] for x in list: if x < pivot: L.append(x) if x == pivot: E.append(x) if x > pivot: R.append(x) if k <= L.size: return momSelect(L, k) else if k <= (L.size + E.size) return pivot else return momSelect(G, k – (L.size + E.size)) Tuesday, March 3, 2015 12

13 Median-of-Medians Select (3) Tuesday, March 3, 2015 13 1122439131458 101671522172520634 11181921232428313643 26373532293033404547 27504249443839414648 Example: Sorting a list of numbers from 1 to 50 List of medians Median of medians Guaranteed to be greater than all the red numbers (lower quarter) and less than all the blue numbers (upper quarter), which happily lies between the 25 th and 75 th percentiles. GREAT PIVOT CHOICE. How many elements will this pivot eliminate, actually? Well, what’s the area of the red or blue region? Height: 3 Width: (n/5)/2 Area = 3n/10 This leaves a problem of at most size 7n/10

14 Median-of-Medians Select (4) Using the median of medians as our pivot, the recurrence relation for select becomes: T(n) = T(n/5) + T(7n/10) + O(n) This is O(n)! (See Eppstein readings for proof) Tuesday, March 3, 2015 14 From recurring on the list of n/5 medians to find the median of medians From recursively calling select() on a list that’s guaranteed to be at most 7/10 the size of the original

15 In Conclusion We can now perform selection in worst case O(n) …which means we can find medians in worst case O(n) …which means we can run quick sort in worst case O(nlogn)!! In practical use, the random-pivot select runs faster. Median of medians is better when worst-case O(n) performance is essential, and is a good algorithm for teaching runtime analysis. We don’t expect you to implement the median of medians algorithm when implementing quicksort. Tuesday, March 3, 2015 15

16 Readings Dasgupta Section 2.4: Does analysis of median finding algorithms Wocjan’s analysis of selection with a random pivot: http://www.eecs.ucf.edu/courses/cot5405/fall2010/chap ter1_2/QuickSelAvgCase.pdf Proof that Median-of-Medians is O(n): http://www.ics.uci.edu/~eppstein/161/960130.html Tuesday, March 3, 2015 16


Download ppt "SELECTION CS16: Introduction to Data Structures & Algorithms Tuesday, March 3, 2015 1."

Similar presentations


Ads by Google