# CS 3343: Analysis of Algorithms Lecture 14: Order Statistics.

## Presentation on theme: "CS 3343: Analysis of Algorithms Lecture 14: Order Statistics."— Presentation transcript:

CS 3343: Analysis of Algorithms Lecture 14: Order Statistics

Order statistics The i th order statistic in a set of n elements is the i th smallest element The minimum is thus the 1 st order statistic The maximum is the n th order statistic The median is the n/2 order statistic If n is even, there are 2 medians How can we calculate order statistics? What is the running time?

Order statistics – selection problem Select the i th smallest of n elements Naive algorithm: Sort. –Worst-case running time  (n log n) using merge sort or heapsort (not quicksort). We will show: –A practical randomized algorithm with  ( n ) expected running time –A cool algorithm of theoretical interest only with  ( n ) worst-case running time

Recall: Quicksort The function Partition gives us the rank of the pivot If we are lucky, k = i. done! If not, at least get a smaller subarray to work with –k > i: i th smallest is on the left subarray –k < i : i th smallest is on the right subarray Divide and conquer –If we are lucky, k close to n/2, or desired # is in smaller subarray –If unlucky, desired # is in larger subarray (possible size n-1)  x x  x x x x  x x  x x rpq k

Randomized divide-and- conquer algorithm R AND -S ELECT (A, p, q, i) ⊳ i th smallest of A[ p.. q] if p = q & i > 1 then error! r  R AND -P ARTITION (A, p, q) k  r – p + 1 ⊳ k = rank(A[r]) if i = k then return A[ r] if i < k then return R AND -S ELECT ( A, p, r – 1, i ) else return R AND -S ELECT ( A, r + 1, q, i – k )  A[r] A[r]  A[r] A[r]  A[r] A[r]  A[r] A[r] rpq k

Randomized Partition Randomly choose an element as pivot –Every time need to do a partition, throw a die to decide which element to use as the pivot –Each element has 1/n probability to be selected Rand-Partition(A, p, q){ d = random(); // draw a random number between 0 and 1 index = p + floor((q-p+1) * d); // p<=index<=q swap(A[p], A[index]); Partition(A, p, q); // now use A[p] as pivot }

Example pivot i = 6 7 7 10 5 5 8 8 11 3 3 2 2 13 k = 4 Select the 6 – 4 = 2nd smallest recursively. Select the i = 6th smallest: 3 3 2 2 5 5 7 7 11 8 8 10 13 Partition:

7 7 10 5 5 8 8 11 3 3 2 2 13 3 3 2 2 5 5 7 7 11 8 8 10 13 10 8 8 11 13 8 8 10 Complete example: select the 6 th smallest element. i = 6 k = 4 i = 6 – 4 = 2 k = 3 i = 2 < k k = 2 i = 2 = k Note: here we always used first element as pivot to do the partition (instead of rand-partition).

Intuition for analysis Lucky: C ASE 3 T(n)= T(9n/10) +  (n) =  (n) Unlucky: T(n)= T(n – 1) +  (n) =  (n 2 ) arithmetic series Worse than sorting! (All our analyses today assume that all elements are distinct.)

Running time of randomized selection For upper bound, assume i th element always falls in larger side of partition The expected running time is an average of all cases T(n) ≤ T(max(0, n–1)) + nif 0 : n–1 split, T(max(1, n–2)) + nif 1 : n–2 split,  T(max(n–1, 0)) + nif n–1 : 0 split, Expectation

Substitution method Assume: T(k) ≤ ck for all k < n if c ≥ 4 Therefore, T(n) = O(n) Want to show T(n) = O(n). So need to prove T(n) ≤ cn for n > n 0

Summary of randomized selection Works fast: linear expected time. Excellent algorithm in practice. But, the worst case is very bad:  (n 2 ). Q. Is there an algorithm that runs in linear time in the worst case? I DEA : Generate a good pivot recursively. A. Yes, due to Blum, Floyd, Pratt, Rivest, and Tarjan [1973].

Worst-case linear-time selection if i = k then return x elseif i < k thenrecursively S ELECT the i th smallest element in the lower part elserecursively S ELECT the (i–k)th smallest element in the upper part S ELECT (i, n) 1.Divide the n elements into groups of 5. Find the median of each 5-element group by rote. 2.Recursively S ELECT the median x of the  n/5  group medians to be the pivot. 3.Partition around the pivot x. Let k = rank(x). 4. Same as R AND - S ELECT

Choosing the pivot

1.Divide the n elements into groups of 5.

Choosing the pivot lesser greater 1.Divide the n elements into groups of 5. Find the median of each 5-element group by rote.

Choosing the pivot lesser greater 1.Divide the n elements into groups of 5. Find the median of each 5-element group by rote. 2.Recursively S ELECT the median x of the   n/5  group medians to be the pivot. x

Analysis lesser greater x At least half the group medians are  x, which is at least    n/5  /2  =   n/10  group medians.

Analysis lesser greater x At least half the group medians are  x, which is at least    n/5  /2  =   n/10  group medians. Therefore, at least 3   n/10  elements are  x. (Assume all elements are distinct.)

Analysis lesser greater x At least half the group medians are  x, which is at least    n/5  /2  =   n/10  group medians. Therefore, at least 3   n/10  elements are  x. Similarly, at least 3   n/10  elements are  x.

At least 3   n/10  elements are  x  at most n-3   n/10  elements are  x At least 3   n/10  elements are  x  at most n-3   n/10  elements are  x The recursive call to S ELECT in Step 4 is executed recursively on at most n-3   n/10  elements. Analysis Need “at most” for worst-case runtime 3   n/10  Possible position for pivot

Use fact that   a/b  a/b-1 n-3   n/10  < n-3(n/10-1)  7n/10 + 3  3n/4 if n ≥ 60 The recursive call to S ELECT in Step 4 is executed recursively on at most 7n/10+3 elements. Analysis

Developing the recurrence if i = k then return x elseif i < k thenrecursively S ELECT the i th smallest element in the lower part elserecursively S ELECT the (i–k)th smallest element in the upper part S ELECT (i, n) 1.Divide the n elements into groups of 5. Find the median of each 5-element group by rote. 2.Recursively S ELECT the median x of the  n/5  group medians to be the pivot. 3.Partition around the pivot x. Let k = rank(x). 4. T(n)T(n) (n)(n) T(n/5) (n)(n) T(7n/10 +3)

Solving the recurrence if c ≥ 20 and n ≥ 60 Assumption: T(k)  ck for all k < n if n ≥ 60

Conclusions Since the work at each level of recursion is basically a constant fraction (19/20) smaller, the work per level is a geometric series dominated by the linear work at the root. In practice, this algorithm runs slowly, because the constant in front of n is large. The randomized algorithm is far more practical. Exercise: Try to divide into groups of 3 or 7. Exercise: Think about an application in sorting.

Download ppt "CS 3343: Analysis of Algorithms Lecture 14: Order Statistics."

Similar presentations