 # Medians and Order Statistics

## Presentation on theme: "Medians and Order Statistics"— Presentation transcript:

Medians and Order Statistics
i-th order statistic: i-th smallest element n elements: median is n odd: (n+1)/2 n even: n/2 or n/2+1 Assume distinct numbers. Input: A, n, 1<=i<=n Output: element x of A larger than i-1 elements of A.

Solutions O(n log n) time based on … O(n) time average.
O(n) time worst case.

Minimum and Maximum How many comparisons?
At most n-1. Examine each element and keep trach of smallest one: Comparison based Each element must be compared Each must loose once (except winner). What about simultaneous min and max?

Min & Max Can do with 2n-2 comparisons. Can do better
Form pairs of elements Compare elements in each pair Pair (ai, ai+1), assume ai < ai+1, then Compare (min,ai), (ai+1,max) 3 comparisions for each pair.

Average Time Median Selection
Divide-and-Conquer (prune-and-search). Randomized: behavior determined by output of random number generator. Based on QuickSort: Partition input array recursively, but Work only on one side!

Randomized Selection QuickSort(A,p,r) RandSelect(A,p,r,i)
If p < r then q=partition(A,p,r) QuickSort(A,p,q) QuickSort(A,q+1,r). First call: QuickSort(A,1,n) After partition(A,p,q): A[i]<A[q}, i<q; A[q]<A[j}, q<j. RandSelect(A,p,r,i) If p == r then return A[p] q=RandPartition(A,p,r) k=q-p+1 /* size of A[p..q] If i ≤ k then return RandSelect(A,p,q,i) Else return RandSelect(A,q+1,r,i-k). First call: RandSelect(A,1,n,i). Returns the i-th smallest element in A[p..r].

Selection (cont.) RandPartition (see 8.3, 8.4 textbook) gives partition with low side: 1 element with probability 2/n j elements with probability 1/n, for j=2,3,…,n. Assume i-th element always on larger side: T(n)≤(T(max(1,n-1)+Σk=1..n-1T(max(k,n-k)))/n+O(n) ≤(T(n-1)+2 Σk=n/2..n-1T(k))/n+O(n) =2(Σk=n/2..n-1T(k))/n+O(n), since T(n-1)=O(n2). Then T(n)=O(n) (proof by substitution).

Worst Case Linear Time Selection
O(n) worst case algorithm. Works in similar way: recursively partition input array Idea: guarantee good split E.g., in QuickSort assume at each recursion level have T(n)=T(9n/10)+T(n/10)+O(n). Then, T(n)=O(n log n). Use deterministic partitioning: Compute the element to partition around.

Steps to find i-th smallest element Algorithm Select
Divide elements in n/5 groups of 5 elements, plus at most one group with (n mod 5) elements. Find median of each group: Insertion sort: O(1) time (at most 5 elements). Take middle element (largest if two medians). Use Select recursively to find median x of medians.

Algorithm Select (cont.)
Partition input array around median-of-medians x. Let k be the number of elements on low side, n-k on high side. a1,a2,…,ak | ak+1,ak+2,…,an ai < aj, for 1 ≤ i ≤ k, k+1 ≤ j ≤ n. Use Select recursively to: Find i-th smallest element on low side, if i ≤ k Find (i-k)-th smallest on high side, if i > k.

Analysis Find lower bound on number of elements greater than x.
At least half of medians in step 2 greater than x. Then, At least half of the groups contribute 3 elements that are greater than x, except: Last group (if less than 5 elements); x own group. Discard those two groups: Number of elements greater than x is ≥ 3((n/5)/2-2)=3n/10-6. Similarly, number of elements smaller than x is ≥3n/10-6. Then, in worst case, Select is called recursively in Step 5 on at most 7n/10+6 elements (upper bound).

Analysis (cont.) Steps 1,2 and 4: O(n) time. Step 3: T(n/5)
Step 5: at most T(7n/10+6) 7n/10+6 < n for n > 20. T(n) ≤ T(|¯n/5¯|)+T(7n/10+6)+O(n), n > n1. Use substitution to solve: Assume T(n) ≤ cn, for n > n1; find n1 and c.

Analysis (cont.) T(n) ≤ c|¯n/5¯| + c(7n/10+6) + O(n)
≤ cn/5 + c + 7cn/10 + 6c +O(n) = 9cn/10 + 7c + O(n) Want T(n) ≤ cn: Pick c such that c(n/10-7) ≥ c1n, where c1 is constant from O(n) above (n1 = 80).

Questions Why not groups of 7 elements? Why not groups of 3 elements?
T(n)=O(?)