Download presentation
Presentation is loading. Please wait.
Published bySabrina Powers Modified over 9 years ago
1
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is the nth order statistic The median is the n/2 order statistic – If n is even, there are 2 medians occur at i = ⌊ (n + 1)/2 ⌋ (the lower median) and at i = ⌈ (n + 1)/2 ⌉ (the upper median). How can we calculate order statistics? What is the running time?
2
Order Statistics The selection problem can be specified formally as follows: Input: A set A of n (distinct) numbers and a number i, with 1 ≤ i ≤ n. Output: The element x ∈ A that is larger than exactly i − 1 other elements of A. The selection problem can be solved in O(nlgn) time Since we can sort the numbers using heapsort or merge sort and then simply index the ith element in the output array. There are faster algorithms, however.
3
Minimum and Maximum MINIMUM(A) 1 min← A[1] 2 for i ← 2 to length[A] 3 do if min > A[i ] 4 then min← A[i ] 5 return min Time complexity of the algorithm is n − 1 comparisons Finding the maximum can, of course, be accomplished with n − 1 comparisons as well.
4
Finding Order Statistics: The Selection Problem A more interesting problem is selection: finding the ith smallest element of a set We will show: – A practical randomized algorithm with O(n) expected running time – A cool algorithm of theoretical interest only with O(n) worst-case running time
5
Randomized Selection Key idea: use partition() from quicksort – But, only need to examine one subarray – This savings shows up in running time: O(n) We will again use a slightly different partition: q = RandomizedPartition(A, p, r) A[q] A[q] qp r
6
Randomized Selection RandomizedSelect(A, p, r, i) if (p == r) then return A[p]; q = RandomizedPartition(A, p, r) k = q - p + 1; if (i == k) then return A[q]; if (i < k) then return RandomizedSelect(A, p, q-1, i); else return RandomizedSelect(A, q+1, r, i-k); A[q] A[q] k q pr
7
Randomized Selection Analyzing RandomizedSelect() – Worst case: partition always 0:n-1 T(n) = T(n-1) + O(n)= ??? = O(n 2 ) (arithmetic series) No better than sorting! – “Best” case: suppose a 9:1 partition T(n) = T(9n/10) + O(n) = ??? = O(n)(Master Theorem, case 3) Better than sorting! What if this had been a 99:1 split?
8
Randomized Selection The time required by RANDOMIZED-SELECT on an input array A[p.. r] of n elements is a random variable that we denote by T (n) We obtain an upper bound on E[T(n)] as follows: RANDOMIZED-PARTITION() is equally likely to return any element as the pivot. For each k such that 1 ≤ k ≤ n, the subarray A[p.. q] has k elements with probability 1/n. For k = 1, 2,..., n, we define indicator random variables X k where: X k = I {the subarray A[p.. q] has exactly k elements}, and so we have E[X k ] = 1/n.
9
Randomized Selection When we call RANDOMIZED-SELECT and choose A[q] as the pivot element, we do not know: –If we will terminate immediately with the correct answer, –Recurse on the subarray A[p.. q − 1], or –Recurse on the subarray A[q + 1.. r]. This decision depends on where the ith smallest element falls relative to A[q]. Assuming that T(n) is monotonically increasing, we can bound the time needed for the recursive call by the time needed for the recursive call on the largest possible input.
10
Randomized Selection We assume, to obtain an upper bound, that the ith element is always on the side of the partition with the greater number of elements. For a given call of RANDOMIZED-SELECT, the indicator random variable X k has the value 1 for exactly one value of k, and it is 0 for all other k. When X k = 1, the two subarrays on which we might recurse have sizes k − 1 and n − k.
11
Randomized Selection Hence, we have the recurrence Average case (For upper bound, assume ith element always falls in larger side of partition): Let’s show that T(n) = O(n) by substitution What happened here?
12
Randomized Selection Taking the expected values, we have
13
Randomized Selection Let us consider the expression max(k − 1, n − k) We have, If n is even, each term from T (n/2) up to T (n − 1) appears exactly twice in the summation, and if n is odd, all these terms appear twice and T (n/2) appears once. Thus, we have
14
Randomized Selection We solve the recurrence by substitution. Assume that, T (n) ≤ cn, for some constant c that satisfies the initial conditions of the recurrence. We assume that, T (n) = O(1), for n less than some constant. We also pick a constant a such that the function described by the O(n) term above (which describes the non-recursive component of the running time of the algorithm) is bounded from above by a.n for all n > 0.
15
Randomized Selection Using this inductive hypothesis, we have
16
Randomized Selection In order to complete the proof, we need to show that for sufficiently large n, this last expression is at most cn or, equivalently, that cn/4 − c/2 − an ≥ 0. If we add c/2 to both sides and factor out n, we get n(c/4 − a) ≥ c/2. As long as we choose the constant c so that c/4 − a > 0, i.e., c > 4a, we can divide both sides by c/4 − a, giving
17
Randomized Selection Thus, if we assume that T (n) = O(1) for n < 2c/(c−4a), we have T (n) = O(n). We conclude that any order statistic, and in particular the median, can be determined on average in linear time.
18
The End
Similar presentations
© 2024 SlidePlayer.com Inc.
All rights reserved.