Presentation is loading. Please wait.

Presentation is loading. Please wait.

1/19 Order Statistics Sorted Find the key that is smaller than exactly k of the n keys.

Similar presentations


Presentation on theme: "1/19 Order Statistics Sorted Find the key that is smaller than exactly k of the n keys."— Presentation transcript:

1 1/19 Order Statistics Sorted Find the key that is smaller than exactly k of the n keys

2 2/19 Statistics: Methods for combining a large amount of data (such as the scores of the whole class on a homework) into a single number or small set of numbers that gives a representative value of the data. The phrase order statistics refers to statistical methods that depend only on the ordering of the data and not on its numerical values. Average of the data, while easy to compute and very important as an estimate of a central value, is NOT an order statistic. Order Statistics

3 3/19 Mode (most commonly occurring value) also does not depend on ordering. Most efficient methods for computing mode in a comparison-based model involve sorting algorithms. Median: The most commonly used order statistic, the value in the middle position in the sorted order of the values. Median can be obtained easily in O(n log n) time via sorting, is it possible to do better? Order Statistics Concept of robustness of estimation

4 4/19 An algorithm that uses random “bits” to guide so as to achieve good “average case” performance. Formally, the algorithm's performance will be a random variable. The "worst case" is typically so unlikely to occur that it can be ignored. Randomized Algorithms

5 5/19 Access a source of independent, unbiased random bits (pseudo random numbers), and it is then allowed to use these random bits to influence its computation. Input Output Algorithm Random bits Randomized Algorithms

6 6/19 Las Vegas Algorithms A randomized algorithm that always outputs the correct answer, it is just that there is a small probability of taking long to execute. Monte Carlo Algorithms Sometimes we want the algorithm to always complete quickly, but allow a small probability of error. Any Las Vegas algorithm can be converted into a Monte Carlo algorithm, by outputting an arbitrary, possibly incorrect answer if it fails to complete within a specified time. Randomized Algorithms

7 7/19 In traditional Quick Sort, we will always pick the first element as the pivot for partitioning. The worst case runtime is O(n 2 ) while the expected runtime is O(nlogn) over the set of all input. Therefore, some input are born to have long runtime, e.g., an inversely sorted list. Randomized Quick Sort

8 8/19 In randomized Quick Sort, we will pick randomly an element as the pivot for partitioning. The expected runtime of any input is O(nlogn) even if the pivot is off by 90%. Randomized Quick Sort

9 9/19 Problem: Finding an 'a' in an array of n elements, given that half are 'a's and the other half are 'b's. Solution: Look at each element of the array, requiring (n/2 operations) if the array were ordered as 'b's first followed by 'a's. Similar drawback with checking in the reverse order, or checking every second element. Randomized Algorithms: Motivating Example

10 10/19 Any strategy with fixed order of checking i.e, a deterministic algorithm, we cannot guarantee that the algorithm will complete quickly for all possible inputs. On the other hand, if we were to check array elements at random, then we will quickly find an 'a' with high probability, whatever be the input. Randomized Algorithms: Motivating Example

11 11/19 The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is the nth order statistic The median is the n/2 order statistic –If n is even, there are 2 medians How can we calculate order statistics? What is the running time? Order Statistics

12 12/19 Given a list of n items, and a number k between 1 and n, find the item that would be k th if we sorted the list. The median is the special case of this for which k=n/2. We'll see two algorithms i.e. a randomized one based on quicksort ("quickselect") and a deterministic one. The randomized one is easier to understand & better in practice so we'll do it first. Let's warm up with some cases of selection that don't have much to do with medians (because k is very far from n/2). Selection problem

13 13/19 If k=1, the selection problem is trivial: just select the minimum element. As usual we maintain a value x that is the minimum seen so far, and compare it against each successive value, updating it when something smaller is seen. min(L) { x = L[1] for (i = 2; i <= n; i++) if (L[i] < x) x = L[i] return x } What if you want to select the second best? Selection problem: 2 nd best search

14 14/19 One possibility: Follow the same general strategy, but modify min(L) to keep two values, the best and second best seen so far. Compare each new value against the second best, to tell whether it is in the top two, but then if we discover that a new value is one of the top two so far we need to tell whether it's best or second best. Selection problem: 2nd best search

15 15/19 Selection problem: 2nd best search Some interesting behavior shows up when we try to analyze it. Worst case: List may be sorted in decreasing order, so each of the n-2 iterations of the loop performs 2 comparisons. The total is then 2n-3 comparisons. Average case: (assuming any permutation of L is equally likely) the first comparison in each iteration still always happens. But the second only happens when L[i] is one of the two smallest values among the first i. Each of the first i values is equally likely to be one of these two, so this is true with probability 2/i. The total expected number of times we make the second comparison is

16 16/19 Selection problem: 2nd best search Conclusion The sum (for i from 1 to n) of 1/i, known as the harmonic series, is ln n + O(1) (this can be proved using calculus, by comparing the sum to a similar integral). Therefore the total expected number of comparisons overall is n + O(log n). This small increase over the n-1 comparisons needed to find the minimum gives us hope that we can perform selection faster than sorting.

17 17/19 Random-Select (S, i) 1. If |S| = 1 then return S. 2. Choose a random element y uniformly from S 3. Compare all elements of S to y. Let S1 = {x ≤ y} S2 = {x > y} 4. If |S1| = n then 4.1 If i = n return {y} else S1 = S1 – {y} 5. If |S1| ≥ i then return Random-Select(S1, i) else return Random-Select(|S2|, i - |S1|) Linear-Time Median Selection

18 18/19 Linear-Time Median Selection Given a “black box” O(n) median algorithm, what can we do? –ith order statistic: Find median x Partition input around x if (i  (n+1)/2) recursively find ith element of first half else find (i - (n+1)/2)th element in second half T(n) = T(n/2) + O(n) = O(n) –Can you think of an application to sorting?

19 19/19 Worst-case O(n lg n) quicksort –Find median x and partition around it –Recursively quicksort two halves –T(n) = 2T(n/2) + O(n) = O(n lg n) Linear-Time Median Selection


Download ppt "1/19 Order Statistics Sorted Find the key that is smaller than exactly k of the n keys."

Similar presentations


Ads by Google