 # Introduction to Algorithms Jiafen Liu Sept. 2013.

## Presentation on theme: "Introduction to Algorithms Jiafen Liu Sept. 2013."— Presentation transcript:

Introduction to Algorithms Jiafen Liu Sept. 2013

Today’s Tasks Order Statistics –Randomized divide and conquer –Analysis of expected time –Worst-case linear-time order statistics –Analysis

Order statistics Given n elements in array, try to select the ith smallest of n elements (the element with rank i)? This has various applications. –i=1, find the minimum element. –i=n, find the maximum element. –Find the median: –i= (n+1)/2 (odd)or i=n/2 and n/2+1(even) –This is useful in statistics.

How to find the ith element? Naïve algorithm? –Sort array A, and find the element A[i]. –If we use merge sort or randomized quicksort –Worst-case running time= Θ(nlgn) + Θ(1) = Θ(nlgn) Can we do better than that ？ –Related with sorting, but different. –Our expected time is Θ(n).

Randomized divide-and-conquer algorithm In which Rand-Partition(A,p,q) seems familiar?

Partitioning subroutine P ARTITION (A, p, q) //A[p.. q] x←A[p] //pivot= A[p] i←p for j← p+1 to q do if A[j] ≤x then i←i+ 1 exchange A[i] ↔ A[j] exchange A[p] ↔ A[i] return i

Randomized divide-and-conquer algorithm

Example Select the i= 7th smallest: Partition

Algorithm Analysis (All our analyses today assume that all elements are distinct.) Like Quicksort, our algorithm depends on the effect of partition. Recall what’s the lucky case of Partition? –Median –1/10:9/10? –Each case is lucky except 0:n-1or n-1:0

Lucky or Unlucky? Lucky: –Let’s take 1/10:9/10 partition as an example –T(n)= T(9n/10) + Θ(n) –How to solve it? –Master Method: –T(n)= Θ(n)

Lucky or Unlucky? Unlucky: –0:n-1or n-1:0 partition –T(n)= T(n-1) + Θ(n) –T(n)= Θ(n 2 ) –That’s like arithmetic series. –Even worse than sorting first and then select!

Analysis of Expected Time We have deal with expected running time of Quicksort algorithm in Lecture 4. –Recall how we handle that? –We have n possibilities in partition, how to express them all in an expression? –Indicator random variable.

Analysis of Expected Time Let T(n) = the running time of RAND-SELECT on an input of size n, assuming random numbers are independent. To obtain an upper bound, assume that the ith element always falls in the larger side of the partition:

Analysis of Expected Time For k= 0, 1, …, n–1, define the indicator random variable:

Computing of Expected Time Independence!

Computing of Expected Time How to solve ? Substitution Method –We guess the answer is Θ(n) –Prove: E[T(n)] ≤ cn for some constant c. –Try to do the rest of this by yourself. if c is chosen large enough so that cn/4 dominates Θ(n). That’s the end of proof? The Base Case

Summary of randomized order- statistic selection Works fast: linear expected time. the worst case is bad: Θ(n 2 ). Still an excellent algorithm in practice. Questions: Is there an algorithm that runs in linear time even in the worst case? Pick the pivot randomly is simple, but is not good.

Improvement of randomized selection Due to Blum, Floyd, Pratt, Rivest, and Tarjan . IDEA: Generate a really good pivot recursively. How can we make the complexity of recursion less than Θ(n)?

Worst-case linear-time order statistics SELECT(i, n) –Divide the n elements into └ n/5 ┘ groups of 5 elements. Find the median of each 5- elements group by rote. –Recursively select the median x of the └ n/5 ┘ group medians to be the pivot. –Partition around the pivot x. Let k= rank(x). If i=k then return x Else if i< k then recursively select the ith smallest element in the lower part Else recursively select the (i–k)th smallest element in the upper part

Choosing the pivot Divide n elements into └ n/5 ┘ groups of 5 elements. Reorganize five elements in each group so that –the middle one is the median. –the upper two are less than the median. –the lower two are bigger. 5 └ n/5 ┘

Choosing the pivot How much time does it takes? Θ(n)

Choosing the pivot Recursively select the median x of the └ n/5 ┘ group medians to be the pivot. Rearranged these groups by these medians.

Choosing the pivot Suppose that the whole SELECT(i, n) algorithm takes T(n), What’s the running time of this step? T( └ n/5 ┘ )=T(n/5) Now what do we know about all these elements?

Analysis Rest of the algorithm –Partition around the pivot x. Let k= rank(x). If i=k then return x Else if i< k then recursively select the ith smallest element in the lower part Else recursively select the (i–k)th smallest element in the upper part The whole cost we expected is Θ(n), so the rest cost must strictly less than T(4n/5), why? –We have already a recursive call of T(n/5).

Analysis Look at this figure carefully –there are some directed paths and gives us more information than we just had.

Analysis Look at this figure carefully –there are some directed paths and gives us more information than we just had. –All the elements in the block are ≤ x. –How many elements are there?

Analysis At least half the group medians are ≤x, which is at least └└ n/5 ┘ /2 ┘ = └ n/10 ┘ group medians. Therefore, at least 3 └ n/10 ┘ elements are ≤x.

Analysis Look at this figure carefully –there are some directed paths and gives us more information than we just had. –Now all the elements in the block are ≥ x. –How many elements are there?

Analysis At least half the group medians are ≥ x, which is at least └└ n/5 ┘ /2 ┘ = └ n/10 ┘ group medians. Therefore, at least 3 └ n/10 ┘ elements are ≥ x. Similarly, at least 3 └ n/10 ┘ elements are ≤ x.

Analysis Then, what’s the expression of the cost of 3-case recursion? –One side with at least 3 └ n/10 ┘ elements –The other side with at most 7 └ n/10 ┘ elements –Then the cost is T(7 └ n/10 ┘ ) For n≥50, we have 3 └ n/10 ┘ ≥ n/4 –It means, for n≥50 we have 7 └ n/10 ┘ ≤ 3n/4 –T ( 3n/4) is even better than our expectation 4n/5. For n ≤ 50, we have T(n) = Θ(1).

Total Running Time

Solving the recurrence How? Substitution Method residual desired If c is chosen large enough to handle Θ(n)

Conclusions Since the work at each level of recursion is a constant fraction (19/20) smaller, the work per level is a geometric series. In practice, this algorithm runs slowly, because the constant in front of n is large. The randomized algorithm is far more practical.

Further Thought Why did we use groups of five? Why not groups of three? How about 7?