# QuickSort Average Case Analysis An Incompressibility Approach Brendan Lucier August 2, 2005.

## Presentation on theme: "QuickSort Average Case Analysis An Incompressibility Approach Brendan Lucier August 2, 2005."— Presentation transcript:

QuickSort Average Case Analysis An Incompressibility Approach Brendan Lucier August 2, 2005

Outline Introduction to QuickSort Overview of Argument The Main Result

Outline Introduction to QuickSort Overview of Argument The Main Result

The QuickSort Algorithm QuickSort( Array A ) If |A| = 0 return Let p = A[1] Let B = ( x in A, x < p ) in stable order Let C = ( x in A, x > p ) in stable order QuickSort(B) QuickSort(C) A = B p C Sorting Algorithm We assume input is a permutation We use Deterministic Quicksort  Pivot is always selected as first element in permutation

QuickSort -- An Example 36571284

36571284 3

6571284 3

6571284 3 16

47285 3 16

47285 3 16 257

48 3 16 257

3 16 257 48 48

3 16 257 48 We call this the QuickSort Tree for the permutation (3,6,5,7,1,2,8,4)

Properties of QuickSort Trees Each node in the tree represents an element as it is chosen to be a pivot. Let T(y) be the subtree of descendents of y, and let R(y) = |T(y)|. QuickSort uses fewer than ΣR(y) comparisons. We note that ΣR(y) is less than n times the height of the QuickSort Tree. To show that QuickSort runs in O(nlogn) time on average, we just need to show that QuickSort trees have average height O(logn). 3 16 257 48 (3,6,5,7,1,2,8,4)

Outline Introduction to QuickSort Overview of Argument The Main Result

Heuristic Argument Each pivot is equally likely to be any element in its range. Call a pivot balanced if it occurs in the middle half of its range. We would expect that half of pivots are balanced. Take a root to leaf path in a QuickSort tree. We expect half of the nodes in the path are balanced, hence split their range in the ratio (1/4,3/4) at most. Thus the range of nodes along our path is reduced to at most 3/4 of its previous size when a balanced node occurs. This means that at most log 4/3 n = dlogn balanced nodes can be on a path. Since half of nodes are expected to be balanced, we expect that a path will have length 2dlogn = O(logn). NOTE: This argument does NOT give us the result we want, since it only talks about expected path length. We need an argument that shows that the average maximum path length is O(logn).

Main Idea Suppose we have a QuickSort tree with a long path, say length k. At most dlogn of the nodes are balanced. Half of the values for a pivot are balanced, the other half are not. Thus, knowing if a pivot is balanced or unbalanced is worth a bit of information. We could encode balanced/unbalanced information for each pivot on our path; this is worth k bits of information. Say we encode as a binary string, with 1 meaning balanced, 0 meaning unbalanced. But we know that at most dlogn nodes are balanced, so our string has at most dlogn 1’s. If we assume that our path is quite long (k >> 2dlogn) then this string has far fewer 1’s than 0’s, and is therefore very compressible. String 010010100

Specifying a Path We want to compress a path, but we need to somehow encode which path we are compressing. We could use a bit for each left / right choice (0 = left, 1 = right) but this adds too much extra information. Instead, we will use the same trick as before: arrange our choices so that a sufficiently long path must have very few instances of one choice. Encode the path as: (0 = follow the larger subtree, 1 = follow the smaller subtree).  Then each 1 causes the range to fall by at least 1/2, so a path can have only logn 1’s.  If our path has length k >> 2logn, this path encoding will have far more 0s than 1s, and can therefore be very heavily compressed. We can choose k large enough so that specifying the path AND its balanced/unbalanced information can still be done in fewer than k bits, so we save bits overall.

Outline Introduction to QuickSort Overview of Argument The Main Result

Lemma: There is a constant c so that if π is logn-incompressible, then the QuickSort tree for π has height less than clogn.  Proof: The rest of this presentation Corollary: The average height of QuickSort trees is O(logn)  Proof: We know that only 2 -logn = 1/n of all permutations are logn- compressible. These could all have height as high as n. The rest must have height < clogn from the lemma. So the average height is bounded by (1-1/n)clogn + (1/n)(n) < clogn + 1 = O(logn) as required.

Encoding Permutations We want to encode permutations in a way that uses the recursive structure of QuickSort trees. Here is a recursive encoding scheme for a permutation of length n:  Specify the pivot, p.  Specify the locations of all values less than the pivot.  Encode the two sub-permutations. There are n options for the first value, (n-1 choose p-1) for the second value, and (p-1)!, (n-p)! choices for the two subpermutations. The total encoding length is then logn + log(n-1 choose p-1) + log(p-1)! + log(n-p)! = log[ n(n-1 choose p-1)(p-1)!(n-p)! ] = log(n!) If we encode the sub-permutations recursively, we get the same result by induction.

Encoding Permutations (con’t) Take a path Y = (y 1, …, y clogn ) in the QuickSort tree. Suppose we know whether each y i is balanced or not. Now modify E(π) so that whenever the pivot is y i for some i, we use one fewer bits to represent y i.  If y i is balanced, we index y i among the balanced values.  Otherwise we index y i among the unbalanced values. Then the total length of E(π) is log(n!) - clogn. All that is left to do is specify the path Y and specify the balanced/unbalanced information.

Compressing Sparse Strings Lemma: Suppose binary string x of length n has at most tn 1’s, where t < 1/2. Then x can be represented in H(t)n + O(logn) bits. Sketch Proof: Encode the number of 1s in x, then encode the locations of those 1s. Some manipulation with Sterling’s Approximation yields the desired result. QED. In particular, if |x| = clogn and n 1 (x) 2d, then x can be encoded in H(d/c)clogn + O(loglogn) bits. Note that this encoding is self-delimiting if we know n.

Encoding Balance Information Our encoding of π requires that we provide balance information about Y, in addition to E(π). Let x be the string whose i th bit is 1 iff y i is balanced. Let z be the string whose i th bit is 1 iff y i+1 is in the smaller range of y i.  Then |xz|=2clogn.  Recall that n 1 (z) ≤ logn and n 1 (x) ≤ dlogn, so n 1 (xz) ≤ (d+1)logn.  Therefore |E(xz)| ≤ H( )2clogn + O(loglogn). d+1 2c

We now give a full encoding of a permutation. Encode the permutation as E(xz)E(π), which has length at most log(n!) - [c(1-2H( )]logn + O(loglogn). Simply take c large enough that [c(1-2H( )] > 1. Then we have C(π | n, p ) ≤ |E(xz)E(π)| ≤ log(n!) - logn. The program p extracts π by decoding E(xz), retrieving x and z, then decoding E(π) by using z to find the values y i and using x to interpret the encodings of the y i s. Thus π is logn-compressible if the QuickSort tree for π has height at least clogn for sufficiently large c. We conclude that QuickSort trees have average height O(logn), and hence the QuickSort algorithm runs in average time O(nlogn). Fully Specifying a Permutation d+1 2c d+1 2c

Summary We prove an average-case upper bound for QuickSort by analyzing the average height of a QuickSort Tree. Our approach was to separate balance information for a long path from the rest of the encoding, then heavily compress the balance information. The compression works because a long path must have fewer balanced than unbalanced nodes.

Fin Thank You

Download ppt "QuickSort Average Case Analysis An Incompressibility Approach Brendan Lucier August 2, 2005."

Similar presentations