Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sorting and Lower Bounds 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 19, 2004.

Similar presentations


Presentation on theme: "Sorting and Lower Bounds 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 19, 2004."— Presentation transcript:

1 Sorting and Lower Bounds 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 19, 2004

2 Announcements  Quiz available today  Open 36 hours  Homework #4 is out  You should finish Part 1 this week!  Reading:  Chapter 8

3 Today - Stability - Solving recurrence equations - A lower bound for sorting - Breaking through the lower bound

4 Total Recall: Sorting Algorithms

5 Stable Sorting Algorithms An important notion is stability: a sorting algorithm is stable if it does not change the relative order of equal elements. a[i] = a[j], i < j  f(i) < f(j) Stability is useful when sorting wrto multiple keys. item: (name, year, … ) Suppose we want to sort by year, and lexicographic within each year.

6 Multiple Keys We could use a special comparator function (this would require a special function for each combination of keys). Easier is often to - first sort by name - stable sort by year Done!

7 Pop Quiz Which of our algorithms are stable? (Assuming a reasonable implementation, one can always make a mess). - Bubble Sort- Heap Sort - Selection Sort- Merge Sort - Insertion Sort- Quick Sort

8 Fred’s Stabilizer F.H. claims that stability is a no-brainer: Given any sorting algorithm a minor modification will turn it into a stable algorithm, so who cares? What modification has Fred in mind? Can it really be ignored?

9 Divide-and-Conquer Theorem  Theorem: Let a, b, c  0.  The recurrence relation  T(1) = c  T(N) = aT(N/b) + bN  for any N which is a power of c  has upper-bound solutions  T(N) = O(N)if a<b  T(N) = O(N log N)if a=b  T(N) = O(N log b a )if a>b a=2, b=2, c=1 for rec. sorting

10 Lower Bound for Comparison Based Sorting

11 Comparison Based Sorting Recall that we only are allowed to make comparisons of the form if( a[i] < a[j] ) … For the time being, assume all items are distinct, so this is essentially the only query we can make. How many comparisons will a sorting algorithm take?

12 Comparison Based Sorting There is no easy answer: even Insertion Sort sometimes takes only n-1 comparisons. In fact, we could attach a pre-computation to any sorting algorithm: check if the input is sorted, if so do nothing. Or reverse sorted, or unimodal, or …. All this can be done in linear time.

13 Lower Bound There are n! permutations of {1,2,…,n}. Hence there are n! leaves in the decision tree. But then the tree must have height log n! =  (n log n). Even the average branch length must be  (n log n).

14 Decision tree for sorting a<b<c a<c<b b<a<c b<c<a c<a<b c<b<a a<b<c a<c<b c<a<b b<a<c b<c<a c<b<a a<b<c a<c<b c<a<b a<b<ca<c<b b<a<c b<c<a c<b<a b<a<cb<c<a a<bb<a b<cc<bc<aa<c b<cc<ba<cc<a

15 Non-Comparison Based Sorting

16 Bucket Sort Suppose all keys are numbers between 1 and 100. We could simply set up 100 containers C[k] and place items with key k into C[k]. In the end, join the containers together. If we use, say, linked lists as buckets the join operation is trivial.

17 Bucket Sort This is clearly linear in n. So we can sort in linear time under very special cirumstances. Question: Is this algorithm stable?

18 Breaking up Keys Think about having to sort a deck of index cards, each with a 6-digit number on it. There are a thousand cards. Fred would simulate Bubble Sort by hand. How would a human really do this?

19 Digits and Letters Suppose the items are numbers or strings. It clearly is helpful to have access to the single digits or letters making up these items. This is perfectly practical, but it is not allowed under the comparison-based standard. So how does one use these digits/letters to sort?

20 Radix sort  Another sorting algorithm that goes beyond comparison is radix sort. 0 1 0 0 0 0 1 0 1 0 0 1 1 1 1 0 1 1 1 0 0 1 1 0 2051734620517346 0123456701234567 0 1 0 0 0 0 1 0 0 1 1 0 1 0 1 0 0 1 1 1 1 0 1 1 0 0 0 1 0 0 1 0 1 0 0 1 0 1 0 1 1 0 1 1 1 0 1 1 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 0 1 1 1 Each sorting step must be stable.

21 Radix sort characteristics  Each sorting step can be performed via bucket sort, and is thus O(N).  If the numbers are all b bits long, then there are b sorting steps.  Hence, radix sort is O(bN).  Also, radix sort can be implemented in-place (just like quicksort).

22 Not just for binary numbers  Radix sort can be used for decimal numbers and alphanumeric strings. 0 3 2 2 2 4 0 1 6 0 1 5 0 3 1 1 6 9 1 2 3 2 5 2 0 3 1 0 3 2 2 5 2 1 2 3 2 2 4 0 1 5 0 1 6 1 6 9 0 1 5 0 1 6 1 2 3 2 2 4 0 3 1 0 3 2 2 5 2 1 6 9 0 1 5 0 1 6 0 3 1 0 3 2 1 2 3 1 6 9 2 2 4 2 5 2

23 Why comparison-based?  Bucket and radix sort are much faster than any comparison-based sorting algorithm  Unfortunately, we can’t always live with the restrictions imposed by these algorithms  In such cases, comparison-based sorting algorithms give us general solutions

24 Back to Quick Sort

25 Review: Quicksort algorithm  If array A has 1 (or 0) elements, then done.  Choose a pivot element x from A.  Divide A-{x} into two arrays:  B = {yA | yx}  C = {yA | yx}  Quicksort arrays B and C.  Result is B+{x}+C.

26 Implementation issues  Quick sort can be very fast in practice, but this depends on careful coding  Three major issues: 1.doing quicksort in-place 2.picking the right pivot 3.avoiding quicksort on small arrays

27 1. Doing quicksort in place 85 24 63 50 17 31 96 45 85 24 63 45 17 31 96 50 LR LR 31 24 63 45 17 85 96 50 LR

28 1. Doing quicksort in place 31 24 63 45 17 85 96 50 LR 31 24 17 45 63 85 96 50 RL 31 24 17 45 50 85 96 63 31 24 17 45 63 85 96 50 LR

29 2. Picking the pivot  In real life, inputs to a sorting routine are often partially sorted  why does this happen?  So, picking the first or last element to be the pivot is usually a bad choice  One common strategy is to pick the middle element  this is an OK strategy

30 2. Picking the pivot  A more sophisticated approach is to use random sampling  think about opinion polls  For example, the median-of-three strategy:  take the median of the first, middle, and last elements to be the pivot

31 3. Avoiding small arrays  While quicksort is extremely fast for large arrays, experimentation shows that it performs less well on small arrays  For small enough arrays, a simpler method such as insertion sort works better  The exact cutoff depends on the language and machine, but usually is somewhere between 10 and 30 elements

32 Putting it all together 85 24 63 50 17 31 96 45 85 24 63 45 17 31 96 50 LR LR 31 24 63 45 17 85 96 50 LR

33 Putting it all together 31 24 63 45 17 85 96 50 LR 31 24 17 45 63 85 96 50 RL 31 24 17 45 50 85 96 63 31 24 17 45 63 85 96 50 LR

34 A complication!  What should happen if we encounter an element that is equal to the pivot?  Four possibilities:  L stops, R keeps going  R stops, L keeps going  L and R stop  L and R keep going

35 Quiz Break

36 Red-green quiz  What should happen if we encounter an element that is equal to the pivot?  Four possibilities:  L stops, R keeps going  R stops, L keeps going  L and R stop  L and R keep going  Explain why your choice is the only reasonable one

37 Quick Sort Analysis

38 Worst-case behavior 10547131730222519 5 471317302221910547105173022219 13 17 471051930222 19

39 Best-case analysis  In the best case, the pivot is always the median element.  In that case, the splits are always “down the middle”.  Hence, same behavior as mergesort.  That is, O(Nlog N).

40 Average-case analysis  Consider the quicksort tree: 10547131730222519 517134730222105 19 51730222105 1347 105222

41 Average-case analysis  The time spent at each level of the tree is O(N).  So, on average, how many levels?  That is, what is the expected height of the tree?  If on average there are O(log N) levels, then quicksort is O(Nlog N) on average.

42 Expected height of qsort tree  Assume that pivot is chosen randomly.  When is a pivot “good”? “Bad”? 51317193047105222 Probability of a good pivot is 0.5. After good pivot, each child is at most 3/4 size of parent.

43 Expected height of qsort tree  So, if we descend k levels in the tree, each time being lucky enough to pick a “good” pivot, the maximum size of the k th child is:  N(3/4)(3/4) … (3/4) (k times)  = N(3/4) k  But on average, only half of the pivots will be good, so  N(3/4) k/2 = 2log 4/3 N = O(log N)

44 Summary of quicksort  A fast sorting algorithm in practice.  Can be implemented in-place.  But is O(N 2 ) in the worst case.  O(Nlog N) average-case performance.

45 World’s Fastest Sorters

46 Sorting competitions  There are several world-wide sorting competitions  Unix CoSort has achieved 1GB in under one minute, on a single Alpha  Berkeley’s NOW-sort sorted 8.4GB of disk data in under one minute, using a network of 95 workstations  Sandia Labs was able to sort 1TB of data in under 50 minutes, using a 144- node multiprocessor machine


Download ppt "Sorting and Lower Bounds 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 19, 2004."

Similar presentations


Ads by Google