Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sorting 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 17, 2004.

Similar presentations


Presentation on theme: "Sorting 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 17, 2004."— Presentation transcript:

1 Sorting 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 17, 2004

2 Announcements  Homework 5 is out  Reading:  Chapter 8 in MAW  Quiz 1 available on Thursday

3 Introduction to Sorting

4 Boring … Sorting is admittedly not very sexy, everybody knows some algorithms already, … But: Good sorting algorithms are needed absolutely everywhere. Sorting is fairly well understood theoretically. Provides a good way to introduce some important ideas.

5 The Problem We are given a sequence of items a 1 a 2 a 3 … a n-1 a n We want to rearrange them so that they are in non-decreasing order. More precisely, we need a permutation f such that a f(1)  a f(2)  a f(3)  …  a f(n-1)  a f(n).

6 A Constraint Comparison Based Sorting While we are rearranging the items, we will only use queries of the form a i  a j Or variants thereof ( and so forth).

7 Say What? The important point here is that the algorithm can can only make comparison such as if( a[i] < a[j] ) … We are not allowed to look at pieces of the elements a[i] and a[j]. For example, if these elements are numbers, we are not allowed to compare the most significant digits.

8 An Easy Upper Bound Here is a simple idea to sort an array: a flip is a position in the array where two adjacent elements are out of order. a[i] > a[i+1] Let’s look for a flip and correct it by swapping the two elements.

9 A Prototype Algorithm // FlipSort while( there is a flip ) pick one, fix it Is this algorithm guaranteed to terminate? If so, what can we say about its running time? Is it correct, i.e., is the array sorted?

10 Termination while( there is a flip ) pick one, fix it It’s tempting to do induction on the number of flips but beware: 10 15 5 10  10 5 15 10 We need to talk about inversions instead.

11 Flips and Inversions 24471399105222 inversion flip

12 Running Time The total number of inversions is clearly quadratic at most. So we can sort in quadratic time if we can manage to find and fix a flip in constant time. We need to organize the search somehow. Probably should try to avoid recomputation.

13 Naïve sorting algorithms Bubble Sort Selection Sort Insertion Sort  this one is actually important Are all quadratic in the worst case and on average.

14 Bubble Sort Scan through the array, fix flips as you go along. Repeat until array is sorted. for( i = 2; i <= n; i++ ) for( j = n; j >= i; j-- ) if( A[j-1] > A[j] ) swap A[j-1] and A[j];

15 Selection Sort For k = n, n-1, … find the smallest element in the last k elements of the array and swap it to the front. for( i = 1; i <= n-1; i++ ) find A[j] minimal in A[i..n] swap with A[i]

16 Insertion Sort Place the ith element into the proper place into the already sorted list of the first i-1 elements. for i = 2 to n do order-insert a[i] in a[1:i-1] Can be implemented nicely.

17 Insertion Sort Using a sentinel. for( i = 2; i <= n; i++ ) x = A[i]; A[0] = x; for( j = i; x < A[j-1]; j-- ) A[j] = A[j-1]; A[j] = x;

18 Insertion sort 1054713993022247105139930222134710599302221347991053022213304799105222 10547139930222 Sorted sublist

19 How fast is insertion sort? Takes O(#inversions) steps, which is very fast if array is nearly sorted to begin with. 3 2 1 6 5 4 9 8 7 …

20 How long does it take to sort?  Can we do better than O(n 2 )?  In the worst case?  In the average case

21 Sorting in O(n log n) O(n log n) turns out to be a Magic Wall: it is hard to reach, and exceedingly hard to break through. In fact, it’s impossible in a sense to do better than O(n log n). We already know that Heapsort will give us this bound: - build the heap in linear time, - destroy it in O(n log n).

22 Heapsort in practice  The average-case analysis for heapsort is somewhat complex.  In practice, heapsort consistently tends to use nearly n log n comparisons.  So, while the worst case is better than n 2, other algorithms sometimes work better.

23 Shellsort Shellsort, like insertion sort, is based on swapping inverted pairs. It achieves O(n 4/3 ) running time. [See your book for details.]

24 Shellsort  Example with sequence 3, 1. 105471399302229947131053022299301310547222993013105472223099131054722230139910547222... Several inverted pairs fixed in one exchange.

25 Recursive Sorting

26 Recursive sorting  Intuitively, divide the problem into pieces and then recombine the results.  If array is length 1, then done.  If array is length N>1, then split in half and sort each half.  Then combine the results.  An example of a divide-and-conquer algorithm.

27 Divide-and-conquer

28

29 Why divide-and-conquer works  Suppose the amount of work required to divide and recombine is linear, that is, O(n).  Suppose also that the amount of work to complete each step is greater than O(n).  Then each dividing step reduces the amount of work by greater than a linear amount, while requiring only linear work to do so.

30 Divide-and-conquer is big  We will see several examples of divide-and-conquer in this course.

31 Recursive Sorting  If array is length 1, then done.  Otherwise, split into two smaller pieces.  Sort each piece.  Combine the sorted pieces.

32 Two Major Approaches 1. Make the split trivial, but perform some work when the pieces are combined  Merge Sort. 2.Work during the split, but then do nothing in the combination step  Quick Sort. In either case, the overhead should be linear with small constants.

33 Analysis The analysis is relatively easy if the two pieces have (approximately) the same size. This is the case for Merge Sort, but not for Quick Sort. Let’s ignore the second case for the time being.

34 Recurrence Equations We need to deal with equations of the form T(1) = 1 T(n) = 2 T(n/2) + f(n) Here f(n) is the non-recursive overhead. There are two recursive calls, each to a sub- instance of the same size n/2. Of course, there are other cases to consider.

35 Recurrence Equations A slight generalization is T(1) = 1 T(n) = a T(n/b) + f(n) Here f(n) is again the non-recursive overhead. There are a recursive calls, each to a sub- instance of the size n/b.

36 Recurrence Equations Of course, we’re cheating: T(1) = 1 T(n) = a T(n/b) + f(n) Makes no sense unless b divides n. Let’s just ignore this. In reality there are ceilings and floors and continuity arguments everywhere.

37 Mergesort

38 The Algorithm Merging the two sorted parts here is responsible for the overhead. merge( nil, B ) = B; merge( A, nil ) = A; merge( a A, b B ) = if( a <= b ) prepend( merge( A, b B ), a ) else prepend( merge( a A, B ), b )

39 The Algorithm The main function. List MergeSort( List L ) { if( length(L) <= 1 ) return L; A = first half of L; B = second half of L; return merge(MergeSort(A),MergeSort(B)); }

40 Harsh Reality In reality, the items are always given in an array. The first and second half can be found by index arithmetic. LLRL

41 But Note … We cannot perform the merge operation in place. Rather, we need to have another array as scratch space. The total space requirement for Merge Sort is 2n + O(log n) Assuming the recursive implementation.

42 Running Time Solving the recurrence equation for Merge Sort one can see that the running time is O(n log n) Since Merge Sort reads the data strictly sequentially it is sometimes useful when data reside on slow external media. But overall it is no match for Quick Sort.

43 Quicksort

44  Quicksort was invented in 1960 by Tony Hoare.  Although it has O(N 2 ) worst-case performance, on average it is O(Nlog N).  More importantly, it is the fastest known comparison-based sorting algorithm in practice.

45 Quicksort idea  Choose a pivot.

46 Quicksort idea  Choose a pivot.  Rearrange so that pivot is in the “right” spot.

47 Quicksort idea  Choose a pivot.  Rearrange so that pivot is in the “right” spot.  Recurse on each half and conquer!

48 Quicksort algorithm  If array A has 1 (or 0) elements, then done.  Choose a pivot element x from A.  Divide A-{x} into two arrays:  B = {yA | yx}  C = {yA | yx}  Quicksort arrays B and C.  Result is B+{x}+C.

49 Quicksort algorithm 10547131730222519517134730222105 19 51730222105 1347 105222

50 Quicksort algorithm 10547131730222519 517134730222105 19 51730222105 1347 In practice, insertion sort is used once the arrays get “small enough”. 105222

51 Doing quicksort in place 85 24 63 50 17 31 96 45 85 24 63 45 17 31 96 50 LR LR 31 24 63 45 17 85 96 50 LR

52 Doing quicksort in place 31 24 63 45 17 85 96 50 LR 31 24 17 45 63 85 96 50 RL 31 24 17 45 50 85 96 63 31 24 17 45 63 85 96 50 LR

53 Quicksort is fast but hard to do  Quicksort, in the early 1960’s, was famous for being incorrectly implemented many times.  More about invariants next time.  Quicksort is very fast in practice.  Faster than mergesort because Quicksort can be done “in place”.

54 Informal analysis  If there are duplicate elements, then algorithm does not specify which subarray B or C should get them.  Ideally, split down the middle.  Also, not specified how to choose the pivot.  Ideally, the median value of the array, but this would be expensive to compute.  As a result, it is possible that Quicksort will show O(N 2 ) behavior.

55 Worst-case behavior 10547131730222519 5 471317302221910547105173022219 13 17 471051930222 19

56 Analysis of quicksort  Assume random pivot.  T(0) = 1  T(1) = 1  T(N) = T(i) + T(N-i-1) + cN, for N>1  where I is the size of the left subarray.

57 Worst-case analysis  If the pivot is always the smallest element, then:  T(0) = 1  T(1) = 1  T(N) = T(0) + T(N-1) + cN, for N>1   T(N-1) + cN  = O(N 2 )  See the book for details on this solution.

58 Best-case analysis  In the best case, the pivot is always the median element.  In that case, the splits are always “down the middle”.  Hence, same behavior as mergesort.  That is, O(Nlog N).

59 Average-case analysis  Consider the quicksort tree: 10547131730222519 517134730222105 19 51730222105 1347 105222

60 Average-case analysis  The time spent at each level of the tree is O(N).  So, on average, how many levels?  That is, what is the expected height of the tree?  If on average there are O(log N) levels, then quicksort is O(Nlog N) on average.

61 Average-case analysis  We’ll answer this question next time…

62 Summary of quicksort  A fast sorting algorithm in practice.  Can be implemented in-place.  But is O(N 2 ) in the worst case.  Average-case performance?


Download ppt "Sorting 15-211 Fundamental Data Structures and Algorithms Klaus Sutner February 17, 2004."

Similar presentations


Ads by Google