Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sorting 15-211 Fundamental Data Structures and Algorithms Aleks Nanevski February 17, 2004.

Similar presentations


Presentation on theme: "Sorting 15-211 Fundamental Data Structures and Algorithms Aleks Nanevski February 17, 2004."— Presentation transcript:

1

2 Sorting 15-211 Fundamental Data Structures and Algorithms Aleks Nanevski February 17, 2004

3 Announcements  Reading:  Chapter 8  Quiz #1 available on Thursday  Quiz will be much easier to complete if you solved Homework 5

4 Introduction to Sorting

5 The Problem We are given a sequence of items a 1 a 2 a 3 … a n-1 a n We want to rearrange them so that they are in non-decreasing order. More precisely, we need a permutation f such that a f(1)  a f(2)  a f(3)  …  a f(n-1)  a f(n).

6 A Constraint Comparison Based Sorting While we are rearranging the items, we will only use queries of the form a i  a j Or variants thereof ( and so forth).

7 Comparison-based sorting The important point here is that the algorithm can can only make comparison such as if( a[i] < a[j] ) … We are not allowed to look at pieces of the elements a[i] and a[j]. For example, if these elements are numbers, we are not allowed to compare the most significant digits.

8 Flips and inversions An unsorted array. 24471399105222 inversion flip Two elements are inverted if A[i] > A[j] for i < j. Flip is an inversion of two consecutive elements.

9 Example 321654 In the above list, there are 9 inversions: (3, 2), (2, 1), (3, 1), (6, 5), (5, 4), (6, 4), (9, 8), (8, 7), (9, 7) Out of which, 6 are flips: (3, 2), (2, 1), (6, 5), (5, 4), (9, 8), (8, 7) 987

10 Naïve sorting algorithms  Bubble sort: scan for flips, until all are fixed What is the running time? 321654 231654 213654 213564 213546 Etc...

11 Naïve sorting algorithms  Bubble sort: scan for flips, until all are fixed What is the running time? O(n 2 ) 321654 231654 213654 213564 213546 Etc...

12 Insertion sort for i = 1 to n-1 do insert a[i] in the proper place in a[0:i-1] Invariant: after i steps, the sub-array A[0:i+1] is sorted

13 Insertion sort 1054713993022247105139930222134710599302221347991053022213304799105222 10547139930222 Sorted subarray

14 How fast is insertion sort? tmp = a[i]; for (j = i; j>0 && a[j-1]>tmp; j--) a[j] = a[j-1]; a[j] = tmp; To insert a[i] into a[0:i-1], slide all elements larger than a[i] to the right.

15 How fast is insertion sort? O(#inversions) sliding steps very fast if array is nearly sorted to begin with tmp = a[i]; for (j = i; j>0 && a[j-1]>tmp; j--) a[j] = a[j-1]; a[j] = tmp; To insert a[i] into a[0:i-1], slide all elements larger than a[i] to the right.

16 How many inversions?  In this case, there are 0+1 +... + (n-3)+(n-2)+(n-1)+n or inversions nn-13210  Consider the worst case:...

17 How many inversions? What about the average case? Consider: p = x 1 x 2 x 3 … x n-1 x n For any such p, let rev(p) be its reversal. Then (x i,x j ) is an inversion either in p or in rev(p). There are n(n-1)/2 pairs (x i,x j ), hence the average number of inversions in a permutation is n(n-1)/4, or O(n 2 ).

18 How long does it take to sort?  Can we do better than O(n 2 )?  In the worst case?  In the average case?

19 Heapsort  Remember heaps:  complete binary tree, each parent smaller than the child  buildHeap has O(n) worst-case running time.  deleteMin has O(log n) worst-case running time.  Heapsort:  Build heap. O(n)  DeleteMin until empty. O(n log n)  Total worst case: O(n log n)

20 n 2 vs n log n

21 Sorting in O(n log n)  Heapsort establishes the fact that sorting can be accomplished in O(n log n) worst-case running time.

22 Heapsort in practice  The average-case analysis for heapsort is somewhat complex.  In practice, heapsort consistently tends to use nearly n log n comparisons.  So, while the worst case is better than n 2, other algorithms sometimes work better.

23 Shellsort  Shellsort, like insertion sort, is based on swapping inverted pairs.  Achieves O(n 3/2 ) running time (see book for details).  Idea: sort in phases, with insertion sort at the end  Remember: insertion sort is good when the array is nearly sorted

24 Shellsort  Definition: k-sort == sort by insertion items that are k positions apart.  therefore, 1-sort == simple insertion sort  Now, perform a series of k-sorts for k = m 1 > m 2 >... > 1

25 Shellsort algorithm 10547131730222519 After 4-sort 3047517105222131951713193047105222 After 2-sort 51317193047105222 After 1-sort Note: many inversions fixed in one exchange.

26 Recursive Sorting

27 Recursive sorting  Intuitively, divide the problem into pieces and then recombine the results.  If array is length 1, then done.  If array is length N>1, then split in half and sort each half.  Then combine the results.  An example of a divide-and-conquer algorithm.

28 Divide-and-conquer

29

30 Divide-and-conquer is important  We will see several examples of divide-and-conquer in this course.

31 Recursive sorting  If array is length 1, then done.  Otherwise, split into two smaller pieces.  Sort each piece.  Combine the sorted pieces.

32 Analysis of recursive sorting  Suppose it takes time T(N) to sort N elements.  Suppose also it takes time N to combine the two sorted arrays.  Then:  T(1) = 1  T(N) = 2T(N/2) + N, for N>1  Solving for T gives the running time for the recursive sorting algorithm.

33 Remember recurrence relations?  Systems of equations such as  T(1) = 1  T(N) = 2T(N/2) + N, for N>1 are called recurrence relations (or sometimes recurrence equations).

34 A solution  A solution for  T(1) = 1  T(N) = 2T(N/2) + N  is given by  T(N) = N log N + N  which is O(N log N).

35 Divide-and-Conquer Theorem  Theorem: Let a, b, c  0.  The recurrence relation  T(1) = b  T(N) = aT(N/c) + bN  for any N which is a power of c  has upper-bound solutions  T(N) = O(N)if a<c  T(N) = O(N log N)if a=c  T(N) = O(N log c a )if a>c a=2, b=1, c=2 for recursive sorting

36 Upper-bounds  Corollary:  Dividing a problem into p pieces, each of size N/p, using only a linear amount of work, results in an O(N log N) algorithm.

37 Upper-bounds  Proof of this theorem left for later.

38 Mergesort

39  Mergesort is the most basic recursive sorting algorithm.  Divide array in halves A and B.  Recursively mergesort each half.  Combine A and B by successively looking at the first elements of A and B and moving the smaller one to the result array.  Note: Should be careful to avoid creating lots of intermediate arrays.

40 Mergesort

41 But don’t actually want to create all of these arrays!

42 Mergesort LLRR Use simple indexes to perform the split. Use a single extra array to hold each intermediate result.

43 Analysis of mergesort  Mergesort generates almost exactly the same recurrence relations shown before.  T(1) = 1  T(N) = 2T(N/2) + N - 1, for N>1  Thus, mergesort is O(N log N).

44 Quicksort

45  Quicksort was invented in 1960 by Tony Hoare.  Although it has O(N 2 ) worst-case performance, on average it is O(N log N).  More importantly, it is the fastest known comparison-based sorting algorithm in practice.

46 Quicksort idea  Choose a pivot.

47 Quicksort idea  Choose a pivot.  Rearrange so that pivot is in the “right” spot.

48 Quicksort idea  Choose a pivot.  Rearrange so that pivot is in the “right” spot.  Recurse on each half and conquer!

49 Quicksort algorithm  If array A has 1 (or 0) elements, then done.  Choose a pivot element x from A.  Divide A-{x} into two arrays:  B = {yA | yx}  C = {yA | yx}  Quicksort arrays B and C.  Result is B+{x}+C.

50 Quicksort algorithm 10547131730222519517134730222105 19 51730222105 1347 105222

51 Quicksort algorithm 10547131730222519 517134730222105 19 51730222105 1347 In practice, insertion sort is used once the arrays get “small enough”. 105222

52 Rearranging in place 85 24 63 50 17 31 96 45 85 24 63 45 17 31 96 50 LR LR 31 24 63 45 17 85 96 50 LR pivot

53 Rearranging in place 31 24 63 45 17 85 96 50 LR 31 24 17 45 63 85 96 50 RL 31 24 17 45 50 85 96 63 31 24 17 45 63 85 96 50 LR

54 Quicksort is fast but hard to do  Quicksort, in the early 1960’s, was famous for being incorrectly implemented many times.  More about invariants next time.  Quicksort is very fast in practice.  Faster than mergesort because Quicksort can be done “in place”.

55 Informal analysis  If there are duplicate elements, then algorithm does not specify which subarray B or C should get them.  Ideally, split down the middle.  Also, not specified how to choose the pivot.  Ideally, the median value of the array, but this would be expensive to compute.  As a result, it is possible that Quicksort will show O(N 2 ) behavior.

56 Worst-case behavior 10547131730222519 5 471317302221910547105173022219 13 17 471051930222 19 If always pick the smallest (or largest) possible pivot then O(n 2 ) steps

57 Analysis of quicksort  Assume random pivot.  T(0) = 1  T(1) = 1  T(N) = T(i) + T(N-i-1) + cN, for N>1  where I is the size of the left subarray.

58 Worst-case analysis  If the pivot is always the smallest element, then:  T(0) = 1  T(1) = 1  T(N) = T(0) + T(N-1) + cN, for N>1   T(N-1) + cN  = O(N 2 )  See the book for details on this solution.

59 Best-case analysis  In the best case, the pivot is always the median element.  In that case, the splits are always “down the middle”.  Hence, same behavior as mergesort.  That is, O(N log N).

60 Average-case analysis  Consider the quicksort tree: 10547131730222519 517134730222105 19 51730222105 1347 105222

61 Average-case analysis  At each level of the tree, there are less than N nodes  So, time spent at each level is O(N)  On average, how many levels are there?  That is, what is the expected height of the tree?  If on average there are O(log N) levels, then quicksort is O(N log N) on average.

62 Average-case analysis  We’ll answer this question next time…

63 Summary of quicksort  A fast sorting algorithm in practice.  Can be implemented in-place.  But is O(N 2 ) in the worst case.  Average-case performance?


Download ppt "Sorting 15-211 Fundamental Data Structures and Algorithms Aleks Nanevski February 17, 2004."

Similar presentations


Ads by Google