Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSC – 332 Data Structures Sorting

Similar presentations


Presentation on theme: "CSC – 332 Data Structures Sorting"— Presentation transcript:

1 CSC – 332 Data Structures Sorting
Dr. Curry Guinn

2 Today’s Class Don’t forget Homework 7 All the sorts
Insertion sort Shell sort Heap sort Merge sort Quick sort What’s the best we can do? In class quiz on Tuesday (See last slide for questions)

3 Sorting Most “easy” algorithms are O(N2)
Any algorithm that sorts by swapping adjacent elements will be O(N2)

4 Insertion sort Take an array of numbers of size n. 1) Initially p = 1
2) Let the first p elements be sorted. 3) Insert the (p+1)th element properly in the list so that now p+1 elements are sorted. 4) Increment p and go to step (3)

5 Insertion Sort... Consists of N - 1 passes
Consists of N - 1 passes For pass p = 1 through N - 1, ensures that the elements in positions 0 through p are in sorted order elements in positions 0 through p - 1 are already sorted move the element in position p left until its correct place is found among the first p + 1 elements

6 Analysis: worst-case running time
Inner loop is executed p times, for each p=1..N-1  Overall: N-1 = …= O(N2)

7 Analysis The bound is tight (N2)
That is, there exists some input which actually uses (N2) time

8 Average case analysis Given an array A of elements, an inversion is an ordered pair (i, j) such that i < j, but A[i] > A[j]. (out of order elements) Assume no duplicate elements. Theorem: The average number of inversions in an array of n distinct elements is n (n - 1) / 4. Corollary: Any algorithm that sorts by exchanging adjacent elements requires O(n2) time on average.

9 Shellsort What’s the concept? Increment sequence
Shell’s sequence {1, 2, 4, 8, …} O(N2) Hibbard’s sequence {1, 3, 7, 15, …} O(N3/2) best sequence known { 1, 5, 19, 41, 109, …} O(N7/6) (either 9*4i -9*2i + 1 or 4i - 3*2i + 1)

10 Shell Sort Example: an array of int, length n = 9 gap = n/2 = 4 gap = max(1, (int) gap /2) = 2 1 10 24 41 31 3 41 25 27 31 44 27 44 A 3 3 10 25 27 24 27 41 25 44 47 47 24 31 44 27 10 47 41 3 25 i i i i i i i i i i i i for (int i = gap; i < A.length; i++) { int temp = A[i]; for ( j = i; j >= gap && temp < A[j-gap]; j -= gap) A[j] = A[j – gap]; A[j] = temp; }

11

12 Heapsort What’s the concept?

13 Performance of Heapsort

14 Mergesort Based on divide-and-conquer strategy
Divide the list into two smaller lists of about equal sizes Sort each smaller list recursively Merge the two sorted lists to get one sorted list How do we divide the list? How much time needed? How do we merge the two sorted lists? How much time needed?

15 Dividing If the input list is a linked list, dividing takes (N) time
We scan the linked list, stop at the N/2 th entry and cut the link If the input list is an array A[0..N-1]: dividing takes O(1) time we can represent a sublist by two integers left and right: to divide A[left..right], we compute center=(left+right)/2 and obtain A[left..center] and A[center+1..right] Try left=0, right = 50, center=?

16 Mergesort Divide-and-conquer strategy
recursively mergesort the first half and the second half merge the two sorted halves together

17

18 How to merge? Input: two sorted array A and B
Output: an output sorted array C Three counters: Actr, Bctr, and Cctr initially set to the beginning of their respective arrays (1)   The smaller of A[Actr] and B[Bctr] is copied to the next entry in C, and the appropriate counters are advanced (2)   When either input list is exhausted, the remainder of the other list is copied to C

19 Example: Merge

20 Example: Merge... Running time analysis: Space requirement:
Clearly, merge takes O(m1 + m2) where m1 and m2 are the sizes of the two sublists. Space requirement: merging two sorted lists requires linear extra memory additional work to copy to the temporary array and back

21 Mergesort void MergeSort(int arr[], int temp[], int left, int right) {
if (left < right) { int center = (left + right)/ 2; // sort left half MergeSort(arr, temp, left, center); // sort right half MergeSort(arr, temp, center+1, right); // merge left and right halves Merge(arr, temp, left, center+1, right); }

22 void Merge(int arr[], int temp[], int curLeft,
int curRight, int endRight) { int endLeft = curRight – 1; int curTemp = curLeft; int numElems = endRight – startLeft + 1; // Main loop while (curLeft <= endLeft && curRight <= endRight) if ( arr[curLeft] <= arr[curRight]) temp[curTemp++] = arr[curLeft++]; else temp[curTemp++] = arr[curRight++]; while (curLeft <= endLeft) // finish left while (curRight <= endRight) // finish right // copy back to arr for (int = i = 0; i < numElems; i++, endRight--) arr[endRight] = tempArray[endRight]; }

23 Analysis of mergesort Let T(N) denote the worst-case running time of mergesort to sort N numbers. Assume that N is a power of 2. Divide step: O(1) time Conquer step: 2 T(N/2) time Combine step: O(N) time Recurrence equation: T(1) = 1 T(N) = 2T(N/2) + N

24 Analysis of mergesort Let T(N) denote the worst-case running time of mergesort to sort N numbers. Assume that N is a power of 2. Divide step: O(1) time Conquer step: 2 T(N/2) time Combine step: O(N) time

25 Solving the Recurrence Relation
See board Other important recurrence relations T(n) = T(n – 1) + 1 O(n) T(n) = T(n/2) + 1 O(log n) T(n) = T(n – 1) + n O(n2)

26 Master Theorem Let T(n) be a monotonically increasing function that satisfies T(n) = a T(n/b) + f(n) T(1) = c where a  1, b  2, c>0. If f(n) is (nd) where d  0 then if a < bd T(n) = If a = bd if a > bd

27 Quicksort Fastest known sorting algorithm in practice for many data types Average case: O(N log N) Worst case: O(N2) But, the worst case seldom happens. Another divide-and-conquer recursive algorithm like mergesort

28 Quicksort Divide step: Conquer step: recursively sort S1 and S2
Pick any element (pivot) v in S Partition S – {v} into two disjoint groups S1 = {x  S – {v} | x  v} S2 = {x  S – {v} | x  v} Conquer step: recursively sort S1 and S2 Combine step: combine the sorted S1, followed by v, followed by the sorted S2 v v S1 S2

29 Example: Quicksort

30 Example: Quicksort...

31 Pseudocode Input: an array A[p, r] Quicksort (A, p, r) {
if (p < r) { q = Partition (A, p, r) //q is the position of the pivot element Quicksort (A, p, q-1) Quicksort (A, q+1, r) }

32 Partitioning Partitioning
Key step of quicksort algorithm Goal: given the picked pivot, partition the remaining elements into two smaller sets Many ways to implement We will learn an easy and efficient partitioning strategy here. How to pick a pivot will be discussed later

33 Partitioning Strategy
Want to partition an array A[left .. right] First, get the pivot element out of the way by swapping it with the last element. (Swap pivot and A[right]) Let i start at the first element and j start at the next-to-last element (i = left, j = right – 1) swap 5 6 4 6 3 12 19 5 6 4 19 3 12 6 i j pivot

34 Partitioning Strategy
Want to have A[p] <= pivot, for p < i A[p] >= pivot, for p > j When i < j Move i right, skipping over elements smaller than the pivot Move j left, skipping over elements greater than the pivot When both i and j have stopped A[i] >= pivot A[j] <= pivot 5 6 4 19 3 12 6 5 6 4 19 3 12 6 i j i j

35 Partitioning Strategy
When i and j have stopped and i is to the left of j Swap A[i] and A[j] The large element is pushed to the right and the small element is pushed to the left After swapping A[i] <= pivot A[j] >= pivot Repeat the process until i and j cross swap 5 6 4 19 3 12 6 5 3 4 19 6 12 6 i j i j

36 Partitioning Strategy
When i and j have crossed Swap A[i] and pivot Result: A[p] <= pivot, for p < i A[p] >= pivot, for p > i 5 3 4 19 6 12 6 i j 5 3 4 19 6 12 6 j i 5 3 4 6 6 12 19 j i

37 Small arrays For very small arrays, quicksort does not perform as well as insertion sort How small depends on many factors, such as the time spent making a recursive call, the compiler, etc Do not use quicksort recursively for small arrays Instead, use a sorting algorithm that is efficient for small arrays, such as insertion sort

38 Picking the Pivot Use the first element as pivot
If the input is random, ok If the input is presorted (or in reverse order) All the elements go into S2 (or S1) This happens consistently throughout the recursive calls Results in O(n2) behavior (Analyze this case later) Choose the pivot randomly Generally safe Random number generation can be expensive

39 Picking the Pivot Use the median of the array
Partitioning always cuts the array into roughly half An optimal quicksort (O(N log N)) However, hard to find the exact median e.g., sort an array to pick the value in the middle

40 Pivot: median of three We will use median of three
Compare just three elements: the leftmost, rightmost and center Swap these elements if necessary so that A[left] = Smallest A[right] = Largest A[center] = Median of three Pick A[center] as the pivot Swap A[center] and A[right – 1] so that pivot is at second last position (why?)

41 Pivot: median of three A[left] = 2, A[center] = 13, A[right] = 6 2 5 6
4 13 3 12 19 6 2 5 6 4 6 3 12 19 13 Swap A[center] and A[right] 2 5 6 4 6 3 12 19 13 Choose A[center] as pivot pivot 2 5 6 4 19 3 12 6 13 Swap pivot and A[right – 1] pivot Note we only need to partition A[left + 1, …, right – 2]. Why?

42 Main Quicksort Routine
Choose pivot Partitioning Recursion For small arrays

43 Partitioning Part Works only if pivot is picked as median-of-three.
A[left] <= pivot and A[right] >= pivot Thus, only need to partition A[left + 1, …, right – 2] j will not run past the end because a[left] <= pivot i will not run past the end because a[right-1] = pivot

44 Analysis Assumptions: A random pivot (no median-of-three partitioning
No cutoff for small arrays Running time pivot selection: constant time O(1) partitioning: linear time O(N) running time of the two recursive calls T(N)=T(i)+T(N-i-1)+cN where c is a constant i: number of elements in S1

45 Worst-Case Analysis What will be the worst case?
The pivot is the smallest element, all the time Partition is always unbalanced

46 Best-case Analysis What will be the best case?
Partition is perfectly balanced. Pivot is always in the middle (median of the array)

47 Average-Case Analysis
Assume Each of the sizes for S1 is equally likely This assumption is valid for our pivoting (median-of-three) and partitioning strategy On average, the running time is O(N log N)

48 Quicksort Faster than Mergesort
Both quicksort and mergesort take O(N log N) in the average case. Why is quicksort faster than mergesort? The inner loop consists of an increment/decrement (by 1, which is fast), a test and a jump. There is no extra juggling as in mergesort. inner loop

49 Is N Log N the best we can do?
For comparison-based sorting, yes.

50 Lower Bound for Sorting
Mergesort and heapsort worst-case running time is O(N log N) Are there better algorithms? Goal: Prove that any sorting algorithm based on only comparisons takes (N log N) comparisons in the worst case (worse-case input) to sort N elements.

51 Lower Bound for Sorting
Suppose we want to sort N distinct elements How many possible orderings do we have for N elements? We can have N! possible orderings (e.g., the sorted output for a,b,c can be a b c, b a c, a c b, c a b, c b a, b c a.)

52 Lower Bound for Sorting
Any comparison-based sorting process can be represented as a binary decision tree. Each node represents a set of possible orderings, consistent with all the comparisons that have been made The tree edges are results of the comparisons

53 Decision tree for Algorithm X for sorting three elements a, b, c

54 Lower Bound for Sorting
The worst-case number of comparisons used by the sorting algorithm is equal to the depth of the deepest leaf The average number of comparisons used is equal to the average depth of the leaves A decision tree to sort N elements must have N! leaves a binary tree of depth d has at most 2d leaves  the tree must have depth at least log2 (N!) Therefore, any sorting algorithm based on only comparisons between elements requires at least  log2(N!)  comparisons in the worst case.

55 Lower Bound for Sorting
Any sorting algorithm based on comparisons between elements requires (N log N) comparisons.

56 Linear time sorting Can we do better (linear time algorithm) if the input has special structure (e.g., uniformly distributed, every numbers can be represented by d digits)? Yes. Bucket Sort (Counting Sort)

57 Bucket Sort Assume N integers to be sorted, each is in the range 1 to M. Define an array B[1..M], initialize all to 0  O(M) Scan through the input list A[i], insert A[i] into B[A[i]]  O(N) Scan B once, read out the nonzero integers  O(M) Total time: O(M + N) if M is O(N), then total time is O(N) Can be bad if range is very big, e.g. M=O(N2) N=7, M = 9, Want to sort 1 2 3 5 6 8 9 Output:

58 For Next Class, Tuesday Homework 7 due Monday, 03/24 For Tuesday
More Sorting Quiz (See next slide)

59 Quiz Questions Weiss, Chapter 7 GEB, Chapter 6:
Shell’s original increments result in an algorithm that is O(N2). What is the problem with Shell’s increments as described at the bottom of page 252 (2nd Edition) or page 276 (3rd Edition) (7.4.1)? What is the worst case analysis of heapsort, given on page 256 (2nd Edition) or page (3rd Edition) (7.5.1)? Mergesort uses the minimum number of comparisons of any sort but it does use extra memory. Quicksort might use more comparisons, but it does not use extra memory. For objects that are large, which sort is better according to the discussion on page 263 (2nd edition) or page 288 (3rd edition) (7.6.1). Does it depend on the language of implementation? Why is choosing the first element as the pivot “an absolutely horrible idea” according to Weiss? GEB, Chapter 6: Definition of Genotype and Phenotype Definition of Aleatoric music What were the 3 languages encoded on the Rosetta Stone? Quiz is open notes and must be taken in class. No electronic submission.


Download ppt "CSC – 332 Data Structures Sorting"

Similar presentations


Ads by Google