Presentation is loading. Please wait.

Presentation is loading. Please wait.

Lecture 8 Sorting. Sorting (Chapter 7) We have a list of real numbers. Need to sort the real numbers in increasing order (smallest first). Important points.

Similar presentations


Presentation on theme: "Lecture 8 Sorting. Sorting (Chapter 7) We have a list of real numbers. Need to sort the real numbers in increasing order (smallest first). Important points."— Presentation transcript:

1 Lecture 8 Sorting

2 Sorting (Chapter 7) We have a list of real numbers. Need to sort the real numbers in increasing order (smallest first). Important points to remember: We have O(N 2 ) algorithms O(NlogN) algorithms (heap sort) In general, any algorithm for sorting must be  (NlogN) (will prove it) In special cases, we can have O(N) sorting algorithms.

3 Insertion Sort Section 7.2 (Weiss) Initially P = 1 Let the first P elements be sorted. (3)Then the P+1 th element is inserted properly in the list so that now P+1 elements are sorted. Now increment P and go to step (3)

4 P+1th element is inserted as follows: Store the P+1 th element first as some temporary variable, temp; If Pth element greater than temp, then P+1th element is set equal to the Pth one, If P-1 th element greater than temp, then Pth element is set equal to temp….. Continue like this till some kth element is less than or equal to temp, or you reach the first position. Let this stop at kth position (k can be 1). Set the kth element equal to temp.

5 Extended Example Need to sort: 34 8 64 51 32 21 P = 1; Looking at first element only, and we do not change. P = 2; Temp = 8; 34 > Temp, so second element is set to 34. We have reached the end of the list. We stop there. Thus, first position is set equal to Temp;

6 After second pass; 8 34 64 51 32 21 Now, the first two elements are sorted. Set P = 3. Note down the rest from the board.

7 Pseudo Code Assume that the list is stored in an array, A (can do with a linked list as well) Insertion Sort(A[],int N) { for (P = 1; P < N; P++) { Temp = A[P]; for (j = P; j > 0 and A[j-1] > Temp; j--) A[j] = A[j-1]; A[j] = Temp; } }

8 Complexity Analysis Inner loop is executed P times, for each P Because of outer loop, P changes from 1 to N. Thus overall we have 1 + 2 +…..N. This is O(N 2 ) Space requirement: O(N)

9 Heap Sort Section 7.5 Complexity: O(NlogN) Suppose we need to store the sorted elements, in this case we need an additional array. Total space is 2N. This is O(N), but can we do with a space of N positions only? Whenever we do a deletemin, the heapsize shrinks by one. We can store the elements in the additional spaces.

10 We have a heap of size N. After the first deletemin, the last element is at position N-1. Store the retrieved element at position N After the second deletemin, the last element is at position N- 2, so the retrieved element is stored at N-1, and so on.

11 Merge Sort Divide and Conquer Divide the list into two smaller lists: Sort the smaller lists. Merge the sorted lists to get an overall sorted list. How do we merge 2 sorted lists?

12 Consider 2 sorted arrays A and B We will merge them into a sorted array C. Have three variables, posA, posB, posC Initially, both of these are at the respective first positions (3)Compare the elements at posA and posB Whichever is smaller, goes into the posC position of C, and the position is advanced in the corresponding array. posC is also advanced (e.g., if element of A goes into C, then posA is advanced). Go to step (3) After all the elements are exhausted in one array, the remaining of the other are copied in C.

13 Suppose we want to merge 1 13 24 26 2 15 27 38 Copy the example from the board. Complexity of the merging?

14 Pseudocode for merging Merge (A, B, C){ posA = 0; posB=0; posC=0; While (posA < sizeA)&& (posB < sizeB) {if (A[posA] <B[posB]) { C[posC]=A[posA]; posA = posA + 1;}

15 Else {C[posC] = B[posB]; posB = posB + 1;} posC = posC + 1;} While (posB < sizeB) {C[posC] =B[posB]; Increment posC and posB;} While (posA < sizeA) {C[posC] =A[posA]; Increment posC and posA;} }

16 Overall approach Divide an array into 2 parts Recursively sort the 2 parts (redivide, recursively sort, sorting one element is easy) Merge the 2 parts Mergesort(A,left,right) { if (left  right) break; center =  (left + right)/2  ; Mergesort(A,left,center); Mergesort(A,center+1,right); B = left half of A; C = right half of A; Merge(B,C,D); Set left to right of A = D;}

17 Note down the extended example from the board; T(n) = 2T(n/2) + cn T(1) = 1; Using master theorem, we get T(n) as O(nlogn) Any problem with merge sort?

18 Quick Sort Storage: O(n) We will choose a pivot element in the list. Partition the array into two parts. One contains elements smaller than this pivot, another contains elements larger than this pivot. Recursively sort and merge the two partitions. Section 7.7 How do we merge? We do not need any additional space for merging. Thus storage is O(n)

19 Copy the example from the board. We would like the partitions to be as equal as possible. This depends on the choice of the pivot. Equal partitions are assured if we can choose our pivot as the element which is in the middle of the sorted list. It is not easy to find that. Pivots are chosen randomly. In worst case, pivot choice can make one partition have n-1 elements, another partition 1 element. On an average this will not happen. How can the approach become bad?

20 Pivot can be chosen as a random element in the list. Or choose the first, the last and the center and take the median of the three (middle position). We will do the latter. We discuss the partitioning now.

21 Interchange the pivot with the last element Have a pointer at the first element (P1), and one at the second last element (P2). Move P1 to the right skipping elements which are less than the pivot. Move P2 to the left skipping elements which are more than the pivot. Stop P1 when we encounter an element greater than or equal to the pivot. Stop P2 when we encounter an element lesser than or equal to the pivot.

22 Interchange the elements pointed to by P1 and P2. If P1 is right of P2, stop, otherwise move P1 and P2 as before till we stop again. Note down the example from the board. 8 1 4 9 6 3 5 2 7 0 When we stop, swap P1 with the last element which is the pivot

23 At any time can you say anything about the elements to the left of P1? Also, for right of P2? Elements to the left of P1 are less than or equal to the pivot. Elements to the right of P1 are greater than or equal to the pivot. When P1 and P2 cross, what can you say about the elements in between P1 and P2? They are all equal to the pivot.

24 Suppose P1 and P2 have crossed, and stopped and the pivot is interchanged with P1. How do we form the partition? Everything including P1 and its right are in one partition (greater). Remaining are in the left partition. We can also do some local optimizations in length. Complexity? Space?

25 Procedure Summary Partition the array Sort the partition recursively. Note down the example from the board. 8 1 4 9 6 3 5 2 7 0

26 Pseudocode Quicksort(A, left, right) { Find pivot; Interchange pivot and A[right]; P1 = left; P2 = right – 1; Partition(A, P1, P2, pivot); /*returns newP1*/ Interchange A[newP1] and A[right]; Quicksort(A, left, newP1-1); Quicksort(A, newP1, right); }

27 Partition(A, P1, P2,pivot) { While (P1  P2) { While (A[P1] < pivot) increment P1; While (A[P2] > pivot) decrement P2; Swap A[P1] and A[P2]; increment P1; decrement P2; } newP1 = P1; return(newP1); }

28 Worst Case Analysis T(n) = T(n 1 ) + T(n 2 ) + cn T(1) = 1; n 1 + n 2 = n In good case, n 1 = n 2 = n/2 always Thus T(n) = O(nlogn)

29 In bad case, n 1 = 1 n 2 = n-1 always T(n) = pn + T(n-1) = pn + p(n-1) + T(n-2) …….. = p(n + n-1 +……+1) = pn 2 Thus T(n) is O(n 2 ) in worst case. Average case complexity is O(nlog n)

30 Quicksort performs well for large inputs, but not so good for small inputs. When the divisions become small, we can use insertion sort to sort the small divisions instead.

31 General Lower Bound For Sorting Suppose a list of size k must be sorted. How many orderings can we have for k members? Depending on the values of the n numbers, we can have n! possible orders, e.g., the sorted output for a,b,c can be a,b,c, b,a,c, a,c,b, c,a,b, c,b,a, b,c,a Section 7.9

32 Any comparison based sorting process can be represented as a binary decision tree. A node represents a comparison, and a branch represents the outcome of a comparison. The possible orders must be the leaves of the decision tree, i.e., depending on certain orders we will have certain comparison sequences, and we will reach a certain leaf. Note down the example. Different sorting algorithms will have different comparison orders, but the leaves are same for all sorting trees.

33 Thus any sorting algorithm for sorting n inputs can be represented as a binary decision tree of n! leaves. Such a tree has depth at least log n! This means that any comparison based sorting algorithm need to perform at least log n! Comparisons in the worst case. log n! = log n + log (n-1) +…..+log(2) + log(1)  log n + log (n-1) +……+ log (n/2) + ….  (n/2)log (n/2) = (n/2)log (n) – (n/2) =  (nlog n)

34 Special Case Sorting Now we will present some linear time sorting algorithms. These apply only when the input has a special structure, e.g., inputs are integers. Counting sort Radix sort Bucket sort Please follow class notes

35 Counting Sort Suppose we know that the list to be sorted consists of n integers in the range 1 to M We will declare an array B with M positions. Initially all positions of B contain 0. Scan through list A A[j] is inserted in B[A[j]] B is scanned once again, and the nonzero elements are read out.

36 Note down example from the board. What do we do with equal elements? B is an array of pointers. Each position in the array has 2 pointers, head and tail. Tail points to the end of a linked list, and head points to the beginning. A[j] is inserted at the end of the list B[A[j]] Again, Array B is sequentially traversed and each nonempty list is printed out.

37 Complexity? Storage? Storage could be large for large ranges. Supposing we have a list of elements, such that every element has 2 fields, one an integer, and another field something else. During sorting we just look at the integer field. Supposing element a occurs before element b in the input list, and a and b have the same integer components, Can their positions be reversed in the output if counting sort is used? O(n + M) If M is O(n), then complexity is O(n) O(n+M)

38 Stability Property Relative position of 2 equal elements remain the same in the output as in the input. Is merge sort stable? Quick sort? Insertion sort? Stability property of counting sort will be used in designing a new sorting algorithm, radix sort.

39 Radix Sort Every integer is represented by at most d digits (e.g., decimal representation) Every digit has k values (k = 10 for decimal, k = 2 for binary) There are n integers

40 We would sort the integers by their least significant digits first using counting sort. Next sort it by its second least significant digit and so on. Note down example from the board Should it work?

41 Note that if the most significant digit of one number a is less than that of another number b, then a comes before b. However, if the most significant digits are the same for a and b, and the difference is in the second most significant digits ( second most significant digit of a is less than that of b), then a comes before b. Why? Remember radix sort. Complexity?

42 Complexity Analysis There are d counting sorts. Each counting sort works on n numbers with the range being 0 to k. Complexity of each counting sort is O(n + k) Overall O(d(n+k)) Storage?O(n + k)

43 If just counting sort were used for these numbers, what is the worst case complexity? O(k d + n) for storage and running time

44 We have sorted integers in worst case linear time. Can we do anything similar for real numbers? Real numbers can be sorted in average case linear complexity under a certain assumption. assuming that the real numbers are uniformly distributed if we know that the real numbers belong to a certain range, then uniform distribution means any two equal size subranges contain roughly the same number of real numbers in the list. Bucket Sort

45 Bucket sort can be used to sort such real numbers in average case linear complexity The idea is similar to direct sequence hashing and counting sort. We have an array of n pointers. Each position has a pointer pointing to a linked list. We have a function which generates an integer from a real number, and adds the real number in the linked list at the position. We have a list of n real numbers.

46 The function is chosen such that the elements at the position j of the array are less than those at position k, if j < k. The function must be such that the number of elements in each linked list is roughly the same. Thus we just sort the linked lists at every position. Next we start from position 0, print out the sorted linked list, go to position 1, print out the sorted linked (assuming it is not empty) and so on. The output is a sorted list.

47 So if there are n real numbers and n positions in the array, then each linked has roughly a constant number c. Thus sorting complexity for a linked list is a constant. What is the printing complexity? Thus overall complexity is O(n)

48 How do we find such a function? We can if the numbers are uniformly distributed. Suppose we know the numbers are in an interval of size M. We divide the interval M into n equal size subintervals, and name them 0, 1,….n-1. Any real number in subinterval j is added to the list at position j in the array. Given any real number, the function actually finds the number of the subinterval where it belongs.

49 Firstly, why should such a function put roughly equal number of elements in each list? Second, evaluation time for the function must be a constant. That is, the function must find the interval number in constant time. Will give an example function which applies for a specific case, but in general such functions can be designed for more general cases as well.

50 Suppose the real numbers are in the interval [0, 1]. The function  nx  gets the correct interval number for any real number x in the interval [0, 1]. What is  nx  for any number x is the interval [0, 1/n) ? any number x is the interval [1/n, 2/n) ? any number x is the interval [2/n, 3/n) ? …………………………….. Evaluation complexity for the function?

51 Thus we have divided the interval [0, 1] in equal sized subintervals, and We have a function which returns the subinterval number of a real number in constant time.

52 Procedure Summary We have a list of n real numbers. Given any number x, we add it at the list at A[  nx  ]. We sort all of the individual linked lists. We output the sorted linked list at A[0], then at A[1], and so on. The output is sorted.

53 On an average every linked list contains a constant number of elements, (as a linked list contains elements from a subinterval, and all subintervals are equal sized, and hence they have equal number of elements because of uniform distribution) Thus sorting complexity is constant for each linked list. Thus overall sorting and printing complexity is O(n).


Download ppt "Lecture 8 Sorting. Sorting (Chapter 7) We have a list of real numbers. Need to sort the real numbers in increasing order (smallest first). Important points."

Similar presentations


Ads by Google