At the end of this session, Student will be able to:

17CS1102 (Data Structures) Topic: To Implementation and Analyse Internal sorting Indicator:3
At the end of this session, Student will be able to: Understand what is Internal Sorting How to Compare time complexity of different Internal Sorting algorithms Deciding best Internal sorting algorithm

Internal Sorting Techniques
List of sorting Techniques Insertion Sort Shell Sort Merge Sort Quick Sort

Insertion Sort Insertion sort consists of n - 1 passes.
For pass p = 2 through n, insertion sort ensures that the elements in positions 1 through p are in sorted order. Following Figure shows a sample file after each pass of insertion sort.

Example Original Positions Moved After p = After p = After p = After p = After p =

Insertion sort after each pass
In pass p, we move the pth element left until its correct place is found among the first p elements The code in Figure 7.2 implements this strategy. The sentinel in a[0] terminates the while loop in the event that in some pass an element is moved all the way to the front. The element in position p is saved in tmp, and all larger elements (prior to position p) are moved one spot to the right. Then tmp is placed in the correct spot.

Routine for Insertion sort
void insertion_sort( int a[ ], unsigned int n ) { unsigned int j, p; int tmp; /*1*/ a[0] = MIN_DATA; /* sentinel */ /*2*/ for( p=2; p <= n; p++ ) { /*3*/ tmp = a[p]; /*4*/ for( j = p; tmp < a[j-1]; j-- ) /*5*/ a[j] = a[j-1]; /*6*/ a[j] = tmp; }

Analysis of Insertion Sort
Worst case: Because of the nested loops, each of which can take n iterations, insertion sort is O(n2). A precise calculation shows that the test at line 4 can be executed at most p times for each value of p. Summing over all p gives a total of Best Case: If the input is presorted, the running time is O(n), because the test in the inner for loop always fails immediately.

#include<stdio.h>
int main() { int data[100],n,temp,i,j; printf("Enter number of terms(should be less than 100): "); scanf("%d",&n); printf("Enter elements: "); for(i=0;i<n;i++) scanf("%d",&data[i]); } for(i=1;i<n;i++) temp = data[i]; j=i-1; while(temp<data[j] && j>=0)

/*To sort elements in descending order, change temp<data[j] to temp>data[j] in above line.*/
{ data[j+1] = data[j]; --j; } data[j+1]=temp; printf("In ascending order: "); for(i=0; i<n; i++) printf("%d\t",data[i]); return 0;

Shellsort Donald Shell, was one of the first algorithms to break the quadratic time barrier It works by comparing elements that are distant For this reason, Shellsort is sometimes referred to as diminishing increment sort. Shellsort uses a sequence, h1, h2, , ht, called the increment sequence. Any increment sequence will do as long as h1 = 1

After a phase, using some increment hk, for every i, we have a[i] ≤ a[i+hk]; all elements spaced hk apart are sorted. The file is then said to be hk-sorted. A popular (but poor) choice for increment sequence is to use the sequence suggested by Shell: ht[n/2], and hk = [hk+1/2].

Example Original After 5-sort After 3-sort After 1-sort

Routine for Shell Sort #include<stdio.h> void shell_sort(int a[],int n) { int gap,i,j,temp; for(gap=n/2;gap>0;gap/=2) for(i=gap;i<n;i+=1) temp=a[i]; for(j=i;j>=gap&&a[j-gap]>temp;j-=gap) a[j]=a[j-gap]; a[j]=temp; }

Analysis of Shell Sort There are increment sequences that give a
significant improvement in the algorithm's . running time. The worst-case running time of Shellsort, using Shell's increments, is ɵ(n2). The worst-case running time of Shellsort using Hibbard's increments is Ω(n3/2) The proof requires showing an upper and Lower bound on the worst-case running time

worst case Proof for Shellsort, using Shell's increments for Lower Bound
To prove lower bound first, by constructing a bad case First, we choose n to be a power of 2. Input_data, with the n/2 largest numbers in the even positions and the n/2 smallest numbers in the odd positions. As all the increments except the last are even, when we come to the last pass, the n/2 largest numbers are still all in even positions and the n/2 smallest numbers are still all in odd positions. The ith smallest number (i ≤n/2) is thus in position 2i -1 before the beginning of the last pass. Restoring the ith element to its correct place requires moving it i -1 spaces in the array. Thus, to merely place the n/2 smallest elements in the correct place requires at least work.

Example for Bad Case Start 1 9 2 10 3 11 4 12 5 13 6 14 7 15 8 16
After 8-sort After 4-sort After 2-sort After 1-sort Above Figure shows a bad (but not the worst) input when n = 16. The number of inversions remaining after the 2-sort is exactly = 28; thus, the last pass will take considerable time.

worst case Proof for Shellsort,using Shell's increments for Upper Bound
Upper Bound is o(n2) We know a pass with increment hk consists of hk insertion sorts of about n/hk elements. Since insertion sort is quadratic, the total cost of a pass is O(hk(n/hk)2) = O(n2/hk). Summing over all passes gives a total bound of Because the increments form a geometric series with common ratio 2, and the largest term in the series is Thus we obtain a total bound of O(n2).

Mergesort The fundamental operation in this algorithm is merging two sorted lists. The basic merging algorithm takes two input arrays a and b, an output array c, and three counters, aptr, bptr, and cptr, which are initially set to the beginning of their respective arrays. The smaller of a[aptr] and b[bptr] is copied to the next entry in c, and the appropriate counters are advanced. When either input list is exhausted, the remainder of the other list is copied to c

Example

Explanation The Mergesort algorithm is therefore easy to describe. If n = 1, there is only one element to sort, and the answer is at hand. Otherwise, recursively Mergesort the first half and the second half. This gives two sorted halves, which can then be merged together using the merging algorithm described above. For instance, to sort the eight-element array 24, 13, 26, 1, 2, 27, 38, 15, we recursively sort the first four and last four elements, obtaining 1, 13, 24, 26, 2, 15, 27, 38. Then we merge the two halves as above, obtaining the final list 1, 2, 13, 15, 24, 26, 27, 38.

Routine for Merge sort void mergesort( int a[], unsigned int n ) { int *tmp_array; tmp_array = (input_type *) malloc( (n+1) * sizeof (input_type) ); if( tmp_array != NULL ) m_sort( a, tmp_array, 1, n ); free( tmp_array ); } else fatal_error("No space for tmp array!!!");

Routine for m_sort void m_sort( input_type a[], input_type tmp_array[ ],int left, int right ) { int center; if( left < right ) center = (left + right) / 2; m_sort( a, tmp_array, left, center ); m_sort( a, tmp_array, center+1, right ); merge( a, tmp_array, left, center+1, right ); }

Routine for merge /* 1_pos = start of left half, r_pos = start of right half */ Void merge( input_type a[ ], input_type tmp_array[ ], int l_pos, int r_pos, int right_end ) { int i, left_end, num_elements, tmp_pos; left_end = r_pos - 1; tmp_pos = l_pos; num_elements = right_end - l_pos + 1; /* main loop */ while( ( 1_pos <= left_end ) && ( r_pos <= right_end ) ) if( a[1_pos] <= a[r_pos] ) tmp_array[tmp_pos++] = a[l_pos++]; else tmp_array[tmp_pos++] = a[r_pos++]; while( l_pos <= left_end ) /* copy rest of first half */ while( r_pos <= right_end ) /* copy rest of second half */ /* copy tmp_array back */ for(i=1; i <= num_elements; i++, right_end-- ) a[right_end] = tmp_array[right_end]; }

Analysis of Mergesort We will assume that n is a power of 2, so that we always split into even halves. For n = 1, the time to mergesort is constant, which we will denote by 1. Otherwise, the time to mergesort n numbers is equal to the time to do two recursive mergesorts of size n/2, plus the time to merge, which is linear. The equations below say this exactly: T(1) = 1 T(n) = 2T(n/2) + n This is a standard recurrence relation, which can be solved several ways.

Proof with Telescope method

Proof with Alternative method:
An alternative method is to substitute the recurrence relation continually on the right-hand side. We have T(n) = 2T(n/2) + n Since we can substitute n/2 into the main equation, 2T(n/2) = 2(2(T(n/4)) + n/2) = 4T(n/4) + n we have T(n) = 4T(n/4) + 2n Again, by substituting n/4 into the main equation, we see that 4T(n/4) = 4(2T(n/8)) + (n/4) = 8T(n/8) + n So we have T(n) = 8T(n/8) + 3n Continuing in this manner, we obtain T(n) = 2kT(n/2k) + k * n Using k = log n, we obtain T(n) = nT(1) + n log n = n log n + n mergesort's running time is O(n log n), it is hardly ever used for main memory sorts.

Quicksort quicksort is the fastest known sorting algorithm in practice. Its average running time is O(n log n). It is very fast, mainly due to a very tight and highly optimized inner loop. It has O(n2) worst-case performance, but this can be made exponentially unlikely with a little effort. quicksort is a divide-and-conquer recursive algorithm.

Basic steps If the number of elements in S is 0 or 1, then return.
Pick any element v in S. This is called the pivot. Partition S - {v} (the remaining elements in S) into two disjoint groups: S1 = {x €S - {v}| x≤ v}, and S2 = {x €S -{v}| x ≥v}. Return {quicksort(S1) followed by v followed by quicksort(S2)}.

Figure 7.11 shows the action of quicksort on a set of numbers.
The pivot is chosen (by chance) to be 65. The remaining elements in the set are partitioned into two smaller sets. Recursively sorting the set of smaller numbers yields 0, 13, 26, 31, 43, 57 (by rule 3 of recursion). The set of large numbers is similarly sorted. The sorted arrangement of the entire set is then trivially obtained.

Efficient way to implement quicksort
Picking the Pivot Partitioning Strategy Small Files Actual Quicksort Routines Analysis of Quicksort A Linear-Expected-Time Algorithm for Selection

Picking the Pivot Selecting pivot is very important and it play major role on performance. A Wrong Way Choosing First Element as pivot 2. A Safe Maneuver choose the pivot element Random Median-of-Three Partitioning picking three elements randomly and using the median of these three as pivot.

Routine for Quick Sort Void quick_sort( input_type a[ ], unsigned int n ) { q_sort( a, 1, n ); insertion_sort( a, n ); }

Routine for median of three
/* Return median of left, center, and right. */ /* Order these and hide pivot */ int median3( int a[], int left, int right ) { int center; center = (left + right) / 2; if( a[left] > a[center] ) swap( &a[left], &a[center] ); if( a[left] > a[right] ) swap( &a[left], &a[right] ); if( a[center] > a[right] ) swap( &a[center], &a[right] ); /* invariant: a[left] <= a[center] <= a[right] */ swap( &a[center], &a[right-1] ); /* hide pivot */ return a[right-1]; /* return pivot */ }

Routine for Main Quick sort
void q_sort( int a[], int left, int right ) { int i, j; int pivot; /*1*/ if( left + CUTOFF <= right ) /*2*/ pivot = median3( a, left, right ); /*3*/ i=left; j=right-1; /*4*/ for(;;) /*5*/ while( a[++i] < pivot ); /*6*/ while( a[--j] > pivot ); /*7*/ if( i < j ) /*8*/ swap( &a[i], &a[j] ); else /*9*/ break; } /*10*/ swap( &a[i], &a[right-1] ); /*restore pivot*/ /*11*/ q_sort( a, left, i-1 ); /*12*/ q_sort( a, i+1, right );

Analysis of Quicksort Quicksort is recursive, and hence, its analysis requires solving a recurrence formula. We will do the analysis for a quicksort, assuming a random pivot (no median-of-three partitioning) and no cutoff for small files. We will take T(0) = T(1) = 1 The running time of quicksort is equal to the running time of the two recursive calls plus the linear time spent in the partition (the pivot selection takes only constant time). This gives the basic quicksort relation T(n) = T(i) + T(n - i - 1) + cn (7.1) where i = |S1| is the number of elements in S1.

Worst-Case Analysis of Quick sort
The pivot is the smallest element, all the time. Then i = 0 and if we ignore T(0) = 1, which is insignificant, the recurrence is T(n) = T(n - 1) + cn, n > 1 (7.2) We telescope, using Equation (7.2) repeatedly. Thus T(n -1) = T(n - 2) + c(n - 1) (7.3) T(n - 2) = T(n - 3) + c(n - 2) (7.4) ... T(2) = T(1) + c(2) (7.5) Adding up all these equations yields (7.6) As claimed earlier

Best-Case Analysis In the best case, the pivot is in the middle. we assume that the two subfiles are each exactly half the size of the original, T(n) = 2T(n/2) + cn (7.7) Divide both sides of Equation (7.7) by n.

At the end of this session, Student will be able to:

Similar presentations

Presentation on theme: "At the end of this session, Student will be able to:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

At the end of this session, Student will be able to:

Similar presentations

Presentation on theme: "At the end of this session, Student will be able to:"— Presentation transcript:

Similar presentations

About project

Feedback