Sorting CS 110: Data Structures and Algorithms First Semester, 2010-2011.

Sorting CS 110: Data Structures and Algorithms First Semester, 2010-2011

Learning Objectives ► Explain and implement sorting algorithms ► Explain pros and cons of each implementation

The Sorting Problem Input: a collection S of n elements that can be ordered Output: the same collection of elements arranged in increasing (or non-decreasing) order *typically, S would be stored in an array, and the problem is to rearrange the elements in that array for now, let’s assume we are sorting a collection of integers

Example 5310262128932818 2101828536293128

Some Sorting Algorithms ► Insertion sort ► Selection sort ► Bubble sort ► Quick sort ► Merge sort ► Heap sort ► Bucket sort ► Radix sort O ( n 2 ) O ( n log n ) O ( n )

Insertion Sort ► Strategy: treat each s[i] as an incoming element that you will insert into the already sorted sequence s[0],s[i],…s[i-1] ► Requires locating the proper position of the incoming element and adjusting elements to the right ► Best case: the array is already sorted so that no “insertions” are carried out  O( n ) ► Worst case: the array is in decreasing order; incoming elements are always inserted at the beginning  O( n 2 )

Insertion Sort for i  1 to n-1 do temp  s[i] // incoming element j  i // adjust elements to the right while ( j > 0 && s[j-1] > temp ) s[j]  s[j-1]; j--; s[j] = temp; // insert incoming element

Insertion Sort Example i=1i=2i=3i=4i=5i=6i=7 42201713 20422017 14 17 4220 17 15 13 422820 17 28 42282320 14 422823 4228 15 42

Selection Sort ► Strategy: locate the minimum element, place it at the first position, locate the next minimum and place it at the second position … ► Requires a scan ( O( n ) ) for each of the n elements  O( n 2 ) best and worst case ► Variation: can repeatedly select the maximum instead and place it at the last position

Selection Sort for i  0 to n-2 do lowIndex  i // determine for j  i+1 to n-1 do // minimum if (s[j] < s[lowIndex] ) lowIndex  j swap( s[i], s[lowIndex] ) // place minimum // in proper place Why not n-1?

Selection Sort Example i=0i=1i=2i=3i=4i=5i=6 4213 20 14 17 15 16 1342 17 28 20 14 20 2823 28 15 1742

Selection Sort Variation ► Repeatedly select maximum instead for i  n-1 downto 1 do highIndex  i // determine for j  0 to i-1 do // maximum if (s[j] > s[highIndex] ) highIndex  j swap( s[i], s[highIndex] ) // place maximum // in proper place

Bubble Sort ► Essentially selection sort but the sort is carried out by swapping adjacent elements only ► Minimum elements are repeatedly “bubbled-up”, maximum elements are repeatedly “bubbled-down” the array ► O( n 2 ) because of the comparisons (actual swaps are carried out only when elements are out of place)

Bubble Sort for i  n-1 down to 1 do for j  0 to i-1 do if (s[j] > s[j+1] ) swap( s[j], s[j+1] ) Puts the ith element in its proper place Repeatedly positions the maximum element

Exercise: Bubble Sort Perform a trace for this array 5310262128932818

Bubble Sort Variation for i  0 to n-2 do for j  n-1 to i+1 do if (s[j] < s[j-1] ) swap( s[j], s[j-1] ) Puts the ith element in its proper place Repeatedly positions the MINIMUM element

Time Complexity Summary AlgorithmBest CaseWorst Case Insertion SortO( n )O(n 2 ) Selection Sort O(n 2 ) Bubble SortO(n 2 )

Improved Sorting Strategy ► Divide-and-Conquer ► Given the collection of n elements to sort: perform the sort in three steps ► Divide step: split the collection S into two subsets, S1 and S2 ► Recursion step: sort S1 and S2 separately ► Conquer step: combine the two lists into one sorted list

Quick Sort and Merge Sort ► Two algorithms adopt this divide-and-conquer strategy ► Quick sort ► Work is carried out in the divide step using a pivot element ► Conquer step is trivial ► Merge sort ► Divide step is trivial – just split the list into two equal parts ► Work is carried out in the conquer step by merging two sorted lists

Quick Sort: Divide Step ► In the divide step, select a pivot from the array (say, the last element) ► Split the list/array S using the pivot: S1 consists of all elements pivot 8524634517319650 2445173185639650 S1S2

Quick Sort: Conquer Step ► After sorting S1 and S2, combine the sorted lists so that S1 is on the left, the pivot is in the middle, and S2 is on the right 1724314550638596 1724314563859650 S1 sortedS2 sorted

Quick Sort with Recur Step 8524634517319650 2445173185639650 S1S2 1724314550638596 1724314563859650 S1 sortedS2 sorted Divide Recur Conquer

Implementing Quick Sort ► It is preferable if we can perform quick-sort in-place; i.e., we sort by swapping elements in the array, perhaps using some temporary variables ► Plan: algorithm QSort( S, a, b ) sorts the sublist in the array S from index a to index b ► QSort( L, 0, n-1 ) will sort an array L of length n ► Within the QSort( S, a, b ) algorithm, there will be recursive calls to QSort on smaller ranges within the range a…b.

Algorithm QSort Algorithm QSort( S, a, b ) if ( a < b ) p  S[b] rearrange S so that: S[a]…S[x-1] are elements < p S[x] = p S[x+1]…S[b] are elements > p QSort( S, a, x-1 ) QSort( S, x+1, b ) base case: a…b range contains 0 or 1 element

Rearranging a Sublist in S p  S[b], l  a, r  b - 1 while l <= r do // find an element larger than the pivot while l <= r and S[l] <= p do l  l + 1 // find an element smaller than the pivot while r >= l and S[r] >= p do r  r – 1 if l < r then swap ( S[l], S[r] ) // swap the two elements swap( S[l], S[b] ) // place pivot in proper place

Time Complexity of Quick Sort ► First, note that rearranging a sublist takes O( n ) time, where n is the length of the sublist ► Requires scanning the list from both ends until both l and r pointers meet ► O( n ) even if loops are nested within a loop ► Rearranging sublists is all that the quick sort algorithm does ► Need to find out how often the sort would perform the rearrange operation

Time Complexity of Quick Sort Suppose the pivots always split the lists into two lists of roughly equal size 1 list of length n 2 lists of length n/2 4 lists of length n/4 n lists of length 1

Time Complexity of Quick Sort ► Each level takes O( n ) time for sublist rearranging ► Assuming an even split caused by each pivot, there will be around log n levels ► Therefore, quick sort takes O( n log n ) time ► But…

Time Complexity of Quick Sort ► In the worst case, the pivot might split the lists such that there is only 1 element in one partition (not an even split) ► There will be n levels ► Each level requires O( n ) time for sublist rearranging ► Quick sort takes O( n 2 ) time in the worst case

Merge Sort ► Another sorting algorithm using the divide-and-conquer paradigm ► This time, the hard work is carried out in the conquer phase instead of the divide phase ► Divide: split the list S[0..n-1] by taking the middle index m ( = (0 + n-1) / 2 ) ► Recursion: recursively sort S[0..m] and S[m+1..n-1] ► Conquer: merge the two sorted lists (how?)

Merge Sort 8524634517319650 8524634517319650 S1S2 1724314550638596 2445638517315096 S1 sortedS2 sorted Divide Recur Conquer

Merge Sort Time Complexity ► Divide step ensures that the sublist split is done evenly ► O( log n ) levels ► Conquer/merge step takes O( n ) time per level ► Time complexity is O( n log n ), guaranteed ► Disadvantage: hard to carry out the merge step in-place; temporary array/list is necessary if we want a simple implementation

Time Complexity Summary AlgorithmBest CaseWorst Case Quick sortO( n log n )O(n 2 ) Merge sortO( n log n )

Time Complexity of Sorting ► Several sorting algorithms have been discussed and the best ones, so far: ► Merge sort: O( n log n ) ► Quick sort (best one in practice): O( n log n ) on average, O( n 2 ) worst case ► Can we do better than O( n log n )? ► No. ► It can be proven that any comparison-based sorting algorithm will need to carry out at least O( n log n ) operations

Restrictions on the Problem ► Suppose the values in the list to be sorted can repeat but the values have a limit (e.g., values are digits from 0 to 9) ► Sorting, in this case, appears easier ► Is it possible to come up with an algorithm better than O( n log n )? ► Yes ► Strategy will not involve comparisons

Bucket Sort ► Idea: suppose the values are in the range 0..m-1; start with m empty buckets numbered 0 to m-1, scan the list and place element s[i] in bucket s[i], and then output the buckets in order ► Will need an array of buckets, and the values in the list to be sorted will be the indexes to the buckets ► No comparisons will be necessary

Bucket Sort: Example 4212032140230 0001122223344 000000 1111 22222222 3333 4444

Bucket Sort Algorithm Algorithm BucketSort( S ) ( values in S are between 0 and m-1 ) for j  0 to m-1 do// initialize m buckets b[j]  0 for i  0 to n-1 do// place elements in their b[S[i]]  b[S[i]] + 1// appropriate buckets i  0 for j  0 to m-1 do// place elements in buckets for r  1 to b[j] do// back in S S[i]  j i  i + 1

Time Complexity ► Bucket initialization: O( m ) ► From array to buckets: O( n ) ► From buckets to array: O( n ) ► Even though this stage is a nested loop, the sum of the number of elements in each bucket is n ► Since m will likely be small compared to n, Bucket sort is O( n ) ► Strictly speaking, time complexity is O ( n + m )

Sorting Integers ► Can we perform bucket sort on any array of (non-negative) integers? ► Yes, but note that the number of buckets will depend on the maximum integer value ► If you are sorting 1000 integers and the maximum value is 999999, you will need 1 million buckets! ► Time complexity is not really O( n ) because m is much > than n. Actual time complexity is O( m ) ► Can we do better?

Radix Sort ► Idea: repeatedly sort by digit—perform multiple bucket sorts on S starting with the rightmost digit ► If maximum value is 999999, only ten buckets (not 1 million) will be necessary ► Use this strategy when the keys are integers, and there is a reasonable limit on their values ► Number of passes (bucket sort stages) will depend on the number of digits in the maximum value

Radix Sort Example: first pass 1258376452369963189208847 2012526364363747581888999 20 12 52636436 37 47 58 18 88 99 9

Radix Sort Example: second pass 2012526364363747581888999 9121820363747525863648899 9 12 1820 36 3747 52 58 63 648899

Radix Sort Example: 1 st & 2 nd passes 2012526364363747581888999 9121820363747525863648899 1258376452369963189208847 sort by rightmost digit sort by leftmost digit

Radix Sort and Stability ► Radix sort works as long as the bucket sort stages are stable sorts ► Stable sort: in case of ties, relative order of elements are preserved in the resulting array ► Suppose there are two elements whose first digit is the same; for example, 52 & 58 ► If 52 occurs before 58 in the array prior to the sorting stage, 52 should occur before 58 in the resulting array ► This way, the work carried out in the previous bucket sort stages is preserved

Time Complexity ► If there is a fixed number p of bucket sort stages (six stages in the case where the maximum value is 999999), then radix sort is O( n ) ► There are p bucket sort stages, each taking O( n ) time ► Strictly speaking, time complexity is O( pn ), where p is the number of digits (note that p = log 10 m, where m is the maximum value in the list)

About Radix Sort ► Note that only 10 buckets are needed regardless of number of stages since the buckets are reused at each stage ► Radix sort can apply to words ► Set a limit to the number of letters in a word ► Use 27 buckets (or more, depending on the letters/characters allowed), one for each letter plus a “blank” character ► The word-length limit is exactly the number of bucket sort stages needed

Summary ► O( n 2 ) algorithms (insertion-, selection-, bubble- sort) are easy to code but are inefficient ► Quick sort has an O( n 2 ) worst case but works very well in practice; O( n log n ) on the average ► Merge sort is difficult to implement in-place but O( n log n ) complexity is guaranteed ► Note: it doesn’t perform well in practice

Summary ► Bucket sort and Radix sort are O( n ) algorithms only because we have imposed restrictions on the input list to be sorted ► Sorting, in general, can be done in O( n log n ) time ► Later this semester ► A guaranteed O( n log n ) algorithm called Heap sort that is a reasonable alternative to quick sort

Sorting CS 110: Data Structures and Algorithms First Semester, 2010-2011.

Similar presentations

Presentation on theme: "Sorting CS 110: Data Structures and Algorithms First Semester, 2010-2011."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Sorting CS 110: Data Structures and Algorithms First Semester, 2010-2011.

Similar presentations

Presentation on theme: "Sorting CS 110: Data Structures and Algorithms First Semester, 2010-2011."— Presentation transcript:

Similar presentations

About project

Feedback