CSE 326: Data Structures Lecture 23 Spring Quarter 2001 Sorting, Part 1 David Kaplan
Outline Sorting: The Problem Space Sorting by Comparison –Lower bound for comparison sorts –Insertion Sort –Heap Sort –Merge Sort –Quick Sort External Sorting Comparison of Sorting by Comparison Outline
Sorting: The Problem Space General problem Given a set of N orderable items, put them in order Without (significant) loss of generality, assume: –Items are integers –Ordering is (Most sorting problems map to the above in linear time.)
Lower Bound for Sorting by Comparison Sorting by Comparison –Only information available to us is the set of N items to be sorted –Only operation available to us is pairwise comparison between 2 items What is the best running time we can possibly achieve?
Decision Tree Analysis of Sorting by Comparison
Max depth of decision tree How many permutations are there of N numbers? How many leaves does the tree have? What’s the shallowest tree with a given number of leaves? What is therefore the worst running time (number of comparisons) by the best possible sorting algorithm?
Lower Bound for log(n!) Stirling’s approximation:
Insertion Sort Basic idea After k th pass, ensure that first k+1 elements are sorted On k th pass, swap (k+1) th element to left as necessary Start After Pass 1 After Pass After Pass What if array is initially sorted? What if array is initially reverse sorted?
Why Insertion Sort is Slow Inversion: a pair (i,j) such that i Array[j] Array of size N can have (N 2 ) inversions –average number of inversions in a random set of elements is N(N-1)/4 Insertion Sort only swaps adjacent elements –only removes 1 inversion!
HeapSort Sorting via Priority Queue (Heap) Basic idea: Shove items into a priority queue, take them out smallest to largest. Worst Case: Best Case:
MergeSort Merging Cars by key [Aggressiveness of driver]. Most aggressive goes first. MergeSort (Table [1..n]) Split Table in half Recursively sort each half Merge two sorted halves together Merge (T1[1..n],T2[1..n]) i1=1, i2=1 While i1<n, i2<n If T1[i1] < T2[i2] Next is T1[i1] i1++ Else Next is T2[i2] i2++ End If End While
MergeSort Analysis Running Time –Worst case? –Best case? –Average case? Other considerations besides running time?
QuickSort Basic idea: Pick a “pivot”. Divide into less-than & greater-than pivot. Sort each side recursively. Picture from PhotoDisc.com
QuickSort Partition Pick pivot: Partition with cursors <> <> 2 goes to less-than
QuickSort Partition (cont’d) <> 6, 8 swap less/greater-than ,5 less-than 9 greater-than Partition done. Recursively sort each side.
Analyzing QuickSort Picking pivot: constant time Partitioning: linear time Recursion: time for sorting left partition (say of size i) + time for right (size N-i-1) T(1) = b T(N) = T(i) + T(N-i-1) + cN where i is the number of elements smaller than the pivot
QuickSort Worst case Pivot is always smallest element. T(N) = T(i) + T(N-i-1) + cN T(N)= T(N-1) + cN = T(N-2) + c(N-1) + cN = T(N-k) + = O(N 2 )
Optimizing QuickSort Choosing the Pivot –Randomly choose pivot Good theoretically and practically, but call to random number generator can be expensive –Pick pivot cleverly “Median-of-3” rule takes Median(first, middle, last). Works well in practice. Cutoff –Use simpler sorting technique below a certain problem size (Weiss suggests using insertion sort, with a cutoff limit of 5-20)
QuickSort Best Case Pivot is always middle element. T(N) = T(i) + T(N-i-1) + cN T(N)= 2T(N/2 - 1) + cN
QuickSort Average Case Assume all size partitions equally likely, with probability 1/N details: Weiss pg
External Sorting When you just ain’t got enough RAM … –e.g. Sort 10 billion numbers with 1 MB of RAM. –Databases need to be very good at this MergeSort Good for Something! –Basis for most external sorting routines –Can sort any number of records using a tiny amount of main memory in extreme case, only need to keep 2 records in memory at any one time!
External MergeSort Split input into two tapes Each group of 1 records is sorted by definition, so merge groups of 1 to groups of 2, again split between two tapes Merge groups of 2 into groups of 4 Repeat until data entirely sorted log N passes
Better External MergeSort Suppose main memory can hold M records. Initially read in groups of M records and sort them (e.g. with QuickSort). Number of passes reduced to log(N/M) k-way mergesort reduces number of passes to log k (N/M) –Requires 2k output devices (e.g. mag tapes) But wait, there’s more … Polyphase merge does a k-way mergesort using only k+1 output devices (plus k th -order Fibonacci numbers!)
Sorting by Comparison Summary Sorting algorithms that only compare adjacent elements are (N 2 ) worst case – but may be (N) best case HeapSort - (N log N) both best and worst case –Suffers from two test-ops per data move MergeSort - (N log N) running time –Suffers from extra-memory problem QuickSort - (N 2 ) worst case, (N log N) best and average case –In practice, median-of-3 almost always gets us (N log N) –Big win comes from {sorting in place, one test-op, few swaps}! Any comparison-based sorting algorithm is (N log N) External sorting: MergeSort with (log N/M) passes