Cpt S 223 – Advanced Data Structures Sorting (Chapter 7)

Slides:



Advertisements
Similar presentations
Introduction to Algorithms Quicksort
Advertisements

CSE 3101: Introduction to the Design and Analysis of Algorithms
Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
CSCE 3110 Data Structures & Algorithm Analysis
Chapter 4: Divide and Conquer Master Theorem, Mergesort, Quicksort, Binary Search, Binary Trees The Design and Analysis of Algorithms.
DIVIDE AND CONQUER APPROACH. General Method Works on the approach of dividing a given problem into smaller sub problems (ideally of same size).  Divide.
Quicksort CSE 331 Section 2 James Daly. Review: Merge Sort Basic idea: split the list into two parts, sort both parts, then merge the two lists
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter 24 Sorting.
Quicksort CS 3358 Data Structures. Sorting II/ Slide 2 Introduction Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case:
25 May Quick Sort (11.2) CSE 2011 Winter 2011.
Quicksort COMP171 Fall Sorting II/ Slide 2 Introduction * Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case: O(N.
Chapter 7: Sorting Algorithms
1 Today’s Material Divide & Conquer (Recursive) Sorting Algorithms –QuickSort External Sorting.
Sorting Algorithms and Average Case Time Complexity
Sorting Chapter Sorting Consider list x 1, x 2, x 3, … x n We seek to arrange the elements of the list in order –Ascending or descending Some O(n.
Lecture 7COMPSCI.220.FS.T Algorithm MergeSort John von Neumann ( 1945 ! ): a recursive divide- and-conquer approach Three basic steps: –If the.
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
Nyhoff, ADTs, Data Structures and Problem Solving with C++, Second Edition, © 2005 Pearson Education, Inc. All rights reserved Sorting.
CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting.
Quicksort. 2 Introduction * Fastest known sorting algorithm in practice * Average case: O(N log N) * Worst case: O(N 2 ) n But, the worst case seldom.
Merge sort, Insertion sort
TDDB56 DALGOPT-D DALG-C Lecture 8 – Sorting (part I) Jan Maluszynski - HT Sorting: –Intro: aspects of sorting, different strategies –Insertion.
Sorting. Introduction Assumptions –Sorting an array of integers –Entire sort can be done in main memory Straightforward algorithms are O(N 2 ) More complex.
Quicksort.
TTIT33 Algorithms and Optimization – Dalg Lecture 2 HT TTIT33 Algorithms and optimization Lecture 2 Algorithms Sorting [GT] 3.1.2, 11 [LD] ,
Divide and Conquer Sorting
Merge sort, Insertion sort. Sorting I / Slide 2 Sorting * Selection sort or bubble sort 1. Find the minimum value in the list 2. Swap it with the value.
CSC 2300 Data Structures & Algorithms March 20, 2007 Chapter 7. Sorting.
Chapter 7 (Part 2) Sorting Algorithms Merge Sort.
Sorting Chapter 6 Chapter 6 –Insertion Sort 6.1 –Quicksort 6.2 Chapter 5 Chapter 5 –Mergesort 5.2 –Stable Sorts Divide & Conquer.
Sorting II/ Slide 1 Lecture 24 May 15, 2011 l merge-sorting l quick-sorting.
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Sorting - 3 CS 202 – Fundamental Structures of Computer Science II.
CSE 373 Data Structures Lecture 19
Sorting (Part II: Divide and Conquer) CSE 373 Data Structures Lecture 14.
Computer Algorithms Lecture 11 Sorting in Linear Time Ch. 8
CSE 373 Data Structures Lecture 15
1 Data Structures and Algorithms Sorting. 2  Sorting is the process of arranging a list of items into a particular order  There must be some value on.
CSC – 332 Data Structures Sorting
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
Sorting HKOI Training Team (Advanced)
CHAPTER 09 Compiled by: Dr. Mohammad Omar Alhawarat Sorting & Searching.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
CSC 41/513: Intro to Algorithms Linear-Time Sorting Algorithms.
Elementary Sorting Algorithms Many of the slides are from Prof. Plaisted’s resources at University of North Carolina at Chapel Hill.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Sorting Fun1 Chapter 4: Sorting     29  9.
Analysis of Algorithms CS 477/677
1 Joe Meehean.  Problem arrange comparable items in list into sorted order  Most sorting algorithms involve comparing item values  We assume items.
Sorting. Pseudocode of Insertion Sort Insertion Sort To sort array A[0..n-1], sort A[0..n-2] recursively and then insert A[n-1] in its proper place among.
Data Structures Using C++ 2E Chapter 10 Sorting Algorithms.
Merge sort, Insertion sort. Sorting I / Slide 2 Sorting * Selection sort (iterative, recursive?) * Bubble sort.
1 Sorting Algorithms Sections 7.1 to Comparison-Based Sorting Input – 2,3,1,15,11,23,1 Output – 1,1,2,3,11,15,23 Class ‘Animals’ – Sort Objects.
CS 61B Data Structures and Programming Methodology July 21, 2008 David Sun.
Sorting CSIT 402 Data Structures II. 2 Sorting (Ascending Order) Input ›an array A of data records ›a key value in each data record ›a comparison function.
Review 1 Selection Sort Selection Sort Algorithm Time Complexity Best case Average case Worst case Examples.
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
ICS201 Lecture 21 : Sorting King Fahd University of Petroleum & Minerals College of Computer Science & Engineering Information & Computer Science Department.
Sorting Fundamental Data Structures and Algorithms Aleks Nanevski February 17, 2004.
Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University Cpt S 122 – Data Structures Sorting.
Chapter 4, Part II Sorting Algorithms. 2 Heap Details A heap is a tree structure where for each subtree the value stored at the root is larger than all.
SORTING AND ASYMPTOTIC COMPLEXITY Lecture 13 CS2110 – Fall 2009.
Intro. to Data Structures Chapter 7 Sorting Veera Muangsin, Dept. of Computer Engineering, Chulalongkorn University 1 Chapter 7 Sorting Sort is.
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
CSE 250 – Data Structures. Today’s Goals  First review the easy, simple sorting algorithms  Compare while inserting value into place in the vector 
Sorting.
Quick Sort (11.2) CSE 2011 Winter November 2018.
8/04/2009 Many thanks to David Sun for some of the included slides!
CSE 373 Data Structures and Algorithms
Presentation transcript:

Cpt S 223 – Advanced Data Structures Sorting (Chapter 7) Nirmalya Roy School of Electrical Engineering and Computer Science Washington State University

Sorting Problem Given an array A[0…N – 1], modify A such that A[i]  A[i + 1] for 0  i  N – 1 Internal vs. external sorting Stable vs. unstable sorting Equal elements retain original order In-place sorting O(1) extra memory Comparison sorting vs. non-comparison sorting Besides assignment operator; “<” and “>” operators are allowed on the input data Function template sort with comparator cmp void sort(Iterator begin, Iterator end, Comparator cmp)

Sorting Algorithms Insertion sort Selection sort Shell sort Heap sort Merge sort Quick sort … Simple data structure; focus on analysis

Insertion Sort Algorithm: Example: Start with empty list S and unsorted list A of N items For each item x in A Insert x into S, in sorted order Example: 7 3 9 5 7 3 9 5 3 7 9 5 3 7 9 5 S A S A S A S A 3 5 7 9 S A

Insertion Sort (cont’d) In-place Stable Best-case? O(N) Worst-case? O(N2) Average-case? InsertionSort(A) { for p = 1 to N – 1 { tmp = A[p] j = p while (j > 0) and (tmp < A[j – 1]) { A[j] = A[j – 1] j = j – 1 } A[j] = tmp Consists of N-1 passes For pass p = 1 to N-1 Position 0 thru p are in sorted order Move the element in position p left until its correct place; among first p+1 elements

Insertion Sort Example Consists of N-1 passes For pass p = 1 to N-1 Position 0 thru p are in sorted order Move the element in position p left until its correct place is found; among first p+1 elements

Selection Sort Algorithm: Start with empty list S and unsorted list A of N items for (i = 0; i < N; i++) x  item in A with smallest key Remove x from A Append x to end of S 7 3 9 5 3 7 9 5 3 5 9 7 3 5 7 9 S A S A S A S A 3 5 7 9 S A

Selection Sort (cont’d) In-place Unstable Best-case: O(N2) Worst-case: O(N2) Average-case: O(N2)

Shell Sort Shell sort is a multi-pass algorithm Each pass is an insertion sort of the sequences consisting of every h-th element for a fixed gap h, known as the increment This is referred to as h-sorting Consider shell sort with gaps 5, 3 and 1 Input array: a1, a2, a3, a4, a5, a6, a7, a8, a9, a10, a11, a12 First pass, 5-sorting, performs insertion sort on separate sub-arrays (a1, a6, a11), (a2, a7, a12), (a3, a8), (a4, a9), (a5, a10) Next pass, 3-sorting, performs insertion sort on the sub-arrays (a1, a4, a7, a10), (a2, a5, a8, a11), (a3, a6, a9, a12) Last pass, 1-sorting, is an ordinary insertion sort of the entire array (a1,..., a12)

Shell Sort (cont’d) In-place Unstable Best-case Worst-case ShellSort(A) { gap = N while (gap > 0) { gap = gap / 2 B = <A[0], A[gap], A[2*gap], …> InsertionSort(B) } In-place Unstable Best-case Sorted: (N log2 N) Worst-case Shell’s increments (by 2k): (N2) Hibbard’s increments (by 2k – 1): (N3/2) Average-case: (N7/6) Later sorts do not undo the work done in previous sorts If an array is 5-sorted and then 3-sorted, the array is now not only 3-sorted, but both 5- and 3-sorted

Shell Sort (cont’d) 13 14 94 33 82 25 59 94 65 23 45 27 73 25 39 10 Works by comparing the elements that are distant The distance between comparisons decreases as algorithm runs until its last phase 13 14 94 33 82 25 59 94 65 23 45 27 73 25 39 10 Sorting each column 5-sorting 10 14 73 25 23 13 27 94 33 39 25 59 94 65 82 45 10 14 73 25 23 13 27 94 33 39 25 59 94 65 82 45

Shell Sort (cont’d) 10 14 73 25 23 13 27 94 33 39 25 59 94 65 82 45 10 14 73 25 23 13 27 94 33 39 25 59 94 65 82 45 Sorting each column 3-sorting 10 14 13 25 23 33 27 25 59 39 65 73 45 94 82 94 10 14 13 25 23 33 27 25 59 39 65 73 45 94 82 94

Shell Sort (cont’d) Insertion Sort: 1-sorting/ Insertion Sort 10 13 14 23 25 25 27 33 39 45 59 65 73 82 94 94 10 14 13 25 23 33 27 25 59 39 65 73 45 94 82 94 Sorted S Unsorted A Insertion Sort: Start with empty list S and unsorted list A of N items For each item x in A Insert x into S, in sorted order

Heap Sort Selection sort where A is a heap Initially, toss all items in A onto heap h (ignoring heap-order) then call buildHeap()on h Algorithm: Start with empty list S and unsorted list A of N items for (i = 0; i < N; i++) x  h.deleteMin() Append x to end of S

Heap Sort (cont’d) In-place Unstable All cases Maintain heap backward at the end of the array As we add items to the sorted list at the beginning of the array, the heap is going to shrink towards the end of the array, with the root being at the last position in the array Unstable All cases (N log2 N)

Initially, toss all items in A onto heap (ignoring heap-order) then call buildHeap() Maintain heap backward at the end of the array Heap Sort (cont’d) 5 3 5 9 3 7 5 7 9 7 9 7 3 9 5 9 5 7 3 3 9 7 5 buildHeap() deleteMin() S A S A S A 7 9 9 3 5 9 7 3 5 7 9 3 5 7 9 deleteMin() deleteMin() deleteMin() S A S A S

Merge Sort Idea: We can merge 2 sorted lists into 1 sorted list in linear time Let Q1 and Q2 be 2 sorted queues Let Q be empty queue Algorithm for merging Q1 and Q2 into Q: While (neither Q1 nor Q2 is empty) item1 = Q1.front() item2 = Q2.front() Move smaller of item1, item2 from present queue to end of Q Concatenate remaining non-empty queue (Q1 or Q2) to end of Q

Merge Sort (cont’d) Recursive divide-and-conquer algorithm Algorithm: Start with unsorted list A of N items Break A into halves A1 and A2, having N/2 and N/2 items Sort A1 recursively, yielding S1 Sort A2 recursively, yielding S2 Merge S1 and S2 into one sorted list S

Merge Sort (cont’d) 7 3 9 5 4 8 1 Unsorted A2 Unsorted A1 Divide with 1 Unsorted A2 Unsorted A1 Divide with O(log n) steps 7 3 9 5 4 8 1 1 + log2 N levels 7 3 9 5 4 8 1 Dividing is trivial Merging is non- trivial 7 3 9 5 4 8 1 Conquer with O(log n) steps 3 7 5 9 4 8 1 Sorted S2 Sorted S1 3 5 7 9 1 4 8 Sorted S 1 3 4 5 7 8 9

Merging Two Sorted Arrays 1 13 24 26 2 15 27 38 A B i j 1. C[k++] =Populate min{ A[i], B[j] } 2. Advance the minimum contributing pointer C 1 2 13 15 24 26 27 38 Temporary array to hold the output k (N) time

Merge Sort (cont’d) Not in-place Stable Analysis: All cases T(N) = 2T(N/2) + (N) T(N) = (N log2 N) See whiteboard MergeSort(A) MergeSort2(A, 0, N – 1) MergeSort2(A, i, j) if (i < j) k = (i + j) / 2 MergeSort2(A, i, k) MergeSort2(A, k + 1, j) Merge(A, i, k + 1, j) Merge(A, i, k, j) Create auxiliary array B Copy elements of sorted A[i…k] and sorted A[k+1…j] into B (in order) A = B

Quick Sort Like merge sort, quick sort is a divide-and-conquer algorithm, except Don’t divide the array in half Partition the array based on elements being less than or greater than some element of the array (the pivot) In-place, unstable Worst-case running time: O(N2) Average-case running time: O(N log2 N) Fastest generic sorting algorithm in practice

Quick Sort (cont’d) Algorithm: Start with list A of N items Choose pivot item v from A Partition A into 2 unsorted lists A1 and A2 A1: All keys smaller than v’s key A2: All keys larger than v’s key Items with same key as v can go into either list The pivot v does not go into either list Sort A1 recursively, yielding sorted list S1 Sort A2 recursively, yielding sorted list S2 Concatenate S1, v, and S2, yielding sorted list S How to choose pivot?

Quick Sort (cont’d) Dividing (“Partitioning”) is non-trivial Merging is trivial For now, let the pivot v be the first item in the list. 4 7 1 5 9 3 S1 1 3 4 7 5 9 S2 v S1 1 3 S2 S1 5 7 9 S2 v v S1 1 3 4 5 7 9 S2 v 1 3 4 5 7 9 O(N log2 N)

Quick Sort Algorithm quicksort (array: S) If size of S is 0 or 1, return Pivot = Pick an element v in S Partition S – {v} into two disjoint groups S1 = {x  (S – {v}) | x < v} S2 = {x  (S – {v}) | x > v} Return {quicksort(S1), followed by v, followed by quicksort(S2)}

Quick Sort (cont’d) 1 3 4 5 7 9 What if the list is already sorted? 1 3 4 5 7 9 For now, let the pivot v be the first item in the list. S1 S2 v What if the list is already sorted? O(N2) When input already sorted, choosing first item as pivot is disastrous.

Quick Sort (cont’d) We need a better pivot-choosing strategy.

Quick Sort (cont’d) Merge sort always divides array in half Quick sort might divide array into sub problems of size 1 and N – 1 When? Leading to O(N2) performance Need to choose pivot wisely (but efficiently) Merge sort requires temporary array for merge step Quick sort can partition the array in place This more than makes up for bad pivot choices

Quick Sort (cont’d) Choosing the pivot Choosing the first element What if array already or nearly sorted? Good for random array Choose random pivot Good in practice if truly random Still possible to get some bad choices Requires execution of random number generator On average, generates ¼, ¾ split

Quick Sort (cont’d) Choosing the pivot Best choice of pivot? Median of array Median is expensive to calculate Estimate median as the median of three elements (called the median-of-three strategy) Choose first, middle, and last elements E.g., <8, 1, 4, 9, 6, 3, 5, 2, 7, 0> Has been shown to reduce running time (comparisons) by 14%

Quick Sort (cont’d) Partitioning strategy Partitioning is conceptually straightforward, but easy to do inefficiently Good strategy Swap pivot with last element A[right] Set i = left Set j = (right – 1) While (i < j) Increment i until A[i] > pivot Decrement j until A[j] < pivot If (i < j) then swap A[i] and A[j] Swap pivot and A[i]

Partitioning Example 8 1 4 9 6 3 5 2 7 0 Initial array 8 1 4 9 0 3 5 2 7 6 Swap pivot; initialize i and j i j 8 1 4 9 0 3 5 2 7 6 Position i and j i j 2 1 4 9 0 3 5 8 7 6 After first swap 2 1 4 9 0 3 5 8 7 6 Before second swap i j 2 1 4 5 0 3 9 8 7 6 After second swap 2 1 4 5 0 3 9 8 7 6 Before third swap j i 2 1 4 5 0 3 6 8 7 9 After swap with pivot i

Quick Sort (cont’d) Partitioning strategy How to handle duplicates? Consider the case where all elements are equal. Current approach: Skip over elements equal to pivot No swaps (good) But then i = (right – 1) and array partitioned into N – 1 and 1 elements Worst-case performance: O(N2)

Quick Sort (cont’d) Partitioning strategy How to handle duplicates? Alternate approach Don’t skip elements equal to pivot Increment i while A[i] < pivot Decrement j while A[j] > pivot Adds some unnecessary swaps But results in perfect partitioning for the array of identical elements Unlikely for input array, but more likely for recursive calls to quick sort

Which Sort to Use? When array A is small, generating lots of recursive calls on small sub-arrays is expensive General strategy When N < threshold, use a sort more efficient for small arrays (e.g. insertion sort) Good thresholds range from 5 to 20 Also avoids issue with finding median-of-three pivot for array of size 2 or less Has been shown to reduce running time by 15%

Analysis of Quick Sort Let m be the number of elements sent to the left partition Compute running time T(N) for array of size N T(0) = T(1) = O(1) T(N) = T(m) + T(N – m – 1) + O(N) Pivot selection takes constant time Time to sort left partition Time to sort right partition Linear time spent in partition

Analysis of Quick Sort (cont’d) Recurrence formula: T(N) = T(m) + T(N – m – 1) + O(N) Worst-case analysis Pivot is the smallest element (m = 0) T(N) = T(0) + T(N – 1) + O(N) T(N) = O(1) + T(N – 1) + O(N) T(N) = T(N – 1) + O(N); since T(N – 1) = T(N – 2) + O(N – 1); T(N) = T(N – 2) + O(N – 1) + O(N) T(N) = T(N – 3) + O(N – 2) + O(N – 1) + O(N) …

Analysis of Quick Sort (cont’d) Recurrence formula: T(N) = T(m) + T(N – m – 1) + O(N) Best-case analysis Pivot is in the middle (m = N / 2) T(N) = T(N / 2) + T(N / 2) + O(N) T(N) = 2T(N / 2) + O(N) T(N) = O(N log N) Average-case analysis Assuming each partition equally likely

Comparison of Sorting Algorithms

Comparison of Sorting Algorithms (cont’d) ~3 hours Good sorting applets: http://www.sorting-algorithms.com http://math.hws.edu/TMCM/java/xSortLab/

Lower Bound on Sorting Best worst-case sorting algorithm (so far) is O(N log N) Can we do better? Can we prove a lower bound on the sorting problem? Preview For comparison-based sorting, we can’t do better We can show lower bound of (N log N)

Decision Trees A decision tree is a binary tree Each node represents a set of possible orderings of the array elements Each branch represents an outcome of a particular comparison Each leaf of the decision tree represents a particular ordering of the original array elements

Decision Trees (cont’d) Decision tree for sorting three elements

Decision Trees (cont’d) The logic of every sorting algorithm that uses comparisons can be represented by a decision tree In the worst case, the number of comparisons used by the algorithm equals the depth of the deepest leaf In the average case, the number of comparisons is the average of the depth of all leaves There are N! different orderings of N elements

Lower Bound for Comparison-based Sorting Lemma 7.1: A binary tree of depth d has at most 2d leaves Lemma 7.2: A binary tree with L leaves must have depth at least log L Theorem 7.6: Any comparison-based sorting requires at least log (N!) comparison in the worst case Theorem 7.7: Any comparison-based sorting requires (N log N) comparisons

Linear Sorting Some constraints on input array allow faster than (N log N) sorting (no comparisons) Counting Sort1 Given array A of N integer elements, each less than M Create array C of size M, where C[i] is the number of i’s in A Use C to place elements into new sorted array B Running time (N + M) = (N) if M = (N) 1Weiss incorrectly calls this Bucket Sort.

Linear Sorting (cont’d) Bucket Sort Assume N elements of A uniformly distributed over the range [0, 1) Create N equal-size buckets over [0, 1) Add each element of A into appropriate bucket Sort each bucket (e.g. with insertion sort) Return concatenation of buckets Average-case running time (N) Assumes each bucket will contain (1) elements

External Sorting What if the number of elements N we wish to sort do not fit in main memory? Obviously, our existing sorting algorithms are inefficient Each comparison potentially requires a disk access Once again, we want to minimize disk accesses

External Merge Sort N = number of elements in array A to be sorted M = number of elements that fit in main memory K = N / M Approach Read in M amount of A, sort it using quick sort, and write it back to disks: O(M log M) Repeat above K times until all of A processed Create K input buffers and 1 output buffer, each of size M / (K + 1) Perform a K-way merge: O(N) Update input buffers one disk-page at a time Write output buffer one disk-page at a time

Multiway Merge (3-way) Example Ta1 81 94 11 96 12 35 17 99 28 58 41 75 15 Ta1 Ta2 Ta3 Tb1 11 81 94 41 58 75 Tb2 12 35 96 15 Tb3 17 28 99 Ta1 11 12 17 28 35 81 94 96 99 Ta2 15 41 58 75 Ta3 Tb1 Tb2 Tb3 Ta1 Ta2 Ta3 Tb1 11 12 15 17 28 35 41 58 75 81 94 96 99 Tb2 Tb3

Sorting: Summary Need for sorting is ubiquitous in software Optimizing the sorting algorithm to the domain is essential Good general-purpose algorithms available Quick sort Optimizations continue… Sort benchmark http://www.hpl.hp.com/hosted/sortbenchmark