Course notes CS2606: Data Structures and Object-Oriented Development Chapter 7: Sorting Department of Computer Science Virginia Tech Spring 2008 (The following.

Slides:



Advertisements
Similar presentations
Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
Advertisements

CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Dan Grossman Spring 2010.
§7 Quicksort -- the fastest known sorting algorithm in practice 1. The Algorithm void Quicksort ( ElementType A[ ], int N ) { if ( N < 2 ) return; pivot.
Lower bound for sorting, radix sort COMP171 Fall 2005.
1 Dictionary Often want to insert records, delete records, search for records. Required concepts: Search key: Describe what we are looking for Key comparison.
© Copyright 2012 by Pearson Education, Inc. All Rights Reserved. 1 Chapter 17 Sorting.
Liang, Introduction to Java Programming, Eighth Edition, (c) 2011 Pearson Education, Inc. All rights reserved Chapter 24 Sorting.
Internal Sorting A brief review and some new ideas CS 400/600 – Data Structures.
Sorting Algorithms and Average Case Time Complexity
Sorting Chapter Sorting Consider list x 1, x 2, x 3, … x n We seek to arrange the elements of the list in order –Ascending or descending Some O(n.
Insertion Sort void inssort (ELEM * array, int n) { for (int I=1; I0) && (key(array[j])
Lower bound for sorting, radix sort COMP171 Fall 2006.
1 ICS 353 Design and Analysis of Algorithms Spring Semester (062) King Fahd University of Petroleum & Minerals Information & Computer Science.
Sorting Heapsort Quick review of basic sorting methods Lower bounds for comparison-based methods Non-comparison based sorting.
Lecture 25 Selection sort, reviewed Insertion sort, reviewed Merge sort Running time of merge sort, 2 ways to look at it Quicksort Course evaluations.
© 2006 Pearson Addison-Wesley. All rights reserved10-1 Chapter 10 Algorithm Efficiency and Sorting CS102 Sections 51 and 52 Marc Smith and Jim Ten Eyck.
Department of Computer Eng. & IT Amirkabir University of Technology (Tehran Polytechnic) Data Structures Lecturer: Abbas Sarraf Internal.
Analysis of Algorithms CS 477/677
CS 202, Spring 2003 Fundamental Structures of Computer Science II Bilkent University1 Sorting CS 202 – Fundamental Structures of Computer Science II Bilkent.
© 2006 Pearson Addison-Wesley. All rights reserved10 A-1 Chapter 10 Algorithm Efficiency and Sorting.
Sorting in Linear Time Lower bound for comparison-based sorting
CSE 373 Data Structures Lecture 15
Chapter 7 Internal Sorting. Sorting Each record contains a field called the key. –Linear order: comparison. Measures of cost: –Comparisons –Swaps.
Week 2 CS 361: Advanced Data Structures and Algorithms
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
Merge Sort. What Is Sorting? To arrange a collection of items in some specified order. Numerical order Lexicographical order Input: sequence of numbers.
C++ Programming: From Problem Analysis to Program Design, Second Edition Chapter 19: Searching and Sorting.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Sorting Fun1 Chapter 4: Sorting     29  9.
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Dan Grossman Spring 2012.
CS 61B Data Structures and Programming Methodology July 28, 2008 David Sun.
Analysis of Algorithms CS 477/677
Coursenotes CS3114: Data Structures and Algorithms Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright ©
Sorting. Pseudocode of Insertion Sort Insertion Sort To sort array A[0..n-1], sort A[0..n-2] recursively and then insert A[n-1] in its proper place among.
1 Sorting Algorithms Sections 7.1 to Comparison-Based Sorting Input – 2,3,1,15,11,23,1 Output – 1,1,2,3,11,15,23 Class ‘Animals’ – Sort Objects.
Data Structure Introduction.
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
Algorithm Efficiency There are often many approaches (algorithms) to solve a problem. How do we choose between them? At the heart of a computer program.
Chapter 18: Searching and Sorting Algorithms. Objectives In this chapter, you will: Learn the various search algorithms Implement sequential and binary.
Priority Queues and Heaps. October 2004John Edgar2  A queue should implement at least the first two of these operations:  insert – insert item at the.
Java Methods Big-O Analysis of Algorithms Object-Oriented Programming
Sorting. Sorting Terminology Sort Key –each element to be sorted must be associated with a sort key which can be compared with other keys e.g. for any.
Review 1 Selection Sort Selection Sort Algorithm Time Complexity Best case Average case Worst case Examples.
Coursenotes CS3114: Data Structures and Algorithms Clifford A. Shaffer Yang Cao Department of Computer Science Virginia Tech Copyright ©
Week 13 - Friday.  What did we talk about last time?  Sorting  Insertion sort  Merge sort  Started quicksort.
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
UNCA CSCI September, 2001 These notes were prepared by the text’s author Clifford A. Shaffer Department of Computer Science Virginia Tech Copyright.
Liang, Introduction to Java Programming, Sixth Edition, (c) 2007 Pearson Education, Inc. All rights reserved Chapter 23 Algorithm Efficiency.
Algorithm Complexity L. Grewe 1. Algorithm Efficiency There are often many approaches (algorithms) to solve a problem. How do we choose between them?
1 UNIT-I BRUTE FORCE ANALYSIS AND DESIGN OF ALGORITHMS CHAPTER 3:
Liang, Introduction to Java Programming, Ninth Edition, (c) 2013 Pearson Education, Inc. All rights reserved. 1 Chapter 25 Sorting.
CSE 326: Data Structures Lecture 23 Spring Quarter 2001 Sorting, Part 1 David Kaplan
Week 13 - Wednesday.  What did we talk about last time?  NP-completeness.
Internal Sorting Each record contains a field called the key. The keys can be places in a linear order. The Sorting Problem Given a sequence of record.
BITS Pilani Pilani Campus Data Structure and Algorithms Design Dr. Maheswari Karthikeyan Lecture1.
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
Liang, Introduction to Java Programming, Tenth Edition, (c) 2013 Pearson Education, Inc. All rights reserved. 1 Chapter 23 Sorting.
Chapter 23 Sorting Jung Soo (Sue) Lim Cal State LA.
Chapter 24 Sorting.
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Description Given a linear collection of items x1, x2, x3,….,xn
Mergesort.
COMP 352 Data Structures and Algorithms
CSE 326: Data Structures Sorting
Lower bound for sorting, radix sort
CSE 373 Data Structures and Algorithms
CSCI 333 Data Structures Chapter 6 14 and 16 October 2002.
These notes were prepared by the text’s author Clifford A. Shaffer
Presentation transcript:

Course notes CS2606: Data Structures and Object-Oriented Development Chapter 7: Sorting Department of Computer Science Virginia Tech Spring 2008 (The following notes were derived from Cliff Shaffer’s textbook and notes)

Algorithm Analysis (Review) See Chapter 3 of the Shaffer text The running time of an algorithm is a function T(n) where n typically represents input size 1 T(n) measures the amount of time to carry out basic operations Example: T(n) = c 1 n + c 2 for an algorithm involving a single for-loop

Big-Oh Notation Big-Oh notation allows one to make a statement regarding an upper bound for the running time of an algorithm (up to a constant factor) Single for-loop example: T(n) is in O(n) Big-Oh describes how “bad” things can get. (the tightest upper bound is what is usually sought)

Best, Worst, Average Cases Not all inputs of a given size take the same time to run. Sequential search for K in an array of n integers: Begin at first element in array and look at each element in turn until K is found Best case: Worst case: Average case:

Which Analysis to Use? While average time appears to be the fairest measure, it may be difficult to determine. When is the worst case time important?

A Common Misunderstanding “The best case for my algorithm is n=1 because that is the fastest.” WRONG! Big-oh refers to a growth rate as n grows to . Best case is defined as which input of size n is cheapest among all inputs of size n.

Sorting Sorting: arranging a set of records in increasing (non-decreasing) order Sorting problem defined Given a set of records r 1, r 2, … r n, with key values k 1, k 2, … k n, arrange the records into an ordering s such that records r s 1, r s 2, … r s n, have keys obeying the property r s 1 <= r s 2 <= … <= r s n

Sorting Each record contains a field called the key. –Linear order: comparison. Measures of cost (basic operations): –Comparisons –Swaps

Comparator Class How do we generalize comparison? Use ==, =: Disastrous Overload ==, =: Disastrous Define a function with a standard name –Implied obligation –Breaks down with multiple key fields/indices for same object Pass in function (Strategy design pattern) –Explicit obligation –Function parameter –Template parameter

Comparator Example // Compare two ints class intintCompare { public: static bool lt(int x, int y) { return x < y; } static bool eq(int x, int y) { return x == y; } static bool gt(int x, int y) { return x > y; } }; // Compare two character strings class CCCompare { public: static bool lt(char* x, char* y) { return strcmp(x, y) < 0; } static bool eq(char* x, char* y) { return strcmp(x, y) == 0; } static bool gt(char* x, char* y) { return strcmp(x, y) > 0; } };

Getting Keys: Payroll Entry class Payroll { private: int ID; char* name; char* address; public: int getID() { return ID; } char* getname() { return name; } char* getaddr() { return address; } };

Payroll getKey Functions class getID { // Use the ID field as a key public: static int key(Payroll* e) { return e->getID(); } }; class getname { // Use the name field as a key public: static char* key(Payroll* e) { return e->getname(); } }; class getaddr { // Use the addr field as a key public: static char* key(Payroll* e) { return e->getaddr(); } };

Dictionary ADT // The Dictionary abstract class. Class Compare compares // two keys. Class getKey gets a key from an element. template class Dictionary { public: virtual void clear() = 0; virtual bool insert(const Elem&) = 0; virtual bool remove(const Key&, Elem&) = 0; virtual bool removeAny(Elem&) = 0; virtual bool find(const Key&, Elem&) const = 0; virtual int size() = 0; };

Using the Comparator int main(int argc, char** argv) { UALdict IDdict; UALdict namedict; Payroll *foo1, *foo2, *findfoo1, *findfoo2; foo1 = new Payroll(5, "Joe", "Anytown"); foo2 = new Payroll(10, "John", "Mytown"); IDdict.insert(foo1); IDdict.insert(foo2); namedict.insert(foo1); namedict.insert(foo2); if(IDdict.find(5, findfoo1)) cout << findfoo1; if(namedict.find("John", findfoo2)) cout << findfoo2; }

Sorting Algorithms Insertion Sort Bubble Sort Selection Sort Shell Sort Quick Sort Merge Sort Heap Sort Bin Sort Radix Sort

Insertion Sort (1)

Insertion Sort (2) template void inssort(Elem A[], int n) { for (int i=1; i<n; i++) for (int j=i; (j>0) && (Comp::lt(A[j], A[j-1])); j--) swap(A, j, j-1); } Best Case: 0 swaps, n-1 comparisons Worst Case: n 2 /2 swaps and comparisons Average Case: n 2 /4 swaps and comparisons

Bubble Sort (1)

Bubble Sort (2) template void bubsort(Elem A[], int n) { for (int i=0; i<n-1; i++) for (int j=n-1; j>i; j--) if (Comp::lt(A[j], A[j-1])) swap(A, j, j-1); } Best Case: 0 swaps, n 2 /2 comparisons Worst Case: n 2 /2 swaps and comparisons Average Case: n 2 /4 swaps, n 2 /2 comparisons

Selection Sort (1)

Selection Sort (2) template void selsort(Elem A[], int n) { for (int i=0; i<n-1; i++) { int lowindex = i; // Remember its index for (int j=n-1; j>i; j--) // Find least if (Comp::lt(A[j], A[lowindex])) lowindex = j; // Put it in place swap(A, i, lowindex); } Best Case: 0 swaps (n-1 as written), n 2 /2 comparisons Worst Case: n-1 swaps, n 2 /2 comparisons Average Case: O(n) swaps and n 2 /2 comparisons

Pointer Swapping

Summary InsertionBubbleSelection Comparisons: Best Case (n)(n) (n2)(n2) (n2)(n2) Average Case (n2)(n2) (n2)(n2) (n2)(n2) Worst Case (n2)(n2) (n2)(n2) (n2)(n2) Swaps Best Case00 (n)(n) Average Case (n2)(n2) (n2)(n2) (n)(n) Worst Case (n2)(n2) (n2)(n2) (n)(n)

Exchange Sorting All of the sorts so far rely on exchanges of adjacent records. What is the average number of exchanges required? –There are n! permutations –Consider permuation X and its reverse, X’ –Together, every pair requires n(n-1)/2 exchanges.

Shellsort

// Modified version of Insertion Sort template void inssort2(Elem A[], int n, int incr) { for (int i=incr; i<n; i+=incr) for (int j=i; (j>=incr) && (Comp::lt(A[j], A[j-incr])); j-=incr) swap(A, j, j-incr); } template void shellsort(Elem A[], int n) { // Shellsort for (int i=n/2; i>2; i/=2) // For each incr for (int j=0; j<i; j++) // Sort sublists inssort2 (&A[j], n-j, i); inssort2 (A, n, 1); }

Quicksort template void qsort(Elem A[], int i, int j) { if (j <= i) return; // List too small int pivotindex = findpivot(A, i, j); swap(A, pivotindex, j); // Put pivot at end // k will be first position on right side int k = partition (A, i-1, j, A[j]); swap(A, k, j); // Put pivot in place qsort (A, i, k-1); qsort (A, k+1, j); } template int findpivot(Elem A[], int i, int j) { return (i+j)/2; }

Quicksort Partition template int partition(Elem A[], int l, int r, Elem& pivot) { do { // Move bounds inward until they meet while (Comp::lt(A[++l], pivot)); while ((r != 0) && Comp::gt(A[--r], pivot)); swap(A, l, r); // Swap out-of-place values } while (l < r); // Stop when they cross swap(A, l, r); // Reverse last swap return l; // Return first pos on right } The cost for partition is  (n).

Partition Example

Quicksort Example

Cost of Quicksort Best case: Always partition in half. Worst case: Bad partition. Average case: T(n) = n /(n-1)  (T(k) + T(n-k)) Optimizations for Quicksort: –Better Pivot –Better algorithm for small sublists –Eliminate recursion k=1 n-1

Mergesort List mergesort(List inlist) { if (inlist.length() <= 1)return inlist; List l1 = half of the items from inlist; List l2 = other half of items from inlist; return merge(mergesort(l1), mergesort(l2)); }

Mergesort Implementation template void mergesort(Elem A[], Elem temp[], int left, int right) { if (left == right) return; int mid = (left+right)/2; mergesort (A, temp, left, mid); mergesort (A, temp, mid+1, right); for (int i=left; i<=right; i++) // Copy temp[i] = A[i]; int i1 = left; int i2 = mid + 1; for (int curr=left; curr<=right; curr++) { if (i1 == mid+1) // Left exhausted A[curr] = temp[i2++]; else if (i2 > right) // Right exhausted A[curr] = temp[i1++]; else if (Comp::lt(temp[i1], temp[i2])) A[curr] = temp[i1++]; else A[curr] = temp[i2++]; }}

Optimized Mergesort template void mergesort(Elem A[], Elem temp[], int left, int right) { if ((right-left) <= THRESHOLD) { inssort (&A[left], right-left+1); return; } int i, j, k, mid = (left+right)/2; mergesort (A, temp, left, mid); mergesort (A, temp, mid+1, right); for (i=mid; i>=left; i--) temp[i] = A[i]; for (j=1; j<=right-mid; j++) temp[right-j+1] = A[j+mid]; // Merge sublists back to A for (i=left,j=right,k=left; k<=right; k++) if (Comp::lt(temp[i], temp[j])) A[k] = temp[i++]; else A[k] = temp[j--]; }

Mergesort Cost Mergesort cost: Mergsort is also good for sorting linked lists.

Sorting using a heap Heap (max-heap): a complete binary tree following the property that every node stores a value greater than or equal to the values of both its children A max-heap implements a priority queue Most common implementation of a heap: array implementation (See sections and 5.5 of the text)

Array Implementation (1) Position Parent Left Child Right Child Left Sibling Right Sibling

Array Implementation (1) Parent (r) = Leftchild(r) = Rightchild(r) = Leftsibling(r) = Rightsibling(r) =

Heapsort template void heapsort(Elem A[], int n) { // Heapsort Elem mval; maxheap H(A, n, n); for (int i=0; i<n; i++) // Now sort H.removemax(mval); // Put max at end } Use a max-heap, so that elements end up sorted within the array. Cost of heapsort: Cost of finding K largest elements:

Heapsort Example (1)

Heapsort Example (2)

Binsort (1) A simple, efficient sort: for (i=0; i<n; i++) B[A[i]] = A[i]; Ways to generalize: –Make each bin the head of a list. –Allow more keys than records.

Binsort (2) template void binsort(Elem A[], int n) { List B[MaxKeyValue]; Elem item; for (i=0; i<n; i++) B[A[i]].append(A[i]); for (i=0; i<MaxKeyValue; i++) for (B[i].setStart(); B[i].getValue(item); B[i].next()) output(item); } Cost:

Radix Sort (1)

Radix Sort (2) template void radix(Elem A[], Elem B[], int n, int k, int r, int cnt[]) { // cnt[i] stores # of records in bin[i] int j; for (int i=0, rtok=1; i<k; i++, rtok*=r) { for (j=0; j<r; j++) cnt[j] = 0; // Count # of records for each bin for(j=0; j<n; j++) cnt[(A[j]/rtok)%r]++; // cnt[j] will be last slot of bin j. for (j=1; j<r; j++) cnt[j] = cnt[j-1] + cnt[j]; for (j=n-1; j>=0; j--) B[--cnt[(A[j]/rtok)%r]] = A[j]; for (j=0; j<n; j++) A[j] = B[j]; }}

Radix Sort Example

Radix Sort Cost Cost:  (nk + rk) How do n, k, and r relate? If key range is small, then this can be  (n). If there are n distinct keys, then the length of a key must be at least log n. –Thus, Radix Sort is  (n log n) in general case

Empirical Comparison (1)

Empirical Comparison (2)

Big-Omega Definition: For T(n) a non-negatively valued function, T(n) is in the set  (g(n)) if there exist two positive constants c and n 0 such that T(n) >= cg(n) for all n > n 0. Meaning: For all data sets big enough (i.e., n > n 0 ), the algorithm always executes in more than cg(n) steps. Lower bound.

Big-Omega Example T(n) = c 1 n 2 + c 2 n. c 1 n 2 + c 2 n >= c 1 n 2 for all n > 1. T(n) >= cn 2 for c = c 1 and n 0 = 1. Therefore, T(n) is in  (n 2 ) by the definition. We want the greatest lower bound.

Theta Notation When big-Oh and  meet, we indicate this by using  (big-Theta) notation. Definition: An algorithm is said to be  (h(n)) if it is in O(h(n)) and it is in  (h(n)).

A Common Misunderstanding Confusing worst case with upper bound. Upper bound refers to a growth rate. Worst case refers to the worst input from among the choices for possible inputs of a given size.

Analyzing Problems Upper bound: Upper bound of best known algorithm. Lower bound: Lower bound for every possible algorithm.

Analyzing Problems: Example Common misunderstanding: No distinction between upper/lower bound when you know the exact running time. Example of imperfect knowledge: Sorting 1. Cost of I/O:  (n). 2. Bubble or insertion sort: O(n 2 ). 3. A better sort (Quicksort, Mergesort, Heapsort, etc.): O(n log n). 4. We prove next that sorting is  (n log n).

Problems and Algorithms Problem: a task to be performed. –Best thought of as inputs and matching outputs Algorithm: a method or a process followed to solve a problem. –Takes an input to the problem and transforms it to corresponding output A problem can have many algorithms. e.g. Sorting  bubble sort, quick sort, others…

Sorting Lower Bound We would like to know a lower bound for all possible sorting algorithms. Sorting is O(n log n) (average, worst cases) because we know of algorithms with this upper bound. Sorting I/O takes  (n) time. We will now prove  (n log n) lower bound for sorting.

Decision Trees

Lower Bound Proof There are n! permutations. A sorting algorithm can be viewed as determining which permutation has been input. Each leaf node of the decision tree corresponds to one permutation. A tree with n nodes has  (log n) levels, so the tree with n! leaves has  (log n!) =  (n log n) levels. Which node in the decision tree corresponds to the worst case?