May 26th –non-comparison sorting

Slides:



Advertisements
Similar presentations
Sorting in Linear Time Introduction to Algorithms Sorting in Linear Time CSE 680 Prof. Roger Crawfis.
Advertisements

Analysis of Algorithms
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Dan Grossman Spring 2010.
Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
CSCE 3110 Data Structures & Algorithm Analysis
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Tyler Robison Summer
Quicksort.
CSE 326: Data Structures Sorting Ben Lerner Summer 2007.
Sorting in Linear Time Lower bound for comparison-based sorting
CSE 373 Data Structures Lecture 15
CSE373: Data Structure & Algorithms Lecture 20: More Sorting Lauren Milne Summer 2015.
HKOI 2006 Intermediate Training Searching and Sorting 1/4/2006.
CSC 41/513: Intro to Algorithms Linear-Time Sorting Algorithms.
CSE373: Data Structures & Algorithms Lecture 20: Beyond Comparison Sorting Dan Grossman Fall 2013.
Introduction to Algorithms Jiafen Liu Sept
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Dan Grossman Spring 2012.
CS 61B Data Structures and Programming Methodology July 28, 2008 David Sun.
1 Joe Meehean.  Problem arrange comparable items in list into sorted order  Most sorting algorithms involve comparing item values  We assume items.
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
CSE373: Data Structure & Algorithms Lecture 22: More Sorting Linda Shapiro Winter 2015.
CSE373: Data Structure & Algorithms Lecture 21: More Sorting and Other Classes of Algorithms Lauren Milne Summer 2015.
CSE373: Data Structure & Algorithms Lecture 22: More Sorting Nicki Dell Spring 2014.
Sorting Lower Bounds n Beating Them. Recap Divide and Conquer –Know how to break a problem into smaller problems, such that –Given a solution to the smaller.
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
FURQAN MAJEED ALGORITHMS. A computer algorithm is a detailed step-by-step method for solving a problem by using a computer. An algorithm is a sequence.
Advanced Sorting 7 2  9 4   2   4   7
CSE373: Data Structures & Algorithms
Lower Bounds & Sorting in Linear Time
Chapter 11 Sorting Acknowledgement: These slides are adapted from slides provided with Data Structures and Algorithms in C++, Goodrich, Tamassia and Mount.
Sorting.
Advanced Algorithms Analysis and Design
CSC317 Selection problem q p r Randomized‐Select(A,p,r,i)
Week 9 - Monday CS 113.
CSE373: Data Structure & Algorithms Lecture 23: More Sorting and Other Classes of Algorithms Catie Baker Spring 2015.
CSCI 104 Sorting Algorithms
May 17th – Comparison Sorts
CSE373: Data Structures & Algorithms
Introduction to Algorithms
Lecture 22 Complexity and Reductions
Cse 373 April 12th – TreEs.
Week 12 - Wednesday CS221.
Algorithms Furqan Majeed.
Quicksort 1.
Algorithm Design and Analysis (ADA)
Instructor: Lilian de Greef Quarter: Summer 2017
Quick-Sort 11/14/2018 2:17 PM Chapter 4: Sorting    7 9
Linear Sorting Sections 10.4
Quick-Sort 11/19/ :46 AM Chapter 4: Sorting    7 9
Objective of This Course
CSE373: Data Structure & Algorithms Lecture 22: More Sorting
Bin Sort, Radix Sort, Sparse Arrays, and Stack-based Depth-First Search CSE 373, Copyright S. Tanimoto, 2002 Bin Sort, Radix.
8/04/2009 Many thanks to David Sun for some of the included slides!
Lower Bounds & Sorting in Linear Time
CSE373: Data Structure & Algorithms Lecture 21: Comparison Sorting
Linear-Time Sorting Algorithms
Quick-Sort 2/23/2019 1:48 AM Chapter 4: Sorting    7 9
Bin Sort, Radix Sort, Sparse Arrays, and Stack-based Depth-First Search CSE 373, Copyright S. Tanimoto, 2001 Bin Sort, Radix.
Analysis of Algorithms
Quicksort.
Analysis of Algorithms
CSE 373 Data Structures and Algorithms
Analysis of Algorithms
CSE 326: Data Structures: Sorting
The Selection Problem.
CSE 332: Sorting II Spring 2016.
CSE 326: Data Structures Lecture #14
Analysis of Algorithms
Algorithm Course Algorithms Lecture 3 Sorting Algorithm-1
Quicksort.
Lecture 22 Complexity and Reductions
Presentation transcript:

May 26th –non-comparison sorting Cse 373 May 26th –non-comparison sorting

Assorted Minutiae HW6 Out – Due next Wednesday No Java Libraries

Assorted Minutiae HW6 Out – Due next Wednesday No Java Libraries Two exam review sessions Wednesday: 1:00 – 2:20 – CMU 120 Friday: 4:30 – 6:20 – EEB 105

Today Non-comparison sorts

Sorting “Slow” sorts

Sorting “Slow” sorts Insertion Selection

Sorting “Slow” sorts Insertion Selection “Fast” sorts

Sorting “Slow” sorts Insertion Selection “Fast” sorts Quick Merge Heap

Sorting “Slow” sorts Insertion Selection “Fast” sorts Quick Merge Heap These are all comparison sorts, can’t do better than O(n log n)

Sorting Non-comparison sorts

Sorting Non-comparison sorts If we know something about the data, we don’t strictly need to compare objects to each other

Sorting Non-comparison sorts If we know something about the data, we don’t strictly need to compare objects to each other If there are only a few possible values and we know what they are, we can just sort by identifying the value

Sorting Non-comparison sorts If we know something about the data, we don’t strictly need to compare objects to each other If there are only a few possible values and we know what they are, we can just sort by identifying the value If the data are strings and ints of finite length, then we can take advantage of their sorted order.

Sorting Two sorting techniques we use to this end

Sorting Two sorting techniques we use to this end Bucket sort

Sorting Two sorting techniques we use to this end Bucket sort Radix sort

Sorting Two sorting techniques we use to this end Bucket sort Radix sort If the data is sufficiently structured, we can get O(n) runtimes

BucketSort If all values to be sorted are known to be integers between 1 and K (or any small range): Create an array of size K Put each element in its proper bucket (a.k.a. bin) If data is only integers, no need to store more than a count of how times that bucket has been used Output result via linear pass through array of buckets count array 1 3 2 4 5 Example: K=5 input (5,1,3,4,3,2,1,1,5,4,5) output: 1,1,1,2,3,3,4,4,5,5,5

Analyzing Bucket Sort Overall: O(n+K) Linear in n, but also linear in K Good when K is smaller (or not much larger) than n We don’t spend time doing comparisons of duplicates Bad when K is much larger than n Wasted space; wasted time during linear O(K) pass For data in addition to integer keys, use list at each bucket

Bucket Sort Most real lists aren’t just keys; we have data Each bucket is a list (say, linked list) To add to a bucket, insert in O(1) (at beginning, or keep pointer to last element) Example: Movie ratings; scale 1-5 Input: 5: Casablanca 3: Harry Potter movies 5: Star Wars Original Trilogy 1: Rocky V count array 1 2 3 4 5 Rocky V Harry Potter Casablanca Star Wars Result: 1: Rocky V, 3: Harry Potter, 5: Casablanca, 5: Star Wars Easy to keep ‘stable’; Casablanca still before Star Wars

Radix sort Radix = “the base of a number system” Examples will use base 10 because we are used to that In implementations use larger numbers For example, for ASCII strings, might use 128 Idea: Bucket sort on one digit at a time Number of buckets = radix Starting with least significant digit Keeping sort stable Do one pass per digit Invariant: After k passes (digits), the last k digits are sorted

Radix Sort Example Radix = 10 Input: 478, 537, 9, 721, 3, 38, 143, 67 3 passes (input is 3 digits at max), on each pass, stable sort the input highlighted in yellow 4 7 8 5 3 7 0 0 9 7 2 1 0 0 3 0 3 8 1 4 3 0 6 7

Analysis Input size: n Number of buckets = Radix: B Number of passes = “Digits”: P Work per pass is 1 bucket sort: O(B+n) Total work is O(P(B+n)) Compared to comparison sorts, sometimes a win, but often not Example: Strings of English letters up to length 15 Run-time proportional to: 15*(52 + n) This is less than n log n only if n > 33,000 Of course, cross-over point depends on constant factors of the implementations

Sorting Takeaways Simple O(n2) sorts can be fastest for small n Selection sort, Insertion sort (latter linear for mostly-sorted) Good for “below a cut-off” to help divide-and-conquer sorts

Sorting Takeaways Simple O(n2) sorts can be fastest for small n Selection sort, Insertion sort (latter linear for mostly-sorted) Good for “below a cut-off” to help divide-and-conquer sorts O(n log n) sorts Heap sort, in-place but not stable nor parallelizable Merge sort, not in place but stable and works as external sort Quick sort, in place but not stable and O(n2) in worst-case Often fastest, but depends on costs of comparisons/copies

Sorting Takeaways Simple O(n2) sorts can be fastest for small n Selection sort, Insertion sort (latter linear for mostly-sorted) Good for “below a cut-off” to help divide-and-conquer sorts O(n log n) sorts Heap sort, in-place but not stable nor parallelizable Merge sort, not in place but stable and works as external sort Quick sort, in place but not stable and O(n2) in worst-case Often fastest, but depends on costs of comparisons/copies  (n log n) is worst-case and average lower-bound for sorting by comparisons

Sorting Takeaways Simple O(n2) sorts can be fastest for small n Selection sort, Insertion sort (latter linear for mostly-sorted) Good for “below a cut-off” to help divide-and-conquer sorts O(n log n) sorts Heap sort, in-place but not stable nor parallelizable Merge sort, not in place but stable and works as external sort Quick sort, in place but not stable and O(n2) in worst-case Often fastest, but depends on costs of comparisons/copies  (n log n) is worst-case and average lower-bound for sorting by comparisons Non-comparison sorts Bucket sort good for small number of possible key values Radix sort uses fewer buckets and more phases Best way to sort? It depends!

Sorting Takeaways Simple O(n2) sorts can be fastest for small n Selection sort, Insertion sort (latter linear for mostly-sorted) Good for “below a cut-off” to help divide-and-conquer sorts O(n log n) sorts Heap sort, in-place but not stable nor parallelizable Merge sort, not in place but stable and works as external sort Quick sort, in place but not stable and O(n2) in worst-case Often fastest, but depends on costs of comparisons/copies  (n log n) is worst-case and average lower-bound for sorting by comparisons Non-comparison sorts Bucket sort good for small number of possible key values Radix sort uses fewer buckets and more phases Best way to sort? It depends!

Algorithm Design Solving well known problems is great, but how can we use these lessons to approach new problems?

Algorithm Design Solving well known problems is great, but how can we use these lessons to approach new problems? Guess and Check

Algorithm Design Solving well known problems is great, but how can we use these lessons to approach new problems? Guess and Check (Brute Force)

Algorithm Design Solving well known problems is great, but how can we use these lessons to approach new problems? Guess and Check (Brute Force) Linear Solving

Algorithm Design Solving well known problems is great, but how can we use these lessons to approach new problems? Guess and Check (Brute Force) Linear Solving Divide and Conquer

Algorithm Design Solving well known problems is great, but how can we use these lessons to approach new problems? Guess and Check (Brute Force) Linear Solving Divide and Conquer Randomization and Approximation

Algorithm Design Solving well known problems is great, but how can we use these lessons to approach new problems? Guess and Check (Brute Force) Linear Solving Divide and Conquer Randomization and Approximation Dynamic Programming

Linear solving Basic linear approach to problem solving

Linear solving Basic linear approach to problem solving If the decider creates a set of correct answers, find one at a time

Linear solving Basic linear approach to problem solving If the decider creates a set of correct answers, find one at a time Selection sort: find the lowest element at each run through Sometimes, the best solution Find the smallest element of an unsorted array

Algorithm Design Which approach should be used comes down to how difficult the problem is

Algorithm Design Which approach should be used comes down to how difficult the problem is How do we describe problem difficulty? P : Set of problems that can be solved in polynomial time

Algorithm Design Which approach should be used comes down to how difficult the problem is How do we describe problem difficulty? P : Set of problems that can be solved in polynomial time NP : Set of problems that can be verified in polynomial time

Algorithm Design Which approach should be used comes down to how difficult the problem is How do we describe problem difficulty? P : Set of problems that can be solved in polynomial time NP : Set of problems that can be verified in polynomial time EXP: Set of problems that can be solved in exponential time

Algorithm Design Some problems are provably difficult

Algorithm Design Some problems are provably difficult Humans haven’t beaten a computer in chess in years, but computers are still far away from “solving” chess

Algorithm Design Some problems are provably difficult Humans haven’t beaten a computer in chess in years, but computers are still far away from “solving” chess At each move, the computer needs to approximate the best move

Algorithm Design Some problems are provably difficult Humans haven’t beaten a computer in chess in years, but computers are still far away from “solving” chess At each move, the computer needs to approximate the best move Certainty always comes at a price

Approximation design What is approximated in the chess game?

Approximation design What is approximated in the chess game? Board quality – If you could easily rank which board layout in order of quality, chess is simply choosing the best board

Approximation design What is approximated in the chess game? Board quality – If you could easily rank which board layout in order of quality, chess is simply choosing the best board It is very difficult, branching factor for chess is ~35

Approximation design What is approximated in the chess game? Board quality – If you could easily rank which board layout in order of quality, chess is simply choosing the best board It is very difficult, branching factor for chess is ~35 Look as many moves into the future as time allows to see which move yields the best outcome

Approximation design Recognize what piece of information is costly and useful for your algorithm

Approximation design Recognize what piece of information is costly and useful for your algorithm Consider if there is a cheap way to estimate that information

Approximation design Recognize what piece of information is costly and useful for your algorithm Consider if there is a cheap way to estimate that information Does your client have a tolerance for error? Can you map this problem to a similar problem? “Greedy” algorithms are often approximators

Randomization design Randomization is also another approach

Randomization design Randomization is also another approach Selecting a random pivot in quicksort gives us more certainty in the runtime

Randomization design Randomization is also another approach Selecting a random pivot in quicksort gives us more certainty in the runtime This doesn’t impact correctness, a randomized quicksort still returns a sorted list

Randomization design Randomization is also another approach Selecting a random pivot in quicksort gives us more certainty in the runtime This doesn’t impact correctness, a randomized quicksort still returns a sorted list Two types of randomized algorithms Las Vegas – correct result in random time

Randomization design Randomization is also another approach Selecting a random pivot in quicksort gives us more certainty in the runtime This doesn’t impact correctness, a randomized quicksort still returns a sorted list Two types of randomized algorithms Las Vegas – correct result in random time Montecarlo – estimated result in deterministic time

Randomization design Can we make a Montecarlo quicksort?

Randomization design Can we make a Montecarlo quicksort? Runs O(n log n) time, but not guaranteed to be correct

Randomization design Can we make a Montecarlo quicksort? Runs O(n log n) time, but not guaranteed to be correct Terminate a random quicksort early!

Randomization design Can we make a Montecarlo quicksort? Runs O(n log n) time, but not guaranteed to be correct Terminate a random quicksort early! If you haven’t gotten the problem in some constrained time, just return what you have.

Randomization design How close is a sort? If we say a list is 90% sorted, what do we mean?

Randomization design How close is a sort? If we say a list is 90% sorted, what do we mean? 90% of elements are smaller than the object to the right of it?

Randomization design How close is a sort? If we say a list is 90% sorted, what do we mean? 90% of elements are smaller than the object to the right of it? The longest sorted subsequence is 90% of the length?

Randomization design How close is a sort? If we say a list is 90% sorted, what do we mean? 90% of elements are smaller than the object to the right of it? The longest sorted subsequence is 90% of the length? Analysis for these problems can be very tricky, but it’s an important approach

Algorithms There aren’t many easy problems left! Understand the tools for problem solving

Algorithms There aren’t many easy problems left! Understand the tools for problem solving Eliminate as many non-feasible solutions as possible

Algorithms There aren’t many easy problems left! Understand the tools for problem solving Eliminate as many non-feasible solutions as possible Understand, that some problems are too difficult for a fast, elegant solution

Algorithms There aren’t many easy problems left! Understand the tools for problem solving Eliminate as many non-feasible solutions as possible Understand, that some problems are too difficult for a fast, elegant solution Academics are great for providing ideas, but sometimes better asymptotic runtimes don’t become apparent until n > 1010

Hardware constraints So far, we’ve taken for granted that memory access in the computer is constant and easily accessible

Hardware constraints So far, we’ve taken for granted that memory access in the computer is constant and easily accessible This isn’t always true!

Hardware constraints So far, we’ve taken for granted that memory access in the computer is constant and easily accessible This isn’t always true! At any given time, some memory might be cheaper and easier to access than others

Hardware constraints So far, we’ve taken for granted that memory access in the computer is constant and easily accessible This isn’t always true! At any given time, some memory might be cheaper and easier to access than others Memory can’t always be accessed easily

Hardware constraints So far, we’ve taken for granted that memory access in the computer is constant and easily accessible This isn’t always true! At any given time, some memory might be cheaper and easier to access than others Memory can’t always be accessed easily Sometimes the OS lies, and says an object is “in memory” when it’s actually on the disk

Hardware constraints Back on 32-bit machines, each program had access to 4GB of memory

Hardware constraints Back on 32-bit machines, each program had access to 4GB of memory This isn’t feasible to provide!

Hardware constraints Back on 32-bit machines, each program had access to 4GB of memory This isn’t feasible to provide! Sometimes there isn’t enough available, and so memory that hasn’t been used in a while gets pushed to the disk

Hardware constraints Back on 32-bit machines, each program had access to 4GB of memory This isn’t feasible to provide! Sometimes there isn’t enough available, and so memory that hasn’t been used in a while gets pushed to the disk Memory that is frequently accessed goes to the cache, which is even faster than RAM

Next week No class on Monday – Happy Memorial Day!

Next week No class on Monday – Happy Memorial Day! Formalize discussion of the “memory mountain” and how this should impact your design decisions