Sorting & Lower Bounds Jeff Edmonds York University COSC 3101 Lecture 5
Problem 1: Replace... while x!=0 AND y!=0 AND z!=0 do with... while x!=0 OR y!=0 OR z!=0 do Now available on website. DUE: February 13th CORRECTION: Assignment 2
Review of Sorting Algorithms
Minimal effort splitting Lots of effort recombining Lots of effort splitting Minimal effort recombining Merge SortInsertion Sort Quick SortSelection Sort Size of Sublists n /2,n /2 n-1,1 Four Recursive Sorts
Selection Sort 5, 9, 14, 31, 25, 8, 18 Swap with first item 25, 9, 14, 31, 5, 8, 18 Search array for minimum item 5, 9, 14, 31, 25, 8, 18 Search remaining array for min item 5, 8, 14, 31, 25, 9, 18 Swap with first item … Loop … 5, 8, 9, 14, 18, 25, 31
Insertion Sort 25, 9, 14, 31, 5, 8, 18 Insert 1 st item in correct position … Loop … 9, 25, 14, 31, 5, 8, 18 Insert 2 nd item in correct position 9, 14, 25, 31, 5, 8, 18 Insert 3 rd item in correct position 5, 8, 9, 14, 18, 25, 31
Selection Sort Average & Worst Time: (n 2 ) Insertion Sort Average & Worst Time: (n 2 )
Merge Sort Divide and Conquer
Merge Sort Split Set into Two (no real work) 25,31,52,88,98 Get one friend to sort the first half. 14,23,30,62,79 Get one friend to sort the second half.
Merge Sort Merge two sorted lists into one 25,31,52,88,9814,23,30,62,7914,23,25,30,31,52,62,79,88,98
Merge Sort Time: T(n) = = (n log n) 2T(n/2) + (n)
Quick Sort Divide and Conquer
Quick Sort Partition set into two using randomly chosen pivot ≤ 52 ≤
Quick Sort ≤ 52 ≤ 14,23,25,30,31 Get one friend to sort the first half. 62,79,98,88 Get one friend to sort the second half.
Quick Sort 14,23,25,30,31 62,79,98,8852 Glue pieces together. (No real work) 14,23,25,30,31,52,62,79,88,98
Quick Sort Let pivot be the first element in the list? ≤ 31 ≤ 52
Quick Sort ≤ 14 ≤ 14,23,25,30,31,52,62,79,88,98 23,25,30,31,52,62,79,88,98 If the list is already sorted, then the list is worst case unbalanced.
Quick Sort Best Time: Worst Time: Expected Time: T(n) = 2T(n/2) + (n) = (n log(n))
Quick Sort T(n) = 2T(n/2) + (n) = (n log(n)) Best Time: Worst Time: Expected Time: = (n 2 ) T(n) = T(0) + T(n-1) + (n)
Quick Sort T(n) = 2T(n/2) + (n) = (n log(n)) Best Time: T(n) = T(0) + T(n-1) + (n) Worst Time: Expected Time: = (n 2 ) T(n) = T( 1 / 3 n) + T( 2 / 3 n) + (n) = (n log(n))
Heaps, Heap Sort, & Priority Queues
Heap Definition Completely Balanced Binary Tree The value of each node each of the node's children. Left or right child could be larger. Where can 1 go? Maximum is at root. Where can 8 go? Where can 9 go?
Heap Data Structure Completely Balanced Binary Tree Implemented by an Array
Make Heap Get help from friends
Heapify ? Maximum is at root. Where is the maximum?
Find the maximum. Put it in place ? Repeat Heapify
Heap Running Time: Heapify
Make Heap Get help from friends T(n) = 2T(n/2) + log(n) Running time: = (n)
Heaps Heap ?
?
?
?
Running Time: i log(n) -i 2 log(n) -i
Insertion Sort Largest i values are sorted on side. Remaining values are off to side. 6,7,8,9 < Max is easier to find if a heap. Insertion
Heap Sort Largest i values are sorted on side. Remaining values are in a heap.
Heap Data Structure HeapArray Heap Array
Heap Sort Largest i values are sorted on side. Remaining values are in a heap Put next value where it belongs. Heap ?
Heap Sort ? ?? ?? ?? Heap
Heap Sort
Running Time:
Priority Queues Maintains dynamic set, S, of elements, each with a key. Max-priority queue supports: INSERT(S,x) MAXIMUM(S) EXTRACT-MAX(S) INCREASE-KEY(S,x,k) Application: Shedule jobs on a shared computer.
Priority Queues cont’d... MAXIMUM(S): EXTRACT-MAX(S):
Priority Queues cont’d... INSERT(S,x): INCREASE-KEY(S,x,k):
Other Sorting Algorithms: Counting Sort Radix Sort Bucket Sort
Counting Sort Input: Array A[1,…,n], with all elements in {1,..,k}. Output: Sorted array B[1,..,n]. Auxiliary Storage: C[1,..,k] A = _ _ _ B = _ _ _ _ _ C =
Counting Sort Example A = C = C = C =
Counting Sort Example cont’d A = _ 2 _ _ _ _ B = C = 1 _ 2 _ 3 _ _ B = C = 3 _ _ _ B = C = 2
Counting Sort Example cont’d A = _ _ 5 B = C = _ 5 B = C = B = C = 4
Counting Sort Analysis (k)
Counting Sort Analysis (k) (n)
Counting Sort Analysis (k) (n)
Counting Sort Analysis (k) (n)
Counting Sort Analysis (k) (n) (n + k) = (n) if k = O(n)
Radix Sort Sort numbers by columns, starting with least significant digit (radix):
Radix Sort Analysis Assume use of Counting Sort ( (n)) in each pass. For constant d digits, overall (dn). Counting sort not in place, therefore may prefer a comparison sort algorithm.
Bucket Sort Assumes input elements distributed uniformly on [0,1). List elements from buckets in order. Sort each bucket. Distribute n input values into the buckets. Divide [0,1) into equal sized buckets.
Bucket Sort Example
Bucket Sort Analysis Division into buckets & concatentation: (n) Insertion sort: (n 2 ) Define n i = number of elements in bucket B[i]. T(n) = (n) + i=1 O(n i 2 ) Take expectation of both sides: E[T(n)] = (n) + i=1 E[O(n i 2 )] = (n) + n O(2 - 1/n) = (n) n /n
Lower Bounds for Sorting using Information Theory
The Time Complexity of a Problem P Merge, Quick, and Heap Sort can sort N numbers using O(N log N) comparisons between the values. Theorem: No algorithm can sort faster. The Time Complexity of a Problem P: The minimum time needed by an algorithm to solve it.
The Time Complexity of a Problem P The minimum time needed by an algorithm to solve it. Problem P is computable in time T upper (n) if there is an algorithm A which outputs the correct answer in this much time Eg: Sorting computable in T upper (n) = O(n 2 ) time. A, I, A(I)=P(I) and Time(A,I) T upper (|I|) Upper Bound:
Understand Quantifiers!!! One girl Could be a separate girl for each boy. SamMary BobBeth John Marilin Monro FredAnn SamMary BobBeth John Marilin Monro FredAnn A, I, A(I)=P(I) and Time(A,I) T upper (|I|)
The Time Complexity of a Problem P The minimum time needed by an algorithm to solve it. A, I, A(I)=P(I) and Time(A,I) T upper (|I|) I, A, A(I)=P(I) and Time(A,I) T upper (|I|) What does this say? True for any problem P and time T upper. Given fixed I, its output is P(I).
The Time Complexity of a Problem P The minimum time needed by an algorithm to solve it. Time T lower (n) is a lower bound for problem p if no algorithm solves the problem faster. There may be algorithms that give the correct answer or run quickly on some inputs instance. Lower Bound:
The Time Complexity of a Problem P The minimum time needed by an algorithm to solve it. Lower Bound: Time T lower (n) is a lower bound for problem p if no algorithm solves the problem faster. eg: No algorithm can sort N values in T lower = sqrt(N) time. But for every algorithm, there is at least one instance I for which either the algorithm gives the wrong answer or it runs in too much time. A, I, A(I) P(I) or Time(A,I) T lower (|I|)
A, I, A(I)=P(I) and Time(A,I) T upper (|I|) Lower Bound: Upper Bound: The Time Complexity of a Problem P The minimum time needed by an algorithm to solve it. “There is” and “there isn’t a faster algorithm” are almost negations of each other.
A, I, A(I)=P(I) and Time(A,I) T upper (|I|) Upper Bound: Prover-Adversary Game I have an algorithm A that I claim works and is fast. I win if A on input I gives the correct output in the allotted time. Oh yeah, I have an input I for which it does not. What we have been doing all along.
Lower Bound: Prover-Adversary Game I win if A on input I gives the wrong output or runs slow. A, I, A(I) P(I) or Time(A,I) T lower (|I|) Proof by contradiction. I have an algorithm A that I claim works and is fast. Oh yeah, I have an input I for which it does not.
Lower Bound: Prover-Adversary Game A, I, A(I) P(I) or Time(A,I) T lower (|I|) I have an algorithm A that I claim works and is fast. Lower bounds are very hard to prove, because I must consider every algorithm no matter how strange.
The Yes/No Questions Game I choose a number [1..N]. I ask you yes/no questions. Time is the number of questions asked in the worst case. I answer. I determine the number.
The Yes/No Questions Game Upper Bound 6 Great!
The Yes/No Questions Game Upper Bound Time N leaves = # of questions = height = log 2 (N)
The Yes/No Questions Game Lower Bound? Time N leaves Is there a faster algorithm? Is the third bit 1? If different questions? = # of questions = height = log 2 (N)
The Yes/No Questions Game Lower Bound? N leaves Is there a faster algorithm? If it has a different structure? Best case Worst case Time = # of questions = height = log 2 (N)
The Yes/No Questions Game Lower Bound Theorem: For every question strategy A, with the worst case object I to ask about, log 2 N questions need to be asked to determine one of N objects. A, I, A(I) P(I) or Time(A,I) log 2 N Proof: Prover/Adversary Game
Lower Bound Proof Oh yeah, I have an input I for which it does not. I have an algorithm A that I claim works and is fast. Two cases: # output leaves < N # output leaves N
I win if A on input I gives the wrong output or requires log N questions. I have an algorithm A that I claim works and is fast. Man your algorithm does not have N output leaves. Lower Bound Proof: case 1 I give a input I = 4 with missing output.
I win if A on input I gives the wrong output or requires log N questions. Good now your algorithm has all N outputs as leaves. It must have height log N. I have an algorithm A that I claim works and is fast. I give a input I = 5 at a deep leaf. Lower Bound Proof: case 2
The Yes/No Questions Game Lower Bound Theorem: For every question strategy A, with the worst case object I to ask about, log 2 N questions need to be asked to determine one of N objects. A, I, A(I) P(I) or Time(A,I) log 2 N Proof: Prover/Adversary Game End of Proof.
Communication Complexity Or a obj a set of N objs. Time is the number of bits sent in the worst case. I send you a stream of bits. I determine the object. I choose a number [1..N].
Communication Complexity Upper Bound 6 Great!
Communication Complexity Lower Bound Theorem: For every communication strategy A, with the worst case object I to communicate, log 2 N bits need to transmitted to communicate one of N objects. A, I, A(I) P(I) or Time(A,I) log 2 N
The Sorting Game I choose a permutation of {1,2,3,…,N}. I ask you yes/no questions. Time is the number of questions asked in the worst case. I answer. I determine the permuation.
Sorting Upper Bound b,c,a Great!
Time N! leaves = # of questions = height = log 2 (N!) Sorting Upper Bound
N! = 1 × 2 × 3 × … × N / 2 × … × N N factors each at most N. N / 2 factors each at least N / 2. N N N / 2 N/2N/2 Bounding log(N!) N / 2 log( N / 2 ) log(N!) N log(N) = (N log(N)).
Time N! leaves = # of questions = height = log 2 (N!) = (N log(N)). Is there a faster algorithm? Is the third bit of c 1? If different questions? Sorting Lower Bound
Is there a faster algorithm? What if a different model of computation? class InsertionSortAlgorithm { for (int i = 1; i < A.length; i++) { int j = i; int B = A[i]; while ((j > 0) && (A[j-1] > A[i])) { A[j] = A[j-1]; j--; } A[j] = B; }} Input: A = [a, b, c]Output in increasing order. Sorting - Lower Bound
I choose a permutation of {1,2,3,…,N}. I answer. I determine the permuation. class InsertionSortAlgorithm { for (int i = 1; i < A.length; i++) { int j = i; int B = A[i]; while ((j > 0) && (A[j-1] > A[i])) { A[j] = A[j-1]; j--; } A[j] = B; }} I ask you this yes/no question. I run my algorithm until I need to know something about the input permutation. Sorting with Alice & Bob
Theorem: For every sorting algorithm A, with the worst case input instance I, (N log 2 N) comparisons (or other bit operations) need to be executed to sort N objects. A, I, A(I) P(I) or Time(A,I) N log 2 N Proof: Prover/Adversary Game Sorting - Lower Bound
Lower Bound Proof Oh yeah, I have an input I for which it does not. I have an algorithm A that I claim works and is fast. Two cases: # output leaves < N! # output leaves N!
I win if A on input I gives the wrong output or requires log N questions. Man your algorithm does not have N! output leaves. Lower Bound Proof: case 1 I have an algorithm A that I claim works and is fast. I give a input I = with missing output. b,a,c
I win if A on input I gives the wrong output or requires log N! questions. Good now your algorithm has all N! outputs as leaves. It must have height log N!. I have an algorithm A that I claim works and is fast. Lower Bound Proof: case 2 = (N log(N)). I give a input I = at a deep leaf.
Sorting Lower Bound Theorem: For every sorting algorithm A, on the worst case input instance I, (N log 2 N) comparisons (or other bit operations) need to be executed to sort N objects. A, I, A(I) P(I) or Time(A,I) N log 2 N Proof: Prover/Adversary Game End of Proof.
End
Problem 1: Replace... while x!=0 AND y!=0 AND z!=0 do with... while x!=0 OR y!=0 OR z!=0 do Now available on website. DUE: February 13th CORRECTION: Don’t forget.... Assignment 2