Sorting We have actually seen already two efficient ways to sort:

Slides:



Advertisements
Similar presentations
Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
Advertisements

CS 3343: Analysis of Algorithms Lecture 14: Order Statistics.
Introduction to Algorithms
Introduction to Algorithms Jiafen Liu Sept
Spring 2015 Lecture 5: QuickSort & Selection
September 19, Algorithms and Data Structures Lecture IV Simonas Šaltenis Nykredit Center for Database Research Aalborg University
Sorting Heapsort Quick review of basic sorting methods Lower bounds for comparison-based methods Non-comparison based sorting.
CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu.
Analysis of Algorithms CS 477/677
Tirgul 4 Order Statistics Heaps minimum/maximum Selection Overview
David Luebke 1 7/2/2015 Medians and Order Statistics Structures for Dynamic Sets.
Order Statistics The ith order statistic in a set of n elements is the ith smallest element The minimum is thus the 1st order statistic The maximum is.
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Analysis of Algorithms CS 477/677
September 29, Algorithms and Data Structures Lecture V Simonas Šaltenis Aalborg University
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 9.
Deterministic and Randomized Quicksort Andreas Klappenecker.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 7.
1 Medians and Order Statistics CLRS Chapter 9. upper median lower median The lower median is the -th order statistic The upper median.
Data Structures Haim Kaplan & Uri Zwick December 2013 Sorting 1.
COSC 3101A - Design and Analysis of Algorithms 4 Quicksort Medians and Order Statistics Many of these slides are taken from Monica Nicolescu, Univ. of.
1 Sorting an almost sorted input Suppose we know that the input is “almost” sorted Let I be the number of “inversions” in the input: The number of pairs.
CSC317 1 Quicksort on average run time We’ll prove that average run time with random pivots for any input array is O(n log n) Randomness is in choosing.
Lower Bounds & Sorting in Linear Time
Sorting.
CSC317 Selection problem q p r Randomized‐Select(A,p,r,i)
Fundamental Data Structures and Algorithms
DAST Tirgul 7.
Quick Sort Divide: Partition the array into two sub-arrays
Order Statistics Comp 122, Spring 2004.
ליאור שפירא, חיים קפלן וחברים
Algorithms and Data Structures Lecture VI
Randomized Algorithms
Description Given a linear collection of items x1, x2, x3,….,xn
Quick-Sort 11/14/2018 2:17 PM Chapter 4: Sorting    7 9
Sorting We have actually seen already two efficient ways to sort:
Quick-Sort 11/19/ :46 AM Chapter 4: Sorting    7 9
Keys into Buckets: Lower bounds, Linear-time sort, & Hashing
Ch 6: Heapsort Ming-Te Chi
CO 303 Algorithm Analysis And Design Quicksort
Ch 7: Quicksort Ming-Te Chi
Randomized Algorithms
Data Structures Review Session
Lecture 3 / 4 Algorithm Analysis
Medians and Order Statistics
CS 3343: Analysis of Algorithms
Data Structures Sorting Haim Kaplan & Uri Zwick December 2014.
Sub-Quadratic Sorting Algorithms
Lower Bounds & Sorting in Linear Time
CS 583 Analysis of Algorithms
Design and Analysis of Algorithms
Quick-Sort 2/23/2019 1:48 AM Chapter 4: Sorting    7 9
CS 3343: Analysis of Algorithms
Topic 5: Heap data structure heap sort Priority queue
Tomado del libro Cormen
Chapter 7 Quicksort.
Chapter 9: Medians and Order Statistics
Order Statistics Comp 122, Spring 2004.
Chapter 9: Selection of Order Statistics
CS 583 Analysis of Algorithms
The Selection Problem.
Quicksort and Randomized Algs
Design and Analysis of Algorithms
Algorithms CSCI 235, Spring 2019 Lecture 26 Midterm 2 Review
CS200: Algorithm Analysis
Sorting We have actually seen already two efficient ways to sort:
Analysis of Algorithms CS 477/677
Presentation transcript:

Sorting We have actually seen already two efficient ways to sort:

A kind of “insertion” sort Insert the elements into a red-black tree one by one Traverse the tree in in-order and collect the keys Takes O(nlog(n)) time

Heapsort (Willians, Floyd, 1964) Put the elements in an array Make the array into a heap Do a deletemin and put the deleted element at the last position of the array

Put the elements in the heap 79 65 26 24 19 15 29 23 33 40 7 Q 79 65 26 24 19 15 29 23 33 40 7

Make the elements into a heap 79 65 26 24 19 15 29 23 33 40 7 Q 79 65 26 24 19 15 29 23 33 40 7

Make the elements into a heap Heapify-down(Q,4) 79 65 26 24 19 15 29 23 33 40 7 Q 79 65 26 24 19 15 29 23 33 40 7

Heapify-down(Q,4) 79 65 26 24 7 15 29 23 33 40 19 Q 79 65 26 24 7 15 29 23 33 40 19

Heapify-down(Q,3) 79 65 26 24 7 15 29 23 33 40 19 Q 79 65 26 24 7 15 29 23 33 40 19

Heapify-down(Q,3) 79 65 26 23 7 15 29 24 33 40 19 Q 79 65 26 23 7 15 29 24 33 40 19

Heapify-down(Q,2) 79 65 26 23 7 15 29 24 33 40 19 Q 79 65 26 23 7 15 29 24 33 40 19

Heapify-down(Q,2) 79 65 15 23 7 26 29 24 33 40 19 Q 79 65 15 23 7 26 29 24 33 40 19

Heapify-down(Q,1) 79 65 15 23 7 26 29 24 33 40 19 Q 79 65 15 23 7 26 29 24 33 40 19

Heapify-down(Q,1) 79 7 15 23 65 26 29 24 33 40 19 Q 79 7 15 23 65 26 29 24 33 40 19

Heapify-down(Q,1) 79 7 15 23 19 26 29 24 33 40 65 Q 79 7 15 23 19 26 29 24 33 40 65

Heapify-down(Q,0) 79 7 15 23 19 26 29 24 33 40 65 Q 79 7 15 23 19 26 29 24 33 40 65

Heapify-down(Q,0) 7 79 15 23 19 26 29 24 33 40 65 Q 7 79 15 23 19 26 29 24 33 40 65

Heapify-down(Q,0) 7 19 15 23 79 26 29 24 33 40 65 Q 7 19 15 23 79 26 29 24 33 40 65

Heapify-down(Q,0) 7 19 15 23 40 26 29 24 33 79 65 Q 7 19 15 23 40 26 29 24 33 79 65

Summery We can build the heap in linear time (we already did this analysis) We still have to deletemin the elements one by one in order to sort that will take O(nlog(n))

Quicksort (Hoare 1961)

quicksort Input: an array A[p, r] Quicksort (A, p, r) if (p < r) then q = Partition (A, p, r) //q is the position of the pivot element Quicksort (A, p, q-1) Quicksort (A, q+1, r)

p r i j 2 8 7 1 3 5 6 4 2 8 7 1 3 5 6 4 i j 2 8 7 1 3 5 6 4 i j 2 8 7 1 3 5 6 4 i j 2 1 7 8 3 5 6 4 i j

2 1 7 8 3 5 6 4 i j 2 1 3 8 7 5 6 4 i j 2 1 3 8 7 5 6 4 i j 2 1 3 8 7 5 6 4 i j 2 1 3 4 7 5 6 8 i j

2 8 7 1 3 5 6 4 p r Partition(A, p, r) x ←A[r] i ← p-1 for j ← p to r-1 do if A[j] ≤ x then i ← i+1 exchange A[i] ↔ A[j] exchange A[i+1] ↔A[r] return i+1

Analysis Running time is proportional to the number of comparisons Each pair is compared at most once  O(n2) In fact for each n there is an input of size n on which quicksort takes cn2  Ω(n2)

But Assume that the split is even in each iteration

T(n) = 2T(n/2) + bn How do we solve linear recurrences like this ? (read Chapter 4)

Recurrence tree bn T(n/2) T(n/2)

Recurrence tree bn bn/2 bn/2 T(n/4) T(n/4) T(n/4) T(n/4)

Recurrence tree bn bn/2 bn/2 logn T(n/4) T(n/4) T(n/4) T(n/4) In every level we do bn comparisons So the total number of comparisons is O(nlogn)

Observations We can’t guarantee good splits But intuitively on random inputs we will get good splits

Randomized quicksort Use randomized-partition rather than partition Randomized-partition (A, p, r) i ← random(p,r) exchange A[r] ↔ A[i] return partition(A,p,r)

On the same input we will get a different running time in each run ! Look at the average for one particular input of all these running times

Expected # of comparisons Let X be the expected # of comparisons This is a random variable Want to know E(X)

Expected # of comparisons Let z1,z2,.....,zn the elements in sorted order Let Xij = 1 if zi is compared to zj and 0 otherwise So,

by linearity of expectation

by linearity of expectation

Consider zi,zi+1,.......,zj ≡ Zij Claim: zi and zj are compared  either zi or zj is the first chosen in Zij Proof: 3 cases: {zi, …, zj} Compared on this partition, and never again. {zi, …, zj} the same {zi, …, zk, …, zj} Not compared on this partition. Partition separates them, so no future partition uses both.

Pr{zi is compared to zj} = Pr{zi or zj is first pivot chosen from Zij} just explained = Pr{zi is first pivot chosen from Zij} + Pr{zj is first pivot chosen from Zij} mutually exclusive possibilities = 1/(j-i+1) + 1/(j-i+1) = 2/(j-i+1)

Simplify with a change of variable, k=j-i+1. Simplify and overestimate, by adding terms.

Lower bound for sorting in the comparison model

A lower bound Comparison model: We assume that the operation from which we deduce order among keys are comparisons Then we prove that we need Ω(nlogn) comparisons on the worst case

Model the algorithm as a decision tree 1 2 1 1 2 2 1 3 1 2 3 2 1 3

Important Observations Every algorithm can be represented as a (binary) tree like this Each path corresponds to a run on some input The worst case # of comparisons corresponds to the longest path

The lower bound Let d be the length of the longest path n! ≤ #leaves ≤ 2d log2(n!) ≤d

Lower Bound for Sorting Any sorting algorithm based on comparisons between elements requires (N log N) comparisons.

Beating the lower bound We can beat the lower bound if we can deduce order relations between keys not by comparisons Examples: Count sort Radix sort

Linear time sorting Or assume something about the input: random, “almost sorted”

Sorting an almost sorted input Suppose we know that the input is “almost” sorted Let I be the number of “inversions” in the input: The number of pairs ai,aj such that i<j and ai>aj

Example 1, 4 , 5 , 8 , 3 I=3 8, 7 , 5 , 3 , 1 I=10

Think of “insertion sort” using a list When we insert the next item ak, how deep it gets into the list? As the number of inversions ai,ak for i < k lets call this Ik

Analysis The running time is:

Thoughts When I=Ω(n2) the running time is Ω(n2) But we would like it to be O(nlog(n)) for any input, and faster whan I is small

Finger red black trees

Finger tree Take a regular search tree and reverse the direction of the pointers on the rightmost spine We go up from the last leaf until we find the subtree containing the item and we descend into it

Finger trees Say we search for a position at distance d from the end Then we go up to height O(log(d)) So search for the dth position takes O(log(d)) time Insertions and deletions still take O(log n) worst case time but O(log(d)) amortized time

Back to sorting Suppose we implement the insertion sort using a finger search tree When we insert item k then d=O(Ik) and it take O(log(Ik)) time

Analysis The running time is: Since ∑Ij = I this is at most

Selection Find the kth element

Randomized selection Randomized-select (A, p, r,k) if p=r then return A[p] q←randomized-partition(A,p,r) j ← q-p+1 if j=k then return A[q] else if k < j then return randomized-select(A,p,q-1,k) else return randomized-select(A,q+1,r,k-j)

Expected running time With probability 1/n, A[p,q] contains exactly k elements, for k=1,2,…,n

Assume n is even

In general

Solve by “substitution” Assume T(k) ≤ ck for k < n, and prove T(n) ≤ cn

Solve by “substitution”

Choose c ≥4a

Selection in linear worst case time Blum, Floyd, Pratt, Rivest, and Tarjan (1973)

5-tuples 6 2 9 5 1

Sort the tuples 9 6 5 2 1

Recursively find the median of the medians 9 6 5 2 1

Recursively find the median of the medians 9 6 5 7 10 1 3 2 11 2 1

Recursively find the median of the medians 9 6 5 7 10 1 3 2 11 2 1

Partition around the median of the medians 5 Continue recursively with the side that contains the kth element

Neither side can be large 5 ≤ ¾n ≤ ¾n

The reason ≥ 9 6 1 3 2 5 7 10 11 2 1

The reason 9 6 1 3 2 5 7 10 11 2 1 ≤

Analysis

Order statistics, a dynamic version rank and select

The dictionary ADT Insert(x,D) Delete(x,D) Find(x,D): Returns a pointer to x if x ∊ D, and a pointer to the successor or predecessor of x if x is not in D

Suppose we want to add to the dictionary ADT Select(k,D): Returns the kth element in the dictionary: An element x such that k-1 elements are smaller than x

Select(5,D) 89 90 19 20 21 4 26 34 67 70 73 77

Select(5,D) 89 90 19 20 21 4 26 34 67 70 73 77

Can we still use a red-black tree ? 4 19 20 21 26 34 67 70 73 77 89 90

For each node v store # of leaves in the subtree of v 12 4 8 2 2 4 4 2 2 2 2 4 19 20 21 26 34 67 70 73 77 89 90

Select(7,T) 12 4 8 2 2 4 4 2 2 2 2 4 19 20 21 26 34 67 70 73 77 89 90

Select(7,T) 12 Select(3, ) 4 8 2 2 4 4 2 2 2 2 4 19 20 21 26 34 67 70 73 77 89 90

Select(7,T) 12 4 8 Select(3, ) 2 2 4 4 2 2 2 2 4 19 20 21 26 34 67 70 73 77 89 90

Select(7,T) 12 4 8 2 2 4 4 Select(1,) 2 2 2 2 4 19 20 21 26 34 67 70 73 77 89 90

Select(i,T) Select(i,T): Select(i,root(T)) Select(k,v): if k = 1 then return v.left if k = 2 then return v.right if k ≤ (v.left).size then return Select(k,v.left) else return Select(k – (v.left).size),v.right) O(logn) worst case time

Rank(x,T) Return the index of x in T

Rank(x,T) x Need to return 9

Sum up the sizes of the subtrees to the left of the path 12 4 8 2 2 4 4 2 2 2 2 4 19 20 21 26 34 67 70 73 77 89 90 x Sum up the sizes of the subtrees to the left of the path

Rank(x,T) Write the p-code

Insertion and deletions Consider insertion, deletion is similar

Insert 12 8 4 2

Insert (cont) 13 9 5 3 2

Easy to maintain through rotations x y <===> y C A x A B B C size(x) ← size(B) + size(C) size(y) ← size(A) + size(x)

Summary Insertion and deletion and other dictionary operations still take O(log n) time