Data Structures Haim Kaplan & Uri Zwick December 2013 Sorting 1.

Slides:



Advertisements
Similar presentations
Algorithms Analysis Lecture 6 Quicksort. Quick Sort Divide and Conquer.
Advertisements

Theory of Computing Lecture 3 MAS 714 Hartmut Klauck.
Analysis of Algorithms CS 477/677 Linear Sorting Instructor: George Bebis ( Chapter 8 )
CSE 3101: Introduction to the Design and Analysis of Algorithms
Sorting Comparison-based algorithm review –You should know most of the algorithms –We will concentrate on their analyses –Special emphasis: Heapsort Lower.
MS 101: Algorithms Instructor Neelima Gupta
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Dan Grossman Spring 2010.
§7 Quicksort -- the fastest known sorting algorithm in practice 1. The Algorithm void Quicksort ( ElementType A[ ], int N ) { if ( N < 2 ) return; pivot.
Lower bound for sorting, radix sort COMP171 Fall 2005.
Using Divide and Conquer for Sorting
1 Sorting Problem: Given a sequence of elements, find a permutation such that the resulting sequence is sorted in some order. We have already seen: –Insertion.
CS 171: Introduction to Computer Science II Quicksort.
Lower bound for sorting, radix sort COMP171 Fall 2006.
1 SORTING Dan Barrish-Flood. 2 heapsort made file “3-Sorting-Intro-Heapsort.ppt”
Lecture 5: Linear Time Sorting Shang-Hua Teng. Sorting Input: Array A[1...n], of elements in arbitrary order; array size n Output: Array A[1...n] of the.
CS 253: Algorithms Chapter 8 Sorting in Linear Time Credit: Dr. George Bebis.
Sorting Heapsort Quick review of basic sorting methods Lower bounds for comparison-based methods Non-comparison based sorting.
Comp 122, Spring 2004 Lower Bounds & Sorting in Linear Time.
Lecture 25 Selection sort, reviewed Insertion sort, reviewed Merge sort Running time of merge sort, 2 ways to look at it Quicksort Course evaluations.
Data Structures, Spring 2006 © L. Joskowicz 1 Data Structures – LECTURE 4 Comparison-based sorting Why sorting? Formal analysis of Quick-Sort Comparison.
CSC 2300 Data Structures & Algorithms March 27, 2007 Chapter 7. Sorting.
CPSC 411, Fall 2008: Set 2 1 CPSC 411 Design and Analysis of Algorithms Set 2: Sorting Lower Bound Prof. Jennifer Welch Fall 2008.
Lecture 5: Master Theorem and Linear Time Sorting
CSE 326: Data Structures Sorting Ben Lerner Summer 2007.
Sorting Lower Bound Andreas Klappenecker based on slides by Prof. Welch 1.
Analysis of Algorithms CS 477/677
1 Today’s Material Lower Bounds on Comparison-based Sorting Linear-Time Sorting Algorithms –Counting Sort –Radix Sort.
DAST 2005 Week 4 – Some Helpful Material Randomized Quick Sort & Lower bound & General remarks…
David Luebke 1 7/2/2015 Linear-Time Sorting Algorithms.
1 CSE 326: Data Structures: Sorting Lecture 16: Friday, Feb 14, 2003.
Lower Bounds for Comparison-Based Sorting Algorithms (Ch. 8)
Computer Algorithms Lecture 11 Sorting in Linear Time Ch. 8
Data Structure & Algorithm Lecture 7 – Linear Sort JJCAO Most materials are stolen from Prof. Yoram Moses’s course.
Sorting in Linear Time Lower bound for comparison-based sorting
CSE 373 Data Structures Lecture 15
Ch. 8 & 9 – Linear Sorting and Order Statistics What do you trade for speed?
1 Time Analysis Analyzing an algorithm = estimating the resources it requires. Time How long will it take to execute? Impossible to find exact value Depends.
Sorting HKOI Training Team (Advanced)
HKOI 2006 Intermediate Training Searching and Sorting 1/4/2006.
Jessie Zhao Course page: 1.
David Luebke 1 10/13/2015 CS 332: Algorithms Linear-Time Sorting Algorithms.
CSC 41/513: Intro to Algorithms Linear-Time Sorting Algorithms.
Introduction to Algorithms Jiafen Liu Sept
The Selection Problem. 2 Median and Order Statistics In this section, we will study algorithms for finding the i th smallest element in a set of n elements.
Sorting Fun1 Chapter 4: Sorting     29  9.
CSE332: Data Abstractions Lecture 14: Beyond Comparison Sorting Dan Grossman Spring 2012.
CS 61B Data Structures and Programming Methodology July 28, 2008 David Sun.
Analysis of Algorithms CS 477/677
September 29, Algorithms and Data Structures Lecture V Simonas Šaltenis Aalborg University
Mudasser Naseer 1 11/5/2015 CSC 201: Design and Analysis of Algorithms Lecture # 8 Some Examples of Recursion Linear-Time Sorting Algorithms.
CS 361 – Chapters 8-9 Sorting algorithms –Selection, insertion, bubble, “swap” –Merge, quick, stooge –Counting, bucket, radix How to select the n-th largest/smallest.
COSC 3101A - Design and Analysis of Algorithms 6 Lower Bounds for Sorting Counting / Radix / Bucket Sort Many of these slides are taken from Monica Nicolescu,
1 Algorithms CSCI 235, Fall 2015 Lecture 17 Linear Sorting.
CSCE 411H Design and Analysis of Algorithms Set 10: Lower Bounds Prof. Evdokia Nikolova* Spring 2013 CSCE 411H, Spring 2013: Set 10 1 * Slides adapted.
Linear Sorting. Comparison based sorting Any sorting algorithm which is based on comparing the input elements has a lower bound of Proof, since there.
SORTING AND ASYMPTOTIC COMPLEXITY Lecture 13 CS2110 – Fall 2009.
CS6045: Advanced Algorithms Sorting Algorithms. Sorting So Far Insertion sort: –Easy to code –Fast on small inputs (less than ~50 elements) –Fast on nearly-sorted.
Sorting and Runtime Complexity CS255. Sorting Different ways to sort: –Bubble –Exchange –Insertion –Merge –Quick –more…
What is the runtime of the best possible (comparison based) sorting algorithm? 1.O(log n) 2.O(n) 3.O(n log n) 4.O(n 2 ) 5.None of the above.
Lower Bounds & Sorting in Linear Time
CPSC 411 Design and Analysis of Algorithms
Introduction to Algorithms
Sorting We have actually seen already two efficient ways to sort:
CS200: Algorithm Analysis
Linear Sorting Sorting in O(n) Jeff Chastine.
Data Structures Sorting Haim Kaplan & Uri Zwick December 2014.
Lower Bounds & Sorting in Linear Time
Linear-Time Sorting Algorithms
Lower bound for sorting, radix sort
Sorting We have actually seen already two efficient ways to sort:
Presentation transcript:

Data Structures Haim Kaplan & Uri Zwick December 2013 Sorting 1

Comparison based sorting info key a1a1 a2a2 anan Input: An array containing n items Keys belong to a totally ordered domain Two keys can be compared in O(1) time Output: The array with the items reordered so that a 1 ≤ a 2 ≤ … ≤ a n “in-place sorting” info may contain initial position

Comparison based sorting Insertion sort Bubble sort Balanced search trees Heapsort Merge sort Quicksort O(n 2 ) O(n log n) expected time

Warm-up: Insertion sort Worst case O(n 2 ) Best case O(n) Efficient for small values of n

Warm-up: Insertion sort Slightly optimized. Worst case still O(n 2 ) Even more efficient for small values of n

Warm-up: Insertion sort (Adapted from Bentley’s Programming Peals, Second Edition, p. 116.)

7 Insertion sort Bubble sort Select sort Shell sort Merge sort Quicksort AlgoRythmics

8 Quicksort [Hoare (1961)] Winner of the 1980 Turing award “One of the 10 algorithms with the greatest influence on the development and practice of science and engineering in the 20th century.”

9 Quicksort < A[p] ≥ A[p]

10 < A[r] ≥ A[r] If A[j]  A[r] < A[r] ≥ A[r] partition

11 < A[r] ≥ A[r] < A[r] ≥ A[r] If A[j] < A[r] partition

12 p r < A[r] ≥ A[r] Lomuto’s partition

partition Use last key as pivot i – last key < A[r] (Is it a good choice?) j – next key to inspect

i j i j i j i j i j Move pivot into position

15 ≤ A[r]≤ A[r] ≥ A[r] Hoare’s partition Performs less swaps than Lomuto’s partition Produces a more balanced partition when keys contain repetitions. Used in practice

16 Hoare’s partition ≤ A[r]≤ A[r] ≥ A[r] A[i] < A[r] ≤ A[r]≤ A[r] ≥ A[r]

17 Hoare’s partition ≤ A[r]≤ A[r] ≥ A[r] A[j] > A[r] ≤ A[r]≤ A[r] ≥ A[r]

18 Hoare’s partition ≤ A[r]≤ A[r] ≥ A[r] A[i]  A[r], A[j] ≤ A[r] ≤ A[r]≤ A[r] ≥ A[r]

19 Analysis of quicksort Best case: n  (n−1)/2, 1, (n − 1)/2 Worst case: n  n−1, 1, 0 Average case: n  i−1, 1, n−i where i is chosen randomly from {1,2,…,n} Worst case obtained when array is sorted… Average case obtained when array is in random order Let C n be the number of comparisons performed

20 Best case of quicksort By easy induction

21 Best case of quicksort …

22 “Fairly good” case of quicksort …

23 Worst case of quicksort By easy induction

24 … Worst case of quicksort Obtained when array is sorted… Worst case is really bad

25 How do we avoid the worst case? Use a random item as pivot Running time is now a random variable For any input, bad behavior is extremely unlikely For simplicity, we consider the expected running time, or more precisely, expected number of comparisons “Average case” now obtained for any input

26 Randomized quicksort (How do we generate random numbers?)

27 Analysis of (rand-)quicksort using recurrence relations P2C2E (Actually, not that complicated)

28 Analysis of (rand-)quicksort

29 Analysis of (rand-)quicksort Proof by induction on the size of the array Let the input keys be z 1 < z 2 < … < z n Basis: If n=2, then i=1 and j=2, and the probability that z 1 and z 2 are compared is indeed 1

30 Analysis of (rand-)quicksort Let z k be the chosen pivot key Induction step: Suppose result holds for all arrays of size < n The probability that z i and z j are compared, given that z k is the pivot element

31 Analysis of (rand-)quicksort Let z k be the chosen pivot key If k<i, both z i and z j will be in the right sub-array, without being compared during the partition. In the right sub-array they are now z’ i  k and z’ j  k. If k>j, both z i and z j will be in the left sub-array, without being compared during the partition. In the left sub-array they are now z’ i and z’ j. If k=i or k=j, then z i and z j are compared If i<k<j, then z i and z j are not compared

32 Analysis of (rand-)quicksort (by induction)

33 Analysis of (rand-)quicksort

34 Analysis of (rand-)quicksort Exact version

35 Lower bound for comparison-based sorting algorithms

36 Sorting algorithm Items to be sorted a 1, a 2, …, a n The comparison model The only access that the algorithm has to the input is via comparisons i : j<

comparison-based sorting algorithm comparison tree

Insertion sort x:y y:z < < x:z > > y:z > < > < > xyzxyzyxzxyzyxzyzxyzxzyxxzyzxyxzy

Quicksort x:z y:z < < > x:y > > < > < > < xyzxyzxyzyxzxzyyzxzxyzxyzyx

40 Comparison trees Every comparison-based sorting algorithm can be converted into a comparison tree. Comparison trees are binary trees The comparison tree of a (correct) sorting algorithm has n! leaves. (Note: the size of a comparison tree is huge. We are only using comparison trees in proofs.)

41 Comparison trees A run of the sorting algorithm corresponds to a root-leaf path in the comparison tree Maximum number of comparisons is therefore the height of the tree Average number of comparisons, over all input orders, is the average depth of leaves

42 Depth and average depth Height = 3 (maximal depth of leaf) Average depth of leaves = ( )/4 = 9/4

43 Maximum and average depth of trees Lemma 2, of course, implies Lemma 1 Lemma 1 is obvious: a tree of depth k contains at most 2 k leaves

44 Average depth of trees Proof by induction (by induction) (by convexity of x log x)

45 Convexity

46 Lower bounds Theorem 1: Any comparison-based sorting algorithm must perform at least log 2 (n!) comparisons on some input. Theorem 2: The average number of comparisons, over all input orders, performed by any comparison- based sorting algorithm is at least log 2 (n!).

47 Stirling formula

48 Approximating sums by integrals f increasing

49 Randomized algorithms The lower bounds we proved so far apply only to deterministic algorithms Maybe there is a randomized comparison-based algorithm that performs an expected number of o(n log n) comparisons on any input?

50 Randomized algorithms A randomized algorithm R may be viewed as a probability distribution over deterministic algorithms (Perform all the random choices in advance) R: Run D i with probability p i, for 1 ≤ i ≤ N

51 Notation R(x) - number of comparisons performed by R on input x (random variable) R: Run D i with probability p i, for 1 ≤ i ≤ N D i (x) - number of comparisons performed by D i on input x (number)

R: Run D i with probability p i, for 1 ≤ i ≤ N More notation + Important observation

53 Randomized algorithms If the expected number of comparisons performed by R is at most f(n) for every input x, then the expected number of comparisons performed by R on a random input is also at most f(n) That means that there is also a deterministic algorithms D i whose expected number of comparisons on a random input is at most f(n) Thus f(n) =  (n log n)

54 Randomized algorithms

55 Lower bounds Theorem 1: Any comparison-based sorting algorithm must perform at least log 2 (n!) comparisons on some input. Theorem 2: The average number of comparisons, over all input orders, performed by any comparison- based sorting algorithm is at least log 2 (n!). Theorem 3: Any randomized comparison-based sorting algorithm must perform an expected number of at least log 2 (n!) comparisons on some input.

56 Beating the lower bound We can beat the lower bound if we can deduce order relations between keys not by comparisons Examples: Count sort Radix sort

Count sort Assume that keys are integers between 0 and R  A

Allocate a temporary array of size R: cell i counts the # of keys = i A C Count sort

A C Count sort

A C Count sort

A C Count sort

A C Count sort

A C Compute the prefix sums of C: cell i now holds the # of keys ≤ i Count sort

A C Count sort Compute the prefix sums of C: cell i now holds the # of keys ≤ i

A C Move items to output array /////// // B Count sort

A C /////// // B Count sort

A C /////// /5 B Count sort

A C ///2/// /5 B Count sort

A C /0/2/// /5 B Count sort

A C /0/2/// 55 B Count sort

A C /0/2/3/ 55 B Count sort

A C B Count sort

(Adapted from Cormen, Leiserson, Rivest and Stein, Introduction to Algorithms, Third Edition, 2009, p. 195)

Complexity: O(n+R) 74 Count sort In particular, we can sort n integers in the range {0,1,…,cn} in O(cn) time Count sort is stable No comparisons performed

Stable sorting algorithms info key aaa xyz info key aaa xyz Order of items with same key should be preserved Is quicksort stable? No.

Want to sort numbers with d digits each between 0 and R  Radix sort

Use a stable sort, e.g. count sort, to sort by the Least Significant Digit LSD Radix sort

LSD Radix sort

LSD Radix sort

LSD Radix sort

LSD Radix sort

LSD Radix sort

LSD Radix sort

LSD Radix sort

85 LSD Radix sort Complexity: O(d(n+R)) In particular, we can sort n integers in the range {0,1,…, n d  1} in O(dn) time (View each number as a d digit number in base n) In practice, choose R to be a power of two Edge digit extracted using simple bit operations

86 Extracting digits In R=2 r, the operation is especially efficient: r bits

87 Word-RAM model Each machine word holds w bits In constant time, we can perform any “usual” operation on two machine words, e.g., addition, multiplication, logical operations, shifts, etc. Open problem: Can we sort n words in O(n) time?