Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Structures Haim Kaplan & Uri Zwick December 2013 Sorting 1.

Similar presentations


Presentation on theme: "Data Structures Haim Kaplan & Uri Zwick December 2013 Sorting 1."— Presentation transcript:

1 Data Structures Haim Kaplan & Uri Zwick December 2013 Sorting 1

2 Comparison based sorting info key a1a1 a2a2 anan Input: An array containing n items Keys belong to a totally ordered domain Two keys can be compared in O(1) time Output: The array with the items reordered so that a 1 ≤ a 2 ≤ … ≤ a n “in-place sorting” info may contain initial position

3 Comparison based sorting Insertion sort Bubble sort Balanced search trees Heapsort Merge sort Quicksort O(n 2 ) O(n log n) expected time

4 Warm-up: Insertion sort Worst case O(n 2 ) Best case O(n) Efficient for small values of n

5 Warm-up: Insertion sort Slightly optimized. Worst case still O(n 2 ) Even more efficient for small values of n

6 Warm-up: Insertion sort (Adapted from Bentley’s Programming Peals, Second Edition, p. 116.)

7 7 Insertion sort Bubble sort Select sort Shell sort Merge sort Quicksort AlgoRythmics

8 8 Quicksort [Hoare (1961)] Winner of the 1980 Turing award “One of the 10 algorithms with the greatest influence on the development and practice of science and engineering in the 20th century.”

9 9 Quicksort < A[p] ≥ A[p]

10 10 < A[r] ≥ A[r] If A[j]  A[r] < A[r] ≥ A[r] partition

11 11 < A[r] ≥ A[r] < A[r] ≥ A[r] If A[j] < A[r] partition

12 12 p r < A[r] ≥ A[r] Lomuto’s partition

13 13 28713564 partition 28713564 28713564 28713564 21783564 Use last key as pivot i – last key < A[r] (Is it a good choice?) j – next key to inspect

14 14 21783564 i j 21387564 i j 21387564 i j 21387564 i j 21347568 i j Move pivot into position

15 15 ≤ A[r]≤ A[r] ≥ A[r] Hoare’s partition Performs less swaps than Lomuto’s partition Produces a more balanced partition when keys contain repetitions. Used in practice

16 16 Hoare’s partition ≤ A[r]≤ A[r] ≥ A[r] A[i] < A[r] ≤ A[r]≤ A[r] ≥ A[r]

17 17 Hoare’s partition ≤ A[r]≤ A[r] ≥ A[r] A[j] > A[r] ≤ A[r]≤ A[r] ≥ A[r]

18 18 Hoare’s partition ≤ A[r]≤ A[r] ≥ A[r] A[i]  A[r], A[j] ≤ A[r] ≤ A[r]≤ A[r] ≥ A[r]

19 19 Analysis of quicksort Best case: n  (n−1)/2, 1, (n − 1)/2 Worst case: n  n−1, 1, 0 Average case: n  i−1, 1, n−i where i is chosen randomly from {1,2,…,n} Worst case obtained when array is sorted… Average case obtained when array is in random order Let C n be the number of comparisons performed

20 20 Best case of quicksort By easy induction

21 21 Best case of quicksort …

22 22 “Fairly good” case of quicksort …

23 23 Worst case of quicksort By easy induction

24 24 … Worst case of quicksort Obtained when array is sorted… Worst case is really bad

25 25 How do we avoid the worst case? Use a random item as pivot Running time is now a random variable For any input, bad behavior is extremely unlikely For simplicity, we consider the expected running time, or more precisely, expected number of comparisons “Average case” now obtained for any input

26 26 Randomized quicksort (How do we generate random numbers?)

27 27 Analysis of (rand-)quicksort using recurrence relations P2C2E (Actually, not that complicated)

28 28 Analysis of (rand-)quicksort

29 29 Analysis of (rand-)quicksort Proof by induction on the size of the array Let the input keys be z 1 < z 2 < … < z n Basis: If n=2, then i=1 and j=2, and the probability that z 1 and z 2 are compared is indeed 1

30 30 Analysis of (rand-)quicksort Let z k be the chosen pivot key Induction step: Suppose result holds for all arrays of size < n The probability that z i and z j are compared, given that z k is the pivot element

31 31 Analysis of (rand-)quicksort Let z k be the chosen pivot key If k<i, both z i and z j will be in the right sub-array, without being compared during the partition. In the right sub-array they are now z’ i  k and z’ j  k. If k>j, both z i and z j will be in the left sub-array, without being compared during the partition. In the left sub-array they are now z’ i and z’ j. If k=i or k=j, then z i and z j are compared If i<k<j, then z i and z j are not compared

32 32 Analysis of (rand-)quicksort (by induction)

33 33 Analysis of (rand-)quicksort

34 34 Analysis of (rand-)quicksort Exact version

35 35 Lower bound for comparison-based sorting algorithms

36 36 Sorting algorithm Items to be sorted a 1, a 2, …, a n The comparison model The only access that the algorithm has to the input is via comparisons i : j<

37 comparison-based sorting algorithm comparison tree

38 Insertion sort x:y y:z < < x:z > > y:z > < > < > xyzxyzyxzxyzyxzyzxyzxzyxxzyzxyxzy

39 Quicksort x:z y:z < < > x:y > > < > < > < xyzxyzxyzyxzxzyyzxzxyzxyzyx

40 40 Comparison trees Every comparison-based sorting algorithm can be converted into a comparison tree. Comparison trees are binary trees The comparison tree of a (correct) sorting algorithm has n! leaves. (Note: the size of a comparison tree is huge. We are only using comparison trees in proofs.)

41 41 Comparison trees A run of the sorting algorithm corresponds to a root-leaf path in the comparison tree Maximum number of comparisons is therefore the height of the tree Average number of comparisons, over all input orders, is the average depth of leaves

42 42 Depth and average depth 1 2 33 Height = 3 (maximal depth of leaf) Average depth of leaves = (1+2+3+3)/4 = 9/4

43 43 Maximum and average depth of trees Lemma 2, of course, implies Lemma 1 Lemma 1 is obvious: a tree of depth k contains at most 2 k leaves

44 44 Average depth of trees Proof by induction (by induction) (by convexity of x log x)

45 45 Convexity

46 46 Lower bounds Theorem 1: Any comparison-based sorting algorithm must perform at least log 2 (n!) comparisons on some input. Theorem 2: The average number of comparisons, over all input orders, performed by any comparison- based sorting algorithm is at least log 2 (n!).

47 47 Stirling formula

48 48 Approximating sums by integrals f increasing

49 49 Randomized algorithms The lower bounds we proved so far apply only to deterministic algorithms Maybe there is a randomized comparison-based algorithm that performs an expected number of o(n log n) comparisons on any input?

50 50 Randomized algorithms A randomized algorithm R may be viewed as a probability distribution over deterministic algorithms (Perform all the random choices in advance) R: Run D i with probability p i, for 1 ≤ i ≤ N

51 51 Notation R(x) - number of comparisons performed by R on input x (random variable) R: Run D i with probability p i, for 1 ≤ i ≤ N D i (x) - number of comparisons performed by D i on input x (number)

52 R: Run D i with probability p i, for 1 ≤ i ≤ N More notation + Important observation

53 53 Randomized algorithms If the expected number of comparisons performed by R is at most f(n) for every input x, then the expected number of comparisons performed by R on a random input is also at most f(n) That means that there is also a deterministic algorithms D i whose expected number of comparisons on a random input is at most f(n) Thus f(n) =  (n log n)

54 54 Randomized algorithms

55 55 Lower bounds Theorem 1: Any comparison-based sorting algorithm must perform at least log 2 (n!) comparisons on some input. Theorem 2: The average number of comparisons, over all input orders, performed by any comparison- based sorting algorithm is at least log 2 (n!). Theorem 3: Any randomized comparison-based sorting algorithm must perform an expected number of at least log 2 (n!) comparisons on some input.

56 56 Beating the lower bound We can beat the lower bound if we can deduce order relations between keys not by comparisons Examples: Count sort Radix sort

57 Count sort Assume that keys are integers between 0 and R  1 57 2305350 20 A 012345678

58 Allocate a temporary array of size R: cell i counts the # of keys = i 58 2305350 25 A 000000 C Count sort 012345678 012345

59 59 2305350 25 A 001000 C Count sort 012345678 012345

60 60 2305350 25 A 001100 C Count sort 012345678 012345

61 61 2305350 25 A 101100 C Count sort 012345678 012345

62 62 2305350 25 A 202203 C Count sort 012345678 012345

63 63 2305350 25 A 202203 C Compute the prefix sums of C: cell i now holds the # of keys ≤ i Count sort 012345678 012345

64 64 2305350 25 A 224669 C Count sort Compute the prefix sums of C: cell i now holds the # of keys ≤ i 012345678 012345

65 65 2305350 25 A 224669 C Move items to output array /////// // B Count sort 012345678 012345678 012345

66 66 2305350 25 A 224669 C /////// // B Count sort 012345678 012345678 012345

67 67 2305350 25 A 224668 C /////// /5 B Count sort 012345678 012345678 012345

68 68 2305350 25 A 223668 C ///2/// /5 B Count sort 012345678 012345678 012345

69 69 2305350 25 A 123668 C /0/2/// /5 B Count sort 012345678 012345678 012345

70 70 2305350 25 A 123667 C /0/2/// 55 B Count sort 012345678 012345678 012345

71 71 2305350 25 A 123567 C /0/2/3/ 55 B Count sort 012345678 012345678 012345

72 72 2305350 25 A 022466 C 002233555 B Count sort 012345678 012345678 012345

73 (Adapted from Cormen, Leiserson, Rivest and Stein, Introduction to Algorithms, Third Edition, 2009, p. 195)

74 Complexity: O(n+R) 74 Count sort In particular, we can sort n integers in the range {0,1,…,cn} in O(cn) time Count sort is stable No comparisons performed

75 Stable sorting algorithms info key aaa xyz info key aaa xyz Order of items with same key should be preserved Is quicksort stable? No.

76 Want to sort numbers with d digits each between 0 and R  1 76 2871 4591 6572 1301 2472 3555 7022 8394 4844 3536 Radix sort

77 Use a stable sort, e.g. count sort, to sort by the Least Significant Digit 77 2871 4591 6572 1301 2472 3555 7022 8394 4844 3536 LSD Radix sort

78 78 2871 4591 6572 1301 2472 3555 7022 8394 4844 3536 2871 4591 1301 6572 2472 7022 8394 4844 3555 3536 LSD Radix sort

79 79 2871 4591 6572 1301 2472 3555 7022 8394 4844 3536 2871 4591 1301 6572 2472 7022 8394 4844 3555 3536 LSD Radix sort

80 80 2871 4591 6572 1301 2472 3555 7022 8394 4844 3536 2871 4591 1301 6572 2472 7022 8394 4844 3555 3536 1301 7022 3536 4844 3555 2871 6572 2472 4591 8394 LSD Radix sort

81 81 2871 4591 6572 1301 2472 3555 7022 8394 4844 3536 2871 4591 1301 6572 2472 7022 8394 4844 3555 3536 1 3 01 7 0 22 3 5 36 4 8 44 3 5 55 2 8 71 6 5 72 2 4 72 4591 8394 LSD Radix sort

82 82 2871 4591 6572 1301 2472 3555 7022 8394 4844 3536 2871 4591 1301 6572 2472 7022 8394 4844 3555 3536 1 3 01 7 0 22 3 5 36 4 8 44 3 5 55 2 8 71 6 5 72 2 4 72 4591 8394 7 0 22 1 3 01 8 3 94 2 4 72 3 5 36 3 5 55 6 5 72 4 5 91 4844 2871 LSD Radix sort

83 83 2871 4591 6572 1301 2472 3555 7022 8394 4844 3536 2871 4591 1301 6572 2472 7022 8394 4844 3555 3536 1 3 01 7 0 22 3 5 36 4 8 44 3 5 55 2 8 71 6 5 72 2 4 72 4591 8394 7 0 22 1 3 01 8 3 94 2 4 72 3 5 36 3 5 55 6 5 72 4 5 91 4844 2871 LSD Radix sort

84 84 2871 4591 6572 1301 2472 3555 7022 8394 4844 3536 2871 4591 1301 6572 2472 7022 8394 4844 3555 3536 1 3 01 7 0 22 3 5 36 4 8 44 3 5 55 2 8 71 6 5 72 2 4 72 4591 8394 7 0 22 1 3 01 8 3 94 2 4 72 3 5 36 3 5 55 6 5 72 4 5 91 4844 2871 1 3 01 2 4 72 2 8 71 3 5 36 3 5 55 4 5 91 4 8 44 6 5 72 7022 8394 LSD Radix sort

85 85 LSD Radix sort Complexity: O(d(n+R)) In particular, we can sort n integers in the range {0,1,…, n d  1} in O(dn) time (View each number as a d digit number in base n) In practice, choose R to be a power of two Edge digit extracted using simple bit operations

86 86 Extracting digits In R=2 r, the operation is especially efficient: r bits

87 87 Word-RAM model Each machine word holds w bits In constant time, we can perform any “usual” operation on two machine words, e.g., addition, multiplication, logical operations, shifts, etc. Open problem: Can we sort n words in O(n) time?


Download ppt "Data Structures Haim Kaplan & Uri Zwick December 2013 Sorting 1."

Similar presentations


Ads by Google