1 Sorting We have actually seen already two efficient ways to sort:

2 A kind of “insertion” sort Insert the elements into a red-black tree one by one Traverse the tree in in-order and collect the keys Takes O(nlog(n)) time

3 Heapsort (Willians, Floyd, 1964) Put the elements in an array Make the array into a heap Do a deletemin and put the deleted element at the last position of the array

4 Quicksort (Hoare 1961)

5 quicksort Input: an array A[p, r] Quicksort (A, p, r) if (p < r) then q = Partition (A, p, r) //q is the position of the pivot element Quicksort (A, p, q-1) Quicksort (A, q+1, r)

6 28713564 i j 28713564 i j 28713564 i j 28713564 i j 21783564 i j p r Last element Pivot = 4 On i and left elements smaller than pivot j explores to right exchange Between i and j greater than pivot

7 21783564 i j 21387564 i j 21387564 i j 21387564 i j 21347568 i j pivot > <=

8 28713564 p r Partition(A, p, r) x ← A[r] i ← p-1 for j ← p to r-1 do if A[j] ≤ x then i ← i+1 exchange A[i] ↔ A[j] exchange A[i+1] ↔ A[r] return i+1 Partition point

9 Analysis Running time is proportional to the number of comparisons Each pair is compared at most once  O(n 2 ) In fact for each n there is an input of size n on which quicksort takes cn 2  Ω(n 2 )

10 But Assume that the split is even in each iteration

11 T(n) = 2T(n/2) + n How do we solve linear recurrences like this ? (read Chapter 4)

12 Recurrence tree T(n/2) n

13 Recurrence tree n/2 n T(n/4)

14 Recurrence tree n/2 n T(n/4) logn In every level we do bn comparisons So the total number of comparisons is O(nlogn)

15 Analysis of 1:9 split

16 Analysis of 1:9 split

17 Observations We can’t guarantee good splits But intuitively on random inputs we will get good splits

18 Randomized quicksort Use randomized-partition rather than partition Randomized-partition (A, p, r) i ← random(p,r) exchange A[r] ↔ A[i] return partition(A,p,r)

19 On the same input we will get a different running time in each run ! Look at the average for one particular input of all these running times

20 Expected # of comparisons Let X be the # of comparisons This is a random variable Want to know E(X)

21 Expected # of comparisons Let z 1,z 2,.....,z n the elements in sorted order Let X ij = 1 if z i is compared to z j and 0 otherwise So, All elements are compared to pivot. At the end of phase the partition puts them in proper sides so will not compare with pivot again.

22 by linearity of expectation

23 by linearity of expectation

24 Consider z i,z i+1,.......,z j ≡ Z ij Claim: z i and z j are compared  either z i or z j is the first chosen (pivot) in Z ij Proof: 3 cases: –{z i, …, z j }Compared on this partition, and never again. –{z i, …, z j }the same –{z i, …, z k, …, z j }Not compared on this partition. Partition separates them, so no future partition uses both.

25 = 1/(j-i+1) + 1/(j-i+1) = 2/(j-i+1) Pr{z i is compared to z j } = Pr{z i or z j is first pivot chosen from Z ij } just explained = Pr{z i is first pivot chosen from Z ij } + Pr{z j is first pivot chosen from Z ij } mutually exclusive possibilities

26 Simplify with a change of variable, k=j-i+1. Simplify and overestimate, by adding terms.

27 Sum 1/k

28 Lower bound for sorting in the comparison model Cannot deal with an algorithm Must deal with the PROBLEM

29 A lower bound Comparison model: We assume that the operation from which we deduce order among keys are comparisons Then we prove that we need Ω(nlogn) comparisons on the worst case

Model the algorithm as a decision tree דוגמה: מיון הכנסה Insertion Sort - איטרציה ה i דואגים שהאלמנטים [A[1],….,A[i נמצאים בסדר יחסי תקין (על ידי החלפות)

Insertion sort 1:2 2:3 < < 1:3 > A[1] < A[2] < A[3] A[2] < A[1] < A[3] 1:3 > 2:3 > < > A[1] < A[3] < A[2] A[3] < A[1] < A[2] < > A[2] < A[3] < A[1] A[3] < A[2] < A[1] A[1] < A[2] Finds the right order A[1] < A[3] A[2] < A[3]

Quicksort 1:3 2:3 < < > A[1] < A[3] < A[2]A[2] < A[3] < A[1] 1:2 > 2:3 > < > A[1] < A[2] < A[3] A[2] < A[1] < A[3] < > A[3] < A[1] < A[2] A[3] < A[2] < A[1] <

33 Important Observations Every comparison algorithm can be represented as a (binary) tree like this Assume that for every node v there is an input on which the algorithm reaches v Then the # of leaves is n!

34 Important Observations Each path corresponds to a run on some input The worst case # of comparisons corresponds to the longest path

35 The lower bound Let d be the length of the longest path #leaves ≤ 2 d n! ≤  log 2 (n!) ≤d Perhaps some orders represented more than once

36 Lower Bound for Sorting Any sorting algorithm based on comparisons between elements requires  (N log N) comparisons.

- אפשר להראות שגם הוא (  (nlogn - צורת ההוכחה: להראות שעץ בינארי k עלים - עומק מסלול ממוצע לפחות logk. הוכחה בשלילה: יהי T הקטן ביותר כך שלא מתקיים. אז ל T בן אחד או שניים: א) אם בן בודד  סתירה לקטן ביותר n 1n עומק ממוצע קטן מ logk k עלים 1n2n n ב) אם שני בנים: מספר העלים בהם הוא k 1 ו- k 2 k 1 <kk-k 1 =k 2 חסם תחתון לזמן ממוצע

אזי ממוצע אורך מסלולים לעלים ב T הוא: - מציאת חסם תחתון לביטוי, ע”י מציאת מינימום שלו (תחת אילוץ k 1 +k 2 =k) פתרון הבעיה נותן מינימום ב k 1 =k 2 

39 Beating the lower bound We can beat the lower bound if we can deduce order relations between keys not by comparisons Examples: Count sort Radix sort

מיונים שראינו עד כה: (O(nlogn האם אפשר לבצע בפחות מ (O(nlogn? ראינו: (  (nlogn אם לא יודעים כלום על המספרים בשלב זה: אם יודעים - אפשר לרדת דוגמה 1: אם יודעים שבמערך [A[1,…,n נמצאים המפתחות n,….,1 אזי מיון לתוך B ב (O(n: B[A[i].key] = A[i] דוגמה 2: Count Sort מערך [A[1],…,A[n, איברים 1,…,k כל איבר מופיע מספר פעמים: 3 2 3 1 4 2 2 5 מיון המערך ע”י: 1. ספירת האיברים מכל סוג 2. כתיבתם במערך תוצאה פרטים: Cormen 175-177 BIN/RADIX SORTING

בתנאים של דוגמה 1, מיון בתוך A: אם A[i].key = j החלף [A[i עם [A[j From i = 1 to n do while A[i].key <> i do swap(A[i], A[A[i].key]) פעולות:(O(n צעדים (O(n החלפות (איבר שנחת במקומו לא יוחלף יותר!) BIN SORTING הינו מיון האיברים לתוך תאים (BINS) ולבסוף- שרשור התאים - דוגמה 1 היא BIN-SORT פשוט גודל BIN קבוע (1) במקרה הכללי גודל משתנה פעולות שנרצה: א) הכנס איבר לתוך BIN ב) חבר שני BIN-ים דוגמה 3

פתרון: 1) כל BIN רשימה מקושרת 2) HEADERS מצביעים על תחילת הרשומה H1 E1 H2 E2 הכנסה: (O(1 שרשור: (O(1 כעת ניתן לשרשר מספר שרירותי של רשימות לתוך n סלים

אנליזה: m - מספר הערכים האפשריים (מספר הסלים) n - מספר המפתחות nסיבוכיות הכנסות = (O(n סיבוכיות שרשורים = (O(m O(m+n) אם מספר המפתחות גדול ממספר הסלים (m < n) O(n) אם מספר המפתחות קטן ממספר הסלים (m > n) למשל m = n 2 O(n 2 ) דוגמה: מיין את המספרים i 2 כאשר i=1,2,..,10 כלומר מיין את 100,....,0,1,4 Hanoch: Sort exams of class with grades xx.yy Hanoch: Sort exams of class with grades xx.yy

פתרון: - הכן n סלים - מיין לפי הספרה הפחות משמעותית - מיין לפי הספרה היותר משמעותית Bin 0 1 2 3 4 5 6 7 8 9 איברים 0 1, 81 - 64, 4 25 36, 16 - 9, 49 Bin 0 1 2 3 4 5 6 7 8 9 איברים 0, 1, 4, 9 16 25 36 49 - 64 - 81 - 0, 1, 81, 64, 4, 25, 36, 16, 9, 49 שרשור

למה עובד? נניח:i = 10a + b, j = 10c + d נניח:i < j  ברור ש - אם a < c אזי שלב שני ישים בסלים המתאימים, והמיון תקין. למה טוב BIN SORT? - תחומים שידועה עליהם אינפורמציה כמו 1,…,n k (קבוע k) - מחרוזת באורך k - אם a = c אזי b < d ולכן: מיון ראשון ימיין בסדר לכן i יכנס לסל לפני j בשלב השני b

האם תמיד טוב? לא אם k מאוד גדול!! דוגמה: n = 100, k = 100 nk :BIN SORT (100 מחזורים של 100 פעולות) מיון אחר: nlogn ו- nk > nlogn אבל… זהירות בהשוואות! במיון רגיל- השוואה = (O(k ולכן יש חשיבות למודל החישובי!!!

- נתונים k מפתחות f 1,…,f k - רוצים למיין בסדר לכסיקוגרפי כלומר: (a 1,…,a k ) < (b 1,…,b k ) אמ”ם: 1) a 1 < b 1 2) או a 1 = b 1, a 2 < b 2 a 1 = b 1,…., a k-1 = b k-1, a k = b k (k דומה ל BIN SORT מוכלל, רק צריך לכל סוג מפתחות את תחום הסלים שלו. RADIX SORT

48 Linear time sorting Or assume something about the input: random, “almost sorted”

49 Sorting an almost sorted input Suppose we know that the input is “almost” sorted Let I be the number of “inversions” in the input: The number of pairs a i,a j such that i a j

50 Example 1, 4, 5, 8, 3 I=3 8, 7, 5, 3, 1 I=10

51 Think of “insertion sort” using a list When we insert the next item a k, how deep it gets into the list? As the number of inversions a i,a k for i < k lets call this I k

52 Analysis The running time is:

53 Thoughts When I=Ω(n 2 ) the running time is Ω(n 2 ) But we would like it to be O(nlog(n)) for any input, and faster when I is small Slides got updated

54 Finger red black trees

55 Finger tree Take a regular search tree and reverse the direction of the pointers on the rightmost spine We go up from the last leaf until we find the subtree containing the item and we descend into it

56 Finger trees Say we search for a position at distance d from the end Then we go up to height O(log(d)) Insertions and deletions still take O(log n) worst case time But: Amortized time : Tree modification = O(1) Search = O(log d) ( contribution of this transaction) So search for the d th position takes O(log(d)) time

57 Back to sorting Suppose we implement the insertion sort using a finger search tree Insert one by one from the input If most elements are sorted – then elements enter at right corner. When we insert item k then d=O(I k ) and it take O(log(I k )) time to search

58 Overall cost d=O(I k ) and it take O(log(I k )) time to search modifications search

59 Analysis The running time is: Since ∑ I j = I this is at most

1 Sorting We have actually seen already two efficient ways to sort:

Similar presentations

Presentation on theme: "1 Sorting We have actually seen already two efficient ways to sort:"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Sorting We have actually seen already two efficient ways to sort:

Similar presentations

Presentation on theme: "1 Sorting We have actually seen already two efficient ways to sort:"— Presentation transcript:

Similar presentations

About project

Feedback