Presentation is loading. Please wait.

Presentation is loading. Please wait.

Motivation of Sorting The term list here is a collection of records. Each record has one or more fields. Each record has a key to distinguish one record.

Similar presentations


Presentation on theme: "Motivation of Sorting The term list here is a collection of records. Each record has one or more fields. Each record has a key to distinguish one record."— Presentation transcript:

1 Motivation of Sorting The term list here is a collection of records. Each record has one or more fields. Each record has a key to distinguish one record with another. For example, the phone directory is a list. Name, phone number, and even address can be the key, depending on the application or need.

2 Sorting Two ways to store a collection of records –Sequential –Non-sequential Assume a sequential list f. To retrieve a record with key f[i].key from such a list, we can do search in the following order: f[n].key, f[n-1].key, …, f[1].key => sequential search

3 Example of An Element of A Search List class Element { public: int getKey() const {return key;}; void setKey(int k) {key = k;}; private: int key; // other records … }

4 Sequential Search int SeqSearch (Element *f, const int n, const int k) // Search a list f with key values // f[1].key, …, f[n].key. Return I such // that f[i].key == k. If there is no such record, return 0 { int i = n; f[0].setKey(k); while (f[i].getKey() != k) i--; return i; }

5 Sequential Search The number of comparisons for a record key i is n – i +1. The average number of comparisons for a successful search is For the phone directory lookup, there should be a better way than this.

6 Search A binary search only takes O( log n ) time to search a sequential list with n records. In fact, if we look up the name start with W in the phone directory, we start search toward the end of the directory rather than the middle. This search method is based on interpolation scheme. An interpolation scheme relies on a ordered list.

7 IRS Tax Verification Example Now, let ’ s look at another searching problem: Tax verification. The IRS gets salary reports from employers. IRS also gets tax filing reports from employees about their salary. Need to verify the two numbers match for each individual employee.

8 Verifying Two Lists With Sequential Search void Verify1(Element* F1, Element* F2, const int n, const int m) // Compare two unordered lists F1 and F2 of size n and m, respectively { Boolean *marked = new Boolean[m]; for (int i = 1; i <= m; i++) marked[i] = FALSE; for (i = 1; i<= n; i++) { int j = SeqSearch(F2, m, F1[i].getKey()); if (j == 0) cout << F1[i].getKey() <<“not in F2 “ << endl; else { if (F1[i].other != F2[j].other) cout << “Discrepancy in “<<F[i].getKey()<<“:”<<F1[i].other << “and “ << F2[j].other << endl; marked[j] = TRUE; // marked the record in F2[j] as being seen } for (i = 1; i <= m; i++) if (!marked[i]) cout << F2[i].getKey() <<“not in F1. “ << endl; delete [ ] marked; } O(mn)

9 Fast Verification of Two Lists void Verify2(Element* F1, Element* F2, const int n, const int m) // Same task as Verfy1. But sort F1 and F2 so that the keys are in // increasing order. Assume the keys in each list are identical { sort(F1, n); sort(F2, m); int i = 1; int j = 1; while ((i <= n) && (j <= m)) switch(compare(F1[i].getKey(), F2[j].getKey())) { case ‘<‘: cout<<F1[i].getKey() <<“not in F2”<< endl; i++; break; case ‘=‘: if (F1[i].other != F2[j].other) cout << “Discrepancy in “ << F1[i].getKey()<<“:” <<F1[i].other<<“ and “<<F2[j].other << endl; i++; j++; break; case ‘>’: cout <<F2[j].getKey()<<“ not in F1”<<endl; j++; } if (i <= n)PrintRest(F1, i, n, 1); //print records I through n of F1 else if (j <= m) PrintRest(F2, j, m, 2); // print records j through m of F2 } O(max{n log n, m log m})

10 Sorting Application So far we have seen two uses of sorting –An aid in searching –A means for matching entries in lists Sorting is used in other applications. –Estimated 25% of all computing time is spent on sorting No one sorting method is the best for all initial orderings of the list being sorted.

11 Formal Description of Sorting Given a list of records (R 1, R 2, …, R n ). Each record has a key K i. The sorting problem is then that of finding permutation, σ, such that K σ(i) ≤ K σ(i+1), 1 ≤ i ≤ n – 1. The desired ordering is (R σ(1), R σ(2), R σ(n) ).

12 Formal Description of Sorting (Cont.) If a list has several key values that are identical, the permutation, σ s, is not unique. Let σ s be the permutation of the following properties: (1) K σ(i) ≤ K σ(i+1), 1 ≤ i ≤ n – 1 (2) If i < j and K i == K j in the input list, then R i precedes R j in the sorted list. The above sorting method that generates σ s is stable.

13 Categories of Sorting Method Internal Method: Methods to be used when the list to be sorted is small enough so that the entire sort list can be carried out in the main memory. –Insertion sort, quick sort, merge sort, heap sort and radix sort. External Method: Methods to be used on larger lists.

14 Insert Into A Sorted List void insert(const Element e, Element* list, int i) // Insert element e with key e.key into the ordered sequence list[0], …, list[i] such that the // resulting sequence is also ordered. Assume that e.key ≥ list[0].key. The array list must // have space allocated for at least i + 2 elements { while (e.getKey() < list[i].getKey()) { list[i+1] = list[i]; i--; } list[i+1] = e; } O(i)

15 Insertion Sort void InsertSort(Element* list, const int n) // Sort list in nondecreasing order of key { list[0].setKey(MININT); for (int j = 2; j <= n; j++) insert(list[j], list, j-1); }

16 Insertion Sort Example 1 Record R i is left out of order (LOO) iff R i < Example 7.1: Assume n = 5 and the input key sequence is 5, 4, 3, 2, 1 j[1][2][3][4][5] -54321 245321 334521 423451 512345

17 Insertion Sort Example 2 Example 7.2: Assume n = 5 and the input key sequence is 2, 3, 4, 5, 1 j[1][2][3][4][5] -23451 223451 323451 423451 512345 O(1) O(n)

18 Insertion Sort Ananlysis If there are k LOO records in a list, the computing time for sorting the list via insertion sort is O((k+1)n) = O(kn). Therefore, if k << n, then insertion sort might be a good sorting choice.

19 Insertion Sort Variations Binary insertion sort: the number of comparisons in an insertion sort can be reduced if we replace the sequential search by binary search. The number of records moves remains the same. List insertion sort: The elements of the list are represented as a linked list rather than an array. The number of record moves becomes zero because only the link fields require adjustment. However, we must retain the sequential search.

20 Quick Sort Quick sort is developed by C. A. R. Hoare. The quick sort scheme has the best average behavior among the sorting methods. Quick sort differs from insertion sort in that the pivot key K i is placed at the correct spot with respect to the whole list. K j ≤ K s(i) for j s(i). Therefore, the sublist to the left of S(i) and to the right of s(i) can be sorted independently.

21 Quick Sort void QuickSort(Element* list, const int left, const int right) // Sort records list[left], …, list[right] into nondecreasing order on field key. Key pivot = list[left].key is // arbitrarily chosen as the pivot key. Pointers I and j are used to partition the sublist so that at any time // list[m].key pivot, m j. It is assumed that list[left].key ≤ list[right+1].key. { if (left < right) { int i = left, j = right + 1, pivot = list[left].getKey(); do { do i++; while (list[i].getKey() < pivot); do j--; while (list[j].getKey() > pivot); if (i<j) InterChange(list, i, j); } while (i < j); InterChange(list, left, j); QuickSort(list, left, j–1); QuickSort(list, j+1, right); }

22 Quick Sort Example Example 7.3: The input list has 10 records with keys (26, 5, 37, 1, 61, 11, 59, 15, 48, 19). R1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R8R8 R9R9 R 10 LeftRight [265371611159154819110 [11519115]26[59614837]15 [15]11[1915]26[59614837]12 1511[1915]26[59614837]45 1511151926[59614837]710 1511151926[4837]59[61]78 1511151926374859[61]10 151115192637485961

23 Quick Sort (Cont.) In QuickSort(), list[n+1] has been set to have a key at least as large as the remaining keys. Analysis of QuickSort –The worst case O(n 2 ) –If each time a record is correctly positioned, the sublist of its left is of the same size of the sublist of its right. Assume T(n) is the time taken to sort a list of size n: T(n) ≤ cn + 2T(n/2), for some constant c ≤ ≤ cn + 2(cn/2 +2T(n/4)) ≤ 2cn + 4T(n/4) : ≤ cn log 2 n + T(1) = O(n logn)

24 Lemma 7.1 Lemma 7.1: Let T avg (n) be the expected time for function QuickSort to sort a list with n records. Then there exists a constant k such that T avg (n) ≤ kn log e n for n ≥ 2.

25 Analysis of Quick Sort Unlike insertion sort (which only needs additional space for a record), quick sort needs stack space to implement the recursion. If the lists split evenly, the maximum recursion depth would be log n and the stack space is of O(log n). The worst case is when the lists split into a left sublist of size n – 1 and a right sublist of size 0 at each level of recursion. In this case, the recursion depth is n, the stack space of O(n). The worst case stack space can be reduced by a factor of 4 by realizing that right sublists of size less than 2 need not be stacked. Asymptotic reduction in stack space can be achieved by sorting smaller sublists first. In this case the additional stack space is at most O(log n).

26 Quick Sort Variations Quick sort using a median of three: Pick the median of the first, middle, and last keys in the current sublist as the pivot. Thus, pivot = median{K l, K (l+r)/2, K r }.

27 Decision Tree So far both insertion sorting and quick sorting have worst-case complexity of O(n 2 ). If we restrict the question to sorting algorithms in which the only operations permitted on keys are comparisons and interchanges, then O(n logn) is the best possible time. This is done by using a tree that describes the sorting process. Each vertex of the tree represents a key comparison, and the branches indicate the result. Such a tree is called decision tree.

28 Decision Tree for Insertion Sort K 1 ≤ K 2 stopK 1 ≤ K 3 K 2 ≤ K 3 K 1 ≤ K 3 stopK 2 ≤ K 2 stop Yes No Yes No I II III IV V VI

29 Decision Tree (Cont.) Theorem 7.1: Any decision tree that sorts n distinct elements has a height of at least log 2 (n!) + 1 Corollary: Any algorithm that sorts only by comparisons must have a worst-case computing time of Ω(n log n)

30 Simple Merge void merge(Element* initList, Element* mergeList, const int l, const int m, const int n) { for (int i1 =l,iResult = l, i2 = m+1; i1<=m && i2<=n; iResult++){ if (initList[i1].getKey() <= initList[i2].getKey()) { mergeList[iResult] = initList[i1]; i1++; } else { mergeList[iResult] = initList[i2]; i2++; } if (i1 > m) for (int t = i2; t <= n; t++) mergeList[iResult + t - i2] = initList[t]; else for (int t = i1; t <= m; t++) mergeList[iResult + t - i1] = initList[t]; } O(n - l + 1)

31 Analysis of Simple Merge If an array is used, additional space for n – l +1 records is needed. If linked list is used instead, then additional space for n – l + 1 links is needed.

32 O(1) Space Merge A second merge algorithm only requires O(1) additional space. Assume total of n records to be merged into a list, where n is a perfect square. And the numbers of records in the left sublist and the right sublist are multiple of

33 O(1) Space Merge Steps Step 1: Identify the records with largest keys. This is done by following right to left along the two lists to be merged. Step 2: Exchange the records of the second list that were identified in Step 1 with those just to the left of those identified from the first list so that the records with largest keys are contiguous. Step 3: Swap the block of largest with the leftmost block (unless it is already the leftmost block). Sort the rightmost block. Step 4: Reorder the blocks, excluding the block of largest records, into nondecreasing order of the last key in the blocks. Step 5: Perform as many merge substeps as needed to merge the blocks, other than the block with the largest keys. Step 6: Sort the block with the largest keys.

34 O(1) Space Merge Example (First 8 Lines) 0 2 4 6 8 a c e g i j k l m n t w z|1 3 5 7 9 b d f h o p q r s u v x y 0 2 4 6 8 a c e g i j k l m n t w z 1 3 5 7 9 b d f h o p q r s u v x y 0 2 4 6 8 a|c e g i j k|u v x y w z|1 3 5 7 9 b|d f h o p q|r s l m n t u v x y w z|c e g i j k|0 2 4 6 8 a|1 3 5 7 9 b|d f h o p q|l m n r s t u v x y w z 0 2 4 6 8 a|1 3 5 7 9 b|c e g I j k|d f h o p q|l m n r s t 0 v x y w z u 2 4 6 8 a|1 3 5 7 9 b|c e g I j k|d f h o p q|l m n r s t 0 1 x y w z u 2 4 6 8 a|v 3 5 7 9 b|c e g I j k|d f h o p q|l m n r s t 0 1 2 y w z u x 4 6 8 a|v 3 5 7 9 b|c e g I j k|d f h o p q|l m n r s t

35 O(1) Space Merge Example (Last 8 Lines) 0 1 2 3 4 5 u x w 6 8 a|v y z 7 9 b|c e g i j k|d f h o p q|l m n r s t 0 1 2 3 4 5 6 7 8 u w a|v y z x 9 b|c e g i j k|d f h o p q|l m n r s t 0 1 2 3 4 5 6 7 8 9 a w|v y z x u b|c e g i j k|d f h o p q|l m n r s t 0 1 2 3 4 5 6 7 8 9 a w v y z x u b c e g i j k|d f h o p q|l m n r s t 0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k v z u|y x w o p q|l m n r s t 0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k v z u y x w o p q|l m n r s t 0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q y x w|v z u r s t 0 1 2 3 4 5 6 7 8 9 a b c d e f g h i j k l m n o p q r s t|v z u y x w

36 Analysis of O(1) Space Merge Step 1 and 2 and the swapping of Step 3 each take O( ) time and O(1) space. The sort of Step 3 can be done in O(n) time and O(1) space using an insertion sort. Step 4 can be done in O(n) time and O(1) space using a selection sort. (Selection sort sorts m records using O(m 2 ) key comparisons and O(m) record moves. So it needs O(n) comparisons and the time to move blocks is O(n). If insertion sort is used in Step 4, then the time becomes O(n 1.5 ) since insertion sort needs O(m 2 ) record moves ( records per block * n record moves).

37 Analysis of O(1) Space Merge (Cont.) The total number of merge substeps is at most. The total time for Step 5 is O(n). The sort of Step 6 can be done in O(n ) by using either a selection sort or an insertion sort. Therefore, the total time is O(n) and the additional space used is O(1).

38 Iterative Merge Sort Treat the input as n sorted lists, each of length 1. Lists are merged by pairs to obtain n/2 lists, each of size 2 (if n is odd, the one list is of length 1). The n/2 lists are then merged by pairs, and so on until we are left with only one list.

39 Merge Tree 265771611159154819 5 261 7711 6115 5919 48 1 5 26 7711 15 59 6119 48 1 5 11 15 26 59 61 7719 48 1 5 11 15 19 26 48 59 61 77

40 Iterative Merge Sort void MergeSort(Element* list, const int n) // Sort list list into non-decreasing order of the keys list[1].key, …,list[n].key. { Element* tempList = new Element[n+1]; // l is the length of the sublist currently being merged. for (int l = 1; l < n; l *= 2) { MergePass(list, tempList, n, l); l *= 2; MergePass(tempList, list, n, l); //interchange role of list and tempList } delete[ ] tempList; }

41 Merge Pass void MergePass(Element* initList, Elemen* resultList, const int n, const int l) // One pass of merge sort. Adjacent pairs of sublists of length l are merged // from list initList to list resultList. n is the number of records in initList { for (int i = 1; i <= n – 2*l + 1; // Are enough elements remaining to form two sublists of length l? i += 2*l) merge(initList, resultList, i, i + l - 1, i + 2*l – 1); // merge remaining list of length < 2 * l if ((i + l – 1) < n) merge(initList, resultList, i, i+l–1, n); else for (int t = i; t <= n; t++) resultList[t] = initList[t]; }

42 Analysis of MergeSort Total of passes are made over the data. Each pass of merge sort takes O( n ) time. The total of computing time is O( n log n )

43 Recursive Merge Sort Recursive merge sort divides the list to be sorted into two roughly equal parts: –the left sublist [left : (left+right)/2] –the right sublist [(left+right)/2 +1 : right] These sublists are sorted recursively, and the sorted sublists are merged. To avoid copying, the use of a linked list (integer instead of real link) for sublist is desirable.

44 Sublist Partitioning For Recursive Merge Sort 265771611159154819 5 2611 5919 48 5 26 7711 15 5919 48 1 5 26 61 7711 15 19 48 59 1 5 11 15 19 26 48 59 61 77 1 61

45 Program 7.11 (Recursive Merge Sort ) class Element { private: int key; Field other; int link; public: Element() {link = 0;}; }; int rMergeSort(Element* list, const int left, const int right) // List list = (list[left], …, list[right]) is to be sorted on the field key. // link is a link field in each record that is initially 0 // list[0] is a record for intermediate results used only in ListMerge { if (left >= right) return left; int mid = (left + right)/2; return ListMerge(list, rMergeSort(list, left, mid), rMergeSort(list, mid+1, right)); } O(n log n)

46 Program 7.12 (Merging Linked Lists) int ListMerge(Element* list, const int start1, const int start2) { int iResult = 0; for (int i1 = start1, i2 = start2; i1 && i2;){ if (list[i1].key <= list[i2].key) { list[iResult].link = i1; iResult = i1; i1 = list[i1].link; } else { list[iResult].link = i2; iResult = i2; i2 = list[i2].link; } // move remainder if (i1 == 0) list[iResult].link = i2; else list[iResult] = i1; return list[0].link; }

47 Natural Merge Sort Natural merge sort takes advantage of the prevailing order within the list before performing merge sort. It runs an initial pass over the data to determine the sublists of records that are in order. Then it uses the sublists for the merge sort.

48 Natural Merge Sort Example 265 771 6111 5915 4819 1 11 59 6115 19 48 5 26 77 1 5 11 26 59 61 77 15 19 48 1 5 11 15 19 26 48 59 61 77

49 Heap Sort Merge sort needs additional storage space proportional to the number of records in the file being sorted, even though its computing time is O(n log n) O(1) merge only needs O(1) space but the sorting algorithm is much slower. We will see that heap sort only requires a fixed amount of additional storage and achieves worst- case and average computing time O(n log n). Heap sort uses the max-heap structure.

50 Heap Sort (Cont.) For heap sort, first of all, the n records are inserted into an empty heap. Next, the records are extracted from the heap one at a time. With the use of a special function adjust(), we can create a heap of n records faster.

51 Program 7.13 (Adjusting A Max Heap) void adjust (Element* tree, const int root, const int n) // Adjust the binary tree with root root to satisfy the heap property. The left and right subtrees of root // already satisfy the heap property. No node has index greater than n. { Element e = tree[root]; int k = e.getKey(); for (int j = 2*root; j <= n; j *= 2) { if (j < n) if (tree[j].getKey() < tree[j+1].getKey()) j++; // compare max child with k. If k is max, then done. if (k >= tree[j].getKey()) break; tree[j/2] = tree[j]; // move jth record up the tree } tree[j/2] = e; }

52 Program 7.14 (Heap Sort) void HeapSort (Element* list, const int n) // The list list = (list[1], …, list[n]) is sorted into nondecreasing order of the field key. { for (int i = n/2; i >= 1; i--) // convert list into a heap adjust(list, i, n); for (i = n-1; i >= 1; i--) // sort list { Element t = list[i+1]; // interchange list 1 and list i+1 list[i+1] = list[1]; list[1] = t; adjust(list, 1, i); } O(n log n)

53 Converting An Array Into A Max Heap 26 154819 5 161 77 1159 77 1515 61 4819 59 1126 [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [2] [3] [4] [5] [6] [7] [8] [9] [10] (a) Input array (b) Initial heap [1]

54 Heap Sort Example 61 51 48 1519 59 1126 [2] [3] [4] [5] [6] [7] [8] [9] [1] Heap size = 9, Sorted = [77] 59 5 48 1519 26 111 [2] [3] [5] [6] [7] [8] [1] Heap size = 8, Sorted = [61, 77]

55 Sorting Several Keys A list of records are said to be sorted with respect to the keys K 1, K 2, …, K r iff for every pair of records i and j, i < j and (K 1 i, K 2 i, …, K r i ) ≤ (K 1 j, K 2 j, …, K r j ). The r-tuple (x 1, x 2, …, x r ) is less than or equal to the r-tuple (y 1, y 2, …, y r ) iff either x i = y i, 1 ≤ i ≤ j, and x j+1 < y j+1 for some j < r or x i = y i, 1 ≤ i ≤ r. Two popular ways to sort on multiple keys –sort on the most significant key into multiple piles. For each pile, sort on the second significant key, and so on. Then piles are combined. This method is called sort on most-significant- digit-first (MSD). –The other way is to sort on the least significant digit first (LSD). Example, sorting a deck of cards: suite and face value. –Spade > Heart > Diamond > Club

56 Sorting Several Keys (Cont.) LSD and MSD only defines the order in which the keys are to be sorted. But they do not specify how each key is sorted. Bin sort can be used for sorting on each key. The complexity of bin sort is O(n). LSD and MSD can be used even when there is only one key. –E.g., if the keys are numeric, then each decimal digit may be regarded as a subkey. => Radix sort.

57 Radix Sort In a radix sort, we decompose the sort key using some radix r. –The number of bins needed is r. Assume the records R1, R2, …, Rn to be sorted based on a radix of r. Each key has d digits in the range of 0 to r-1. Assume each record has a link field. Then the records in the same bin are linked together into a chain: –f[i], 0 ≤ i ≤ r (the pointer to the first record in bin i) –e[i], (the pointer to the end record in bin i) –The chain will operate as a queue. –Each record is assumed to have an array key[d], 0 ≤ key[i] ≤ r, 0 ≤ i ≤ d.

58 Program 7.15 (LSD Radix Sort) void RadixSort (Element* list, const int d, const int n) { int e[radix], f[radix]; // queue pointers for (int i = 1; i <= n; i++) list[i].link = i + 1; // link into a chain starting at current list[n].link = 0; int current = 1; for (i = d – 1; i >= 0; i--) // sort on key key[i] { for (int j = 0; j < radix; j++) f[j] = 0; // initialize bins to empty queues for (; current; current = list[current].link) { // put records into queues int k = list[current].key[i]; if (f[k] == 0) f[k] = current; else list[e[k]].link = current; e[k] = current; } for (j = 0; f[j] == 0; j++); // find the first non-empty queue current = f[j]; int last = e[j]; for (int k = j+1; k < radix; k++){ // concatenate remaining queues if (f[k]){ list[last].link = f[k]; last = e[k]; } list[last].link = 0; } O(n) O(r) d passes O(d(n+r))

59 Radix Sort Example 1792083069385998455927133 list[1] list[2] list[3] list[4] list[5] list[6] list[7] list[8] list[9] list[10] e[0]e[1]e[2]e[3]e[4]e[5]e[6]e[7]e[8]e[9] f[0]f[1]f[2]f[3]f[4]f[5]f[6]f[7]f[8]f[9] 179 859 9 2083065598493 33 271 9333984553062081798599 list[1] list[2] list[3] list[4] list[5] list[6] list[7] list[8] list[9] list[10]

60 Radix Sort Example (Cont.) e[0]e[1]e[2]e[3]e[4]e[5]e[6]e[7]e[8]e[9] f[0]f[1]f[2]f[3]f[4]f[5]f[6]f[7]f[8]f[9] 179859 9 208 306559849333271 9333984553062081798599 list[1] list[2] list[3] list[4] list[5] list[6] list[7] list[8] list[9] list[10] 3062089335585927117998493 list[1] list[2] list[3] list[4] list[5] list[6] list[7] list[8] list[9] list[10]

61 Radix Sort Example (Cont.) e[0]e[1]e[2]e[3]e[4]e[5]e[6]e[7]e[8]e[9] f[0]f[1]f[2]f[3]f[4]f[5]f[6]f[7]f[8]f[9] 179 55 33 9 859984 306 271 9335593179208271306859948 list[1] list[2] list[3] list[4] list[5] list[6] list[7] list[8] list[9] list[10] 3062089335585927117998493 list[1] list[2] list[3] list[4] list[5] list[6] list[7] list[8] list[9] list[10] 93 208

62 List And Table Sorts Apart from radix sort and recursive merge sort, all the sorting methods we have looked at so far require excessive data movement. When the amount of data to be sorted is large, data movement tends to slow down the process. It is desirable to modify sorting algorithms to minimize the data movement. Methods such as insertion sort or merge sort can be modified to work with a linked list rather than a sequential list. Instead of physical movement, an additional link field is used to reflect the change in the position of the record in the list.

63 Program 7.16 (Rearranging Records Using A Doubly Linked List void list (Element* list, const int n, int first) { int prev = 0; for (int current = first; current; current = list[current].link) // convert chain into a doubly linked list { list[current].linkb = prev; prev = current; } for (int i = 1; i < n; i++) // move list first to position i while maintaining the list { if (first != i) { if (list[i].link) list[list[i].link].linkb = first; list[list[i].linkb].link = first; Element a = list[first]; list[first] = list[i];list[i] = a; } first = list[i].link; } O(n) O(nm) Assume each record is m words

64 Example 7.9 i R1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R8R8 R9R9 R 10 key 265771611159154819 link 96023851071 i R1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R8R8 R9R9 R 10 key265771611159154819 link96023851071 linkb10450729618

65 Example 7.9 (Cont.) i R1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R8R8 R9R9 R 10 key157726611159154819 link26093851074 linkb04510729648 i R1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R8R8 R9R9 R 10 key157726611159154819 link26093851071 linkb04510729618 Configuration after first iteration of the for loop of list1, first = 2 Configuration after second iteration of the for loop of list1, first = 6

66 Example 7.9 (Cont.) i R1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R8R8 R9R9 R 10 key151126617759154819 link26896051074 linkb04210759648 i R1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R8R8 R9R9 R 10 key151115617759264819 link26810605978 linkb04267591088 Configuration after third iteration of the for loop of list1, first = 8 Configuration after fourth iteration of the for loop of list1, first = 10

67 Example 7.10 (Rearranging Records Using Only One Link) void list2(Element* list, const int n, int first) // Same function as list1 except that a second link field linkb is not required { for (int i = 1; i < n; i++) { while (first < i ) first = list[first].link; int q = list[first].link; // list q is next record in nondecreasing order if (first != i) // interchange list i and list first moving listfirst to its correct spot as list firs t has ith smallest key. // Also set link from old position of list i to new one { Element t = list[i]; list[i] = list[first]; list[first] = t; list[i].link = first; } first = q; } O(n) O(nm)

68 Example 7.10 (Rearranging Records Using Only One Link) i R1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R8R8 R9R9 R 10 key157726611159154819 link46093851071 i R1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R8R8 R9R9 R 10 key157726611159154819 link46093851071 Configuration after first iteration of the for loop of list1, first = 2 Configuration after second iteration of the for loop of list1, first = 6

69 Example 7.10 (Rearranging Records Using Only One Link) i R1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R8R8 R9R9 R 10 key151126617759154819 link46693051071 i R1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R8R8 R9R9 R 10 key151115617759264819 link4668305971 Configuration after third iteration of the for loop of list1, first = 8 Configuration after fourth iteration of the for loop of list1, first = 10

70 Example 7.10 (Rearranging Records Using Only One Link) i R1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R8R8 R9R9 R 10 key151115197759264861 link46681005973 i R1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R8R8 R9R9 R 10 key151115192659774861 link4668185073 Configuration after fifth iteration of the for loop of list1, first = 1 Configuration after sixth iteration of the for loop of list1, first = 9

71 Table Sort The list-sort technique is not well suited for quick sort and heap sort. One can maintain an auxiliary table, t, with one entry per record. The entries serve as an indirect reference to the records. Initially, t[i] = i. When interchanges are required, only the table entries are exchanged. It may be necessary to physically rearrange the records according to the permutation specified by t sometimes.

72 Table Sort (Cont.) The function to rearrange records corresponding to the permutation t[1], t[2], …, t[n] can be considered as an application of a theorem from mathematics: –Every permutation is made up of disjoint cycles. The cycle for any element i is made up of i, t[i], t 2 [i], …, t k [i], where tj[i ]=t[t j-1 [i]], t 0 [i]=i, t k [i]=i.

73 Program 7.18 (Table Sort) void table(Element* list, const int n, int *t) { for (int i = 1; i < n; i++) { if (t[i] != i) { Element p = list[i]; int j = i; do { int k = t[j]; list[j] = list[k]; t[j] = j; j = k; } while (t[j] != i) list[j] = p; t[j] = j; }

74 Table Sort Example R1R1 R2R2 R3R3 R4R4 R5R5 R6R6 R7R7 R8R8 key 35 14124226503118 t 32857146 key 1214184226353150 t 12357648 key 1214182631354250 t 12345678 Initial configuration Configuration after rearrangement of first cycle Configuration after rearrangement of second cycle 1 2 3 4 5

75 Analysis of Table Sort To rearrange a nontrivial cycle including k distinct records one needs k+1 moves. Total of record moves Since the records on all nontrivial cycles must be different, then The total number of record moves is maximum when and there are cycles. Assume that one record move costs O(m) time, the total computing time is O(mn).

76 Summary Of Internal Sorting No one method is best for all conditions. –Insertion sort is good when the list is already partially ordered. And it is best for small number of n. –Merge sort has the best worst-case behavior but needs more storage than heap sort. –Quick sort has the best average behavior, but its worst-case behavior is O(n 2 ). –The behavior of radix sort depends on the size of the keys and the choice of r.

77 External Sorting There are some lists that are too large to fit in the memory of a computer. So internal sorting is not possible. Some records are stored in the disk (tape, etc.). System retrieves a block of data from a disk at a time. A block contains multiple records. The most popular method for sorting on external storage devices is merge sort. –Segments of the input list are sorted. –Sorted segments (called runs) are written onto external storage. –Runs are merged until only run is left.

78 Example 7.12 Consider a computer which is capable of sorting 750 records is used to sort 4500 records. Six runs are generated with each run sorting 750 records. Allocate three 250-record blocks of internal memory for performing merging runs. Two for input run 2 runs and the last one is for output. Three factors contributing to the read/write time of disk: –seek time –latency time –transmission time

79 Example 7.12 (Cont.) run1 (1 – 750) run2 (751 – 1500) run3 (1501 – 2250) run4 (2251 – 3000) run5 (3001 – 3750) run6 (3751 – 4500)

80 Example 7.12 (Cont.) t IO = t s + t l +t rw t IS = time to internal sort 750 records nt m = time to merge n records from input buffers to the output buffer t s = maximum seek time t l = maximum latency time t rw = time to read or write on block of 250 records

81 Example 7.12 (Cont.) OperationTime (1) Read 18 blocks of input, 18 t IO, internally sort, 6 t IS, write 18 blocks, 18 t IO 36 t IO + 6 t IS (2) Merge runs 1 to 6 in pairs36 t IO + 4500 t m (3) Merge two runs fo 1500 records each, 12 blocks 24 t IO + 3000 t m (4) Merge one run of 3000 records with one run of 1500 records 36 t IO + 4500 t m Total time 132 t IO + 12000 t m + 6 t IS

82 K-Way Merging To merge m runs via 2-way merging will need passes. If we use higher order merge, the number of passes over would be reduced. With k-way merge on m runs, we need passes over. But is it always true that the higher order of merging, the less computing time we will have? –Not necessary! –k-1 comparisons are needed to determine the next output. –If loser tree is used to reduce the number of comparisons, we can achieve complexity of O(n log 2 m) –The data block size reduced as k increases. Reduced block size implies the increase of data passes over

83 Buffer Handling for Parallel Operation To achieve better performance, multiple input buffers and two output buffers are used to avoid idle time. Evenly distributing input buffers among all runs may still have idle time problem. Buffers should be dynamically assigned to whoever needs to retrieve more data to avoid halting the computing process. We should take advantage of task overlapping and keep computing process busy and avoid idle time.

84 Buffering Algorithm Step 1: Input the first block of each of the k runs, setting up k linked queues, each having one block of data. Step 2: Let LastKey[i] be the last key input from run i. Let NextRun be the run for which LastKey is minimum. Step 3: Use a function kwaymerge to merge records from the k input queues into the output buffer. Step 4: Wait for any ongoing disk I/O to complete. Step 5: If an input buffer has been read, add it to the queue for the appropriate run. Step 6: If LastKey[NextRun] != +infinity, then initiate reading the next block from run NextRun into a free input buffer. Step 7: Initiate the writing of output buffer. Step 8: If a record with key +infinity has not been merged into the output buffer, go back to Step 3. Otherwise, wait for the ongoing write to complete and then terminate.

85 Optimal Merging of Runs 26 11 6 24 5 15 6 24 20 515 26 weighted external path length = 2*3 + 4*3 + 5*2 + 15*1 = 43 weighted external path length = 2*2 + 4*2 + 5*2 + 15*2 = 52

86 Huffman Code Assume we want to obtain an optimal set of codes for messages M 1, M 2, …, M n+1. Each code is a binary string that will be used for transmission of the corresponding message. At receiving end, a decode tree is used to decode the binary string and get back the message. A zero is interpreted as a left branch and a one as a right branch. These codes are called Huffman codes. The cost of decoding a code word is proportional to the number bits in the code. This number is equal to the distance of the corresponding external node from the root node. If q i is the relative frequency with which message Mi will be transmitted, then the expected decoding time is where d i is the distance of the external node for message Mi from the root node.

87 Huffman Codes (Cont.) The expected decoding time is minimized by choosing code words resulting in a decode tree with minimal weighted external path length. M1M1 M2M2 M3M3 M4M4 0 0 0 1 1 1

88 Huffman Function class BinaryTree { public: BinaryTree(BinaryTree bt1, BinaryTree bt2) { root->LeftChild = bt1.root; root->RightChild = bt2.root; root->weight = bt1.root->weight + bt2.root->weight; } private: BinaryTreeNode *root; } void huffman (List l) // l is a list of single node binary trees as decribed above { int n = l.Size(); // number of binary trees in l for (int i = 0; i < n-1; i++) { // loop n-1 times BinaryTree first = l.DeleteMinWeight(); BinaryTree second = l.DeleteMinWeight(); BinaryTree *bt = new BinaryTree(first, second); l.Insert(bt); } O(nlog n)

89 Huffman Tree Example 5 2 3 5 2 3 10 5 16 9 7 5 2 3 10 5 23 13 16 9 7 5 2 3 10 5 23 13 39 (a) (b) (c) (d) (e)


Download ppt "Motivation of Sorting The term list here is a collection of records. Each record has one or more fields. Each record has a key to distinguish one record."

Similar presentations


Ads by Google