Instructors: C. Y. Tang and J. S. Roger Jang

Name: Instructors: C. Y. Tang and J. S. Roger Jang
Uploaded: 2017-08-17T00:13:51+00:00
Duration: PTM49S59
Description: Instructors: C. Y. Tang and J. S. Roger Jang

Instructors: C. Y. Tang and J. S. Roger Jang
Chapter 7 Sorting Instructors: C. Y. Tang and J. S. Roger Jang All the material are integrated from the textbook "Fundamentals of Data Structures in C" and some supplement from the slides of Prof. Hsin-Hsi Chen (NTU).

Sequential Search /* maximum size of list plus one */
# define MAX_SIZE 1000 typedef struct { int key; /* other fields */ } element; element list[MAX_SIZE];

Sequential Search Int seqsearch(int list[], int searchnum, int n) {
/* search an array, list, that has n numbers. Return i, if list[i] = searchnum. Return -1, if searchnum is not in the list */ int i; // sentinel that signals the end of the list list[n].key = searchnum; for (i = 0; list[i].key != searchnum; i++) ; return ((i < n) ? i : -1); }

Analysis of Seq Search Example 44, 55, 12, 42, 94, 18, 06, 67
unsuccessful search n+1 The average number of comparisons for a successful search is

Binary Search Input : sorted list (i.e. list[0].key≦list[1].key≦ ... ≦list[n-1].key) compare searchnum and list[middle].key, where middle=(n-1)/2, There are three possible outcomes: searchnum < list[middle].key search list[0] and list[middle-1] searchnum = list[middle].key search terminal successfully searchnum > list[middle].key search list[middle+1] and list[n-1]

Binary Search Input : 4,15,17,26,30,46,48,56,58,82,90,95 It is easily to see that binary search makes no more than O(log n) comparisons (i.e. height of tree ). [5] 46 [2] [8] 17 58 [0] [3] [6] [10] 4 26 48 90 [1] [4] [7] [9] [11] 15 30 56 82 95 Decision tree for binary search

Binary Search int binsearch(element list[], int searchnum, int n) {
int left = 0, right = n-1, middle; while (left <= right){ middle = (left + right) / 2; switch (COMPARE(list[middle].key, searchnum)){ case -1 : left = middle + i; break; case 0 : return middle; case 1 : right = middle - 1; } return -1;

List Verification Compare lists to verify that are identical, or to identify the discrepancies. The problem of list verification is an instance of repeatedly searching one list, using each key in the other list as the search key. example international revenue service (e.g., employee vs. employer) complexities random order: O(mn) ordered list: O(tsort(n)+tsort(m)+m+n)

Verifying using a sequential search
/* compare two unordered lists list1 and list2 */ void verify1(element list1[], element list2[ ], int n, int m) { int i, j; int marked[MAX_SIZE]; for(i = 0; i<m; i++) marked[i] = FALSE; for (i=0; i<n; i++) if ((j = seqsearch(list2, m, list1[i].key))<0) printf("%d is not in list 2\n", list1[i].key); else marked[j] = TRUE; for ( i=0; i<m; i++) if (!marked[i]) printf("%d is not in list1\n", list2[i]key); }

Fast verification of two lists
/* Same task as verify1, but list1 and list2 are sorted */ void verify2(element list1[ ], element list2 [ ], int n, int m) { int i, j; sort(list1, n); sort(list2, m); i = j = 0; while (i< n && j< m) if (list1[i].key<list2[j].key) { printf ("%d is not in list 2 \n", list1[i].key); i++; }

Fast verification of two lists
else if (list1[i].key == list2[j].key) { i++; j++; } else { printf("%d is not in list 1\n", list2[j].key); for(; i<n; i++) printf ("%d is not in list 2\n", list1[i].key); for(; j<m; j++)

complexities verify1︰ O(mn) (randomly arranged) verify2︰
O(tsort(n) + tsort(m) + m + n) (sorted list)

Sorting Problem Definitions
We are given a list of records (R0, R1, ... , Rn-1). each record, Ri, has a key value, Ki. Find a permutation σ such that︰ sorted Kσ(i-1) ≦ Kσ(i), for 0 < i ≦ n-1. stable If i<j and Ki=Kj in the list, then Ri precedes Rj in the sorted list. criteria # of key comparisons # of data movements

Insertion sort Insert a record Ri into a sequence of ordered records, R0,R1,...,Ri-1.(K0≦K1≦ ... ≦Ki-1). Time Complexity : O(n2) Stable The fastest sort when n is small (n≦20)

Insertion sort

Insertion sort /* perform a insertion sort on the list */
void insertion_sort(element list[], int n) { int i, j; element next; for (i=1; i<n; i++) { next= list[i]; for (j=i-1; j>=0 && next.key<list[j].key ; j--) list[j+1] = list[j]; list[j+1] = next; }

Insertion sort worse case i 0 1 2 3 4 - 5 4 3 2 1 1 4 5 3 2 1

Insertion sort O(n) best case left out of order (LOO) i 0 1 2 3 4
left out of order (LOO)

Insertion sort Ri is LOO if Ri < max{Rj} k: # of records LOO
0j<i k: # of records LOO Computing time: O((k+1)n) * * * * *

Variation Binary insertion sort sequential search --> binary search
reduce # of comparisons, # of moves unchanged List insertion sort array --> linked list sequential search, move --> 0

Quick Sort (C.A.R. Hoare) Given (R0, R1, ... , Rn-1), pick up a pivot key Ki, if Ki is placed in position s(i), then︰ Kj≦Ks(i) for j<s(i) Kj≧Ks(i) for j>s(i) R0, …, RS(i)-1, RS(i), RS(i)+1, …, RS(n-1) two partitions

Quick sort example Given 10 records with keys (26, 5, 37, 1, 61, 11, 59, 15, 48, 19)

Quick sort void quicksort(element list[], int left, int right) {
int pivot, i, j; element temp; if (left < right) { i = left; j = right+1; pivot = list[left].key; do { do i++; while (list[i].key < pivot); do j--; while (list[j].key > pivot); if (i < j) SWAP(list[i], list[j], temp); } while (i < j); SWAP(list[left], list[j], temp); quicksort(list, left, j-1); quicksort(list, j+1, right); }

Analysis for quick sort
Assume that each time a record is positioned, the list is divided into the rough same size of two parts. Position a list with n element needs O(n) T(n) is the time taken to sort n elements T(n)<=cn+2T(n/2) for some c <=cn+2(cn/2+2T(n/4)) <=cnlog n+nT(1)=O(nlog n)

Analysis for quick sort
Time complexity: Worst case: O(n2) Best case: O(nlogn) Average case: O(nlogn) Space complexity Worst case: O(n) Best case: O(logn) Average case: O(logn) Unstable

Variation Our version of quick sort always picked the key of the first record in the current sublist as the pivot. A better choice for this pivot is the median of the first, middle, and last keys in the current sublist.

Optimal sorting time How quickly can we hope to sort a list of n object? If we restrict our question to algorithms that permit only the comparison and interchange of keys, then the answer is O(n log2n)

Decision tree for insertion sort
K0≦K1 [0,1,2] Yes No K0≦K2 [1,0,2] K1≦K2 [0,1,2] Yes No Yes No stop [0,1,2] K0≦K2 [0,1,2] stop [1,0,2] K1≦K2 [1,2,0] Yes No Yes No stop [0,1,2] stop [0,1,2] stop [1,2,0] stop [2,1,0]

Optimal sorting time Theorem 7.1: Any decision tree that sorts n distinct elements has a height of at least log2(n!) +1 Corollary: Any algorithm that sorts by comparisons only must have a worst case computing time of Ω(n log2n).

Merge sort How to merge two sorted lists (list[i], ... , list[m] and list[m+1], ... , list[n]) into single sorted list, (sorted[i], ... , sorted[n])? There are two algorithm to do this, one need O(n) space complexity, another only require O(1).

Merge sort (O(n) space)
void merge(element list[], element sorted[], int i, int m, int n) { int j, k, t; j = m+1; k = i; while (i<=m && j<=n) { if (list[i].key<=list[j].key) sorted[k++]= list[i++]; else sorted[k++]= list[j++]; } if (i>m) for (t=j; t<=n; t++) sorted[k+t-j]= list[t]; else for (t=i; t<=m; t++) sorted[k+t-i] = list[t];

Analysis Time complexity: O(n) Space complexity: O(n)

Sort by the rightmost records of -1 blocks
*Figure 7.5:First eight lines for O(1) space merge example (p337) file 1 (sorted) file 2 (sorted) discover keys exchange preprocessing exchange sort Sort by the rightmost records of -1 blocks compare exchange compare exchange compare

   y w z u x a v b c e g i j k d f h o p q l m n r s t    w z u x a v y b c e g i j k d f h o p q l m n r s t    z u x w a v y b c e g i j k d f h o p q l m n r s t    u x w a v y z b c e g i j k d f h o p q l m n r s t

Segment one is merged (i.e., 0, 2, 4, 6, 8, a)
*Figure 7.6:Last eight lines for O(1) space merge example(p.338) 6, 7, 8 are merged Segment one is merged (i.e., 0, 2, 4, 6, 8, a) Change place marker (longest sorted sequence of records) Segment one is merged (i.e., b, c, e, g, i, j, k) Change place marker Segment one is merged (i.e., o, p, q) No other segment. Sort the largest keys.

O(1) Space merge sort

Iterative Merge Sort We assume that the input sequence has n sorted lists each of length 1. We merge these lists pairwise to obtain n/2 list of size 2. We then merge the n/2 lists pairwise, and so on, until a single list remains.

Iterative Merge Sort example

merge_pass function void merge_pass(element list[], element sorted[],int n, int length) { int i, j; for (i=0; i<n-2*length; i+=2*length) merge(list,sorted,i,i+length-1,i+2*lenght-1); if (i+length<n) merge(list, sorted, i, i+lenght-1, n-1); else for (j=i; j<n; j++) sorted[j]= list[j]; } One complement segment and one partial segment Only one segment i i+length-1 i+2length-1 ... 2*length ...

merge_sort function void merge_sort(element list[], int n) {
int length=1; element extra[MAX_SIZE]; while (length<n) { merge_pass(list, extra, n, length); length *= 2; merge_pass(extra, list, n, length); } l l l l ... 2l 2l ... 4l ...

Recursive Formulation of Merge Sort
copy copy copy copy (1+5)/2 (6+10)/2 (1+10)/2 Data Structure: array (copy subfiles) vs. linked list (no copy)

Simulation of merge sort
start=3

Point to the start of sorted chain
Recursive Merge Sort lower upper int rmerge(element list[], int lower, int upper) { int middle; if (lower >= upper) return lower; else { middle = (lower+upper)/2; return listmerge(list, rmerge(list,lower,middle), rmerge(list,middle, upper)); } Point to the start of sorted chain lower middle upper

List Merge int listmerge(element list[], int first, int second) {
int start=n; while (first!=-1 && second!=-1) { if (list[first].key<=list[second].key) { /* key in first list is lower, link this element to start and change start to point to first */ list[start].link= first; start = first; first = list[first].link; } first ... second

List Merge first is exhausted. second is exhausted. O(nlog2n) else {
/* key in second list is lower, link this element into the partially sorted list */ list[start].link = second; start = second; second = list[second].link; } if (first==-1) else list[start].link = first; return list[n].link; first is exhausted. second is exhausted. O(nlog2n)

Nature Merge Sort

Heap Sort The heap sort algorithm will require only a fixed amount of additional storage and at the same time will have as its worst case and average computing time O(n log n). While heap sort is slightly slower than merge sort using O(n) additional space, it is faster than merge sort using O(1) additional space.

Heap Sort Example Array interpreted as a binary tree 26 5 77 1 61 11
[1] 26 [2] 5 [3] 77 [4] 1 [5] 61 [6] 11 [7] 59 [8] 15 [9] 48 [10] 19 Array interpreted as a binary tree

Heap Sort Example initial heap exchange
[1] 77 [2] 61 [3] 59 exchange [4] 48 [5] 19 [6] 11 [7] 26 [8] 15 [9] 1 [10] 5 Max heap following first for loop of heapsort

Heap Sort Example 61 48 59 15 19 11 26 5 1 77 [1] [2] [3] [4] [5] [6]
[7] 26 [8] 5 [9] 1 [10] 77

[7] 1 [8] 5 [9] [10] 61 77

[7] 1 [8] [9] [10] 59 61 77

[7] 48 [8] [9] [10] 59 61 77

Heap Sort 2i+1 2i void adjust(element list[], int root, int n) {
int child, rootkey; element temp; temp=list[root]; rootkey=list[root].key; child=2*root; while (child <= n) { if ((child < n) && (list[child].key < list[child+1].key)) child++; if (rootkey > list[child].key) break; else { list[child/2] = list[child]; child *= 2; } list[child/2] = temp; 2i+1 2i

Heap Sort void heapsort(element list[], int n) { int i, j;
element temp; for (i=n/2; i>0; i--) adjust(list, i, n); for (i=n-1; i>0; i--) { SWAP(list[1], list[i+1], temp); adjust(list, 1, i); } ascending order (max heap) bottom-up n-1 cylces top-down

Radix Sort We now examine the problem of sorting records that have several keys. These keys are labeled K0,K1, ... ,Kr-1, with K0 be the most significant key and Kr-1 the least. R0, ... ,Rn-1, is lexically sorted with respect to the keys K0,K1, ... ,Kr-1 iff , 0≦i<n-1

Radix Sort We now examine the problem of sorting records that have several keys. These keys are labeled K0,K1, ... ,Kr-1, with K0 be the most significant key and Kr-1 the least. R0, ... ,Rn-1, is lexically sorted with respect to the keys K0,K1, ... ,Kr-1 iff , 0≦i<n-1 Most significant digit first: sort on K0, then K1, ... Least significant digit first: sort on Kr-1, then Kr-2, ...

Arrangement of cards after first pass of an MSD sort
Suits:  <  <  <  Face values: 2 < 3 < 4 < … < J < Q < K < A (1) MSD sort first, e.g., bin sort, four bins     LSD sort second, e.g., insertion sort (2) LSD sort first, e.g., bin sort, 13 bins 2, 3, 4, …, 10, J, Q, K, A MSD sort, e.g., bin sort four bins    

Arrangement of cards after first pass of an LSD sort

LSD radix sort example

LSD radix sort example (Cont.)

LSD Radix Sort An LSD radix r sort,
R0, R1, ..., Rn-1 have the keys that are d-tuples (x0, x1, ..., xd-1) #define MAX_DIGIT 3 #define RADIX_SIZE 10 typedef struct list_node *list_pointer; typedef struct list_node { int key[MAX_DIGIT]; list_pointer link; }

LSD Radix Sort list_pointer radix_sort(list_pointer ptr) {
list_pointer front[RADIX_SIZE], rear[RADIX_SIZE]; int i, j, digit; for (i=MAX_DIGIT-1; i>=0; i--) { for (j=0; j<RADIX_SIZE; j++) front[j]=read[j]=NULL; while (ptr) { digit=ptr->key[I]; if (!front[digit]) front[digit]=ptr; else rear[digit]->link=ptr; Initialize bins to be empty queue. Put records into queues.

LSD Radix Sort O(d(n+r)) O(n) O(r) rear[digit]=ptr; ptr=ptr->link;
} /* reestablish the linked list for the next pass */ ptr= NULL; for (j=RADIX_SIZE-1; j>=0; j++) if (front[j]) { rear[j]->link=ptr; ptr=front[j]; return ptr; Get next record. O(n) O(d(n+r)) O(r)

List and Table Sort Most sorting methods require excessive data movement since we must physically move records following some comparisons. If the records are large, this slows down the sorting process. We can reduce data movement by using a linked list representation.

List and Table Sort However, in some applications we must physically rearrange the records so that they are in the required order. We can achieve considerable savings by first performing a liked list sort and then physically rearranging the records according to the order specified in the list.

List and Table Sort In this section, we examine three methods for rearranging the lists into sorted order. list_sort1 and list_sort2 require a linked list representation. table_sort uses an auxiliary table that indirectly references the list's records.

In Place Sorting R0 R1 R2 R3 R4 R5 R6 R7 R8 R9
i key link R0 <--> R3 How to know where the predecessor is ? I.E. its link should be modified.

list_sort 1 void list_sort1(element list[], int n, int start) {
int i, last, current; element temp; last = -1; for (current=start; current!=-1; current=list[current].link){ list[current].linkb = last; last = current; } for (i=0; i<n-1; i++) { if (start!=i) { if (list[i].link+1) list[list[i].link].linkb= start; list[list[i].linkb].link= start; SWAP(list[start], list[i].temp); start = list[i].link;

Example of list_sort 1

list_sort 1 list_sort 1 require to concert the singly linked list into a doubly linked one. However, we can avoid this by introduce list_sort 2.

list_sort 2 /* list sort with only one link field */
void list_sort2 (element list[], int n, int start) { int i.next; element temp; for(i = 0; i < n-1; i++) while(start < i) start = list[start].link; next = list[start].link; /* save index of next largest key */ if (start != i){ SWAP(list[i],list[start],temp); list[i].link = start; } srart = next;

Example of list_sort 2

Table sort In previous section, we use link list to represent sort result. Now we use auxiliary one dimension table to represent it (i.e. Rt[i] is the record with the ith smallest key). How to physically rearrange the records according to this table?

table_sort void table_sort (element list[], int n, int table[]){
/* rearrange list[0],...,list[n-1] to correspond to the sequence list[table[0]],...,list[table[n-1]] */ int i,current,next; element temp; for (i = 0; i < n-1; i++) if(table[i] != i){ /* nontrivial cycle starting a i */ temp = list[i]; current = i;

table_sort do { next = table[current]; list[current] = list[next];
table[current] = current; current = next; } while (table[current] != i); list[current] = temp; }

table_sort 0  1  2  3  4  4  3  1  2  0  R0 50 R1 9 R2 11 R3
0      4      R0 50 R1 9 R2 11 R3 8 R4 3 Before sort Auxiliary table T Key After Data are not moved. (0,1,2,3,4) A permutation linked list sort (4,3,1,2,0) 第0名在第4個位置，第1名在第3個位置，…，第4名在第0個位置 (0,1,2,3,4)

Every permutation is made of disjoint cycles.
table_sort Every permutation is made of disjoint cycles. Example R0 R1 R2 R3 R4 R5 R6 R7 key table two nontrivial cycles trivial cycle R0 R2 R7 R5 R0 R1 R1 1 R3 R4 R6 R3

table_sort (1) R0 R2 R7 R5 R0 12 18 50 35 35 0 (2) i=1,2 t[i]=i
35 0 (2) i=1,2 t[i]=i (3) R3 R4 R6 R3 3 4 6 42 3

Summary of internal sorting
Of the several sorting methods we have studied no one method is best. Some methods are good for small n, others for large n. An insertion sort works well when the list is already partially ordered. It is also the best sorting method for small n. Merge sort has the best worst case behavior, but it requires more storage than a heap sort, and has slightly more overhead than quick sort. Quick sort has the best average behavior, but its worst case behavior is O(n2). The behavior of radix sort depends on the size of the keys and the choice of the radix.

Practical Considerations for Internal Sorting
Data movement slow down sorting process insertion sort and merge sort --> linked file perform a linked list sort + rearrange records

Complexity of Sort

Comparison n < 20: insertion sort 20  n < 45: quick sort
n  45: merge sort hybrid method: merge sort + quick sort merge sort + insertion sort

External Sort To sort large files (cannot be contained in the internal memory). Following overheads apply: Seek time Latency time Transmission time The most popular method for sorting on external storage devices is merge sort. phase 1：Segment the input file & sort the segments (runs) phase 2：Merge the runs

File: 4500 records, A1, …, A4500 internal memory: 750 records (3 blocks) block length: 250 records input disk vs. scratch pad (disk) (1) sort three blocks at a time and write them out onto scratch pad (2) three blocks: two input buffers & one output buffer 750 Records 750 Records 750 Records 750 Records 750 Records 750 Records 1500 Records 1500 Records 1500 Records 3000 Records Block Size = 250 4500 Records 2 2/3 passes

Time Complexity of External Sort
input/output time ts = maximum seek time tl = maximum latency time trw = time to read/write one block of 250 records tIO = ts + tl + trw cpu processing time tIS = time to internally sort 750 records ntm = time to merge n records from input buffers to the output buffer

Time Complexity of External Sort

Consider Parallelism Carry out the CPU operation and I/O operation in parallel 132 tIO = tm + 6 tIS Two disks: 132 tIO is reduced to 66 tIO

k-way Merging 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 A 4-way merge on 16 runs
A 4-way merge on 16 runs 2 passes (4-way) vs. 4 passes (2-way)

k-way Merging A k-way merge on m runs requires at most  logkm  passes over the data. To begin with, k-runs of size S1,S2,...,Sk can no longer be merged internally in time. The most direct way to merge k-runs would be to make k-1 comparisons to determine the next record to output. Hence, the total number of key comparisons being made is n(k-1)logkm. We can achieve a significant reduction in the number of comparisons by using a loser tree with k leaves (Chapter 5).

Optimal Merging Of Runs
15 5 2 4 5 15 2 4 external path length: 2*3+4*3+5*2+15*1=43 2*2+4*2+5*2+15*2=52

Optimal Merging Of Runs
When we merge by using the first merge tree, we merge some records only once, while others may be merged up to three times. In the second merge tree, we merge each record exactly twice. We can construct Huffman tree to solve this problem.

Construction of a Huffman tree
Runs of length : 2,3,5,7,9,13 5 2 3

Runs of length : 2,3,5,7,9,13 10 5 5 2 3

Runs of length : 2,3,5,7,9,13 16 7 9 10 5 5 2 3

Runs of length : 2,3,5,7,9,13 16 23 7 9 10 13 5 5 2 3

Runs of length : 2,3,5,7,9,13 39 16 23 7 9 10 13 5 5 2 3

Instructors: C. Y. Tang and J. S. Roger Jang

Similar presentations

Presentation on theme: "Instructors: C. Y. Tang and J. S. Roger Jang"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Instructors: C. Y. Tang and J. S. Roger Jang

Similar presentations

Presentation on theme: "Instructors: C. Y. Tang and J. S. Roger Jang"— Presentation transcript:

Similar presentations

About project

Feedback