Page 1 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Chapter 9
Page 2 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Searching Assume that we have an array of integers: And we wished to find a particular element in the array (e.g., 10) #include #include void main() { int iarray[10] = {7,2,6,9,4,3,8,10,1,5}, index, search = 10; for (index = 0; index < 10 && iarray[index] != search; for (index = 0; index < 10 && iarray[index] != search; index++); index++); if (index == 10) if (index == 10) printf("The Integer is NOT on the list\n"); printf("The Integer is NOT on the list\n"); else else printf("The Integer %d was found in position %d\n", printf("The Integer %d was found in position %d\n", iarray[index], index); iarray[index], index);}
Page 3 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting for Following the program during the for loop: for (index = 0; index < 10 && iarray[index] != search; index++); search Variable values (search set to 10) indexiarray[index] Condition Check: index < 10 && iarray[index] != search; 07 TRUE FALSE Exit Loop printf("The Integer %d was found in position %d\n", iarray[index], index); The Integer 10 was found in position 7
Page 4 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting sequential search: Since the list of integers is not in any order, we must perform a sequential search: Each element in the list be checked until: The element is found The end of the list is reached adequate The procedure is adequate if each element is to be considered (e.g., in a transaction listing) inadequate The procedure is inadequate if specific elements are sought In a sequential search: MAXIMUMn + 1The MAXIMUM number of searches required is: n + 1 ( where n = the number of elements on the list) AVERAGEn + 1)/2The AVERAGE number of searches required is: (n + 1)/2
Page 5 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting The number of searches required is dependent upon the number of elements in the list: Number elements Maximum Searches (n + 1) Average Searches (n + 1)/ ,0001, ,00010,0015, ,000100,00150, ,000,0001,000,001500, ,000,00010,000,001 5,000, ,000,000100,000,00150,000, ,000,000,000 1,000,000,001500,000,000.5
Page 6 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting IF the list were sorted Binary Search We could perform a Binary Search on it: 1.Determine the bottom and top of the list STOP 2. If the bottom offset > top offset: STOP: The number is NOT in the list 3.Find the midpoint = (bottom + top)/2 STOP 4.If the element at the midpoint is the Search number: STOP: The number has been found 5.If the element is greater than the search number: top = midpoint - 1 Else (the element is less than the search number): bottom = midpoint + 1 Go to step 2
Page 7 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting 6 Let’s consider the procedure, step by step (assume we are trying to find the integer 6 on the list) 1. Determine the bottom and top of the list Is the bottom offset > top offset ?? offsets: bottom = 09 = top No 3. Find the midpoint = (bottom + top)/2 = (0 + 9)/2 = 4
Page 8 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Offset: Element at midpoint the search element ?? No 5. Element greater than the search number?? No bottom = midpoint + 1= = 5 The new search list is: top (unchanged)
Page 9 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting 2. Is the bottom offset > top offset ?? Offset: bottom = 59 = top No 3. Find the midpoint = (bottom + top)/2 = (5 + 9)/2 = Offset:
Page 10 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Offset: Element at midpoint the search element ?? No 5. Element greater than the search number?? Yes top = midpoint - 1 = = 6 The new search list is: bottom (unchanged)
Page 11 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting 2. Is the bottom offset > top offset ?? Offset: bottom = 56 = top No 3. Find the midpoint = (bottom + top)/2 = (5 + 6)/2 = 5 Offset:
Page 12 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting 2. Is the bottom offset > top offset ?? Offset: bottom = 5= top No 3. Find the midpoint = (bottom + top)/2 = (5 + 5)/2 = 5 Offset:
Page 13 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Offset: Element at midpoint the search element ?? STOP Yes: STOP The search number was found This does NOT seem like a savings over a sequential search. In fact, it seems like much more work. In this case (because the list is short (and because we intentionally chose the worst case scenario), probably not.
Page 14 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting No. Elements Ave. Sequential Searches , ,0005, ,00050, ,000,000500, ,000,0005,000, ,000,00050,000, ,000,000,000500,000,000.5 binary search For a binary search: MAXIMUMlog 2 nThe MAXIMUM number of searches required is: log 2 n (where n = the number of elements on the list) AVERAGE(log 2 n) - 1The AVERAGE number of searches required is: (log 2 n) - 1 (for n > 30) Max. Binary Searches Ave. Binary Searches
Page 15 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Is a binary search always preferred to a sequential search? NO. It depends: If all elements are to be examined, a sequential search is preferred A binary search: Is programatically more complex requires more comparisons As a general rule of thumb, a binary search is preferred if the list contains more than elements How does a binary search work if an element is NOT on the list?? Consider the array: Suppose we were to search the list for the value 9 (Which is NOT on the list)
Page 16 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Search #1: bottomtopmidpoint Search #2: bottomtopmidpoint bottomtop midpoint Search #3: Search #4: bottomtop Since the bottom offset is > top offset STOP
Page 17 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting What would the C code for a binary search look like?? #include void main() { int iarray[10] = {1,2,6,10,12,14,15,21,22,29}, search, bottom = 0, top = 9, found = 0, midpt = 9/2; char temp[10]; printf("\nEnter the number to search for: "); search = atoi(gets(temp)); while ((top > bottom) && (found == 0)) if (iarray[midpt] == search) found = 1; else { if (search > iarray[midpt]) bottom = midpt + 1; else top = midpt - 1; midpt = (bottom + top)/2; } if (found == 0) printf("The Integer is NOT on the list\n"); else printf("The Integer %d was found in position %d\n", iarray[midpt], midpt); }
Page 18 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Why? Displaying in order Faster Searching Categories InternalInternal –List elements manipulated in RAM –Faster –Limited by amount of RAM ExternalExternal –External (secondary) storage areas used –Slower –Used for Larger Lists –Limited by secondary storage (Disk Space) Sorting
Page 19 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting ExchangeExchange (e.g., bubble sort) –Single list –Incorrectly ordered pairs swapped as found Selection –Two lists (generally); Selection with exchange uses one list –Largest/Smallest selected in each pass and moved into position Insertion –One or two lists (two more common) –Each item from original list inserted into the correct position in the new list Basic Internal Sort Types
Page 20 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Exchange Sorts Bubble Sort 1: Bubble Sort 1: The largest element ‘bubbles’ up Given: if the element is smaller, reset the bottom pointer (not here) Point to bottom elementPoint to bottom element Compare with element above Compare with element above: if the element is greater, swap positions (in this case, swap) Continue the process until the largest element is at the end This will require n-1 comparisons (9 for our example) (where n = the length of the unsorted list) At the end of the pass:At the end of the pass: The largest number is in the last position The length of the unsorted list has been shortened by 1
Page 21 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting How does this work?? Comparison:Pass #1: Swap Swap Swap Swap Swap Don’t Swap
Page 22 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Continuing Comparison:Pass #1: Don’t Swap Swap The new list appears as Note: 9 (n - 1) comparisons were required We know that the largest element is at the end of the list
Page 23 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Continuing Comparison: Pass #2: 1 (10) Don’t’ Swap Swap 2 (11) 3 (12) Swap 4 (13) Swap 5 (14) Don’t Swap 6 (15) Don’t Swap (16) Don’t Swap 8 (17) Swap
Page 24 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Continuing Comparison: Pass #3: 1 (18) Swap Don’t Swap 2 (19) 3 (20) Don’t Swap 4 (21) Don’t Swap 5 (22) Don’t Swap 6 (23) Don’t Swap 7 (24) Swap
Page 25 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Continuing Comparison: Pass #4: 1 (25) Don’t Swap 2 (26) 3 (27) Don’t Swap 4 (28) Don’t Swap 5 (29) Don’t Swap 6 (30) Don’t Swap
Page 26 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Continuing Comparison: Pass #5: 1 (31) Don’t Swap 2 (32) 3 (33) Don’t Swap 4 (34) Don’t Swap 5 (35) Swap And the new list Is in order, so we can stop. Right ???
Page 27 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting NO. WORST In the WORST case scenario A bubble sort would yield: Pass After Pass OrderComparisons (numbers in reverse order): Maximum Comparisons necessary
Page 28 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting n n-1 If we want to be sure, given an array of n dimensions, we need a maximum of n-1 passes to sort the array, and a total of: (n-1)+(n-2)+...+1](n 2 - n)/2 (n-1)+(n-2)+...+1] or (n 2 - n)/2 comparisons. What does this imply ??? No. ItemsMax. Passes: (n - 1) Max. Compares: (n 2 - n)/ ,950 1, ,500 10,0009,99949,995, ,000 99,9994,999,950,000 1,000,000999,999499,999,500,000 The C code necessary?
Page 29 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting include void main() { int pass=0, compare=0, swaps=0, top=9, i, j, temp, iarray[10]={7,2,6,1,3,4,8,10,9,5}; while (top > 0)// check end { pass++;// increment ctr for (i = 0; i < top; i++)// begin pass { compare++;// increment ctr if (iarray[i] > iarray[i+1])// ?? out of order { swaps++; // increment ctr temp = iarray[i];// temp. storage iarray[i] = iarray[i+1];// swap iarray[i+1] = temp; } printf("%3d %3d %3d: ", pass,compare,swaps); for (j = 0; j < 10; j++) printf("%3d",iarray[j]); // print element printf("\n"); } top--; } }
Page 30 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting 1 1 1: : : : : : : : : : : : : : : : : : : : : : : The Output (modified slightly) would appear as: : : : : : : : : : : : : : : : : : : : : : : Pass Comparison Swap Order
Page 31 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Since the list IS sorted after 5 passes (35 comparisons), why can’t we stop?? IF We could, IF we knew the list was sorted: without If we make a pass without swapping any elements, we know the list is sorted (one extra pass is needed) flag before We need a flag which we set to 0 (zero) before each pass: flag If we make any swaps in the pass, we set the flag to 1 flag = 0 If we exit the loop, and the flag = 0, the list is sorted For our example, we could stop after Pass 6 (39 comparisons) How would the C code appear?
Page 32 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting include void main() { int pass=0, compare=0, swaps=0, top=9, i, j, temp, sorted = 1, iarray[10]={7,2,6,1,3,4,8,10,9,5}; while ((top > 0) && (sorted == 1))// check end AND if NOT sorted { pass++;// increment ctr sorted = 0;// reset our flag for (i = 0; i < top; i++)// begin pass { compare++;// increment ctr if (iarray[i] > iarray[i+1])// ?? out of order { swaps++; // increment ctr sorted = 1;// set the flag temp = iarray[i];// temp. storage iarray[i] = iarray[i+1];// swap iarray[i+1] = temp; } printf("%3d %3d %3d: ", pass,compare,swaps); for (j = 0; j < 10; j++) printf("%3d",iarray[j]); // print element printf("\n"); } top--; } }
Page 33 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Could we refine the bubble sort?? AND We could ‘bubble - up’ in one pass (as we did before) AND ‘bubble-down’ in the next pass. Consider our list after our first pass (9th comparison): top Starting at the top of the list, we now ‘bubble-down’ the smallest element (‘1’ will end up at the bottom of the list): Comparison 1 (10) Pass # Swap 2 (11) Swap
Page 34 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Continuing: Comparison: Pass #2: 3 (12) Swap Don’t Swap 4 (13) 5 (14) Don’t Swap 6 (15) Don’t Swap 7 (16) Swap 8 (17) Swap
Page 35 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Continuing: Comparison:Pass #3: 1 (18) Don’t Swap Swap 2 (19) 3 (20) Swap 4 (21) Swap 5 (22) Swap 6 (23) Don’t Swap (24) Don’t Swap
Page 36 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Since the List is in order, Can we Stop?? Comparison:Pass #4: 1 (25) Don’t Swap 2 (26) 3 (27) Don’t Swap 4 (28) Don’t Swap 5 (29) Don’t Swap 6 (30) Don’t Swap NO: Remember, we need one pass WITHOUT a swap
Page 37 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting include void swap(int *swaparray, int a, int b); int sorted = 1; void main() { int bottom = 0, top=9, i, iarray[10]={7,2,6,1,3,4,8,10,9,5}; while ((top > bottom) && (sorted == 1))// check end AND if NOT sorted { sorted = 0;// reset our flag for (i = bottom; i < top; i++)// begin bubble-up pass if (iarray[i] > iarray[i+1])// ?? out of order swap(iarray, i, i+1);// Swap the elements top--; if ((top > bottom) && (sorted == 1))// check end AND if NOT sorted { sorted = 0; // reset our flag for (i = top; i > bottom; i--)// begin bubble-down pass if (iarray[i] < iarray[i-1])// ?? out of order swap(iarray, i, i-1);// Swap the elements bottom++; } } } void swap(int *swaparray, int a, int b) { int temp; sorted = 1;// set the flag temp = swaparray[a];// temp. storage swaparray[a] = swaparray[b];// swap swaparray[b] = temp; }
Page 38 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Are there better sorting methods? YES: Generally speaking, bubble sorts are very slow Quicksort The Quicksort Method: Generally the fastest internal sorting method intended for longer lists How does a quicksort work? As we have seen, the shorter the list, the faster the sort QuicksortQuicksort recursively partitions the list into smaller sublists, gradually moving the elements into their correct position
Page 39 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting pivot Step 1: Choose a pivot element from list Optimal Pivot: Median element One alternative: Median of list The pivot element will divide the list in half Step 2: Partition The List move numbers larger than pivot to right, smaller numbers to left compare leftmost with rightmost until a swap is needed Elements out of order: Swap needed Elements in Order: No Swap Elements out of order: Swap needed Swap Elements
Page 40 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Continue with remaining elements: No Swap Swap No Swap Swap Swap The Left and right partitions are partially sorted: New List: Smaller Elements Larger Elements
Page 41 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Put the LEFT Partition in Order (even though already sorted): Step 1: Select Pivot: 1324 Array Offset: Midpoint = (bottom+ top)/2 = (0 + 3)/ 2 = 1 Repeat Step 2 with the partitioned list: No Swap 1324 Swap We made 1 swap. Our new partitioned list appears as: 1234 Smaller ElementsLarger Elements
Page 42 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting OK – So the list is in order. We can stop, Right??? Not really. The only way to be sure that the complete list is in order is to keep breaking the list down until there no swaps are made or there is only one element on each sublist. 12 Looking at the left sublist: All we know is that the elements on it are smaller than the elements on the right sub-list. The order could have been: 21 Assume that it was the sublist above. We have to continue making sublists: 21 The list midpoint is (0 + 1)/2 = 0 Swap NOW we are done since each sublist contains only one element
Page 43 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Now put the RIGHT Partition in Order: Step 1: Select Pivot: Array Offset: Midpoint = (bottom + top)/2 = (4 + 9)/ 2 = 6 Repeat Step 2 with the partitioned list: Swap No SwapSwap New Partitioned List: Smaller ElementsLarger Elements
Page 44 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Put the new LEFT Partition in Order (already sorted): Step 1: Select Pivot: Array Offset: Midpoint = (bottom+ top)/2 = (4 + 6)/ 2 = 5 Repeat Step 2 with the partitioned list: No Swap Since no swaps were made, the partition is in order
Page 45 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Once again, put the new RIGHT Partition in Order: Step 1: Select Pivot: Array Offset: Midpoint = (bottom+ top)/2 = (7 + 9)/ 2 = 8 Repeat Step 2 with the partitioned list: SwapNo Swap Note that since the (new) left partition contains only 1 (one) element, it MUST be in order Step 1: Find new right pivot: Offset: 8 9 Pivot = (8 + 9)/2 = 8 Step 2: Check Order:Swap And the new right list: 910 Is Sorted (as is the whole list)
Page 46 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting This seems very complicated. Is it worth it?? Maybe: For our list, we needed 22 comparisons and 7 swaps (vs. 30 comparisons and 15 swaps for our 2-ways sort with checks). quicksort The WORST case scenario for a quicksort is: log 2 n! quicksortBubble Sort Comparing a quicksort with a Bubble Sort: ElementsMax. Bubble Sort Max Quicksort , , ,5009,965 10,000 49,995,000132,877 What About the C Code necessary ?? RECURSION It’s pretty simple, but it involves a new procedure: RECURSION Recusion is when a function calls itself.
Page 47 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting #include int quicksort(int a[], int first, int last); void swap(int *a, int *b); void main() { int iarray[10] = {7,2,6,9,4,3,8,10,1,5}; quicksort(iarray,0,9); } int quicksort(int list[], int first, int last) { int lower = first, upper = last, bound = list[(first + last)/2]; while (lower <= upper) { while (list[lower] < bound) lower++; while (bound < list[upper]) upper--; } if (lower < upper) swap(&list[lower++],&list[upper--]); } else lower++; if (first < upper) quicksort(list,first,upper); if (upper + 1 < last) quicksort(list,upper+1,last); } void swap(int *a, int *b) { int i, temp; temp = *a; *a = *b; *b = temp; }
Page 48 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting Why is sorting important?? This illustrates a major trade-off in programming: Finding elements in a list is much quicker if the list is sorted (as we have seen, a binary search is exponentially faster than a sequential search) Sorting is a difficult and time-consuming task (as is maintaining a sorted list)
Page 49 Data Structures in C for Non-Computer Science Majors Kirs and Pflughoeft Searching and Sorting