Intro To Algorithms Searching and Sorting
Searching A common task for a computer is to find a block of data A common task for a computer is to find a block of data We’re going to look at the two most common and basic of the searching algorithms: We’re going to look at the two most common and basic of the searching algorithms: –Linear Search –Binary Search
Linear Search Linear search is performed in the order in which the data appears: Linear search is performed in the order in which the data appears: Ex. Below is an animation of a linear search for the value 7: Ex. Below is an animation of a linear search for the value 7: data index Found at index 4, search stopped
Linear Search Algorithm Here is the pseudo code: Here is the pseudo code: For each index in the array, compare the object/data to the item you are looking for. For each index in the array, compare the object/data to the item you are looking for. –If the item is found, return –Else, go to the next index
Issues Linear search is great if the data you want is at the front of the array Linear search is great if the data you want is at the front of the array It’s fairly robust in that it works on any data in any order It’s fairly robust in that it works on any data in any order If the data you’re trying to find is in the last position, you have to look at every other one first If the data you’re trying to find is in the last position, you have to look at every other one first What do you do if the data is not in the array? What do you do if the data is not in the array? –Usually you return a null object or the index -1
Linear Searching the Dictionary Suppose you’re going to find Zebra in the dictionary using linear search Suppose you’re going to find Zebra in the dictionary using linear search You have to look at each word, one at a time until you arrive at Zebra You have to look at each word, one at a time until you arrive at Zebra In real life you wouldn’t do this (I hope) In real life you wouldn’t do this (I hope) How do you search the dictionary in real life? How do you search the dictionary in real life?
Binary Search Algorithm Much like the way you search a dictionary in real life, we can improve our linear searching algorithm for sorted data Much like the way you search a dictionary in real life, we can improve our linear searching algorithm for sorted data Open the dictionary in the middle, check which word you’re looking at, (probably something like M). Open the dictionary in the middle, check which word you’re looking at, (probably something like M). Zebra is after so throw away the first half of the dictionary (no need to look there) Zebra is after so throw away the first half of the dictionary (no need to look there) Now with the remaining half, look in the middle… Now with the remaining half, look in the middle… Repeat until you find the word you’re looking for Repeat until you find the word you’re looking for
Binary Search Example Suppose you have the values 1 to 100 in ascending order Suppose you have the values 1 to 100 in ascending order If you were looking for the number 71, what numbers would you encounter along the way? If you were looking for the number 71, what numbers would you encounter along the way? –Interval (1,100) half ( )/2 = 50 –50 is not the target, the target is larger –Interval (51,100) half ( )/2 = 75 –75 is not the target, the target is smaller –Interval (51,74) half ( )/2 = 62 –62 is not the target, the target is larger –Interval (63,74) half ( )/2 = 68 –68 is not the target, the target is larger –Interval (69,74) half ( )/2 = 71 –71 is the target, stop In this case, it took us 5 comparisons vs. 71 with linear search In this case, it took us 5 comparisons vs. 71 with linear search
Binary Search Algorithm Here is the pseudo code for the algorithm: Here is the pseudo code for the algorithm: Calculate the size of the array (number of elements) Calculate the size of the array (number of elements) mid = size/2, first = 0, last = size – 1 mid = size/2, first = 0, last = size – 1 While first is less than last While first is less than last –If target is in index mid Stop, return object (or index mid) –Otherwise if the target is greater than the object at index mid start = mid + 1// shift the interval over –Otherwise (target is less than the object at mid) end = mid – 1 –mid = (start + end)/2//update the mid At this point the only way out of the loop is if the target was not found, return a sentinel value At this point the only way out of the loop is if the target was not found, return a sentinel value
Sorting How do you sort a collection of values? How do you sort a collection of values? This is another common task for a computer This is another common task for a computer There are many, many algorithms that for sorting data, here are some of the most popular: There are many, many algorithms that for sorting data, here are some of the most popular: –Selection Sort –Insertion Sort –Heap Sort –Merge Sort –Quick Sort –Radix Sort –Bubble Sort –… We’re focusing on the first two, Selection and Insertion Sort We’re focusing on the first two, Selection and Insertion Sort
Selection Sort Like the name implies, the idea is to ‘select’ the smallest value in the collection and move it to the front. Like the name implies, the idea is to ‘select’ the smallest value in the collection and move it to the front. The pseudo code is: The pseudo code is: –Start at index n = 0 –While n is less than the last index Find the smallest value in the collection Swap it with the value at index n n = n + 1 //go to the next value Swapping values (one, two): Swapping values (one, two): –copy = one –one = two –two = copy
Selection Sort Sample Here is a sample run of the selection sort algorithm: Find the smallest item Swap it with the item at the front
Insertion Sort Insertion sort is another simple but inefficient algorithm Insertion sort is another simple but inefficient algorithm The implementation is a bit tougher since we’ll have to insert elements rather than swap them The implementation is a bit tougher since we’ll have to insert elements rather than swap them Here’s the idea: Here’s the idea: –Start with a sorted list –Take in a new value by inserting it to the proper location
Implementing Insertion Sort How do you start with a sorted list when you’re trying to sort the list? How do you start with a sorted list when you’re trying to sort the list? Start with a list with one element Start with a list with one element For the sample run I’ll put the sorted list in {} style brackets and the remainder of the items will follow For the sample run I’ll put the sorted list in {} style brackets and the remainder of the items will follow
Sample Run Sorting Sorting –5,7,3,4,2,9,8,1,6 –{5},7,3,4,2,9,8,1,6 insert 7 to the list –{5,7},3,4,2,9,8,1,6 insert 3 to the list –{3,5,7},4,2,9,8,1,6 insert 4 to the list –{3,4,5,7},2,9,8,1,6 insert 2 to the list –{2,3,4,5,7},9,8,1,6 insert 9 to the list –{2,3,4,5,7,9},8,1,6 insert 8 to the list –{2,3,4,5,7,8,9},1,6 insert 1 to the list –{1,2,3,4,5,7,8,9},6 insert 6 to the list –{1,2,3,4,5,6,7,8,9} the list is sorted
Insertion Sort Algorithm Here is the pseudo code: Here is the pseudo code: –index = 1 –While index < size of collection insert = Array[index] //object to insert current = index – 1 //end of sorted list While current > 0 and insert 0 and insert < Array[current] –Array[current] = Array[current+1] –current = current – 1 Array[current+1] = insert index = index + 1
Big-O notation How do we compare algorithms? How do we compare algorithms? There are usually two considerations There are usually two considerations –Space requirements (memory) –Time requirements (speed of execution) We’re going to take a short introduction to how computer scientists compare the speed of algorithms. We’re going to take a short introduction to how computer scientists compare the speed of algorithms.
Classes of Functions Which function is larger? Which function is larger? We say they are all related, that they “behave” the same way, since the most significant part of them is x 2 We say they are all related, that they “behave” the same way, since the most significant part of them is x 2 As computer scientists, we’d say they are: As computer scientists, we’d say they are:
Complexity of Searching Linear search: Linear search: –Since the worst case is that we have to search through each element in a collection, linear search is O(n) –This means it requires approximately the same number of executions in the CPU as there are values in the collection Binary Search: Binary Search: –Since binary search is a divide and conquer algorithm, it executes much faster. –It’s easiest to find the relationship by trying some examples –If there were 8 values, then if worst came to worst: 8 4 2 1 the problem is reduced to a singleton in 3 steps 32 16 8 4 2 1 reduced in 5 steps 256 128 64 32 16 8 4 2 1 reduced in 8 steps –2 3 = 8, 2 5 = 32, 2 8 = 256 –The relationship is O(log 2 n)
Complexity of Sorting The best sorting algorithms execute O(nlogn) The best sorting algorithms execute O(nlogn) So far all of the sorting algorithms we’ve seen execute O(n 2 ) So far all of the sorting algorithms we’ve seen execute O(n 2 ) How do we do better? How do we do better? We’ll only learn one sorting algorithm that is O(nlogn), but there are many others We’ll only learn one sorting algorithm that is O(nlogn), but there are many others
Mergesort Here’s the idea: Here’s the idea: Take two sorted lists and merge them together. Take two sorted lists and merge them together. –Ex. { 1,3,5,7 } { 2, 4, 6, 8 } Becomes: {1,2,3,4,5,6,7,8} This turns out to be MUCH faster in practice than both Insertion sort and Selection sort. The next slides illustrate how to sort values with any initial order so they can be merged together
Mergesort: Illustration The initial list of values:
Mergesort: Illustration Split the values in two: Divide and conquer
Mergesort: Illustration Continue splitting the lists up until they are sorted… Base case for sorted lists is two lists with a single element each
Mergesort: Illustration First pair of sorted lists:
Mergesort: Illustration Compare the elements to merge them:
Mergesort: Illustration Move the winner to the front of the next list
Mergesort: Illustration There is nothing to compare 85 with, so it wins by default
Mergesort: Illustration Move 85 ti the next spot in the list
Mergesort: Illustration The list has been “merged”, the values are sorted
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Mergesort: Illustration
Web Animations Sorting out Sorting (Circa 1970!) from University of Toronto Sorting out Sorting (Circa 1970!) from University of Toronto –
Programming Exercises Create a program for each of the algorithms in the slides Create a program for each of the algorithms in the slides –Linear Search –Binary Search –Selection Sort –Insertion Sort –Merge Sort (you will write the merge method, I will provide the rest of the code) see slides below
Programming Exercises Your program will do something like the following. Your program will do something like the following. How many values would you like to test? 5 How many values would you like to test? 5 Here are 5 random values: {9,3,5,1,4} Here are 5 random values: {9,3,5,1,4} Using Insertion Sort it took seconds and 29 comparisons were made Using Insertion Sort it took seconds and 29 comparisons were made Using Selection Sort it took seconds and 41 comparisons were made Using Selection Sort it took seconds and 41 comparisons were made Using Merge Sort it took seconds and 17 comparisons were made Using Merge Sort it took seconds and 17 comparisons were made Something similar will be done to compare linear and binary searching Something similar will be done to compare linear and binary searching
Tips The System class has a static method: The System class has a static method: –long currentTimeMillis() –Get the time before you run your algorithm and immediately after in order to see how long it took.
Implementing Mergesort Mergesort can be done iteratively (using loops) but in practice, its usually done with recursion as the code is much easier to implement Mergesort can be done iteratively (using loops) but in practice, its usually done with recursion as the code is much easier to implement Base case Base case –List of length 1 is sorted Recursive case Recursive case –Split lists in half –Merge them back together once sorted
Merge As discussed previously, inserting values into an array requires extra overhead As discussed previously, inserting values into an array requires extra overhead An efficient way to merge values in an array is to use a temporary copy An efficient way to merge values in an array is to use a temporary copy For example For example –without extra array (have to shuffle values over): {1,3,5}{2,4,6} {1,3,5,5}{2,4,6} {1,3,3,5}{2,4,6}{1,2,3,5}{4,6} {1,2,3,5}{4,6} {1,2,3,5,5}{4,6} {1,2,3,4,5}{4,6} {1,2,3,4,5}{6} {1,2,3,4,5,6} –with extra array: {1,3,5}{2,4,6}Check which value at the front of the array is smallest {1, _, _, _, _, _}Copy it to the next value in the extra array {1, 2, _, _, _, _} {1, 2, 3, _, _, _} {1, 2, 3, 4, _, _} {1, 2, 3, 4, 5, _} {1, 2, 3, 4, 5, 6}
Merge When sorting data, it is only one array that we want to sort, not taking many pieces and merging them together. When sorting data, it is only one array that we want to sort, not taking many pieces and merging them together. We can merge values in the same array together by defining boundaries for where the first and second array start and end. We can merge values in the same array together by defining boundaries for where the first and second array start and end. { 4,7,8,1,2,3,0,9,5 } { 4,7,8,1,2,3,0,9,5 } One sorted array is shown in red, the other in green, merging them within the same array we would get: {1,2,3,4,7,8,0,9,5 } One sorted array is shown in red, the other in green, merging them within the same array we would get: {1,2,3,4,7,8,0,9,5 }
Merge Your method will behave as follows: Your method will behave as follows: merge( array, start, mid, end ) merge( array, start, mid, end ) merge( {1,5,7,2,3,0,9}, 0, 3, 4 } ) merge( {1,5,7,2,3,0,9}, 0, 3, 4 } ) merges the values starting at index 0 to 2 with the values at index 3 to 4, all values beyond index 4 are ignored merges the values starting at index 0 to 2 with the values at index 3 to 4, all values beyond index 4 are ignored The result in the array is now {1,2,3,5,7,0,9} The result in the array is now {1,2,3,5,7,0,9} Some additional clarification may be needed at this point in class Some additional clarification may be needed at this point in class You should create a copy array to merge the values as it is more efficient than inserting in place You should create a copy array to merge the values as it is more efficient than inserting in place
Mergesort code private void mergesortHelper(Compareable[] a, int lo, int hi) { if (hi - lo <= 1) // base case return; // sort each half, recursively int mid = (lo + hi) / 2; mergesortHelper(a, lo, mid); mergesortHelper(a, mid, hi); // merge back together merge(a, lo, mid, hi); } public void mergesort(Compareable[] a) { int n = a.length; mergesort(a, 0, n); }
Algorithms Pt II (ICTP12) Bubble Sort Bubble Sort Quicksort Quicksort
Bubble Sort BubbleSort is another O(n^2) sorting algorithm that makes an improvement over selection sort by shuffling the values as it finds them. BubbleSort is another O(n^2) sorting algorithm that makes an improvement over selection sort by shuffling the values as it finds them. {4,7,3,5,8,6,2,1} {4,7,3,5,8,6,2,1} –4 vs. 7, 7 is larger so keep track of 7 –7 vs. 3, 3 is smaller so swap them {4,3,7,5,8,6,2,1} {4,3,7,5,8,6,2,1} –5 vs 7, 5 is smaller so swap them {4,3,5,7,8,6,2,1} {4,3,5,7,8,6,2,1} –7 vs. 8, 8 is larger so keep track of 8 {4,3,5,7,8,6,2,1} {4,3,5,7,8,6,2,1} –8 vs. 6, 6 is smaller so swap them {4,3,5,7,6,8,2,1} {4,3,5,7,6,8,2,1} –8 vs. 2, 2 is smaller so swap them {4,3,5,7,6,2,8,1} {4,3,5,7,6,2,8,1} –8 vs. 1, 1 is smaller so swap them {4,3,5,7,6,2,1,8} {4,3,5,7,6,2,1,8} This is similar in that we have selected 8 to be put in the largest position, but along the way we moved all of the smaller values closer to their proper position where as Selection Sort would not do any additional moves. This is similar in that we have selected 8 to be put in the largest position, but along the way we moved all of the smaller values closer to their proper position where as Selection Sort would not do any additional moves.
Quicksort Quicksort is a very fast sorting algorithm, hence the name. Caution is needed though as its worst case scenario is O(n^2) while it typically performs O(nlgn) Quicksort is a very fast sorting algorithm, hence the name. Caution is needed though as its worst case scenario is O(n^2) while it typically performs O(nlgn) Here is the idea: Here is the idea: –Choose a value, call this the pivot –Process the array so that at the end of your method you will have put everyone smaller than the pivot on its left and everyone larger on its right. –This process of placing the pivot is called partitioning the array –Recursively call quicksort on the left and right halves.
Quicksort Here’s the basic idea: Here’s the basic idea: {6,8,1,7,3,5,2,4} {6,8,1,7,3,5,2,4} Let’s say we choose 6 as the pivot, we’d like (something like) this when we are done: Let’s say we choose 6 as the pivot, we’d like (something like) this when we are done: –{1,3,5,2,4} 6 {8,7} Now we quicksort {1,3,5,2,4} and {8,7} Now we quicksort {1,3,5,2,4} and {8,7} The actual process of partitioning is a little trickier (to do well)… The actual process of partitioning is a little trickier (to do well)…
Pivoting… efficiently Here is one way you can pivot an array without (much) extra memory (in place) Here is one way you can pivot an array without (much) extra memory (in place) –Choose a pivot, in this case I’ll take the first item in the array. –Maintain three sections of the array Less than the pivot Greater than the pivot Unexplored –When you are finished processing the array, place the pivot between the sections.
Pivot Walk Through array = {6,8,1,7,3,5,2,4} array = {6,8,1,7,3,5,2,4} Variables: Variables: –pivot = 6, low = 0, high= 7 –Low and high represent the region which is unexplored –All values less than low are smaller than the pivot –All values greater than high are larger than the pivot –When low == high, you are finished and now can place the pivot
Pivot Walk Through Let’s take out the pivot and put it in a temporary variable Let’s take out the pivot and put it in a temporary variable Now we have a home to move the first item smaller than the pivot to (left side) Now we have a home to move the first item smaller than the pivot to (left side) That means I’ll start looking on the larger side That means I’ll start looking on the larger side –If the item really is bigger, great, I can leave it alone (proper place) –If it is smaller though, I can swap it to the free space Initially, Initially, –low = 0, high = 7, target = low target high
Pivot Walk Through array[target] = 4 array[target] = 4 This is smaller than 6 so I will swap it to the left/smaller side This is smaller than 6 so I will swap it to the left/smaller side –low = low + 1 –low = 1, high = 7 Now my empty space is on the right side, so I will be ready for the first time I find a larger value. Now my empty space is on the right side, so I will be ready for the first time I find a larger value. That means I want continue working on the smaller side That means I want continue working on the smaller side –target = low = 1 –If the value is still smaller, leave it alone (proper place) –If it is larger, then I have a place to swap it to low target high
Pivot Walk Through array[target] = 8 array[target] = 8 This is larger than 6 so I will swap it to the right/larger side This is larger than 6 so I will swap it to the right/larger side –high = high – 1 –low = 1, high = 6 Now my empty space is on the left side, so I will be ready for the next time I find a smaller value. Now my empty space is on the left side, so I will be ready for the next time I find a smaller value. That means I want continue working on the larger side That means I want continue working on the larger side –target = high = 6 –If the value is still larger, leave it alone (proper place) –If it is smaller, then I have a place to swap it to low target high
Pivot Walk Through array[target] = 2 array[target] = 2 This is smaller than 6 so I will swap it to the left/smaller side This is smaller than 6 so I will swap it to the left/smaller side –low = low + 1 –low = 2, high = 6 Now my empty space is on the right side, so I will be ready for the next time I find a larger value. Now my empty space is on the right side, so I will be ready for the next time I find a larger value. That means I want continue working on the smaller side That means I want continue working on the smaller side –target = low = 2 –If the value is still smaller, leave it alone (proper place) –If it is larger, then I have a place to swap it to low target high
Pivot Walk Through array[target] = 1 array[target] = 1 This is smaller than 6 so I can leave it alone (one more item resolved on the left side) This is smaller than 6 so I can leave it alone (one more item resolved on the left side) –low = low + 1 –low = 3, high = 6 My empty space is still on the right side, so I will be ready for the next time I find a larger value. My empty space is still on the right side, so I will be ready for the next time I find a larger value. That means I want continue working on the smaller side That means I want continue working on the smaller side –target = low = 3 –If the value is still smaller, leave it alone (proper place) –If it is larger, then I have a place to swap it to low target high
Pivot Walk Through array[target] = 7 array[target] = 7 This is larger than 6 so I can swap it to the larger side This is larger than 6 so I can swap it to the larger side –high = high – 1 –low = 3, high = 5 Now my empty space is on the left side, so I will be ready for the next time I find a smaller value. Now my empty space is on the left side, so I will be ready for the next time I find a smaller value. That means I want continue working on the larger side That means I want continue working on the larger side –target = high = 5 –If the value is still larger, leave it alone (proper place) –If it is smaller, then I have a place to swap it to low target high
Pivot Walk Through array[target] = 5 array[target] = 5 This is smaller than 6 so I can swap it to the smaller side This is smaller than 6 so I can swap it to the smaller side –low = low + 1 –low = 4, high = 5 Now my empty space is on the right side, so I will be ready for the next time I find a larger value. Now my empty space is on the right side, so I will be ready for the next time I find a larger value. That means I want continue working on the smaller side That means I want continue working on the smaller side –target = low = 4 –If the value is still smaller, leave it alone (proper place) –If it is larger, then I have a place to swap it to low target high
Pivot Walk Through array[target] = 3 array[target] = 3 This is smaller than 6 so I can leave it alone This is smaller than 6 so I can leave it alone –low = low + 1 –low = 5, high = 5 Now that low == high I have finished the partition. Now that low == high I have finished the partition. All that’s left to do is place the pivot All that’s left to do is place the pivot low target high
Exercises Cont’ Implement the sorting algorithms for Implement the sorting algorithms for –Bubble sort –Quicksort Add them to the main program to put the algorithms against each other Add them to the main program to put the algorithms against each other In your main program you should input the size of your test array to fill with random data and then report out how long each sorting algorithm took to finish the same data set (give each a copy of the test data to sort) In your main program you should input the size of your test array to fill with random data and then report out how long each sorting algorithm took to finish the same data set (give each a copy of the test data to sort)