Data Structures Sorted Arrays

Slides:



Advertisements
Similar presentations
Chapter 3: The Efficiency of Algorithms Invitation to Computer Science, Java Version, Third Edition.
Advertisements

Quick Sort, Shell Sort, Counting Sort, Radix Sort AND Bucket Sort
Data Structures Hash Tables
Data Structures Advanced Sorts Part 2: Quicksort Phil Tayco Slide version 1.0 Mar. 22, 2015.
Chapter 3: The Efficiency of Algorithms
Algorithm Efficiency and Sorting
Data Structures Introduction Phil Tayco Slide version 1.0 Jan 26, 2015.
COMP s1 Computing 2 Complexity
Data Structures Arrays Phil Tayco Slide version 1.0 Feb 02, 2015.
Analysis of Algorithms
Data Structures & Algorithms CHAPTER 4 Searching Ms. Manal Al-Asmari.
C++ Programming: From Problem Analysis to Program Design, Second Edition Chapter 19: Searching and Sorting.
CS 162 Intro to Programming II Searching 1. Data is stored in various structures – Typically it is organized on the type of data – Optimized for retrieval.
Data Structures Simple Sorts Phil Tayco Slide version 1.0 Feb. 8, 2015.
CSC 211 Data Structures Lecture 13
1 Searching and Sorting Searching algorithms with simple arrays Sorting algorithms with simple arrays –Selection Sort –Insertion Sort –Bubble Sort –Quick.
Searching and Sorting Searching: Sequential, Binary Sorting: Selection, Insertion, Shell.
Searching Topics Sequential Search Binary Search.
Data Structures Arrays and Lists Part 2 More List Operations.
CMPT 120 Topic: Searching – Part 2 and Intro to Time Complexity (Algorithm Analysis)
Searching Arrays Linear search Binary search small arrays
Searching and Sorting Searching algorithms with simple arrays
Chapter 13 Recursion Copyright © 2016 Pearson, Inc. All rights reserved.
16 Searching and Sorting.
CSC 427: Data Structures and Algorithm Analysis
Multiway Search Trees Data may not fit into main memory
CSC 421: Algorithm Design & Analysis
CSC 421: Algorithm Design & Analysis
Analysis of Algorithms
COP 3503 FALL 2012 Shayan Javed Lecture 15
Introduction to Search Algorithms
CSC 222: Object-Oriented Programming
Recitation 13 Searching and Sorting.
Introduction to complexity
CSC 421: Algorithm Design & Analysis
Searching.
Data Structures 2018 Quiz Answers
Phil Tayco Slide version 1.0 May 7, 2018
Bubble Sort Bubble sort is one way to sort an array of numbers. Adjacent values are swapped until the array is completely sorted. This algorithm gets its.
Binary Search Back in the days when phone numbers weren’t stored in cell phones, you might have actually had to look them up in a phonebook. How did you.
Linear and Binary Search
Algorithm design and Analysis
Chapter 3: The Efficiency of Algorithms
Chapter 8 Search and Sort
Winter 2018 CISC101 12/2/2018 CISC101 Reminders
Java Programming Loops
Searching and Sorting Arrays
MSIS 655 Advanced Business Applications Programming
8/04/2009 Many thanks to David Sun for some of the included slides!
CS 201 Fundamental Structures of Computer Science
Searching and Sorting 1-D Arrays
Searching: linear & binary
Search,Sort,Recursion.
Analysis of Algorithms
Search,Sort,Recursion.
CSC 421: Algorithm Design & Analysis
CSC 427: Data Structures and Algorithm Analysis
Java Programming Loops
Searching.
Data Structures Introduction
Data Structures Unsorted Arrays
Chapter 9: More About Data, Arrays, and Files
Data Structures Advanced Sorts Part 1: Mergesort
Analysis of Algorithms
Sum this up for me Let’s write a method to calculate the sum from 1 to some n public static int sum1(int n) { int sum = 0; for (int i = 1; i
CSC 421: Algorithm Design & Analysis
Searching.
Module 8 – Searching & Sorting Algorithms
Analysis of Algorithms
Algorithm Analysis How can we demonstrate that one algorithm is superior to another without being misled by any of the following problems: Special cases.
Presentation transcript:

Data Structures Sorted Arrays Phil Tayco Slide version 1.1 Feb 3, 2019

Sorted Arrays Definition As the name implies, a sorted array is an array where the elements maintain some sense of order This order can be anything that makes sense within the context of the data elements used: Numerical sequential order of a key value like social security numbers Reverse alphabetical order of strings like student last names Function call order managing code execution Numeric and mathematical symbol order for calculating equations Character sequences representing compressed or encoded text The main reason to maintain the order is to improve the efficiency of the search function

Sorted Arrays Binary Search A linear search in a non sorted array performs at O(n) in the worst case By establishing an order, the binary search algorithm can be applied and significantly improves performance in the worst case The algorithm: In an array of n elements, go to index [n/2] If the record there is the one you want, you are done If the record value there is smaller than your search value, all records less than the current record can be ignored – set your range of elements to [n/2+1…n] and return to step 1 Otherwise, set your range of elements to [0…(n/2)-1] and return to step 1 Repeat this loop until you have 0 elements (record is not found) or record is found

Sorted Arrays int binarySearch(int searchValue) { int lowIndex = 0; int highIndex = currentSize - 1; int currentIndex; while (highIndex >= lowIndex) currentIndex = (lowIndex + highIndex) / 2; if (numbers[currentIndex] == searchValue) return currentIndex; else if (numbers[currentIndex] > searchValue) highIndex = currentIndex - 1; else lowIndex = currentIndex + 1; } return -1;

Sorted Arrays binarySearch(2), first pass: Lo = 0, hi = 5, cur = 2 binarySearch(2), second pass: binarySearch(2), third pass: 1 2 4 5 8 9 Lo = 0, hi = 5, cur = 2 1 2 3 4 5 1 2 4 5 8 9 Lo = 0, hi = 1, cur = 0 1 2 3 4 5 1 2 4 5 8 9 Lo = 1, hi = 1, cur = 1 1 2 3 4 5

Sorted Arrays Binary search analysis Using the comparison operation as a unit of measure, each iteration can be seen at worst as 3 comparisons performed Worst case scenario is an element not found: 10 elements: 3 * (4 iterations) + 1 100 elements: 3 * (7 iterations) + 1 1000 elements: 3 * (10 iterations) + 1 What is the formula that captures the relationship between the size of the list and the number of iterations?

Sorted Arrays Exponential formulas Consider the value of the exponent from given a base and the result 8 = 23 16 = 8 * 2 = 24 32 = 16 * 2 = 25 64 = 32 * 2 = 26 … With a base of 2, as the exponent increases by 1, the result doubles in size

Sorted Arrays Captain’s log These formulas can be restated in the form of a logarithm: log2 8 = 3 log2 16 = 4 log2 32 = 5 The key pattern is that as n doubles in size, the exponent increments by one A similar pattern is seen when the array size doubles in algorithms where the number of comparisons steadily increases by 1 log2 n = number of comparisons

Sorted Arrays O(log n) As such, the mathematical pattern that this relationship is most similar to is a logarithm O(log n) is used to represent this category of efficiency It is significantly faster than O(n) but still not as efficient as O(1) Often this category is used when algorithms use a “divide and conquer” approach The array is divided in half each time we iterate until the search is complete The cutting in half is modeled with as log2

Sorted Arrays So sorted arrays are better, right? Recall worst case scenarios for unsorted arrays: Insert: O(1) Update, Search and Delete: O(n) For sorted arrays, the Search significantly improves to O(log n), but what about Insert, Update and Delete? Instinct is to say that because the search is O(log n) and update and delete are using the search function, they too must be O(log n), which should be an improvement!

Sorted Arrays Shifty results Remember the maintenance factor. We must keep the array ordered after the insert, update and delete is performed and have no holes Insert needs to search for the right location to add the correct element and then shift any necessary elements making the performance degrade from O(1) Update can improve to O(log n), but the new key value may require moving the element to a new location and shifting other elements Delete can also improve to O(log n), but the elements must also be shifted to keep holes from forming “Shift” appears to be a common theme which is worst case O(n)

Sorted Arrays How do we stand so far:

Sorted Arrays What do we do with the other 3 functions? Search is significantly improved, no doubt making situations where saving data that is read only in a sorted state very significant If we have to update, insert and delete, then there are 2 schools of thought: Maintain the order as you perform these functions Do not maintain order as you perform these functions and only perform a sort when you need to (such as before a search takes place) Let’s next take a look at maintaining the order as the functions are performed

Sorted Arrays Insert The algorithm requires searching for the correct location in the array for where the new element needs to be placed Once found, the elements to the right are shifted to make room for the new record The shift requires looking at elements linearly which is O(n) performance in the worst case One approach is to perform a linear search for the insert spot first and then complete the O(n) with the shift

Sorted Arrays Linear search in insert(4): currentSize = 5 Shift at end of insert(4): currentSize = 5 1 2 5 6 9 1 2 3 4 5 1 2 4 5 6 9 currentSize = 6 1 2 3 4 5

Sorted Arrays boolean insertLinearSearchAndShift(int element) { if (currentSize == numbers.length) return false; int currentIndex = 0; while (numbers[currentIndex] < element && currentIndex < currentSize) currentIndex++; int insertIndex = currentIndex; for (int n = currentSize; n > insertIndex; n--) numbers[n] = numbers[n-1]; currentSize++; numbers[insertIndex] = element; return true; }

Sorted Arrays Insert analysis The combination of the two loops together end up using comparisons that go through the entire list linearly no matter where the new element goes This is good for worst case scenario but also means at least an O(n) performance every time For very large size lists, O(n) may not be good and if worst case scenario is not expected, a binary search for the insert location first may be a better choice

Sorted Arrays Binary search in insert(4): currentSize = 5 Shift at end of insert(4): currentSize = 5 1 2 5 6 9 1 2 3 4 5 1 2 4 5 6 9 currentSize = 6 1 2 3 4 5

Sorted Arrays boolean insertBinarySearchAndShift(int element) { if (currentSize == numbers.length) return false; int lowIndex = 0; int highIndex = currentSize - 1; int currentIndex = 0; while (highIndex >= lowIndex) currentIndex = (lowIndex + highIndex) / 2; if (currentIndex == 0) break; else if (numbers[currentIndex] > element && numbers[currentIndex - 1] <= element) else if (numbers[currentIndex] > element) highIndex = currentIndex - 1; else lowIndex = currentIndex + 1; }

Sorted Arrays int insertIndex = currentIndex; if (element > numbers[insertIndex]) insertIndex++; for (int n = currentSize; n > insertIndex; n--) numbers[n] = numbers[n-1]; currentSize++; numbers[insertIndex] = element; return true; }

Sorted Arrays Insert binary analysis More comparisons and functions are needed to handle finding correct insert index using binary search. In the long run, this is still O(log n) However, in average cases, the O(log n) search plus O(n) will be less than a full n In the worst case, the algorithm will take longer than the first insert algorithm (O log n search to the first element in the array, then a full shift) Best case is finding the insert index at the end of the array resulting in no shift and a O(log n) search performance

Sorted Arrays Algorithm analysis Algorithm 1 is always O(n) while algorithm 2 ranges from O(log n) to O(log n) + O(n) Categorically, both algorithms are O(n) in worst case. Which one to use? Insertion of random values suggests algorithm 2 since worst case inserts only occur with values added at either ends of the array Smaller array sizes suggests algorithm 1 since O(n) is consistent Frequency of expected insertions also a factor

Sorted Arrays Update This algorithm uses the search to find the value to change Once the value is changed, the new value needs to be moved to the correct spot Once the correct spot is found, elements must be shifted to make room for the updated value Because of the shift, we can take advantage of this by combining the search with the shift If the updated value is to the left, linearly move in that direction shifting elements at the same time until you hit the right spot If the updated value is to the right, shift in that direction in a similar way This leads to a binary search followed by a linear shift

Sorted Arrays Binary search in update (5, 0): Shift at end of update(5, 0): 1 2 5 6 9 1 2 3 4 5 1 2 6 9 1 2 3 4 5

Sorted Arrays Code for update (uses binary search function): boolean updateBinarySearchMoveAndShift(int oldValue, int newValue) { int recordIndex = binarySearch(oldValue); if (recordIndex == -1) return false; int nextIndex = 0; if (newValue > oldValue) nextIndex = recordIndex + 1; if (nextIndex == currentSize) return true; while(nextIndex < currentSize && newValue > numbers[nextIndex]) numbers[nextIndex-1] = numbers[nextIndex]; nextIndex++; } nextIndex--;

Sorted Arrays Code for update (uses binary search function): else { if (recordIndex == 0) return true; nextIndex = recordIndex - 1; while(nextIndex >= 0 && newValue < numbers[nextIndex]) numbers [nextIndex+1] = numbers[nextIndex]; nextIndex--; } nextIndex++; numbers[nextIndex] = newValue;

Sorted Arrays Update analysis The use of the binary search makes the algorithm perform at least at O(log n) The worst case scenario here is the update of a value from one end of the array to the other The shift is unavoidably performed in a linear way adding a worst case O(n) Like the insert, the O(n) represents worst case, but a similar range of O(n/2) to O(n) + O(log n) applies for average and best case scenarios Also like the insert, the challenge is the linear shift. Perhaps there is a better data structure to make that better… Question: What is the best case scenario for update? Meanwhile, what about delete?

Sorted Arrays Delete One algorithm uses a linear search to find the value to remove Once the value is found, elements from the right must be shifted to ensure there are no holes This leads to another situation similar to insert where we can stick to a consistent O(n) and do a linear search followed by a shift

Sorted Arrays boolean deleteLinear(int targetValue) { int targetIndex = 0; while (targetIndex < currentSize) if (numbers[targetIndex] != targetValue) targetIndex++; else break; } if (targetIndex == currentSize) return false; for (int n = targetIndex; n < currentSize - 1; n++) numbers[n] = numbers[n+1]; numbers[--currentSize] = -1; // -1 value representing a blank value return true;

Sorted Arrays Linear search in delete (5): Shift at end of delete(5): 1 2 5 6 9 1 2 3 4 5 1 2 6 9 1 2 3 4 5

Sorted Arrays Delete O(n) analysis The O(n) delete is a guaranteed O(n) solution. No matter the scenario (best, average or worst), the performance for comparisons is O(n) This can be useful is specific smaller range situations As the range gets larger and considering other factors, the search part of the algorithm can instead use a binary search

Sorted Arrays Code for delete using binary search: boolean deleteBinary(int targetValue) { int targetIndex = binarySearch(targetValue); if (targetIndex == -1) return false; for (int n = targetIndex; n < currentSize - 1; n++) numbers[n] = numbers[n+1]; numbers[--currentSize] = -1; return true; }

Sorted Arrays Binary search in delete (5): Shift at end of delete(5): 1 2 5 6 9 1 2 3 4 5 1 2 6 9 1 2 3 4 5

Sorted Arrays Delete using Binary Search analysis In larger data range situations, the search improvement is very helpful The worst case situation is worse than the consistent O(n) solution. If the target value is the first element in the array, the search is a complete O(log n) followed by a complete O(n) shift The best case is O(log n) in which the target value is at the end of the array followed by no shift of elements Also note, if the element does not exist, this algorithm performs at O(log n) while the previous delete algorithm is still O(n)

Sorted Arrays (Linear Search) Summary (Unsorted and Sorted, Linear): Worst Case Unsorted Arrays Sorted Arrays (Linear Search) Search O(n) O (log n) - no need to do linear Insert O(1) Update Delete O(n) here is worst case O(n) here is guaranteed

Sorted Arrays (Binary Search) Summary (Sorted using Binary Search): Sorted Arrays (Binary Search) Worst Case Best Case Search O(log n) O(1) Insert O(log n) + O(n) Update Delete

Sorted Arrays Summary For smaller sized lists, the linear search based maintenance algorithms of insert, update and delete can take advantage of a guaranteed O(n) performance The worst case for using a binary search based maintenance algorithm exceeds the guaranteed linear search based ones The best case, though is as low as O(log n), which is significantly better than the guaranteed O(n) This makes the average performance often better than the guaranteed O(n) The big deal with the maintenance algorithms is the shift. Let’s now look at a data structure that addresses this along with the memory management