Data Structures Arrays Phil Tayco Slide version 1.0 Feb 02, 2015.

Data Structures Arrays Phil Tayco Slide version 1.0 Feb 02, 2015

Arrays Our first traditional data structure Arrays in modern programming languages have different forms (ArrayList, dynamic memory allocated arrays, dictionaries, etc.) Depending on the language, many of the constraints we discuss may appear to be addressed We look at a more traditional view of an array and its design intentions versus a specific programming languages implementation

Arrays Definition This structure begins with reserving a specified amount of space for n number of elements Each element is the same data type Direct random access of any element is possible Element location is referred to as an “index” with the first index starting at a value of 0

Arrays Considerations A specified amount of space for n number of elements must be reserved. This means you must consider maximum capacity at creation and implies some space may not be used in the program Direct random access of any element is possible suggesting fast performance getting to any record

Arrays Functional usage: Insert Adding an element into an array requires 2 essential steps –Ensure there is enough space in the array to add the new record –If necessary, adding the record while preserving the intended order of the array Many times, an additional variable representing current size is maintained along with the array (e.g. the capacity of an array may be set to 100 but the current number of existing elements may be 5)

Arrays Insert unordered If the order is insignificant, the key step after ensuring there is enough space is to find the next available space and adding the new record there An effective way to maintain this is to use the current size variable as the index for the next available space to add a record Before insert, if the current size is already equal to the array capacity, the insert cannot be performed If insert is performed, the current size must be incremented

Arrays Sample code for insert unordered: // Given Type[] array = new Type[CAPICITY]; // Given currentSize = 0; boolean insert(Type element) { if (arrayCapacity == array.length) return false; // Array is at maximum capacity array[currentSize++] = element; }

Arrays Insert unordered analysis Using the comparison operation type and a worst case scenario of all elements filled, the Big O for this algorithm is O(1): There will always be one comparison performed The performance is also the same in all other cases (if the list is empty or partially full) If the list is unordered, this algorithm is most effective given that its performance is constant Question: would it even be possible to eliminate the need to perform a comparison?

Arrays Insert unordered algorithm 2 Given this ideal performance, it may seem unnecessary to explore other algorithms As analysts on the never ending quest for something better, we should take a look at other ideas “Always keep an open mind and a compassionate heart” – Phil Jackson

Arrays Insert unordered algorithm 2 In the previous algorithm, the new element gets added to the end of the array What if we added it to the beginning of the array? –All elements “in front” of the new record must be shifted over to the next spot –The need to check for capacity must still be performed

Arrays Sample code for insert unordered 2: // Given Type[] array = new Type[CAPICITY]; // Given currentSize = 0; boolean insert(Type element) { if (arrayCapacity == array.length) return false; // Array is at maximum capacity for (int n = currentSize; n > 0 ; n--) array[n] = array[n-1]; array[0] = element; currentSize++; }

Arrays Insert unordered 2 analysis The code is the same as algorithm 1 with the addition of a loop which means there will at least be one comparison Can the loop add to the performance time? Definitely in the worst case! –Worst case is when only one space is available –The loop must perform (n-1) comparisons to shift everything before adding the new record In the worst case, performance is O(n) while best case is O(1) – can also be viewed as the more the array grows in used spaces, the more the performance goes toward O(n)

Arrays Insert unordered algorithms comparison Clearly, the first algorithm is preferred. Does that mean the second algorithm has no application? Perhaps there are situations where the order of the contents of the array don’t matter, but it is particularly important to know what the last element was added to the list Remember the saying at the beginning: “It depends” – in this case, how a record is deleted could be a factor Before examining the delete algorithms, we should look at the Search and Update functions

Arrays Search As discussed in the introduction, the search in an unordered list is an O(n) operation in the worst case using comparisons as the operation type Worst case situation is the record is found in the last location or does not exist in a full array Best case is the record is found on the first try, but how much can that be viewed as a reliable way to measure algorithm effectiveness in this case? Average case is going to be somewhere in between best and worst, which at this point is “good to know”

Arrays Linear Search Since the algorithm performs at O(n), an unordered array search is often referred to as a “Linear search” Sample code that follows returns the index location of the record if found or -1 representing not found

Arrays Sample code for linear search: // Given Type[] array = new Type[CAPICITY]; // Given currentSize = 0; int search(Type element) { for (int n = 0; n < currentSize; n++) if (array[n] == element) return n; return -1; }

Arrays Linear search analysis Clearly, this is O(n) performance. However, notice the actual number of comparisons –1 comparison to control the loop –1 comparison to check between element and array index The actual number of comparisons in the worst case is 2 * n. Why isn’t this referred to as O(2n)? –Our initial analysis is to identify the algorithm category –If we want to compare between two O(n) algorithms, the 2n becomes more significant

Arrays Can we do better? The inclination is to see if we can develop an algorithm that performs better than this. The challenges are: –2 comparisons are always needed to control going through the array and checking if we found the element –Worst case requires going through every single element in the array –Arbitrary start points (end of array or middle of array) and random hopping around the array do not improve the performance in the worst case and actually make things more complicated Think of a word search puzzle. If your algorithm is to look for the first letter of your search word in the puzzle, no matter how you jump around, the worst case is still ending up checking every letter

Arrays Update Now that we’ve looked at the search algorithm, update (and delete) can take advantage of this Update is a search for an element and if found, modifying it while maintaining the design intent of the structure In this case, the design intent is an unordered list making a modification of a record simple (there’s no need to check if the array order needs to be maintained because there is no order) We can use the search algorithm in our update

Arrays Sample code for update: // Given Type[] array = new Type[CAPICITY]; // Given currentSize = 0; boolean update(Type oldElement, Type newElement) { int searchIndex = search(oldElement); if (searchIndex == -1) return false; // Element to update is not found array[searchIndex] = newElement; return true; }

Arrays Update analysis Because the algorithm uses the search algorithm, the performance initially depends on the Big O of the search Since the search is linear in the worst case, update performance will at least be O(n) as well Are there additional comparisons to consider? –Notice after search is performed, there is one more comparison done to check if the record was found –Technically, this algorithm is O(2n + 1), but as far as a category is concerned, this is simply O(n) Thus, the update performance is also linear in the worst case

Arrays Delete Now the delete. In most data structures, the delete function is usually found to be the most complex and is often treated last among the 4 functions The algorithm of this function also tends to affect the performance of the other functions making choosing between delete algorithms needing to consider the impact insert, search and update The design intent of this structure is still unordered, so maintaining order after removing an element is not a big deal

Arrays Delete However, another design intent to question is whether or not to have “holes” in the array In this case, we don’t want holes because that impacts the use of the “currentSize” variable as the location of the next available element To account for this, the algorithm requires the following steps: –Use the search algorithm to find the element to remove –Shift over “to the left” any records after the record –Reduce the current size by 1

Arrays Sample code for delete: // Given Type[] array = new Type[CAPICITY]; // Given currentSize = 0; boolean delete(Type element) { int searchIndex = search(element); if (searchIndex == -1) return false; // Element to update is not found // “Remove” element at array[searchIndex] here for (int n = currentSize; n > searchIndex; n--) array[n-1] = array[n]; currentSize--; return true; }

Arrays Delete analysis Because the algorithm uses the search algorithm, the performance initially depends on the Big O of the search Question: What is the performance of the shift that occurs after the search is performed? Answer: In the case of all elements requiring a shift, it will be O(n) Question 2: What does this make the performance of delete?

Arrays Big O of delete in the worst case The worst case of the shift is O(n) implying that the overall order is technically O(2n) + O(n) or O(3n) If this is true, the algorithm should still be considered as being in the linear category of O(n), but it would be good to note that it is still has a higher performance against search and update However, it is not really O(3n) –In the worst case of the shift, the search will perform in the best case (because the first element will be found on the first try!) –In the worst case of the search, the shift of the entire array will also not need to be performed –The effective order is still O(n), but it is a lot closer to O(2n) than O(3n) as the code implies

Arrays Unordered Array Summary Worst case scenarios show the performances as: –Insert: O(1) –Update, Search and Delete: O(n) Holes are not intended in this data structure to ensure the currentSize variable is utilized properly Question: how are these performances impacted (if any) if duplicate record values are allowed? Update and Delete are dependent on Search – if Search were somehow improved, Update and Delete could be positively impacted Search is improved dramatically if a sense of order is maintained. This is our next topic: Sorted arrays

Sorted Arrays Definition As the name implies, a sorted array is an array where the elements maintain some sense of order This order can be anything that makes sense within the context of the data elements used: –Numerical sequential order of a key value like social security numbers –Reverse alphabetical order of strings like student last names –Function call order managing code execution –Numeric and mathematical symbol order for calculating equations –Character sequences representing compressed or encoded text –More and more… The main goal of maintaining any order is that its use is designed to improve the efficiency of any, if not all, supporting functions such as search

Sorted Arrays Binary Search A linear search in a non sorted array performs at O(n) in the worst case By establishing an order, the binary search algorithm can be applied and significantly improves performance in the worst case The algorithm: –In an array of n elements, go to index [n/2] –If the record there is the one you want, you are done –If the record value there is smaller than your search value, all records less than the current record can be ignored – set your range of elements to [n/2+1…n] and return to step 1 –Otherwise, set your range of elements to [0…(n/2)-1] and return to step 1 –Repeat this loop until you have 0 elements (record is not found) or record is found

Sorted Arrays Code for binary search: // Function returns -1 if element is not found or index of found element int binarySearch(Type searchValue) { int lowIndex = 0; int highIndex = currentSize - 1; int currentIndex; while (highIndex >= lowIndex) { currentIndex = (lowIndex + highIndex) / 2; if (array[currentIndex] == searchValue) return currentIndex; else if (array[currentIndex] > searchValue) highIndex = currentIndex – 1; else lowIndex = currentIndex + 1; } return -1; }

Sorted Arrays Binary search analysis Using the comparison operation as a unit of measure, each iteration can be seen at worst as 3 comparisons performed Worst case scenario is an element not found: –10 elements: 3 * (4 iterations) + 1 –100 elements: 3 * (7 iterations) + 1 –1000 elements: 3 * (10 iterations) + 1 What is the formula that captures the relationship between the size of the list and the number of iterations? (Take the “3 *” and the “+ 1” out of the equation)

Sorted Arrays Binary search analysis Worst case scenario is an element not found: –10 elements: 3 * (4 iterations) + 1 –100 elements: 3 * (7 iterations) + 1 –1000 elements: 3 * (10 iterations) + 1 What is the formula that captures the relationship between the size of the list and the number of iterations? (Take the “3 *” and the “+ 1” out of the equation)

Sorted Arrays Exponential growth The formula is based on the fact that the algorithm cuts the range of the search size in half with each iteration On the growing side of things mathematically, numbers that double in size are exponentially growing by a factor of 2: –8 = 2 3 –16 = 2 4 –32 = 2 5

Sorted Arrays Captain’s log The converse of an exponent is a logarithm which can be similarly applied to these numbers: –log 2 8 = 3 –log 2 16 = 4 –log 2 32 = 5 Since the binary search algorithm cuts the array size in half with each iteration, the number of iterations can be related to the size of the list in a similar way: –log 2 n = number of iterations

Sorted Arrays To be precise… The size of a list is often not a base 2 number Truncate the result of log 2 n and add 1: –log 2 n = floor(number of iterations) + 1 For example, with 100 elements, log 2 100 equals about 6.64. Truncate it to 6 and add 1 and you get 7 iterations This makes for a fun game with kids. Think of a number between 1 and 1000 and you can find it in 10 tries!

Sorted Arrays Back to Big O Now we have a way to categorize the performance of binary search As the list grows in size, the performance grows logarithmically. We can be precise if we need to, but as a category, this is O(log n) O(log n) is much better than O(n) O(log n) is usually found with algorithms that cut search ranges in half (known as the “divide and conquer” method which we will see more of later) You could say with smaller n values, O(n) is better than O(log n), but remember that the purpose of the categories is to characterize the performance as the list grows in size

Sorted Arrays So sorted arrays are better, right? Recall worst case scenarios for unsorted arrays: –Insert: O(1) –Update, Search and Delete: O(n) For sorted arrays, the Search improves to O(log n), but what about Insert, Update and Delete? All functions must now consider keeping the order of the array intact: –Insert needs to find the right location to add the correct element making the performance degrade to O(log n) plus whatever work is necessary to shift elements –Update can improve to O(log n), but the new key value may require moving the element to a new location and shifting other elements –Delete can also improve to O(log n), but the elements must also be shifted to keep holes from forming

Sorted Arrays Okay, well how do we stand so far:

Sorted Arrays What do we do with the other 3 functions? Search is dramatically improved, no doubt making situations where saving data that is read only in a sorted state very significant If we have to update, insert and delete, then there are 2 schools of thought: –Maintain the order as you perform these functions –Do not maintain order as you perform these functions and only perform a sort when you need to (such as before a search takes place) Let’s next take a look at maintaining the order as the functions are performed

Sorted Arrays Insert The algorithm requires searching for the correct location in the array for where the new element needs to be placed Once found, the elements to the right are shifted to make room for the new record Searching for the correct spot can use a similar approach as binary search for good performance However, the shift requires looking at elements linearly which results in an O(n) performance in the worst case Because of this, one approach is to perform a linear search for the insert spot first and then complete the O(n) with the shift

Sorted Arrays Code for insert: boolean insert(Type newElement) { if (currentSize == array.length) return false; int currentIndex = 0; while (array[currentIndex++] < newElement && currentIndex < currentSize); int insertIndex = currentIndex - 1; for (int n = currentSize; n > insertIndex; n--) array[n] = array[n-1]; currentSize++; array[insertIndex] = newElement; return true; }

Sorted Arrays Insert analysis The combination of the two loops together end up using comparisons that go through the entire list linearly no matter where the new element goes This is good for worst case scenario but also means at least an O(n) performance every time For very large size lists, O(n) may not be good and if worst case scenario is not expected, a binary search for the insert location first may be a better choice

Sorted Arrays Code for insert using binary search: boolean binaryInsert(Type newElement) { if (currentSize == array.length) return false; int lowIndex = 0; int highIndex = currentSize - 1; int currentIndex; while (highIndex >= lowIndex) { currentIndex = (lowIndex + highIndex) / 2; if (currentIndex == 0) break; else if (array[currentIndex] > newElement && array[currentIndex – 1] <= newElement) break;

Sorted Arrays Code for insert using binary search: else if (array[currentIndex] > searchValue) highIndex = currentIndex – 1; else lowIndex = currentIndex + 1; } int insertIndex = currentIndex; if (newElement > array[insertIndex]) insertIndex++; for (int n = currentSize; n > insertIndex; n--) array[n] = array[n-1]; currentSize++; array[insertIndex] = newElement; return true; }

Sorted Arrays Insert binary analysis More comparisons and functions are needed to handle finding correct insert index using binary search. In the long run, this is still O(log n) In the worst case, the algorithm will take longer than the first insert algorithm However, in average cases, the O(log n) search plus O(n) will be less than a full n Best case is finding the insert index on the first iteration (the middle of the array) making the shift an n/2 performance

Sorted Arrays Algorithm analysis Algorithm 1 is always O(n) while algorithm 2 ranges from O(n/2) to O(log n) + O(n) Which to use? Remember “it depends”? –Insertion of random values suggests algorithm 2 –Smaller array sizes suggests algorithm 1 –Frequency of expected insertions also a factor

Sorted Arrays Update This algorithm uses the search to find the value to change Once the value is changed, the new value needs to be moved to the correct spot Once the correct spot is found, elements must be shifted to make room for the updated value Because of the shift, we can take advantage of this by combining the search with the shift –If the updated value is to the left, linearly move in that direction shifting elements at the same time until you hit the right spot –If the updated value is to the right, shift in that direction in a similar way This leads to a binary search followed by a linear shift

Sorted Arrays Code for update (uses binary search function): boolean update(Type oldValue, Type newValue) { int recordIndex = binarySearch(oldValue); if (recordIndex == -1) return false; int nextIndex = 0; if (newValue > oldValue) { nextIndex = recordIndex + 1; if (nextIndex == currentSize) return true; while(nextIndex array[nextIndex]) { array[nextIndex-1] = array[nextIndex]; nextIndex++; } nextIndex--; }

Sorted Arrays Code for update (uses binary search function): else { if (recordIndex == 0) return true; nextIndex = recordIndex - 1; while(nextIndex >= 0 && newValue < array[nextIndex]) { array[nextIndex+1] = array[nextIndex]; nextIndex--; } nextIndex++; } array[nextIndex] = newValue; return true; }

Sorted Arrays Update analysis The use of the binary search makes the algorithm perform at least at O(log n) The worst case scenario here is the update of a value from one end of the array to the other The shift is unavoidably performed in a linear way adding a worst case O(n) Since the O(log n) binary search is an “addition to” (not a multiplication with…) the O(n) shift, the O(n) performance dominates the analysis, making this algorithm O(n) Like the insert, the O(n) represents worst case, but a similar range of O(n/2) to O(n) + O(log n) applies for average and best case scenarios Also like the insert, the challenge is the linear shift. Perhaps there is a better data structure to make that better… Meanwhile, what about delete?

Sorted Arrays Delete This algorithm also uses the search to find the value to remove Once the value is found, elements from the right must be shifted to ensure there are no holes This leads to another situation similar to insert where we can stick to a consistent O(n) and do a linear search followed by a shift or Go with the range of O(n/2) to O(log n) + O(n) and do a binary search followed by a linear shift

Sorted Arrays Code for O(n) delete: boolean delete(Type targetValue) { int targetIndex = 0; while (targetIndex < currentSize) { if (array[targetIndex] != targetValue) targetIndex++; else break; } if (targetIndex == currentSize) return false; for (int n = targetIndex; n < currentSize - 1; n++) array[n] = array[n+1]; array[--currentSize] = -1; // -1 value representing a blank value return true; }

Sorted Arrays Delete O(n) analysis The O(n) delete is a guaranteed O(n) solution. No matter the scenario (best, average or worst), the performance for comparisons is O(n) This can be useful is specific smaller range situations As the range gets larger, the search part of the algorithm can be improved using a binary search

Sorted Arrays Code for delete using binary search: boolean deleteBinary(Type targetValue) { int targetIndex = binarySearch(targetValue); if (targetIndex == -1) return false; for (int n = targetIndex; n < currentSize - 1; n++) array[n] = array[n+1]; array[--currentSize] = -1; return true; }

Sorted Arrays Delete using Binary Search analysis In larger data range situations, the search improvement is very helpful The worst case situation is worse than the consistent O(n) solution. If the target value is the first element in the array, the search is a complete O(log n) followed by a complete O(n) shift However, the best case is O(n/2) in which the target value is in the middle of the array followed by a shift of half the elements Also note, if the element does not exist, this algorithm performs at O(log n) while the previous delete algorithm is still O(n)

Sorted Arrays Summary (Unsorted and Sorted, Linear): Worst CaseUnsorted ArraysSorted Arrays (Linear Search) SearchO(n)O (log n) - no need to do linear InsertO(1)O(n) UpdateO(n)(not discussed, but O(2n) in worst case) DeleteO(n) O(n) here is worst case O(n) here is guaranteed

Sorted Arrays Summary (Sorted using Binary Search): Sorted Arrays (Binary Search)Worst CaseBest Case SearchO(log n)(not discussed, but O(1)) InsertO(log n) + O(n)O(n/2) UpdateO(log n) + O(n)(not discussed, but O(1)) DeleteO(log n) + O(n)O(n/2)

Sorted Arrays Summary For smaller sized lists, the linear search based maintenance algorithms of insert, update and delete can take advantage of a guaranteed O(n) performance The worst case for using a binary search based maintenance algorithm exceeds the guaranteed linear search based ones The best case, though is as low as O(n/2), which is significantly better than the guaranteed O(n) –Note that we can do this comparison between O(n) and O(n/2) because both are in the linear category This makes the average performance sometimes better than the guaranteed O(n) The big deal with the maintenance algorithms is the shift. Let’s now look at a data structure that addresses this along with the memory allocations necessary with arrays

Data Structures Arrays Phil Tayco Slide version 1.0 Feb 02, 2015.

Similar presentations

Presentation on theme: "Data Structures Arrays Phil Tayco Slide version 1.0 Feb 02, 2015."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Data Structures Arrays Phil Tayco Slide version 1.0 Feb 02, 2015.

Similar presentations

Presentation on theme: "Data Structures Arrays Phil Tayco Slide version 1.0 Feb 02, 2015."— Presentation transcript:

Similar presentations

About project

Feedback