Today Data cleanup algorithms –Copy-over –Shuffle-left –Converging pointers Algorithm efficiency Efficiency of data cleanup algorithms Reading: –Chapter 3
Comparing Algorithms Algorithm –Design –Correctness –Efficiency –Also, clarity, elegance, ease of understanding There are many ways to solve a problem –Conceptually –Also different ways to write pseudocode for the same conceptual idea How to compare algorithms?
Efficiency of Algorithms Efficiency: Amount of resources used by an algorithm Space (number of variables) Time (number of instructions) When design algorithm must be aware of its use of resources If there is a choice, pick the more efficient algorithm!
Efficiency of Algorithms Does efficiency matter? Computers are so fast these days… Yes, efficiency matters a lot! –There are problems (actually a lot of them) for which all known algorithms are so inneficient that they are impractical –Remember the shortest-path-through-all-cities problem from Lab1…
Data Cleanup Algorithms What are they? A systematic strategy for removing errors from data. Why are they important? Errors occur in all real computing situations. How are they related to the search algorithm? To remove errors from a series of values, each value must be examined to determine if it is an error. E.g., suppose we have a list d of data values, from which we want to remove all the zeroes (they mark errors), and pack the good values to the left. Legit is the number of good values remaining when we are done. d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 5 3 4 0 6 2 4 0 Legit
Data Cleanup: Copy-Over algorithm Idea: Scan the list from left to right and copy non-zero values to a new list Copy-Over Algorithm (Fig 3.2) Get values for n and the list of n values A1, A2, …, An Set left to 1 Set newposition to 1 While left <= n do If A left is non-zero Copy A left into B newposition (Copy it into position newposition in new list Increase left by 1 Increase newposition by 1 Else increase left by 1 Stop
Data Cleanup: The Shuffle-Left Algorithm Idea: –go over the list from left to right. Every time we see a zero, shift all subsequent elements one position to the left. –Keep track of nb of legitimate (non-zero) entries How does this work? How many loops do we need?
Shuffle-Left Algorithm (Fig 3.1) 1Get values for n and the list of n values A1, A2, …, An 2Set legit to n 3Set left to 1 4Set right to 2 5Repeat steps 6-14 until left > legit 6if A leftt ≠ 0 7Increase left by 1 8Increase right by 1 9else 10Reduce legit by 1 11Repeat 12-13 until right > n 12 Copy A ight into A right-1 13 Increase right by 1 14Set right to left + 1 15Stop
Exercising the Shuffle-Left Algorithm d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 5 3 4 0 6 2 4 0 legit
Data Cleanup: The Converging-Pointers Algorithm Idea: –One finger moving left to right, one moving right to left –Move left finger over non-zero values; – If encounter a zero value then Copy element at right finger into this position Shift right finger to the left
Converging Pointers Algorithm (Fig 3.3) 1Get values for n and the list of n values A1, A2,…,An 2Set legit to n 3Set left to 1 4Set right to n 5Repeat steps 6-10 until left ≥ right 6 If the value of A left ≠0 then increase left by 1 7 Else 8Reduce legit by 1 9Copy the value of A right to A left 10Reduce right by 1 11if A left =0 then reduce legit by 1. 12Stop
Exercising the Converging Pointers Algorithm d 1 d 2 d 3 d 4 d 5 d 6 d 7 d 8 5 3 4 0 6 2 4 0 legit
Efficiency of Algorithms How to measure time efficiency? Running time: let it run and see how long it takes –On what machine? –On what inputs? We want a measure of time efficiency which is independent of machine, speed etc –Look at an algorithm pseudocode and estimate its running time –Look at 2 algorithm pseudocodes and compare them
Time Efficiency (Time) efficiency of an algorithm: –the number of pseudocode instructions (steps) executed Is this accurate? –Not all instructions take the same amount of time… –But..Good approximation of running time in most cases Time efficiency depends on input –Example: the sequential search algorithm In the best case, how fast can the algorithm halt? In the worst case, how fast can the algorithm halt?
Efficiency of an algorithm worst case efficiency is the maximum number of steps that an algorithm can take for any collection of data values. Best case efficiency is the minimum number of steps that an algorithm can take any collection of data values. Average case efficiency - the efficiency averaged on all possible inputs - must assume a distribution of the input - we normally assume uniform distribution (all keys are equally probable) If the input has size n, efficiency will be a function of n
Worst Case Efficiency for Sequential Search 1.Get the value of target, n, and the list of n values1 2.Set index to 11 3.Set found to false1 4.Repeat steps 5-8 until found = true or index > nn 5if the value of list index = target then n 6Output the index 0 7Set found to true 0 8 else Increment the index by 1 n 9if not found then1 10Print a message that target was not found0 11Stop 1 Total 3n+5
Analysis of Sequential Search Time efficiency –Best-case : 1 comparison target is found immediately –Worst-case: 3n + 5 comparisons Target is not found –Average-case: 3n/2+4 comparisons Target is found in the middle Space efficiency –How much space is used in addition to the input?
Order of Magnitude Worst-case of sequential search: –3n+5 comparisons –Are these constants accurate? Can we ignore them? Simplification: –ignore the constants, look only at the order of magnitude –n, 0.5n, 2n, 4n, 3n+5, 2n+100, 0.1n+3 ….are all linear –we say that their order of magnitude is n 3n+5 is order of magnitude n: 3n+5 = (n) 2n +100 is order of magnitude n: 2n+100= (n) 0.1n+3 is order of magnitude n: 0.1n+3= (n) ….
Efficiency of Copy-Over Best case: –all values are zero: no copying, no extra space Worst-case: –No zero value: n elements copied, n extra space –Time: (n) –Extra space: n
Efficiency of Shuffle-Left Space: –no extra space (except few variables) Time –Best-case No zero value: no copying ==> order of n = (n) –Worst case All zero values: –every element thus requires copying n-1 values one to the left n x (n-1) = n 2 - n = order of n 2 = (n 2 ) (why?) –Average case Half of the values are zero n/2 x (n-1) = (n 2 - n)/2 = order of n 2 = (n 2 )
Efficiency of Converging Pointers Algorithm Space –No extra space used (except few variables) Time –Best-case No zero value No copying => order of n = (n) –Worst-case All values zero: One copy at each step => n-1 copies order of n = (n) –Average-case Half of the values are zero: n/2 copies order of n = (n)
Data Cleanup Algorithms Copy-Over –worst-case: time (n), extra space n –best case: time (n), no extra space Shuffle-left –worst-case: time (n 2 ), no extra space –Best-case: time (n), no extra space Converging pointers –worst-case: time (n), no extra space –Best-case: time (n), no extra space