2 Last time Today Algorithms for Pattern matching Find all occurences of targetFind number of occurences of targetFind number of values larger than targetFind largest /smallest, sum, averagePattern matchingTodayPattern matching algorithmEfficiency of algorithmsData cleanup algorithmsReading: start on Chapter 3, textbook
3 Pattern MatchingProblem: Suppose we have a gene (text) T = TCAGGCTAATCGTAGG and a probe (pattern) P = TA. Design an algorithm that searches T to find the position of every instance of P that appears in T.E.g., for this text, the algorithm should return the answer:There is a match at position 7There is a match at position 13Algorithm:What is the idea?Check if pattern matches starting at position 1, then check if it matches starting at position 2,…and so onHow to check if pattern matches text starting at position k?Check that every character of pattern matches corresponding character of text
4 Pattern Matching Input Output: Algorithm idea Gene (text) of n characters T1, T2, …, TnProbe (pattern) of m (m < n) characters P1, P2, …PmOutput:Location (index) of every occurrence of pattern within textAlgorithm ideaGet input (text and pattern)Set starting location k to 1Repeat until reach end of textAttempt to match every character in the pattern beginning at pos k in textIf there was a match, print kAdd 1 to kStop
5 Comparing Algorithms Algorithm There are many ways to solve a problem DesignCorrectnessEfficiencyAlso, clarity, elegance, ease of understandingThere are many ways to solve a problemConceptuallyAlso different ways to write pseudocode for the same conceptual ideaHow to compare algorithms?
6 Efficiency of Algorithms Efficiency: Amount of resources used by an algorithmSpace (number of variables)Time (number of instructions)When designing an algorithm must be aware of its use of resourcesIf there is a choice, pick the more efficient algorithm!
7 Efficiency of Algorithms Does efficiency matter?Computers are so fast these days…Yes, efficiency matters a lot!There are problems (actually a lot of them) for which all known algorithms are so inneficient that they are impracticalRemember the shortest-path-through-all-cities problem from Lab1…
8 Efficiency of Algorithms How to measure time efficiency?Running time: let it run and see how long it takesOn what machine?On what inputs?Time efficiency depends on inputExample: the sequential search algorithmIn the best case, how fast can the algorithm halt?In the worst case, how fast can the algorithm halt?
9 Time EfficiencyWe want a measure of time efficiency which is independent of machine, speed etcLook at an algorithm pseudocode and estimate its running timeLook at 2 algorithm pseudocodes and compare them(Time) Efficiency of an algorithm:the number of pseudocode instructions (steps) executedIs this accurate?Not all instructions take the same amount of time…But..Good approximation of running time in most cases
10 (Time) Efficiency of an algorithm worst case efficiencyis the maximum number of steps that an algorithm can take for any input data values.best case efficiencyis the minimum number of steps that an algorithm can take for any input data values.average case efficiency-the efficiency averaged on all possible inputs- must assume a distribution of the input- we normally assume uniform distribution (all keys are equally probable)If the input has size n, efficiency will be a function of n
11 Analysis of Sequential Search Time efficiencyBest-case : 1 comparisontarget is found immediatelyWorst-case: 3n + 5 comparisonsTarget is not foundAverage-case: 3n/2+4 comparisonsTarget is found in the middleSpace efficiencyHow much space is used in addition to the input?
12 Worst Case Efficiency for Sequential Search Get the value of target, n, and the list of n values 1Set index toSet found to falseRepeat steps 5-8 until found = true or index > n n5 if the value of listindex = target then nOutput the indexSet found to true 08 else Increment the index by n9 if not found then10 Print a message that target was not found 0StopTotal n+5
13 Order of Magnitude Worst-case of sequential search: Simplification: 3n+5 comparisonsAre these constants accurate? Can we ignore them?Simplification:ignore the constants, look only at the order of magnituden, 0.5n, 2n, 4n, 3n+5, 2n+100, 0.1n+3 ….are all linearwe say that their order of magnitude is n3n+5 is order of magnitude n: n+5 = (n)2n +100 is order of magnitude n: 2n+100=(n)0.1n+3 is order of magnitude n: 0.1n+3=(n)….
14 Data Cleanup Algorithms What are they?A systematic strategy for removing errors from data.Why are they important?Errors occur in all real computing situations.How are they related to the search algorithm?To remove errors from a series of values, each value must be examined to determine if it is an error.E.g., suppose we have a list d of data values, from which we want to remove all the zeroes (they mark errors), and pack the good values to the left. Legit is the number of good values remaining when we are done.d d2 d3 d4 d5 d d7 d8Legit
15 Data Cleanup: Copy-Over algorithm Idea: Scan the list from left to right and copy non-zero values to a new listCopy-Over Algorithm (Fig 3.2)Variables: n, A1, …, An, newposition, left, B1,…,BnGet values for n and the list of n values A1, A2, …, AnSet left to 1Set newposition to 1While left <= n doIf Aleft is non-zeroCopy A left into B newposition(Copy it into position newposition in new listIncrease left by 1Increase newposition by 1Else increase left by 1Stop
16 Data Cleanup: The Shuffle-Left Algorithm Idea:go over the list from left to right. Every time we see a zero, shift all subsequent elements one position to the left.Keep track of nb of legitimate (non-zero) entriesHow does this work?How many loops do we need?
17 Shuffle-Left Algorithm (Fig 3.1) Variables: n, A1,…,An, legit, left, rightGet values for n and the list of n values A1, A2, …, AnSet legit to nSet left to 1Set right to 2Repeat steps 6-14 until left > legit6 if Aleftt ≠ 07 Increase left by 18 Increase right by 19 else10 Reduce legit by 1Repeat until right > nCopy Aight into Aright-1Increase right by 114 Set right to left + 115 Stop
18 Exercising the Shuffle-Left Algorithm d d2 d3 d4 d5 d d7 d8legit
19 Data Cleanup: The Converging-Pointers Algorithm Idea:One finger moving left to right, one moving right to leftMove left finger over non-zero values;If encounter a zero value thenCopy element at right finger into this positionShift right finger to the left
20 Converging Pointers Algorithm (Fig 3.3) Variables: n, A1,…, An, legit, left, rightGet values for n and the list of n values A1, A2,…,AnSet legit to nSet left to 1Set right to nRepeat steps 6-10 until left ≥ rightIf the value of Aleft≠0 then increase left by 1ElseReduce legit by 1Copy the value of Aright to Aleft10 Reduce right by 1if Aleft=0 then reduce legit by 1.Stop
21 Exercising the Converging Pointers Algorithm d d2 d3 d4 d5 d d7 d8legit