Presentation is loading. Please wait.

Presentation is loading. Please wait.

UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2006 Wednesday, 12/6/06 String Matching Algorithms Chapter 32.

Similar presentations


Presentation on theme: "UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2006 Wednesday, 12/6/06 String Matching Algorithms Chapter 32."— Presentation transcript:

1 UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2006 Wednesday, 12/6/06 String Matching Algorithms Chapter 32

2 Chapter Dependencies Ch 32 String Matching Automata You’re responsible for material in Sections 32.1-32.4 of this chapter.

3 String Matching Algorithms Motivation & Basics

4 String Matching Problem source: 91.503 textbook Cormen et al. Motivations: text-editing, pattern matching in DNA sequences Text: array T[1...n] Pattern: array P[1...m] Array Element: Character from finite alphabet  Pattern P occurs with shift s in T if P[1...m] = T[s+1...s+m] 32.1

5 String Matching Algorithms ä Naive Algorithm ä Worst-case running time in O((n-m+1) m) ä Rabin-Karp ä Worst-case running time in O((n-m+1) m) ä Better than this on average and in practice ä Finite Automaton-Based  Worst-case running time in O(n + m|  ) ä Knuth-Morris-Pratt ä Worst-case running time in O(n + m)

6 Notation & Terminology   * = set of all finite-length strings formed using characters from alphabet   Empty string:  ä |x| = length of string x ä w is a prefix of x: wx ä w is a suffix of x: wx ä prefix, suffix are transitive ab abcca cca abcca

7 Overlapping Suffix Lemma source: 91.503 textbook Cormen et al. 32.1 32.3 32.1

8 String Matching Algorithms Naive Algorithm

9 Naive String Matching source: 91.503 textbook Cormen et al. worst-case running time is in  ((n-m+1)m) 32.4

10 String Matching Algorithms Rabin-Karp

11 Rabin-Karp Algorithm source: 91.503 textbook Cormen et al. ä Assume each character is digit in radix-d notation (e.g. d=10) ä p = decimal value of pattern ä t s = decimal value of substring T[s+1..s+m] for s = 0,1...,n-m ä Strategy: ä compute p in O(m) time (which is in O(n)) ä compute all t i values in total of O(n) time ä find all valid shifts s in O(n) time by comparing p with each t s ä Compute p in O(m) time using Horner’s rule: ä p = P[m] + d(P[m-1] + d(P[m-2] +... + d(P[2] + dP[1]))) ä Compute t 0 similarly from T[1..m] in O(m) time ä Compute remaining t i ‘s in O(n-m) time ä t s+1 = d(t s - d m-1 T[s+1]) + T[s+m+1]

12 Rabin-Karp Algorithm source: 91.503 textbook Cormen et al. p, t s may be large, so use mod 32.5

13 Rabin-Karp Algorithm (continued) p = 31415 spurious spurioushit t s+1 = d(t s - d m-1 T[s+1]) + T[s+m+1] source: 91.503 textbook Cormen et al.

14 Rabin-Karp Algorithm (continued) source: 91.503 textbook Cormen et al.

15 Rabin-Karp Algorithm (continued) source: 91.503 textbook Cormen et al. worst-case running time is in  ((n-m+1)m)  (m) in  (n)  (m)  ((n-m+1)m) high-order digit position for m-digit window Matching loop invariant: when line 10 executed t s =T[s+1..s+m] mod q rule out spurious hit Try all possible shifts d is radix q is modulus Preprocessing

16 Rabin-Karp Algorithm (continued) source: 91.503 textbook Cormen et al. average-case running time is in  (n+m) Assume reducing mod q is like random mapping from  * to Z q Estimate (chance that t s = p mod q) = 1/q # spurious hits is in O(n/q)  (m) in  (n)  (m)  ((n-m+1)m) high-order digit position for m-digit window Matching loop invariant: when line 10 executed t s =T[s+1..s+m] mod q rule out spurious hit Try all possible shifts d is radix q is modulus Preprocessing Expected matching time = O(n) + O(m(v + n/q)) (v = # valid shifts) If v is in O(1) and q >= m

17 String Matching Algorithms Finite Automata

18 source: 91.503 textbook Cormen et al. Strategy: Build automaton for pattern, then examine each text character once. worst-case running time is in  (n) + automaton creation time 32.6

19 Finite Automata source: 91.503 textbook Cormen et al.

20 String-Matching Automaton source: 91.503 textbook Cormen et al. Pattern = P = ababaca Automaton accepts strings ending in P 32.7

21 String-Matching Automaton source: 91.503 textbook Cormen et al. Suffix Function for P:  (x) = length of longest prefix of P that is a suffix of x Automaton’s operational invariant at each step: keeps track of longest pattern prefix that is a suffix of what has been read so far 32.3 32.4

22 String-Matching Automaton source: 91.503 textbook Cormen et al. Simulate behavior of string-matching automaton that finds occurrences of pattern P of length m in T[1..n] worst-case running time of matching is in  (n) assuming automaton has already been created...

23 String-Matching Automaton (continued) source: 91.503 textbook Cormen et al. Correctness of matching procedure... 32.4 32.3 to be proved next…

24 String-Matching Automaton (continued) source: 91.503 textbook Cormen et al. Correctness of matching procedure... 32.2 32.8 32.2 32.8

25 String-Matching Automaton (continued) source: 91.503 textbook Cormen et al. Correctness of matching procedure... 32.3 32.9 32.3 32.9 32.2 32.1

26 String-Matching Automaton (continued) source: 91.503 textbook Cormen et al. Correctness of matching procedure... 32.4 32.3

27 String-Matching Automaton (continued) source: 91.503 textbook Cormen et al. worst-case running time of automaton creation is in  (m 3 |  |) worst-case running time of entire string-matching strategy is in  (m |  |) +  (n) can be improved to:  (m |  |) pattern matching time automaton creation time

28 String Matching Algorithms Knuth-Morris-Pratt

29 Knuth-Morris-Pratt Overview  Achieve  (n+m) time by shortening automaton preprocessing time below  (m |  |) ä Approach: ä don’t precompute automaton’s transition function ä calculate enough transition data “on-the-fly” ä obtain data via “alphabet-independent” pattern preprocessing ä pattern preprocessing compares pattern against shifts of itself

30 Knuth-Morris-Pratt Algorithm source: 91.503 textbook Cormen et al. determine how pattern matches against itself 32.10

31 Knuth-Morris-Pratt Algorithm source: 91.503 textbook Cormen et al. Prefix function  shows how pattern matches against itself Equivalently, what is largest k < q such that P k P q ?  (q) is length of longest prefix of P that is a proper suffix of P q Example: 32.5

32 Knuth-Morris-Pratt Algorithm source: 91.503 textbook Cormen et al.  (m) in  (n) using amortized analysis # characters matched scan text left-to-right next character does not match next character matches Is all of P matched? Look for next match  (m+n) using amortized analysis  (n)

33 Knuth-Morris-Pratt Algorithm Amortized Analysis Potential Method k = current state of algorithm source: 91.503 textbook Cormen et al.  (m) in  (n) initial potential value potential decreases Potential is never negative since  (k) >= 0 for all k potential increases by <=1 in each execution of for loop body amortized cost of loop body is in  (1)  (m) loop iterations

34 Knuth-Morris-Pratt Algorithm source: 91.503 textbook Cormen et al. Correctness...

35 Knuth-Morris-Pratt Algorithm source: 91.503 textbook Cormen et al. Correctness... 32.5 32.1 32.6

36 Knuth-Morris-Pratt Algorithm source: 91.503 textbook Cormen et al. Correctness... 32.11 32.5

37 Knuth-Morris-Pratt Algorithm source: 91.503 textbook Cormen et al. Correctness... 32.6 32.5 32.7 32.6


Download ppt "UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2006 Wednesday, 12/6/06 String Matching Algorithms Chapter 32."

Similar presentations


Ads by Google