Presentation is loading. Please wait.

Presentation is loading. Please wait.

UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2001 Lecture 8 Tuesday, 11/13/01 String Matching Algorithms Chapter.

Similar presentations


Presentation on theme: "UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2001 Lecture 8 Tuesday, 11/13/01 String Matching Algorithms Chapter."— Presentation transcript:

1 UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2001 Lecture 8 Tuesday, 11/13/01 String Matching Algorithms Chapter 34

2 Chapter Dependencies Ch 33 Number-Theoretic Algorithms RSA Math: Number Theory You’re responsible for material in this chapter that we discussed in lecture. (Note that this does not include sections 33.8 or 33.9.) Ch 34 String Matching Automata You’re responsible for material in Sections 34.1-34.4 of this chapter. (Note that this does not include section 34.5.)

3 Overview ä Number-Theoretic Algorithms Follow-up: ä Chinese Remainder Theorem ä Powers of an Element ä RSA Details ä String Matching Algorithms ä Motivation ä Naive Algorithm ä Rabin-Karp ä Finite Automata ä Knuth-Morris-Pratt ä Homework #6 ä Midterm Exam Follow-up

4 Number-Theoretic Algorithms Follow-up Chinese Remainder Theorem

5 Solving Modular Linear Eq source: 91.503 textbook Cormen et al.

6 Chinese Remainder Theorem source: 91.503 textbook Cormen et al.

7 Number-Theoretic Algorithms Follow-up RSA Details

8 RSA Encryption source: 91.503 textbook Cormen et al.

9 RSA Cryptosystem source: 91.503 textbook Cormen et al.

10 Modular Exponentiation source: 91.503 textbook Cormen et al.

11 RSA Correctness source: 91.503 textbook Cormen et al.

12 String Matching Algorithms Motivation & Basics

13 String Matching Problem source: 91.503 textbook Cormen et al. Motivations: text-editing, pattern matching in DNA sequences Text: array T[1...n] Pattern: array P[1...m] Array Element: Character from finite alphabet  Pattern P occurs with shift s in T if P[1...m] = T[s+1...s+m]

14 String Matching Algorithms ä Naive Algorithm ä Worst-case running time in O((n-m+1) m) ä Rabin-Karp ä Worst-case running time in O((n-m+1) m) ä Better than this on average and in practice ä Finite Automaton-Based  Worst-case running time in O(n + m|  ) ä Knuth-Morris-Pratt ä Worst-case running time in O(n + m)

15 Notation & Terminology   * = set of all finite-length strings formed using characters from alphabet   Empty string:  ä |x| = length of string x ä w is a prefix of x: wx ä w is a suffix of x: wx ä prefix, suffix are transitive ab abcca cca abcca

16 Overlapping Suffix Lemma source: 91.503 textbook Cormen et al.

17 String Matching Algorithms Naive Algorithm

18 Naive String Matching source: 91.503 textbook Cormen et al. worst-case running time is in  ((n-m+1)m)

19 String Matching Algorithms Rabin-Karp

20 Rabin-Karp Algorithm source: 91.503 textbook Cormen et al. ä Assume each character is digit in radix-d notation (e.g. d=10) ä p = decimal value of pattern ä t s = decimal value of substring T[s+1..s+m] for s = 0,1...,n-m ä Strategy: ä compute p in O(m) time (which is in O(n)) ä compute all t i values in total of O(n) time ä find all valid shifts s in O(n) time by comparing p with each t s ä Compute p in O(m) time using Horner’s rule: ä p = P[m] + d(P[m-1] + d(P[m-2] +... + d(P[2] + dP[1]))) ä Compute t 0 similarly from T[1..m] in O(m) time ä Compute remaining t i ‘s in O(n-m) time ä t s+1 = d(t s - d m-1 T[s+1]) + T[s+m+1]

21 Rabin-Karp Algorithm source: 91.503 textbook Cormen et al. p, t s may be large, so use mod

22 Rabin-Karp Algorithm (continued) p = 31415 spurious spurioushit t s+1 = d(t s - d m-1 T[s+1]) + T[s+m+1] source: 91.503 textbook Cormen et al.

23 Rabin-Karp Algorithm (continued) source: 91.503 textbook Cormen et al.

24 Rabin-Karp Algorithm (continued) source: 91.503 textbook Cormen et al. worst-case running time is in  ((n-m+1)m)  (m) in  (n)  (m)  ((n-m+1)m) high-order digit position for m-digit window Matching loop invariant: when line 10 executed t s =T[s+1..s+m] mod q rule out spurious hit Try all possible shifts d is radix q is modulus Preprocessing

25 Rabin-Karp Algorithm (continued) source: 91.503 textbook Cormen et al. average-case running time is in  (n+m) Assume reducing mod q is like random mapping from  * to Z q Estimate (chance that t s = p mod q) = 1/q # spurious hits is in O(n/q)  (m) in  (n)  (m)  ((n-m+1)m) high-order digit position for m-digit window Matching loop invariant: when line 10 executed t s =T[s+1..s+m] mod q rule out spurious hit Try all possible shifts d is radix q is modulus Preprocessing Expected matching time = O(n) + O(m(v + n/q)) (v = # valid shifts) If v is in O(1) and q >= m

26 String Matching Algorithms Finite Automata

27 source: 91.503 textbook Cormen et al. Strategy: Build automaton for pattern, then examine each text character once. worst-case running time is in  (n) + automaton creation time

28 Finite Automata source: 91.503 textbook Cormen et al.

29 String-Matching Automaton source: 91.503 textbook Cormen et al. Pattern = P = ababaca Automaton accepts strings ending in P

30 String-Matching Automaton source: 91.503 textbook Cormen et al. Suffix Function for P:  (x) = length of longest prefix of P that is a suffix of x Automaton’s operational invariant at each step: keeps track of longest pattern prefix that is a suffix of what has been read so far

31 String-Matching Automaton source: 91.503 textbook Cormen et al. Simulate behavior of string-matching automaton that finds occurrences of pattern P of length m in T[1..n] worst-case running time of matching is in  (n) assuming automaton has already been created...

32 String-Matching Automaton (continued) source: 91.503 textbook Cormen et al. Correctness of matching procedure...

33 String-Matching Automaton (continued) source: 91.503 textbook Cormen et al. Correctness of matching procedure...

34 String-Matching Automaton (continued) source: 91.503 textbook Cormen et al. Correctness of matching procedure...

35 String-Matching Automaton (continued) source: 91.503 textbook Cormen et al. worst-case running time of automaton creation is in  (m 3 |  |) worst-case running time of entire string-matching strategy is in  (m |  |) +  (n) can be improved to:  (m |  |) pattern matching time automaton creation time

36 String Matching Algorithms Knuth-Morris-Pratt

37 Knuth-Morris-Pratt Overview  Achieve  (n+m) time by shortening automaton preprocessing time below  (m |  |) ä Approach: ä don’t precompute automaton’s transition function ä calculate enough transition data “on-the-fly” ä obtain data via “alphabet-independent” pattern preprocessing ä pattern preprocessing compares pattern against shifts of itself

38 Knuth-Morris-Pratt Algorithm source: 91.503 textbook Cormen et al. determine how pattern matches against itself

39 Knuth-Morris-Pratt Algorithm source: 91.503 textbook Cormen et al. Prefix function  shows how pattern matches against itself Equivalently, what is largest k < q such that P k P q ?  (q) is length of longest prefix of P that is a proper suffix of P q Example:

40 Knuth-Morris-Pratt Algorithm source: 91.503 textbook Cormen et al.  (m) in  (n) using amortized analysis # characters matched scan text left-to-right next character does not match next character matches Is all of P matched? Look for next match  (m+n) using amortized analysis  (n)

41 Knuth-Morris-Pratt Algorithm Amortized Analysis Potential Method k = current state of algorithm source: 91.503 textbook Cormen et al.  (m) in  (n) initial potential value potential decreases Potential is never negative since  (k) >= 0 for all k potential increases by <=1 in each execution of for loop body amortized cost of loop body is in  (1)  (m) loop iterations

42 Knuth-Morris-Pratt Algorithm source: 91.503 textbook Cormen et al. Correctness...

43 Knuth-Morris-Pratt Algorithm source: 91.503 textbook Cormen et al. Correctness...

44 Knuth-Morris-Pratt Algorithm source: 91.503 textbook Cormen et al. Correctness...

45 Knuth-Morris-Pratt Algorithm source: 91.503 textbook Cormen et al. Correctness...

46 Homework #6........

47 Midterm Exam Follow-up........


Download ppt "UMass Lowell Computer Science 91.503 Analysis of Algorithms Prof. Karen Daniels Fall, 2001 Lecture 8 Tuesday, 11/13/01 String Matching Algorithms Chapter."

Similar presentations


Ads by Google