Presentation is loading. Please wait.

Presentation is loading. Please wait.

String Matching Input: Strings P (pattern) and T (text); |P| = m, |T| = n. Output: Indices of all occurrences of P in T. ExampleT = discombobulate later.

Similar presentations


Presentation on theme: "String Matching Input: Strings P (pattern) and T (text); |P| = m, |T| = n. Output: Indices of all occurrences of P in T. ExampleT = discombobulate later."— Presentation transcript:

1 String Matching Input: Strings P (pattern) and T (text); |P| = m, |T| = n. Output: Indices of all occurrences of P in T. ExampleT = discombobulate later 15 > |T| (no occurrence of P) P output combo 4 (i.e., with shift 3) ate 12

2 Applications Text retrieval Computational biology - DNA is a one-dimensional (1-D) string of characters A’s, G’s, C’s, T’s. Searching for DNA patterns Comparing two or more DNA strings for similarities Reconstructing DNA strings from overlapping fragments. - All information for 3-D protein folding is contained in protein sequence itself and independent of the environment.

3 Sliding the Pattern Template T = b i o l o g y P = l o g i c n = 7 m = 5 b i o l o g y l o g i c b i o l o g y l o g i c b i o l o g y l o g i c b i o l o g y l o g i c b i o l o g y l o g i c b i o l o g y l o g i c T[1]  P[1] T[4] = P[1], T[5] = P[2], T[6] = P[3], but T[7]  P[4] T[2]  P[1] b i o l o g y l o g i c No match! T[3]  P[1]

4 Another Example T = b i o l o g i c a l P = l o g i c n = 10 m = 5 b i o l o g i c a l l o g i c Match found! return 4.

5 The Naive Matcher Pattern: P[1..m] Text: T[1..n] Naive-String-Matcher(T, P) // find all occurrences of P in T. for s = 1 to n  m +1 do if P[1.. m] = T[s.. s+m  1] then print “Pattern occurs at index” s T:T: P:P: s s+m-1 1 m

6 Time Complexity m(n  m + 1) comparisons (as below) in the worst case. m chars P T 1 2 3 n  m+1 n n  m + 1 blocks, each requiring m comparisons Time complexity is O(mn)!

7 Finite Automaton A finite automaton consists of a finite set Q of states a start state a set A of accepting states a finite input alphabet  a transition function d : Q    Q. a 01 b a b start state accepting state Example 1 00 0 a b 0 1 state input transition function

8 Accepting a String a 01 b a b aabba bbabb input state sequence accepts? 010001 Yes 000100 No Always begins at the start state. Accepts a string if it ends at an accepting state after accepting all string chars. Otherwise, it rejects the string.

9 A String Matching Automaton Pattern P = a a b a 10 20 23 40 a b P a a b a input state 0 1 2 3 20 4 T = a b b a a a b a a b a state sequence 0 1 0 0 1 43210 a b b a b a b a a b Pattern occurs at indices 5 and 8! aba not rescanned due to transition 4  2 Ex. 2 2 3 42 3 4

10 Key Ideas of Automaton Matching Do not rescan chars of T that have already been examined. Slide pattern forward by more than one position if possible.

11 The Automaton Matcher Finite-Automaton-Matcher(T, d, m) n = length[T] q = 0 // current state for i = 1 to n do q = d(q, T[i]) // d function precomputed if q = m // match succeeds then print “Pattern occurs at index” i  m+1 O(n) if the state transition function d is available. But computing d requires O(m |  |)! // details omitted. 3


Download ppt "String Matching Input: Strings P (pattern) and T (text); |P| = m, |T| = n. Output: Indices of all occurrences of P in T. ExampleT = discombobulate later."

Similar presentations


Ads by Google