Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,

Similar presentations


Presentation on theme: "1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,"— Presentation transcript:

1 1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California, Berkeley, 1970. Morris (Jr), J. H., Pratt, V. R.

2 2 Morris-Pratt algorithm We are given a text T and a pattern P to find all occurrences of P in T and perform the comparisons from left to right. n : the length of T m : the length of P Example 1234567891011121314151617181920 TAAAAAATCACATTAGCAAAA PATCACAGTATCA 123456789101112

3 3 The basic principle of MP Algorithm is step by step comparison. Initially, we compare T(1) with P(1). If T(1) ≠ P(1), we move The pattern one step towards the right. Example 1234567891011121314151617181920 TAAAAAATCACATTAGCAAAA PCTCACAGTATCA 123456789101112 PCTCACAGTATCA 123456789101112

4 4 Suppose the following condition occurs, should we move pattern P only one step towards the right? The answer is no in this case as we may use Rule 1, the suffix of T to prefix of P rule. b T a P j i+j-1 i 1 1 j+m-1 n m Example 1234567891011121314151617181920 TAAAAAATCACATTAGCAAAA PATCACAGTATCA 123456789101112

5 5 Rule 1: The Suffix of T to Prefix of P Rule For a window to have any chance to match a pattern, in some way, there must be a suffix of the window which is equal to a prefix of the pattern. T P

6 6 The Implication of Rule 1: Find the longest suffix v of the window which is equal to some prefix of P. Skip the pattern as follows: T P v v P v

7 7 Now, we know that a prefix U of T is equal to a prefix U of P. Thus, instead of finding the longest suffix of T equal to a prefix of P, We may simply find the longest suffix of U of P which is equal to a prefix of P. Ub T Ua P v Example 1234567891011121314151617181920 TAAAAACAGACATTAGCAAAA PCAGACAGTATCA 123456789101112

8 8 Example 1234567891011121314151617181920 TAAAAACAGACATTAGCAAAA PCAGACAGTATCA 123456789101112 In this case, we can see the longest suffix of U which is equal to a prefix of P is CA. Thus, we may apply Rule 1 to move P as follows: 1234567891011121314151617181920 TAAAAACAGACATTAGCAAAA PCAGACAGTATCA 123456789101112

9 9 The MP Algorithm Assume that we have already found the largest prefix of T which is equal to a prefix of P. T P U Ua b

10 10 The MP Algorithm Skip the pattern by using Rule 1. T P v v v a b c T P v v b c Given a substring U of T which is equal to a prefix of P, how do we know the longest suffix of U which is equal to some prefix of P? We do this by pre-processing....

11 11 Prefix Function In preprocessing phase, we would construct a table, named Prefix. Definition –For location i, let j be the largest j, if it exists, such that P(0,j-1) is a suffix of P(0,i), Prefix(i)=j. –If, for P(0,i), there is no prefix equal to a suffix, Prefix(i)=0.

12 12 Example Note that, we move the pattern i-Prefix(i-1) steps when a mismatch occurs at location i. 5 b 6 c 7 b 8 a 9 e 10 b 11 c 12 b 1 b 2 c 3 b 4 a 123401230010 P 13 a 4 i 14 b 15 c 16 b 567 a 8 0 Prefix

13 13 How can we construct the Prefix Table Efficiently? To compute Prefix(i), we look at Prefix(i-1). In the following example, since Prefix(11)=4, we know that there exists a prefix of length 4 which is equal to a suffix with length 4 of P(0,11). Besides, P(4)=P(12). We may conclude that Prefix(12)=Prefix(11)+1=4+1=5. 5 a 6 c 7 c 8 g 9 a 10 g 11 c 12 a 12 g 3 c 4 a P a i 0 a 1000123400150Prefix

14 14 Another Case Consider the following example. Prefix(9)=4. But P(4)≠P(10). Can we conclude that Prefix(10)=0? No, we cannot. 5 a 6 g 7 c 8 g 9 c 10 gc 12 g 3 c 4 g P i 0 c 001234?0120 Prefix

15 15 There exists a shorter prefix with length 2 which is equal to a suffix of P(0, 9), and P(10)=P(2). We should conclude that Prefix(10)=2+1=3. 5 a 6 g 7 c 8 g 9 c 10 gc 12 g 3 c 4 g P i 0 c 00123430120 Prefix

16 16 In other words, we may use the pointer idea expressed below: It may be necessary to examine P(0, j) to see whether there exists a prefix of P(0, j) equal to a suffix of P(0, j). Thus the Prefix function can be found recursively. YX i-1 j

17 17 Construct the Prefix Function f f [0]=0 for ( i=1 ; i<m ; i++ ){ t = f (i-1); /*t is the value of f(i-1)*/ While(t>=0){ if ( P(i) = P(t) ) { f [i] = t + 1; break; } else{ if ( t != 0) t = f [t-1]; /*recursive*/ else{ f [i] = 0; break; }

18 18 t = f[i-1] = f[0] = 0; ∵ P[1] = c ≠ P[t] = P[0] = b ∴ f [1] = 0. Example: 5 a 6 g 7 c 8 g 9 c 10 gc 12 g 3 c 4 g P i 0 c 0 Prefix 5 a 6 g 7 c 8 g 9 c 10 gc 12 g 3 c 4 g P i 0 c 00 Prefix

19 19 t = f[i-1] = f[3] = 2; P[4] = a ≠ P[t] = P[2] = c, and t != 0; t = f[t-1] = f[1] = 0; ∵ P[4] = a ≠ P[t] = P[2] = c ∴ f [4] = 0. 5 a 6 g 7 c 8 g 9 c 10 gc 12 g 3 c 4 g P i 0 c 010 Prefix 5 a 6 g 7 c 8 g 9 c 10 gc 12 g 3 c 4 g P i 0 c 00120 Prefix t = f[i-1] = f[0] = 0; ∵ P[2] = c = P[t] = P[0] = c ∴ f [2] = t + 1 = 1.

20 20 t = f[i-1] = f[9] = 4; P[10] = c ≠ P[t] = P[4] = a, and t != 0; t = f[t-1] = f[3] = 2; ∵ P[10] = c = P[t] = P[2] = c ∴ f [10] = t + 1= 3. 5 a 6 g 7 c 8 g 9 c 10 gc 12 g 3 c 4 g P i 0 c 0012340120 Prefix 5 a 6 g 7 c 8 g 9 c 10 gc 12 g 3 c 4 g P i 0 c 00123430120 Prefix t = f[i-1] = f[8] = 3; ∵ P[9] = g = P[t] = P[3] = g ∴ f [9] = t + 1 = 4.

21 21 Example 012345678910111213141516171819 Tacgccgcgagcgcgctcaaa Pcgcgagcgcgc 012345678910 Shift by 1 012345678910111213141516171819 Tacgccgcgagcgcgctcaaa Pcgcgagcgcgc 012345678910 0 1 2 3 4 5 6 7 8 9 10 11 1 1 2 2 2 5 6 6 6 6 6 8 i i - prefix(i-1)

22 22 Example 012345678910111213141516171819 Tacgccgcgagcgcgctcaaa Pcgcgagcgcgc 012345678910 Shift by 2 012345678910111213141516171819 Tacgccgcgagcgcgctcaaa Pcgcgagcgcgc 012345678910 0 1 2 3 4 5 6 7 8 9 10 11 1 1 2 2 2 5 6 6 6 6 6 8 i i - prefix(i-1)

23 23 Example 012345678910111213141516171819 Tacgccgcgagcgcgctcaaa Pcgcgagcgcgc 012345678910 Shift by 1 012345678910111213141516171819 Tacgccgcgagcgcgctcaaa Pcgcgagcgcgc 012345678910 0 1 2 3 4 5 6 7 8 9 10 11 1 1 2 2 2 5 6 6 6 6 6 8 i i - prefix(i-1)

24 24 Example 012345678910111213141516171819 Tacgccgcgagcgcgctcaaa Pcgcgagcgcgc 012345678910 Shift by 8 012345678910111213141516171819 Tacgccgcgagcgcgctcaaa Pcgcgagcgcgc 012345678910 0 1 2 3 4 5 6 7 8 9 10 11 1 1 2 2 2 5 6 6 6 6 6 8 i i - prefix(i-1) Match!

25 25 Time Complexity preprocessing phase in O(m) space and time complexity searching phase in O(n+m) time complexity

26 26 References AHO, A.V., HOPCROFT, J.E., ULLMAN, J.D., 1974, The design and analysis of computer algorithms, 2nd Edition, Chapter 9, pp. 317--361, Addison-Wesley Publishing Company. BEAUQUIER, D., BERSTEL, J., CHRÉTIENNE, P., 1992, Éléments d'algorithmique, Chapter 10, pp 337-377, Masson, Paris. CROCHEMORE, M., 1997. Off-line serial exact string searching, in Pattern Matching Algorithms, ed. A. Apostolico and Z. Galil, Chapter 1, pp 1-53, Oxford University Press. HANCART, C., 1992, Une analyse en moyenne de l'algorithme de Morris et Pratt et de ses raffinements, in Théorie des Automates et Applications, Actes des 2e Journées Franco- Belges, D. Krob ed., Rouen, France, 1991, PUR 176, Rouen, France, 99-110. HANCART, C., 1993. Analyse exacte et en moyenne d'algorithmes de recherche d'un motif dans un texte, Ph. D. Thesis, University Paris 7, France. MORRIS (Jr) J.H., PRATT V.R., 1970, A linear pattern-matching algorithm, Technical Report 40, University of California, Berkeley.

27 27 Knuth-Morris-Pratt Algorithm KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R.,, Fast pattern matching in strings, SIAM Journal on Computing 6(1), 1977, pp.323-350.

28 28 In MP algorithm, it has two cases to move the pattern from left to right. wy wx uux wy P uux Case 1: There exists a suffix of w which equals to a prefix of w. Case 2: No suffix of w which equals to a prefix of w. T P T P

29 29 KMP algorithm improves the MP algorithm, and it has two cases. wy wx uzux wy P uzux Case 1: There exists a suffix of w which equals to a prefix of w and z ≠ x. Case 2: No suffix of w which equals to a prefix of w such that z ≠ x. T P T P

30 30 5 b 6 c 7 b 8 a 9 e 10 b 11 c 12 b 1 b 2 c 3 b 4 a 0 14 0 0 1 P 13 a 1 i 14 b 15 c 16 b 0 a 1 0 T:T: P:P: bacbbcbaebacbbcba baccbcbaebacbbcba…… Mismatch occurs at location 4 of P. Move P (4 - KMPtable[4]) = 4 - (-1) = 5 steps. bacbbcbaebacbbcba

31 31 5 b 6 c 7 b 8 a 9 e 10 b 11 c 12 b 1 b 2 c 3 b 4 a 0 14 0 0 1 P 13 a 1 i 14 b 15 c 16 b 0 a 1 0 T:T: P:P: bacbbcbaebacbbcba bacbbabaebacbbcba…… Mismatch occurs at location 4 of P. Move P (5 - KMPtable[5]) = 5 - 0 = 5 steps. bacbbcbaebacbbcba

32 32 5 b 6 c 7 b 8 a 9 e 10 b 11 c 12 b 1 b 2 c 3 b 4 a 0 14 0 0 1 P 13 a 1 i 14 b 15 c 16 b 0 a 1 0 T:T: P:P: bacbbcbaebacbbcba bacbbcbabbacbbcba…… Mismatch occurs in position 8 of P. Move P (8 - KMPtable[8]) = 8 - 4 = 4 steps. bacbbcbaebacbbcba

33 33 Example 012345678910111213141516171819 Tacacgagcaccgcgctcaaa Pcacgagcacgc 012345678910 cacgagcacgc 0123456789 cacgagcacgc 0123456789 (MP Algorithm) (KMP Algorithm)

34 34 Time Complexity Preprocessing phase in O(m) space and time complexity. Searching phase in O(n+m) time complexity.

35 35 References AHO, A.V., 1990, Algorithms for finding patterns in strings. in Handbook of Theoretical Computer Science, Volume A, Algorithms and complexity, J. van Leeuwen ed., Chapter 5, pp 255-300, Elsevier, Amsterdam. AOE, J.-I., 1994, Computer algorithms: string pattern matching strategies, IEEE Computer Society Press. BAASE, S., VAN GELDER, A., 1999, Computer Algorithms: Introduction to Design and Analysis, 3rd Edition, Chapter 11, Addison-Wesley Publishing Company. BAEZA-YATES R., NAVARRO G., RIBEIRO-NETO B., 1999, Indexing and Searching, in Modern Information Retrieval, Chapter 8, pp 191-228, Addison-Wesley. BEAUQUIER, D., BERSTEL, J., CHRÉTIENNE, P., 1992, Éléments d'algorithmique, Chapter 10, pp 337-377, Masson, Paris. CORMEN, T.H., LEISERSON, C.E., RIVEST, R.L., 1990. Introduction to Algorithms, Chapter 34, pp 853-885, MIT Press. CROCHEMORE, M., 1997. Off-line serial exact string searching, in Pattern Matching Algorithms, ed. A. Apostolico and Z. Galil, Chapter 1, pp 1-53, Oxford University Press. CROCHEMORE, M., HANCART, C., 1999, Pattern Matching in Strings, in Algorithms and Theory of Computation Handbook, M.J. Atallah ed., Chapter 11, pp 11-1--11-28, CRC Press Inc., Boca Raton, FL. CROCHEMORE, M., LECROQ, T., 1996, Pattern matching and text compression algorithms, in CRC Computer Science and Engineering Handbook, A. Tucker ed., Chapter 8, pp 162-202, CRC Press Inc., Boca Raton, FL. CROCHEMORE, M., RYTTER, W., 1994, Text Algorithms, Oxford University Press. GONNET, G.H., BAEZA-YATES, R.A., 1991. Handbook of Algorithms and Data Structures in Pascal and C, 2nd Edition, Chapter 7, pp. 251-288, Addison-Wesley Publishing Company.

36 36 References GOODRICH, M.T., TAMASSIA, R., 1998, Data Structures and Algorithms in JAVA, Chapter 11, pp 441-467, John Wiley & Sons. GUSFIELD, D., 1997, Algorithms on strings, trees, and sequences: Computer Science and Computational Biology, Cambridge University Press. HANCART, C., 1992, Une analyse en moyenne de l'algorithme de Morris et Pratt et de ses raffinements, in Théorie des Automates et Applications, Actes des 2e Journées Franco-Belges, D. Krob ed., Rouen, France, 1991, PUR 176, Rouen, France, 99-110. HANCART, C., 1993. Analyse exacte et en moyenne d'algorithmes de recherche d'un motif dans un texte, Ph. D. Thesis, University Paris 7, France. KNUTH D.E., MORRIS (Jr) J.H., PRATT V.R., 1977, Fast pattern matching in strings, SIAM Journal on Computing 6(1):323-350. SEDGEWICK, R., 1988, Algorithms, Chapter 19, pp. 277-292, Addison-Wesley Publishing Company. SEDGEWICK, R., 1988, Algorithms in C, Chapter 19, Addison-Wesley Publishing Company. SEDGEWICK, R., FLAJOLET, P., 1996, An Introduction to the Analysis of Algorithms, Chapter ?, pp. ??-??, Addison-Wesley Publishing Company. STEPHEN, G.A., 1994, String Searching Algorithms, World Scientific. WATSON, B.W., 1995, Taxonomies and Toolkits of Regular Language Algorithms, Ph. D. Thesis, Eindhoven University of Technology, The Netherlands. WIRTH, N., 1986, Algorithms & Data Structures, Chapter 1, pp. 17-72, Prentice-Hall.

37 37 Simon Algorithm String matching algorithms and automata SIMON I. 1st American Workshop on String Processing, pp 151-157(1993)

38 38 KMP algorithm improves the MP algorithm, and it has two cases. wy wx uzux wy P uzux Case 1: There exists a suffix of w which equals to a prefix of w and z ≠ x. Case 2: No suffix of w which equals to a prefix of w such that z ≠ x. T P T P

39 39 Simon algorithm improves KMP algorithm, and it has two cases. wy wx uzux uzux wy P Case 1: There exists a suffix of w which equals to a prefix of w and z = y Case 2: No suffix of w which equals to a prefix of w such that z = y. T P T P

40 40 Example 012345678910111213141516171819 Tacacgagcaccgcgctcaaa Pcacgagcacgc 012345678910 cacgagcacgc 0123456789 cacgagcacgc 0123456789 (MP Algorithm) (KMP Algorithm) cacgagcacgc 012345678910 (Simon Algorithm)

41 41 References BEAUQUIER, D., BERSTEL, J., CHRÉTIENNE, P., 1992, Éléments d'algorithmique, Chapter 10, pp 337-377, Masson, Paris. CROCHEMORE, M., 1997. Off-line serial exact string searching, in Pattern Matching Algorithms, ed. A. Apostolico and Z. Galil, Chapter 1, pp 1-53, Oxford University Press. CROCHEMORE, M., HANCART, C., 1997. Automata for Matching Patterns, in Handbook of Formal Languages, Volume 2, Linear Modeling: Background and Application, G. Rozenberg and A. Salomaa ed., Chapter 9, pp 399-462, Springer-Verlag, Berlin. CROCHEMORE, M., RYTTER, W., 1994, Text Algorithms, Oxford University Press. HANCART, C., 1992, Une analyse en moyenne de l'algorithme de Morris et Pratt et de ses raffinements, in Théorie des Automates et Applications, Actes des 2e Journées Franco-Belges, D. Krob ed., Rouen, France, 1991, PUR 176, Rouen, France, 99-110. HANCART, C., 1993, On Simon's string searching algorithm, Inf. Process. Lett. 47(2):95-99. HANCART, C., 1993. Analyse exacte et en moyenne d'algorithmes de recherche d'un motif dans un texte, Ph. D. Thesis, University Paris 7, France. SIMON I., 1993, String matching algorithms and automata, in in Proceedings of 1st American Workshop on String Processing, R.A. Baeza-Yates and N. Ziviani ed., pp 151-157, Universidade Federal de Minas Gerais, Brazil. SIMON, I., 1994, String matching algorithms and automata, in Results and Trends in Theoretical Computer Science, Graz, Austria, Karhumäki, Maurer and Rozenberg ed., pp 386- 395, Lecture Notes in Computer Science 814, Springer Verlag.

42 42 Thanks for your attention.


Download ppt "1 Morris-Pratt Algorithm Advisor: Prof. R. C. T. Lee Speaker: C. W. Lu A linear pattern-matching algorithm, Technical Report 40, University of California,"

Similar presentations


Ads by Google