Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Morris-Pratt algorithm Advisor: Prof. R. C. T. Lee Reporter: C. S. Ou A linear pattern-matching algorithm, Technical Report 40, University of California,

Similar presentations


Presentation on theme: "1 Morris-Pratt algorithm Advisor: Prof. R. C. T. Lee Reporter: C. S. Ou A linear pattern-matching algorithm, Technical Report 40, University of California,"— Presentation transcript:

1 1 Morris-Pratt algorithm Advisor: Prof. R. C. T. Lee Reporter: C. S. Ou A linear pattern-matching algorithm, Technical Report 40, University of California, Berkeley, 1970. Morris (Jr) J. H., Pratt V. R.

2 2 Morris-Pratt algorithm We are given a text T and a pattern P to find all occurrences of P in T and perform the comparisons from left to right. n : the length of T m : the length of P Example 1234567891011121314151617181920 tAAAAAATCACATTAGCAAAA pATCACAGTATCA 123456789101112

3 3 Rule 1: The Partial Window Rule This rule means that instead of a complete window whose is equal to the size of the pattern, we may use a prefix of a complete window to match the prefix of a prefix of the complete pattern. T P A complete window How do we get the partial window?

4 4 The basic principle of MP Algorithm is still step by step comparison. Initially, the length of the partial window is 1. Initially, we compare T(1) with P(1). If T(1) ≠ P(1), we move The pattern one step towards the right. Example 1234567891011121314151617181920 TAAAAAATCACATTAGCAAAA PCTCACAGTATCA 123456789101112 PCTCACAGTATCA 123456789101112

5 5 If T(1)=P(1), we extend the partial window until a mismatching is found. Example 1234567891011121314151617181920 TATCACAGCACATTAGCAAAA PATCACAGTATCA 123456789101112

6 6 Suppose the following condition occurs, should we move pattern P only one step towards the right? The answer is no in this case as we may use Rule 2, the suffix of T to prefix of P rule. b T a P j i+j-1 i 1 1 j+m-1 n m Example 1234567891011121314151617181920 tAAAAAATCACATTAGCAAAA pATCACAGTATCA 123456789101112

7 7 Rule 2: The Suffix of T to Prefix of P Rule For a window to have any chance to match a pattern, in some way, there must be a suffix of the window which is equal to a prefix of the pattern. T P

8 8 The Implication of Rule 2: Find the longest suffix v of the window which is equal to some prefix of P. Skip the pattern as follows: T P v v P v

9 9 Now, we know that a prefix U of T is equal to a prefix U of P. Thus, instead of finding the longest suffix of T equal to a prefix of P, We may simply find the longest suffix of U of P which is equal to a prefix of P. Ub T Ua P v Example 1234567891011121314151617181920 TAAAAACACACATTAGCAAAA PCACACAGTATCA 123456789101112

10 10 Example 1234567891011121314151617181920 tAAAAACACACATTAGCAAAA pCACACAGTATCA 123456789101112 In this case, we can see the longest suffix of U which is equal to a prefix of P is CA. Thus, we may apply Rule 2 to move P as follows: 1234567891011121314151617181920 tAAAAACACACATTAGCAAAA pCACACAGTATCA 123456789101112

11 11 The MP Algorithm Assume that we have already found the largest prefix of T which is equal to a prefix of P. t p U Ua b

12 12 The MP Algorithm Skip the pattern by using Rule 1 and Rule 2. T P v v v a b c T P v v b c Given a prefix U of T which is equal to a prefix of P, how do we know the longest Suffix of U which is equal to some prefix of U? We do this by pre-processing.

13 13 for x > 1 and prefix function 1 2 3 4 5 6 7 8 9 10 11 12 13 0 0 0 1 0 1 2 3 4 2 3 4 -1 0 0 0 1 0 1 2 3 4 2 3 4 1 1 2 3 3 5 5 5 5 5 8 8 8 Preprocessing phase pATCACATCATCA 12345678910111213 Example j f(j) j - g(j) Let The prefix function f(j), 2 ≤ j ≤ m, for P( j) can be written as follows: g(j) MP algorithm uses j – g(j) – 1 to decide the distance that pattern P aligns in text T.

14 14 prefix function 1 2 3 4 5 6 7 8 9 10 11 12 13 0 0 0 1 pATCACATCATCA 12345678910111213 Example j f(j) j = 1 →f(1) = 0 j = 2 →P 2 = ‘T’≠ P f 1 (2-1)+1 =P 1 =‘A’ →f(2)=0 j = 3 → P 3 = ‘C’≠ P f 1 (3-1)+1 =P 1 =‘A’ →f(3)=0 j = 4 →P 4 = ‘A’= P f 1 (4-1)+1 =P 1 =‘A’ →f(4)=0+1=1

15 15 pATCACATCATCA 12345678910111213 Example j f(j) prefix function 1 2 3 4 5 6 7 8 9 10 11 12 13 0 0 0 1 0 1 2 3 4 j = 5 →P 5 = ‘C’≠ P f 1 (5-1)+1 =P 1+1 =‘T’ →f(5)=0 j = 6 → P 6 = ‘A’= P f 1 (6-1)+1 =P 1 =‘A’ →f(6)=0+1=1 j = 7 → P 7 = ‘T’= P f 1 (7-1)+1 =P 1+1 =‘T’ →f(7)=1+1=2 j = 8 → P 8 = ‘C’= P f 1 (8-1)+1 =P 2+1 =‘C’ →f(8)=2+1=3 j = 9 → P 9 = ‘A’= P f 1 (9-1)+1 =P 3+1 =‘A’ →f(9)=3+1=4

16 16 We have found that f(9) = 4. We now check whether P(10)=P(5). The answer is no. Does this mean that we should set f(9) to be 0? No. pATCACATCATCA 12345678910111213 Example j f(j) prefix function 1 2 3 4 5 6 7 8 9 10 11 12 13 0 0 0 1 0 1 2 3 4 2 3 4 j = 10 →P 10 = ‘T’≠ P f 2 (10-1)+1 =P f (4)+1 =P 1+1 =P 2 =‘T’ →f(10)=1+1=2 j = 11 → P 11 = ‘C’= P f 1 (11-1)+1 =P 2+1 =‘C’ →f(11)=2+1=3 j = 12 → P 12 = ‘A’= P f 1 (12-1)+1 =P 3+1 =‘T’ →f(12)=3+1=4

17 17 Then, after a shift, the comparisons can resume between characters c = P(f(i )) and T( i +j) = b without missing any occurrence of P in T, and avoiding a backtrack on the text. ub T ua P i+j-1 i 1 1 j+m-1 n m Example v a P vc 1234567891011121314151617181920 TAAAAACACACATTAGCAAAA PCACACAGTATCA 123456789101112 PCACACAGTATCA 123456789101112

18 18 Example 1234567891011121314151617181920 TACACGTACACACAGTATCAA PCACACAGTATCA 123456789101112 PCACACAGTATCA 123456789101112 Shift by 1 1234567891011121314151617181920 TACACGTACACACAGTATCAA 1 2 3 4 5 6 7 8 9 10 11 12 13 1 1 2 2 2 2 2 7 8 9 10 10 10 j j - g(j)-1 prefix function

19 19 Example 1234567891011121314151617181920 TACACGTACACACAGTATCAA PCACACAGTATCA 123456789101112 PCACACAGTATCA 123456789101112 Shift by 2 1234567891011121314151617181920 TACACGTACACACAGTATCAA 1 2 3 4 5 6 7 8 9 10 11 12 13 1 1 2 2 2 2 2 7 8 9 10 10 10 j prefix function j - g(j)-1

20 20 Example 1234567891011121314151617181920 TACACGTACACACAGTATCAA PCACACAGTATCA 123456789101112 PCACACAGTATCA 123456789101112 Shift by 1 1234567891011121314151617181920 TACACGTACACACAGTATCAA 1 2 3 4 5 6 7 8 9 10 11 12 13 1 1 2 2 2 2 2 7 8 9 10 10 10 j prefix function j - g(j)-1

21 21 Example 1234567891011121314151617181920 TACACGTACACACAGTATCAA PCACACAGTATCA 123456789101112 PCACACAGTATCA 123456789101112 Shift by 1 1234567891011121314151617181920 TACACGTACACACAGTATCAA 1 2 3 4 5 6 7 8 9 10 11 12 13 1 1 2 2 2 2 2 7 8 9 10 10 10 j prefix function j - g(j)-1

22 22 Example 1234567891011121314151617181920 TACACGTACACACAGTATCAA PCACACAGTATCA 123456789101112 PCACACAGTATCA 123456789101112 Shift by 1 1234567891011121314151617181920 TACACGTACACACAGTATCAA 1 2 3 4 5 6 7 8 9 10 11 12 13 1 1 2 2 2 2 2 7 8 9 10 10 10 j prefix function j - g(j)-1

23 23 Example 1234567891011121314151617181920 TACACGTACACACAGTATCAA PCACACAGTATCA 123456789101112 PCACACAGTATCA 123456789101112 Shift by 1 1234567891011121314151617181920 TACACGTACACACAGTATCAA 1 2 3 4 5 6 7 8 9 10 11 12 13 1 1 2 2 2 2 2 7 8 9 10 10 10 j prefix function j - g(j)-1

24 24 Example 1234567891011121314151617181920 TACACGTACACACAGTATCAA PCACACAGTATCA 123456789101112 PCACACAGTATCA 123456789101112 Shift by 10 1234567891011121314151617181920 TACACGTACACACAGTATCAA MATCH 1 2 3 4 5 6 7 8 9 10 11 12 13 1 1 2 2 2 2 2 7 8 9 10 10 10 j prefix function j - g(j)-1

25 25 Time Complexity preprocessing phase in O(m) space and time complexity searching phase in O(n+m) time complexity

26 26 References AHO, A.V., HOPCROFT, J.E., ULLMAN, J.D., 1974, The design and analysis of computer algorithms, 2nd Edition, Chapter 9, pp. 317--361, Addison-Wesley Publishing Company. BEAUQUIER, D., BERSTEL, J., CHRÉTIENNE, P., 1992, Éléments d'algorithmique, Chapter 10, pp 337-377, Masson, Paris. CROCHEMORE, M., 1997. Off-line serial exact string searching, in Pattern Matching Algorithms, ed. A. Apostolico and Z. Galil, Chapter 1, pp 1-53, Oxford University Press. HANCART, C., 1992, Une analyse en moyenne de l'algorithme de Morris et Pratt et de ses raffinements, in Théorie des Automates et Applications, Actes des 2e Journées Franco- Belges, D. Krob ed., Rouen, France, 1991, PUR 176, Rouen, France, 99-110. HANCART, C., 1993. Analyse exacte et en moyenne d'algorithmes de recherche d'un motif dans un texte, Ph. D. Thesis, University Paris 7, France. MORRIS (Jr) J.H., PRATT V.R., 1970, A linear pattern-matching algorithm, Technical Report 40, University of California, Berkeley.

27 27 Thanks for your attention.


Download ppt "1 Morris-Pratt algorithm Advisor: Prof. R. C. T. Lee Reporter: C. S. Ou A linear pattern-matching algorithm, Technical Report 40, University of California,"

Similar presentations


Ads by Google