Presentation on theme: "Parameterized Pattern Matching by Boyer-Moore-type Algorithms"— Presentation transcript:
1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp Brenda S. BakerAdvisor: Prof. R. C. T. LeeSpeaker: Kuei-hao Chen
2 Let us consider two strings: A=a1a2a3a4a5=xaxbyB=b1b2b3b4b5=bacbcIf the edit distance concept is used, A may be transformed to B by substituting a1 by b1, a3 by b3 and a5 by b5.
3 In this paper, we define a new transformation in which a character may be substituted by another character. But the substitution is global. That is, if x in A is substituted by a, then every x in A is substituted by a.
4 A=a1a2a3a4a5=xaxbyB=b1b2b3b4b5=bacbcConsider the above example again. To transform A to B, the first x must be substituted by b. But this is global. Thus,A’=babbyIt can be easily seen that if this kind of substitution is used, A=xaxby can not be transformed to B.
5 For A=xaxby and B=babbc, A can be transformed to B by substituting x by b and y by c.
6 We define bijection to be a global substitution of a set of distinct characters into another set characters.A string P p-matches a string Q if P can be transformed to Q by a bijection.
7 LetA=ababcB=bcbcdThen A p-matches B because there is a bijection, namely which transforms A to B.
8 On the other hand, for A=ababc and B=bcbdc, A does not p-match B. It is actually easy to determine whether A p-matches B. Given A=a1a2… aN and B=b1b2…bN. A p-matches B if and only if for every i, if ai=x and bi=y, then if aj=x, bj must be y.
9 For A=ababc and B=bcbcc For A=ababc and B=bcbcc. It can be seen that every a in A is matched with b and every b is matched c. This is not true for A=ababc and B=bcbdc.Thus, given a string A and a string B which are of the same length, it is trivial to determine whether A p-matches B.
10 There is another property which is important There is another property which is important. If A p-matches B and B p-matches C, then A p-matches C. It is obvious that this is true.
11 This paper considers the following problem: Given a text T and a pattern P, find all occurrence where P p-matches a substring of T.For example:LetandWe can see that P p-matches strings in T.
12 For P=abaec and S2=cacbd, the substitution will transform P to S2. For S2=cacbd and S1=bcbda, the substitutiontransforms S2 to S1.It can be seen that P=abaec will be transformed to S1=bcbda by
14 This paper is based upon Good suffix rule 1 and Good suffix rule 2 proposed in Boyer and Moore Algorithm.
15 Good Suffix Rule 1 for p-match Let T1 be the largest suffix which p-matches with a suffix P1 of P. If there is a substring zP2 which is the right most one and p-matches with yP1 , and z≠y, we can move P as follows:
16 Example T v x w P’ u x v P u v w P u v w Shift Transform p-mismatch 1 23456789101112131415Tvxwp-mismatchP’uxvTransformPuvw12345678910ShiftPuvw12345678910
17 123456789101112131415TvxwP’vxwTransformPuvw12345678910After moving, we compare T and P from right to left. We found out T6,15≡P1,10.
18 Good Suffix Rule 2 for p-match Let T1 be the largest suffix of the window of P which p-matches with a suffix P1 of P.Let be suffix of P1 which p-matches with a prefix P2 of P. If exists, we move P as follows:
19 Example T x v w P’ u x v P u v w P u v w Shift Transform p-mismatch 1 2345678910111213Txvwp-mismatchP’uxvTransformPuvw12345678ShiftPuvw345678910
20 T x v w P’ u x v P u v w Transform 1 2 3 4 5 6 7 8 9 10 11 12 13 3 4 5
22 Example T G A C P’ C A T P A T C P A T C Shift Transform j’=7 j=9 12345678910111213141516171819202122TGACp-mismatchP’CATTransformPATC123456789101112j’=7j=9PATC123456789101112Shift
23 T G A C P’ C A T P A T C P A T C Shift Transform j’=7 j=9 p-mismatch 1 2345678910111213141516171819202122TGACp-mismatchP’CATTransformPATC123456789101112j’=7j=9PATC123456789101112Shift
24 T G A C P’ T C A P A T C Transform 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16171819202122TGACP’TCATransformPATC123456789101112
25 Time ComplexityIn average case, the preprocessing phase in O(mlog min(m, Π)) time and space complexity O(n) time complexity and searching phase in O(nlog min(m, Π)) .
26 References[AFM94] Amihood Amir, Martin Farach, and S. Muthukrishnan, Alphabet dependence in parameterized matching. Info. Proc. Letters, Vol. 49, pp , 1994.[Bak] Brenda S. Baker, Parameterized pattern matching: algorithms and applications., J. Comput. Syst. Sci. to appear.[Bak92] Brenda S. Baker, A program for identifying duplicated code., In Computing Science and Statistics Vol.24: Proceeding of the 24th Symposium on the Interface, pp.49-57, 1992.[Bak93a] Brenda S. Baker, Parameterized duplication in strings: algorithms and an application to software maintenance., submitted for publication, 1993.[Bak93b] Brenda S. Baker, A theory of parameterized pattern matching: Algorithms and applications, In Proceedings of the 25th Annual Symposium on Theory of Computing, pp.71-80, pp.1993.[BM77] Robert S. Boyer and J. Strother Moore, A fast string searching algorithm, Commun. ACM,Vol.20, No.10, pp , 1977.
27 References[BYGR90] Ricardo A. Baeza-Yates, Gaston H. Gonnet, and Mireille Regnier, Analysis of Boyer-Moore-type string searching algorithms. In Proc. of First Annual ACM-SIAM Symposium on Discrete Algorithms, pp , 1990.[BYR92] Ricardo A. Baeza-Yates and Mireille Regnier, Average running time of the Boyer-Moore-Horspool algorithm, Theoretical Computer Sci., Vol. 92, pp.19-31, 1992.[CLC+92] Maxime Crochemore, Thierry Lecroq, Artur Czumaj, Leszek Gasieniec, S. Jarominek, and W. Plandowski, Speeding up two string-matching algorithms, In 9th Annual Symposium on Theoretical Aspects of Computer Science, LNCS Vol.577, pp , 1992.[Col 91] Richard Cole. Tight bounds of the complexity of the Boyer-Moore string matching algorithm, In Proceedings of the Second Annual ACM-SIAM Symposium on Discrete Algorithms, pp , pp.1991.[Hor 80] R. Nigel Horspool. Practical fast searchingin strings. Soft. Pract. And Exp., Vol.10, pp , 1980.
28 References[HS91] Andrew Hume and Daniel Sunday, Fast string search, Soft. Pract. And Exp., Vol. 21, No.11, pp , 1991.[IS94] Ramana M. Idury and Alejandro A. Schaffer. Multiple matching of parameterized patterns. In proc. Of 5th Symposium on Combinatorial Pattern Matching, pp , 1994.[KMP77] D. E. Knuth, J. H. Morries, and V. R. Pratt, Fast pattern matching in strings, SIAM J. Comput., Vol.6, No.2, pp , 1977.[Ryt80] Wojciech Rytter, A correct preprocessing algorithm for Boyer-Moore string-searching, SIAM J. Comput., Vol.9, No.3, pp , 1980.[Sch88] R. Schaback, On the expected sublinearity of the Boyer-Moore algorithm. SIAM J. on Comput., Vol. 17, No.4, pp , 1988.[Sun 90] Daniel M. Sunday, A very fast substring search algorithm, Commun. ACM, Vol.33, No.8, pp , 1990