# Parameterized Pattern Matching by Boyer-Moore-type Algorithms

## Presentation on theme: "Parameterized Pattern Matching by Boyer-Moore-type Algorithms"— Presentation transcript:

Parameterized Pattern Matching by Boyer-Moore-type Algorithms
Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp    Brenda S. Baker Advisor: Prof. R. C. T. Lee Speaker: Kuei-hao Chen

Let us consider two strings:
A=a1a2a3a4a5=xaxby B=b1b2b3b4b5=bacbc If the edit distance concept is used, A may be transformed to B by substituting a1 by b1, a3 by b3 and a5 by b5.

In this paper, we define a new transformation in which a character may be substituted by another character. But the substitution is global. That is, if x in A is substituted by a, then every x in A is substituted by a.

A=a1a2a3a4a5=xaxby B=b1b2b3b4b5=bacbc Consider the above example again. To transform A to B, the first x must be substituted by b. But this is global. Thus, A’=babby It can be easily seen that if this kind of substitution is used, A=xaxby can not be transformed to B.

For A=xaxby and B=babbc, A can be transformed to B by substituting x by b and y by c.

We define bijection to be a global substitution of a set of distinct characters into another set characters. A string P p-matches a string Q if P can be transformed to Q by a bijection.

Let A=ababc B=bcbcd Then A p-matches B because there is a bijection, namely which transforms A to B.

On the other hand, for A=ababc and B=bcbdc, A does not p-match B.
It is actually easy to determine whether A p-matches B. Given A=a1a2… aN and B=b1b2…bN. A p-matches B if and only if for every i, if ai=x and bi=y, then if aj=x, bj must be y.

For A=ababc and B=bcbcc
For A=ababc and B=bcbcc. It can be seen that every a in A is matched with b and every b is matched c. This is not true for A=ababc and B=bcbdc. Thus, given a string A and a string B which are of the same length, it is trivial to determine whether A p-matches B.

There is another property which is important
There is another property which is important. If A p-matches B and B p-matches C, then A p-matches C. It is obvious that this is true.

This paper considers the following problem:
Given a text T and a pattern P, find all occurrence where P p-matches a substring of T. For example: Let and We can see that P p-matches strings in T.

For P=abaec and S2=cacbd, the substitution will transform P to S2.
For S2=cacbd and S1=bcbda, the substitution transforms S2 to S1. It can be seen that P=abaec will be transformed to S1=bcbda by

The substitution can be visualized as follows:

This paper is based upon Good suffix rule 1 and Good suffix rule 2 proposed in Boyer and Moore Algorithm.

Good Suffix Rule 1 for p-match
Let T1 be the largest suffix which p-matches with a suffix P1 of P. If there is a substring zP2 which is the right most one and p-matches with yP1 , and z≠y, we can move P as follows:

Example T v x w P’ u x v P u v w P u v w Shift Transform p-mismatch 1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 T v x w p-mismatch P’ u x v Transform P u v w 1 2 3 4 5 6 7 8 9 10 Shift P u v w 1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 T v x w P’ v x w Transform P u v w 1 2 3 4 5 6 7 8 9 10 After moving, we compare T and P from right to left. We found out T6,15≡P1,10.

Good Suffix Rule 2 for p-match
Let T1 be the largest suffix of the window of P which p-matches with a suffix P1 of P. Let be suffix of P1 which p-matches with a prefix P2 of P. If exists, we move P as follows:

Example T x v w P’ u x v P u v w P u v w Shift Transform p-mismatch 1
2 3 4 5 6 7 8 9 10 11 12 13 T x v w p-mismatch P’ u x v Transform P u v w 1 2 3 4 5 6 7 8 Shift P u v w 3 4 5 6 7 8 9 10

T x v w P’ u x v P u v w Transform 1 2 3 4 5 6 7 8 9 10 11 12 13 3 4 5

The shift function ∆ is

Example T G A C P’ C A T P A T C P A T C Shift Transform j’=7 j=9
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 T G A C p-mismatch P’ C A T Transform P A T C 1 2 3 4 5 6 7 8 9 10 11 12 j’=7 j=9 P A T C 1 2 3 4 5 6 7 8 9 10 11 12 Shift

T G A C P’ C A T P A T C P A T C Shift Transform j’=7 j=9 p-mismatch 1
2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 T G A C p-mismatch P’ C A T Transform P A T C 1 2 3 4 5 6 7 8 9 10 11 12 j’=7 j=9 P A T C 1 2 3 4 5 6 7 8 9 10 11 12 Shift

T G A C P’ T C A P A T C Transform 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
16 17 18 19 20 21 22 T G A C P’ T C A Transform P A T C 1 2 3 4 5 6 7 8 9 10 11 12

Time Complexity In average case, the preprocessing phase in O(mlog min(m, Π)) time and space complexity O(n) time complexity and searching phase in O(nlog min(m, Π)) .

References [AFM94] Amihood Amir, Martin Farach, and S. Muthukrishnan, Alphabet dependence in parameterized matching. Info. Proc. Letters, Vol. 49, pp , 1994. [Bak] Brenda S. Baker, Parameterized pattern matching: algorithms and applications., J. Comput. Syst. Sci. to appear. [Bak92] Brenda S. Baker, A program for identifying duplicated code., In Computing Science and Statistics Vol.24: Proceeding of the 24th Symposium on the Interface, pp.49-57, 1992. [Bak93a] Brenda S. Baker, Parameterized duplication in strings: algorithms and an application to software maintenance., submitted for publication, 1993. [Bak93b] Brenda S. Baker, A theory of parameterized pattern matching: Algorithms and applications, In Proceedings of the 25th Annual Symposium on Theory of Computing, pp.71-80, pp.1993. [BM77] Robert S. Boyer and J. Strother Moore, A fast string searching algorithm, Commun. ACM,Vol.20, No.10, pp , 1977.

References [BYGR90] Ricardo A. Baeza-Yates, Gaston H. Gonnet, and Mireille Regnier, Analysis of Boyer-Moore-type string searching algorithms. In Proc. of First Annual ACM-SIAM Symposium on Discrete Algorithms, pp , 1990. [BYR92] Ricardo A. Baeza-Yates and Mireille Regnier, Average running time of the Boyer-Moore-Horspool algorithm, Theoretical Computer Sci., Vol. 92, pp.19-31, 1992. [CLC+92] Maxime Crochemore, Thierry Lecroq, Artur Czumaj, Leszek Gasieniec, S. Jarominek, and W. Plandowski, Speeding up two string-matching algorithms, In 9th Annual Symposium on Theoretical Aspects of Computer Science, LNCS Vol.577, pp , 1992. [Col 91] Richard Cole. Tight bounds of the complexity of the Boyer-Moore string matching algorithm, In Proceedings of the Second Annual ACM-SIAM Symposium on Discrete Algorithms, pp , pp.1991. [Hor 80] R. Nigel Horspool. Practical fast searchingin strings. Soft. Pract. And Exp., Vol.10, pp , 1980.

References [HS91] Andrew Hume and Daniel Sunday, Fast string search, Soft. Pract. And Exp., Vol. 21, No.11, pp , 1991. [IS94] Ramana M. Idury and Alejandro A. Schaffer. Multiple matching of parameterized patterns. In proc. Of 5th Symposium on Combinatorial Pattern Matching, pp , 1994. [KMP77] D. E. Knuth, J. H. Morries, and V. R. Pratt, Fast pattern matching in strings, SIAM J. Comput., Vol.6, No.2, pp , 1977. [Ryt80] Wojciech Rytter, A correct preprocessing algorithm for Boyer-Moore string-searching, SIAM J. Comput., Vol.9, No.3, pp , 1980. [Sch88] R. Schaback, On the expected sublinearity of the Boyer-Moore algorithm. SIAM J. on Comput., Vol. 17, No.4, pp , 1988. [Sun 90] Daniel M. Sunday, A very fast substring search algorithm, Commun. ACM, Vol.33, No.8, pp , 1990

THANK YOU

Similar presentations