Presentation on theme: "1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541."— Presentation transcript:
1 Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6 th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp Brenda S. Baker Advisor: Prof. R. C. T. Lee Speaker: Kuei-hao Chen
2 Let us consider two strings: A=a 1 a 2 a 3 a 4 a 5 =xaxby B=b 1 b 2 b 3 b 4 b 5 =bacbc If the edit distance concept is used, A may be transformed to B by substituting a 1 by b 1, a 3 by b 3 and a 5 by b 5.
3 In this paper, we define a new transformation in which a character may be substituted by another character. But the substitution is global. That is, if x in A is substituted by a, then every x in A is substituted by a.
4 A=a 1 a 2 a 3 a 4 a 5 =xaxby B=b 1 b 2 b 3 b 4 b 5 =bacbc Consider the above example again. To transform A to B, the first x must be substituted by b. But this is global. Thus, A=babby It can be easily seen that if this kind of substitution is used, A=xaxby can not be transformed to B.
5 For A=xaxby and B=babbc, A can be transformed to B by substituting x by b and y by c.
6 We define bijection to be a global substitution of a set of distinct characters into another set characters. A string P p-matches a string Q if P can be transformed to Q by a bijection.
7 Let A=ababc B=bcbcd Then A p-matches B because there is a bijection, namely which transforms A to B.
8 On the other hand, for A=ababc and B=bcbdc, A does not p-match B. It is actually easy to determine whether A p- matches B. Given A=a 1 a 2 … a N and B=b 1 b 2 …b N. A p-matches B if and only if for every i, if a i =x and b i =y, then if a j =x, b j must be y.
9 For A=ababc and B=bcbcc. It can be seen that every a in A is matched with b and every b is matched c. This is not true for A=ababc and B=bcbdc. Thus, given a string A and a string B which are of the same length, it is trivial to determine whether A p-matches B.
10 There is another property which is important. If A p-matches B and B p-matches C, then A p-matches C. It is obvious that this is true.
11 This paper considers the following problem: Given a text T and a pattern P, find all occurrence where P p-matches a substring of T. For example: Let and We can see that P p-matches strings in T.
12 For P=abaec and S 2 =cacbd, the substitution will transform P to S 2. For S 2 =cacbd and S 1 =bcbda, the substitution transforms S 2 to S 1. It can be seen that P=abaec will be transformed to S 1 =bcbda by
13 The substitution can be visualized as follows:
14 This paper is based upon Good suffix rule 1 and Good suffix rule 2 proposed in Boyer and Moore Algorithm.
15 Good Suffix Rule 1 for p-match Let T 1 be the largest suffix which p-matches with a suffix P 1 of P. If there is a substring zP 2 which is the right most one and p-matches with yP 1, and zy, we can move P as follows:
Tvvxxvvvvxxwvvww Puuuvvvwwvv Shift Example p-mismatch Puuuvvvwwvv uuuxxxvvxx Transform P
Tvvxxvvvvxxwvvww Puuuvvvwwvv vvvxxwvvww Transform After moving, we compare T and P from right to left. We found out T 6,15P 1,10. P
18 Good Suffix Rule 2 for p-match Let T 1 be the largest suffix of the window of P which p- matches with a suffix P 1 of P. Let be suffix of P 1 which p-matches with a prefix P 2 of P. If exists, we move P as follows:
Txxvvvvxxwvvww Puvvvwwvv Shift p-mismatch Puvvvwwvv uxxxvvxx Transform P Example
Txxvvvvxxwvvww Puvvvwwvv uxxxvvxx Transform P
21 The shift function is
TGATCGATCAATCATATCATCAT PATCACATCATCA Example CATCTCATCATCP Transform p-mismatch j=7j=9 PATCACATCATCA Shift
TGATCGATCAATCATATCATCAT PATCACATCATCA Transform p-mismatch j=7j=9 PATCACATCATCA Shift CATCTCATCATC P
TGATCGATCAATCATATCATCAT PATCACATCATCA Transform TCATATCATCAT P
25 Time Complexity In average case, the preprocessing phase in O(mlog min(m, Π)) time and space complexity O(n) time complexity and searching phase in O(nlog min(m, Π)).
26 References [AFM94] Amihood Amir, Martin Farach, and S. Muthukrishnan, Alphabet dependence in parameterized matching. Info. Proc. Letters, Vol. 49, pp , [Bak] Brenda S. Baker, Parameterized pattern matching: algorithms and applications., J. Comput. Syst. Sci. to appear. [Bak92] Brenda S. Baker, A program for identifying duplicated code., In Computing Science and Statistics Vol.24: Proceeding of the 24th Symposium on the Interface, pp.49-57, [Bak93a] Brenda S. Baker, Parameterized duplication in strings: algorithms and an application to software maintenance., submitted for publication, [Bak93b] Brenda S. Baker, A theory of parameterized pattern matching: Algorithms and applications, In Proceedings of the 25th Annual Symposium on Theory of Computing, pp.71-80, pp [BM77] Robert S. Boyer and J. Strother Moore, A fast string searching algorithm, Commun. ACM,Vol.20, No.10, pp , 1977.
27 References [BYGR90] Ricardo A. Baeza-Yates, Gaston H. Gonnet, and Mireille Regnier, Analysis of Boyer-Moore-type string searching algorithms. In Proc. of First Annual ACM-SIAM Symposium on Discrete Algorithms, pp , [BYR92] Ricardo A. Baeza-Yates and Mireille Regnier, Average running time of the Boyer-Moore-Horspool algorithm, Theoretical Computer Sci., Vol. 92, pp.19-31, [CLC+92] Maxime Crochemore, Thierry Lecroq, Artur Czumaj, Leszek Gasieniec, S. Jarominek, and W. Plandowski, Speeding up two string- matching algorithms, In 9th Annual Symposium on Theoretical Aspects of Computer Science, LNCS Vol.577, pp , [Col 91] Richard Cole. Tight bounds of the complexity of the Boyer- Moore string matching algorithm, In Proceedings of the Second Annual ACM-SIAM Symposium on Discrete Algorithms, pp , pp [Hor 80] R. Nigel Horspool. Practical fast searchingin strings. Soft. Pract. And Exp., Vol.10, pp , 1980.
28 References [HS91] Andrew Hume and Daniel Sunday, Fast string search, Soft. Pract. And Exp., Vol. 21, No.11, pp , [IS94] Ramana M. Idury and Alejandro A. Schaffer. Multiple matching of parameterized patterns. In proc. Of 5th Symposium on Combinatorial Pattern Matching, pp , [KMP77] D. E. Knuth, J. H. Morries, and V. R. Pratt, Fast pattern matching in strings, SIAM J. Comput., Vol.6, No.2, pp , [Ryt80] Wojciech Rytter, A correct preprocessing algorithm for Boyer- Moore string-searching, SIAM J. Comput., Vol.9, No.3, pp , [Sch88] R. Schaback, On the expected sublinearity of the Boyer-Moore algorithm. SIAM J. on Comput., Vol. 17, No.4, pp , [Sun 90] Daniel M. Sunday, A very fast substring search algorithm, Commun. ACM, Vol.33, No.8, pp , 1990