Download presentation

Presentation is loading. Please wait.

Published byMakayla Lee Modified over 4 years ago

1
**Parameterized Pattern Matching by Boyer-Moore-type Algorithms**

Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp Brenda S. Baker Advisor: Prof. R. C. T. Lee Speaker: Kuei-hao Chen

2
**Let us consider two strings:**

A=a1a2a3a4a5=xaxby B=b1b2b3b4b5=bacbc If the edit distance concept is used, A may be transformed to B by substituting a1 by b1, a3 by b3 and a5 by b5.

3
In this paper, we define a new transformation in which a character may be substituted by another character. But the substitution is global. That is, if x in A is substituted by a, then every x in A is substituted by a.

4
A=a1a2a3a4a5=xaxby B=b1b2b3b4b5=bacbc Consider the above example again. To transform A to B, the first x must be substituted by b. But this is global. Thus, A’=babby It can be easily seen that if this kind of substitution is used, A=xaxby can not be transformed to B.

5
**For A=xaxby and B=babbc, A can be transformed to B by substituting x by b and y by c.**

6
We define bijection to be a global substitution of a set of distinct characters into another set characters. A string P p-matches a string Q if P can be transformed to Q by a bijection.

7
Let A=ababc B=bcbcd Then A p-matches B because there is a bijection, namely which transforms A to B.

8
**On the other hand, for A=ababc and B=bcbdc, A does not p-match B.**

It is actually easy to determine whether A p-matches B. Given A=a1a2… aN and B=b1b2…bN. A p-matches B if and only if for every i, if ai=x and bi=y, then if aj=x, bj must be y.

9
**For A=ababc and B=bcbcc**

For A=ababc and B=bcbcc. It can be seen that every a in A is matched with b and every b is matched c. This is not true for A=ababc and B=bcbdc. Thus, given a string A and a string B which are of the same length, it is trivial to determine whether A p-matches B.

10
**There is another property which is important**

There is another property which is important. If A p-matches B and B p-matches C, then A p-matches C. It is obvious that this is true.

11
**This paper considers the following problem:**

Given a text T and a pattern P, find all occurrence where P p-matches a substring of T. For example: Let and We can see that P p-matches strings in T.

12
**For P=abaec and S2=cacbd, the substitution will transform P to S2.**

For S2=cacbd and S1=bcbda, the substitution transforms S2 to S1. It can be seen that P=abaec will be transformed to S1=bcbda by

13
**The substitution can be visualized as follows:**

14
This paper is based upon Good suffix rule 1 and Good suffix rule 2 proposed in Boyer and Moore Algorithm.

15
**Good Suffix Rule 1 for p-match**

Let T1 be the largest suffix which p-matches with a suffix P1 of P. If there is a substring zP2 which is the right most one and p-matches with yP1 , and z≠y, we can move P as follows:

16
**Example T v x w P’ u x v P u v w P u v w Shift Transform p-mismatch 1**

2 3 4 5 6 7 8 9 10 11 12 13 14 15 T v x w p-mismatch P’ u x v Transform P u v w 1 2 3 4 5 6 7 8 9 10 Shift P u v w 1 2 3 4 5 6 7 8 9 10

17
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 T v x w P’ v x w Transform P u v w 1 2 3 4 5 6 7 8 9 10 After moving, we compare T and P from right to left. We found out T6,15≡P1,10.

18
**Good Suffix Rule 2 for p-match**

Let T1 be the largest suffix of the window of P which p-matches with a suffix P1 of P. Let be suffix of P1 which p-matches with a prefix P2 of P. If exists, we move P as follows:

19
**Example T x v w P’ u x v P u v w P u v w Shift Transform p-mismatch 1**

2 3 4 5 6 7 8 9 10 11 12 13 T x v w p-mismatch P’ u x v Transform P u v w 1 2 3 4 5 6 7 8 Shift P u v w 3 4 5 6 7 8 9 10

20
**T x v w P’ u x v P u v w Transform 1 2 3 4 5 6 7 8 9 10 11 12 13 3 4 5**

21
The shift function ∆ is

22
**Example T G A C P’ C A T P A T C P A T C Shift Transform j’=7 j=9**

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 T G A C p-mismatch P’ C A T Transform P A T C 1 2 3 4 5 6 7 8 9 10 11 12 j’=7 j=9 P A T C 1 2 3 4 5 6 7 8 9 10 11 12 Shift

23
**T G A C P’ C A T P A T C P A T C Shift Transform j’=7 j=9 p-mismatch 1**

2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 T G A C p-mismatch P’ C A T Transform P A T C 1 2 3 4 5 6 7 8 9 10 11 12 j’=7 j=9 P A T C 1 2 3 4 5 6 7 8 9 10 11 12 Shift

24
**T G A C P’ T C A P A T C Transform 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15**

16 17 18 19 20 21 22 T G A C P’ T C A Transform P A T C 1 2 3 4 5 6 7 8 9 10 11 12

25
Time Complexity In average case, the preprocessing phase in O(mlog min(m, Π)) time and space complexity O(n) time complexity and searching phase in O(nlog min(m, Π)) .

26
References [AFM94] Amihood Amir, Martin Farach, and S. Muthukrishnan, Alphabet dependence in parameterized matching. Info. Proc. Letters, Vol. 49, pp , 1994. [Bak] Brenda S. Baker, Parameterized pattern matching: algorithms and applications., J. Comput. Syst. Sci. to appear. [Bak92] Brenda S. Baker, A program for identifying duplicated code., In Computing Science and Statistics Vol.24: Proceeding of the 24th Symposium on the Interface, pp.49-57, 1992. [Bak93a] Brenda S. Baker, Parameterized duplication in strings: algorithms and an application to software maintenance., submitted for publication, 1993. [Bak93b] Brenda S. Baker, A theory of parameterized pattern matching: Algorithms and applications, In Proceedings of the 25th Annual Symposium on Theory of Computing, pp.71-80, pp.1993. [BM77] Robert S. Boyer and J. Strother Moore, A fast string searching algorithm, Commun. ACM,Vol.20, No.10, pp , 1977.

27
References [BYGR90] Ricardo A. Baeza-Yates, Gaston H. Gonnet, and Mireille Regnier, Analysis of Boyer-Moore-type string searching algorithms. In Proc. of First Annual ACM-SIAM Symposium on Discrete Algorithms, pp , 1990. [BYR92] Ricardo A. Baeza-Yates and Mireille Regnier, Average running time of the Boyer-Moore-Horspool algorithm, Theoretical Computer Sci., Vol. 92, pp.19-31, 1992. [CLC+92] Maxime Crochemore, Thierry Lecroq, Artur Czumaj, Leszek Gasieniec, S. Jarominek, and W. Plandowski, Speeding up two string-matching algorithms, In 9th Annual Symposium on Theoretical Aspects of Computer Science, LNCS Vol.577, pp , 1992. [Col 91] Richard Cole. Tight bounds of the complexity of the Boyer-Moore string matching algorithm, In Proceedings of the Second Annual ACM-SIAM Symposium on Discrete Algorithms, pp , pp.1991. [Hor 80] R. Nigel Horspool. Practical fast searchingin strings. Soft. Pract. And Exp., Vol.10, pp , 1980.

28
References [HS91] Andrew Hume and Daniel Sunday, Fast string search, Soft. Pract. And Exp., Vol. 21, No.11, pp , 1991. [IS94] Ramana M. Idury and Alejandro A. Schaffer. Multiple matching of parameterized patterns. In proc. Of 5th Symposium on Combinatorial Pattern Matching, pp , 1994. [KMP77] D. E. Knuth, J. H. Morries, and V. R. Pratt, Fast pattern matching in strings, SIAM J. Comput., Vol.6, No.2, pp , 1977. [Ryt80] Wojciech Rytter, A correct preprocessing algorithm for Boyer-Moore string-searching, SIAM J. Comput., Vol.9, No.3, pp , 1980. [Sch88] R. Schaback, On the expected sublinearity of the Boyer-Moore algorithm. SIAM J. on Comput., Vol. 17, No.4, pp , 1988. [Sun 90] Daniel M. Sunday, A very fast substring search algorithm, Commun. ACM, Vol.33, No.8, pp , 1990

29
THANK YOU

Similar presentations

Presentation is loading. Please wait....

OK

Raita Algorithm T. RAITA Advisor: Prof. R. C. T. Lee

Raita Algorithm T. RAITA Advisor: Prof. R. C. T. Lee

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google