Parameterized Pattern Matching by Boyer-Moore-type Algorithms

Slides:



Advertisements
Similar presentations
Advisor: Prof. R. C. T. Lee Speaker: L. C. Chen
Advertisements

Advanced Piloting Cruise Plot.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
1 Very fast and simple approximate string matching Information Processing Letters, 72:65-70, G. Navarro and R. Baeza-Yates Advisor: Prof. R. C. T.
1 Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen.
1 Fast Parallel and Serial Approximate String Matching Journal of Algorithms, Vol.10 (1989), pp G. Landau and U. Vishkin Advisor: Prof. R. C.
1 Approximate Boyer-Moore String Matching Source : SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp J. Tarhio and E. Ukkonen Advisor: Prof.
Speaker: C. C. Lin Adviser: R. C. T. Lee
1 Rules for Approximate String Matching R.C.T. Lee.
1 Faster algorithms for string matching with k mismatches Adviser : R. C. T. Lee Speaker: C. C. Yen Journal of Algorithms, Volume 50, Issue 2, February.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Title Subtitle.
My Alphabet Book abcdefghijklm nopqrstuvwxyz.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 5 second questions
Year 6 mental test 10 second questions
Solve Multi-step Equations
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Tuned Boyer Moore Algorithm
ABC Technology Project
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
VOORBLAD.
Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
© 2012 National Heart Foundation of Australia. Slide 2.
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
Addition 1’s to 20.
25 seconds left…...
Subtraction: Adding UP
Slippery Slope
Januar MDMDFSSMDMDFSSS
Week 1.
Analyzing Genes and Genomes
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
Immunobiology: The Immune System in Health & Disease Sixth Edition
By Rasmussen College. 1. What majors or programs do you offer? 2. What is the average length of your programs? 3. What percentage of your students graduate?
北海道大学 Hokkaido University 1 Lecture on Information knowledge network2010/12/23 Lecture on Information Knowledge Network "Information retrieval and pattern.
Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan
1 The Colussi Algorithm Advisor: Prof. R. C. T. Lee Speaker: Y. L. Chen Correctness and Efficiency of Pattern Matching Algorithms Information and Computation,
1 Reverse Factor Algorithm Advisor: Prof. R. C. T. Lee Speaker: L. C. Chen Speeding up on two string matching algorithms, Algorithmica, Vol.12, 1994, pp
1 Advisor: Prof. R. C. T. Lee Speaker: G. W. Cheng Two exact string matching algorithms using suffix to prefix rule.
1 String Matching Algorithms Based upon the Uniqueness Property Advisor : Prof. R. C. T. Lee Speaker : C. W. Lu C. W. Lu and R. C. T. Lee, 2007, String.
1 KMP Skip Search Algorithm Advisor: Prof. R. C. T. Lee Speaker: Z. H. Pan Very Fast String Matching Algorithm for Small Alphabets and Long Patterns, Christian,
Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp Adviser:
Quick Search Algorithm A very fast substring search algorithm, SUNDAY D.M., Communications of the ACM. 33(8),1990, pp Adviser: R. C. T. Lee Speaker:
Raita Algorithm T. RAITA Advisor: Prof. R. C. T. Lee
1 Turbo-BM Algorithm Adviser: R. C. T. Lee Speaker: H. M. Chen Deux méthodes pour accélérer l'algorithme de Boyer-Moore, Théorie des Automates et Applications.,
1 Speeding up on two string matching algorithms Advisor: Prof. R. C. T. Lee Speaker: Kuei-hao Chen, CROCHEMORE, M., CZUMAJ, A., GASIENIEC, L., JAROMINEK,
Advisor: Prof. R. C. T. Lee Speaker: T. H. Ku
Source : Practical fast searching in strings
Adviser: R. C. T. Lee Speaker: C. W. Cheng National Chi Nan University
Presentation transcript:

Parameterized Pattern Matching by Boyer-Moore-type Algorithms Proceedings of the 6th Annual ACM-SIAM Symposium on Discrete Algorithms, 1995, pp. 541 - 550   Brenda S. Baker Advisor: Prof. R. C. T. Lee Speaker: Kuei-hao Chen

Let us consider two strings: A=a1a2a3a4a5=xaxby B=b1b2b3b4b5=bacbc If the edit distance concept is used, A may be transformed to B by substituting a1 by b1, a3 by b3 and a5 by b5.

In this paper, we define a new transformation in which a character may be substituted by another character. But the substitution is global. That is, if x in A is substituted by a, then every x in A is substituted by a.

A=a1a2a3a4a5=xaxby B=b1b2b3b4b5=bacbc Consider the above example again. To transform A to B, the first x must be substituted by b. But this is global. Thus, A’=babby It can be easily seen that if this kind of substitution is used, A=xaxby can not be transformed to B.

For A=xaxby and B=babbc, A can be transformed to B by substituting x by b and y by c.

We define bijection to be a global substitution of a set of distinct characters into another set characters. A string P p-matches a string Q if P can be transformed to Q by a bijection.

Let A=ababc B=bcbcd Then A p-matches B because there is a bijection, namely which transforms A to B.

On the other hand, for A=ababc and B=bcbdc, A does not p-match B. It is actually easy to determine whether A p-matches B. Given A=a1a2… aN and B=b1b2…bN. A p-matches B if and only if for every i, if ai=x and bi=y, then if aj=x, bj must be y.

For A=ababc and B=bcbcc For A=ababc and B=bcbcc. It can be seen that every a in A is matched with b and every b is matched c. This is not true for A=ababc and B=bcbdc. Thus, given a string A and a string B which are of the same length, it is trivial to determine whether A p-matches B.

There is another property which is important There is another property which is important. If A p-matches B and B p-matches C, then A p-matches C. It is obvious that this is true.

This paper considers the following problem: Given a text T and a pattern P, find all occurrence where P p-matches a substring of T. For example: Let and We can see that P p-matches strings in T.

For P=abaec and S2=cacbd, the substitution will transform P to S2. For S2=cacbd and S1=bcbda, the substitution transforms S2 to S1. It can be seen that P=abaec will be transformed to S1=bcbda by

The substitution can be visualized as follows:

This paper is based upon Good suffix rule 1 and Good suffix rule 2 proposed in Boyer and Moore Algorithm.

Good Suffix Rule 1 for p-match Let T1 be the largest suffix which p-matches with a suffix P1 of P. If there is a substring zP2 which is the right most one and p-matches with yP1 , and z≠y, we can move P as follows:

Example T v x w P’ u x v P u v w P u v w Shift Transform p-mismatch 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 T v x w p-mismatch P’ u x v Transform P u v w 1 2 3 4 5 6 7 8 9 10 Shift P u v w 1 2 3 4 5 6 7 8 9 10

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 T v x w P’ v x w Transform P u v w 1 2 3 4 5 6 7 8 9 10 After moving, we compare T and P from right to left. We found out T6,15≡P1,10.

Good Suffix Rule 2 for p-match Let T1 be the largest suffix of the window of P which p-matches with a suffix P1 of P. Let be suffix of P1 which p-matches with a prefix P2 of P. If exists, we move P as follows:

Example T x v w P’ u x v P u v w P u v w Shift Transform p-mismatch 1 2 3 4 5 6 7 8 9 10 11 12 13 T x v w p-mismatch P’ u x v Transform P u v w 1 2 3 4 5 6 7 8 Shift P u v w 3 4 5 6 7 8 9 10

T x v w P’ u x v P u v w Transform 1 2 3 4 5 6 7 8 9 10 11 12 13 3 4 5

The shift function ∆ is

Example T G A C P’ C A T P A T C P A T C Shift Transform j’=7 j=9 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 T G A C p-mismatch P’ C A T Transform P A T C 1 2 3 4 5 6 7 8 9 10 11 12 j’=7 j=9 P A T C 1 2 3 4 5 6 7 8 9 10 11 12 Shift

T G A C P’ C A T P A T C P A T C Shift Transform j’=7 j=9 p-mismatch 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 T G A C p-mismatch P’ C A T Transform P A T C 1 2 3 4 5 6 7 8 9 10 11 12 j’=7 j=9 P A T C 1 2 3 4 5 6 7 8 9 10 11 12 Shift

T G A C P’ T C A P A T C Transform 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 T G A C P’ T C A Transform P A T C 1 2 3 4 5 6 7 8 9 10 11 12

Time Complexity In average case, the preprocessing phase in O(mlog min(m, Π)) time and space complexity O(n) time complexity and searching phase in O(nlog min(m, Π)) .

References [AFM94] Amihood Amir, Martin Farach, and S. Muthukrishnan, Alphabet dependence in parameterized matching. Info. Proc. Letters, Vol. 49, pp.111-115, 1994. [Bak] Brenda S. Baker, Parameterized pattern matching: algorithms and applications., J. Comput. Syst. Sci. to appear. [Bak92] Brenda S. Baker, A program for identifying duplicated code., In Computing Science and Statistics Vol.24: Proceeding of the 24th Symposium on the Interface, pp.49-57, 1992. [Bak93a] Brenda S. Baker, Parameterized duplication in strings: algorithms and an application to software maintenance., submitted for publication, 1993. [Bak93b] Brenda S. Baker, A theory of parameterized pattern matching: Algorithms and applications, In Proceedings of the 25th Annual Symposium on Theory of Computing, pp.71-80, pp.1993. [BM77] Robert S. Boyer and J. Strother Moore, A fast string searching algorithm, Commun. ACM,Vol.20, No.10, pp.762-772, 1977.

References [BYGR90] Ricardo A. Baeza-Yates, Gaston H. Gonnet, and Mireille Regnier, Analysis of Boyer-Moore-type string searching algorithms. In Proc. of First Annual ACM-SIAM Symposium on Discrete Algorithms, pp.328-343, 1990. [BYR92] Ricardo A. Baeza-Yates and Mireille Regnier, Average running time of the Boyer-Moore-Horspool algorithm, Theoretical Computer Sci., Vol. 92, pp.19-31, 1992. [CLC+92] Maxime Crochemore, Thierry Lecroq, Artur Czumaj, Leszek Gasieniec, S. Jarominek, and W. Plandowski, Speeding up two string-matching algorithms, In 9th Annual Symposium on Theoretical Aspects of Computer Science, LNCS Vol.577, pp.589-600, 1992. [Col 91] Richard Cole. Tight bounds of the complexity of the Boyer-Moore string matching algorithm, In Proceedings of the Second Annual ACM-SIAM Symposium on Discrete Algorithms, pp.224-234, pp.1991. [Hor 80] R. Nigel Horspool. Practical fast searchingin strings. Soft. Pract. And Exp., Vol.10, pp.501-506, 1980.

References [HS91] Andrew Hume and Daniel Sunday, Fast string search, Soft. Pract. And Exp., Vol. 21, No.11, pp.1221-1248, 1991. [IS94] Ramana M. Idury and Alejandro A. Schaffer. Multiple matching of parameterized patterns. In proc. Of 5th Symposium on Combinatorial Pattern Matching, pp.226-239, 1994. [KMP77] D. E. Knuth, J. H. Morries, and V. R. Pratt, Fast pattern matching in strings, SIAM J. Comput., Vol.6, No.2, pp.323-350, 1977. [Ryt80] Wojciech Rytter, A correct preprocessing algorithm for Boyer-Moore string-searching, SIAM J. Comput., Vol.9, No.3, pp.509-512, 1980. [Sch88] R. Schaback, On the expected sublinearity of the Boyer-Moore algorithm. SIAM J. on Comput., Vol. 17, No.4, pp.648-659, 1988. [Sun 90] Daniel M. Sunday, A very fast substring search algorithm, Commun. ACM, Vol.33, No.8, pp132-139, 1990

THANK YOU