Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Zhu-Takaoka Algorithm

Similar presentations


Presentation on theme: "The Zhu-Takaoka Algorithm"— Presentation transcript:

1 The Zhu-Takaoka Algorithm
On improving the average case of the Boyer-Moore string matching algorithm, Journal of Information Processing 10(3): , R. F. ZHU, T. TAKAOKA  Advisor: Prof. R. C. T. Lee Speaker: S. Y. Tang

2 The Zhu-Takaoka Algorithm is an algorithm which solves the string matching problem.
Input: a text string T of length n and a pattern string P of length m. Output: all occurrences of P which occur in T.

3 The Zhu-Takaoka Algorithm is a variant of the Boyer and Moore Algorithm. The algorithm only improve the bad character of the Boyer and Moore Algorithm. Zhu and Takaoka modified the BM Algorithm. They replaced the bad character rule by a 2-substring rule . The good suffix rules are still used.

4 The 2-Substring Rule Consider text=ACTGCTAAGTA and pattern=CTAAG.
No GC appears in P. 1 2 3 4 5 6 7 8 9 10 11 Text A C T G Pattern C T A G 1 2 3 4 5 6 7 8 9 10 11 Text A C T G Pattern C T A G 1 2 3 4 5 6 7 8 9 10 11 Text A C T G Pattern C T A G

5 How can we know whether a specified 2-substring appears in P or not?

6 Whenever a mismatch or a complete match occurs, we select
the last 2-substring in T and search for the rightmost location of this 2-substring in P if it exists. This is done by constructing a ztBc table. Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Text G C A T Pattern G C A Shift by 5 G C A Shift by 1 G C A T(CA)=5 means that CA appears in 5 locations from the right end. Thus we can shift by 5. T(GA)=1 means that GA appears in 1 location from the right end. If GA is the 2-substring to be matched, we shift 1 step. ztBc A C G * A 8 2 C 5 7 G 1 6 *

7 ztBc[a,b] The preprocessing phase of the algorithm consists in computing for each pair of characters (a, b) with a, b the rightmost occurrence of ab in x [ 0..m -2]

8 preprocessing phase Consider text= ATTGCCTAATA and pattern=CTAAG
The alphabet of pattern is {A.C.G.T }; The sign “ * ” denotes a word of text which never appears in pattern. First, we fill in the blanks with the length m of pattern. Example: A C G T * 5

9 preprocessing phase Then, we suppose the last 2-substring ab does not occur in [0..m-2]. If P0 = b, we set ztBc[i , b] = m-1 for all i. Example: A C G T * 5 4 ← b T: ATTGCCTAAGTA P: CTAAG CTAAG a

10 preprocessing phase Finally, we set ztBC[a,b] = k if k≤ m-2 and P[m-k-2..m-k-1]=ab and ab does not occur in P[m-k-1..m-2]. Example: A C G T * 1 4 5 3 2 ← b P: CTAAG 1 2 3 a

11 Case 1 : If ztBc[A,C] = k Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Text G C A T Pattern G C A Shift by 5 G C A ztBc A C G * ← b i 1 2 3 4 5 6 7 x[i] G C A A 8 2 C 5 7 G 1 6 * ztBc[C,A] = 5 ; k ≤ m-2 ; ∵ x[ ] = ab (x[1..2] = CA) and “CA” does not occur in x[ ] (x[2..6] ). a

12 Case 2 : => If ztBc[A,C] = k Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Text G C A T Pattern G C A Shift by 7 G C A ztBc A C G * ← b i 1 2 3 4 5 6 7 x[i] G C A A 8 2 C 5 7 G 1 6 * ztBc[C,G] = 7 ; k = m-1 ; ∵ x[0] = b ( G = G) and “CG” does not occur in x[0..8-2] (x[0..6] ). a

13 Case 3 : => If ztBc[A,C] = k Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Text G C A T Pattern G C A ztBc A C G * i 1 2 3 4 5 6 7 x[i] G C A A 8 2 C 5 7 G 1 6 * ← b ztBc[A,C] = 8 ; k = m ; ∵ x[0] ≠b (G≠C) and “AC” does not occur in x[0..8-2] ( x[0..6] ). a

14 Full Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Text G C A T Pattern G C A Shift by 5 G C A In the step, we select the ztBc function to shift because ztBc[P6P7=CA] = 5 > bmGs [7] =1. The pattern shifts 5 steps right by case 1. ← b ztBc A C G * i 1 2 3 4 5 6 7 x[i] G C A bmGs A 8 2 C 5 7 G 1 6 * a

15 Full Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Text G C A T Pattern exact matching G C A Shift by 7 G C A In the step, we select the bmGs function to shift because ztBc[A,G] = 2 < bmGs [0] = 7. ← b ztBc A C G * i 1 2 3 4 5 6 7 x[i] G C A bmGs A 8 2 C 5 7 G 1 6 * a

16 Full Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Text G C A T Pattern G C A Shift by 4 G C A In the step, we select the bmGs function to shift because ztBc[A,G] = 2 < bmGs [5] = 4. ← b ztBc A C G * i 1 2 3 4 5 6 7 x[i] G C A bmGs A 8 2 C 5 7 G 1 6 * a

17 Full Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Text G C A T Pattern G C A By the bmGs or ztBc function ; We can select the ztBc function or the bmGs function to shift because ztBc[C,G] = 7 = bmGs [6]. ← b ztBc A C G * i 1 2 3 4 5 6 7 x[i] G C A bmGs A 8 2 C 5 7 G 1 6 * a

18 Time complexity preprocessing phase in O(m+ ) time and space complexity. ( = the numbers of alphabet of the text ). searching phase in O(m × n) time complexity.

19 References ZHU, R.F. and TAKAOKA, T., 1987, On improving the average case of the Boyer-Moore string matching algorithm, Journal of Information Processing 10(3):

20 Thank you for your attention.


Download ppt "The Zhu-Takaoka Algorithm"

Similar presentations


Ads by Google