Presentation is loading. Please wait.

Presentation is loading. Please wait.

Adviser: R. C. T. Lee Speaker: C. W. Cheng National Chi Nan University

Similar presentations


Presentation on theme: "Adviser: R. C. T. Lee Speaker: C. W. Cheng National Chi Nan University"— Presentation transcript:

1 Adviser: R. C. T. Lee Speaker: C. W. Cheng National Chi Nan University
Tuned Boyer Moore Algorithm Raita Algorithm Horspool Algorithm Quick Search Algorithm Smith Algorithm Zhu-Takaoka Algorithm Adviser: R. C. T. Lee Speaker: C. W. Cheng National Chi Nan University

2 Problem Definition Input: a text string T with length n and a pattern string P with length m. Output: all occurrences of P in T.

3 Definition Ts : the first character of a string T aligns to a pattern P. Pl : the first character of a pattern P aligns to a string T. Tj : the character of the jth position of a string T. Pi : the character of the ith position of a pattern P. Pf : the last character of a pattern P. n : The length of T. m : The length of P.

4 Rule 2-2: 1-Suffix Rule (A Special Version of Rule 2)
Consider the 1-suffix x. We may apply Rule 2-2 now.

5 Tuned Boyer Moore Algorithm
Fast string searching , HUME A. and SUNDAY D.M., Software - Practice & Experience 21(11), 1991, pp

6 Introduction simplification of the Boyer-Moore algorithm.
uses only the bad-character shift. easy to implement. very fast in practice uses Rule 2-2: 1-Suffix Rule

7 Tuned Boyer Moore Algorithm
In this algorithm, We always focus on the last character of the window of T and try to slide the pattern to match the last character of T.

8 Tuned Boyer Moore Algorithm Rule
Since Ts+m-1 ≠ Pf , we move the pattern P to right such that the largest position i in the right of Pi is equal to Ts+m. We can shift the pattern at least (m-i) positions right until Ts+m-1 = Pf. s s+m-1 T x z y P z x y i f 1 Shift P z x y i f 1 Shift P z x y 1 i f

9 Tuned Boyer Moore Preprocessing Table
In this algorithm, we construct a table as follow. Let x be a character in the alphabet. We record the position of the last x, if it exists in P, we record the position of x from the second last position of P. If x does not exist in P1 to Pm-1, we record it as m.

10 Tuned Boyer Moore Preprocessing Table
Example: P=AGCAGAC A C G T bmBC 1 4 2 7

11 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 G C A T A G C

12 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 tbmBC[A]=1, shift=1 G C A T A G C

13 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 G C A T A G C

14 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 tbmBC[G]=2, shift=2 G C A T A G C

15 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 G C A T A G C

16 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 G C A T match A G C

17 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 tbmBC[C]=4, shift=4 G C A T exact match A G C

18 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 G C A T A G C

19 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 G C A T match A G C

20 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 tbmBC[C]=4, shift=4 G C A T mismatch A G C

21 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 G C A T A G C

22 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 tbmBC[T]=7, shift=7 G C A T A G C

23 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 G C A T A G C

24 Time complexity preprocessing phase in O(m+ σ) time and O(σ) space complexity, σ is the number of alphabets in pattern. searching phase in O(mn) time complexity.

25 Raita Algorithm Tuning the Boyer-Moore-Horspool string searching algorithm, T. RAITA, Software - Practice & Experience, 22(10), 1994, pp

26 Introduction simplification of the Boyer-Moore algorithm.
uses only the bad-character shift. easy to implement. very fast in practice uses Rule 2-2: 1-Suffix Rule

27 Raita Algorithm In this algorithm, first we compare the last character of the window of T with the last character of the pattern, then we compare the first character and the middle character of the window. If they match, we compare other characters from left to right. If mismatch occurs, we slide the window by the preprocessing table.

28 Raita Preprocessing Table
The preprocessing table of Raita algorithm is the same with Tuned Boyer-Moore algorithm.

29 Raita Preprocessing Table
Example: P=AGCAGAC A C G T raBC 1 4 2 7

30 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
raBC 1 4 2 7 G C A T A G C

31 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
raBC 1 4 2 7 raBC[A]=1, shift=1 G C A T A G C

32 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
raBC 1 4 2 7 G C A T A G C

33 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
raBC 1 4 2 7 raBC[G]=2, shift=2 G C A T A G C

34 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
raBC 1 4 2 7 G C A T A G C

35 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
raBC 1 4 2 7 G C A T match A G C

36 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
raBC 1 4 2 7 G C A T match A G C

37 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
raBC 1 4 2 7 G C A T match A G C

38 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
raBC 1 4 2 7 raBC[C]=4, shift=4 G C A T exact match A G C

39 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
raBC 1 4 2 7 G C A T A G C

40 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
raBC 1 4 2 7 G C A T match A G C

41 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
raBC 1 4 2 7 raBC[C]=4, shift=4 G C A T mismatch A G C

42 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
raBC 1 4 2 7 G C A T A G C

43 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
raBC 1 4 2 7 raBC[T]=7, shift=7 G C A T A G C

44 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
raBC 1 4 2 7 G C A T A G C

45 Time complexity preprocessing phase in O(m+ σ) time and O(σ) space complexity, σ is the number of alphabets in pattern. searching phase in O(mn) time complexity.

46 Practical fast searching in strings,
Horspool Algorithm Practical fast searching in strings, R. NIGEL HORSPOOL, SOFTWARE-PRACTICE AND EXPERIENCE, VOL. 10, 1980, pp

47 Introduction simplification of the Boyer-Moore algorithm.
uses only the bad-character shift. easy to implement. very fast in practice uses Rule 2-2: 1-Suffix Rule

48 Horspool Algorithm In this algorithm, We always compare the window of T with pattern from right to left and try to slide the pattern to match the last character of T.

49 Horspool Preprocessing Table
The preprocessing table of Horspool algorithm is the same with Tuned Boyer-Moore algorithm.

50 Horspool Preprocessing Table
Example: P=AGCAGAC A C G T hpBC 1 4 2 7

51 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
hpBC 1 4 2 7 G C A T A G C

52 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
hpBC 1 4 2 7 hpBC[A]=1, shift=1 G C A T A G C

53 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
hpBC 1 4 2 7 G C A T A G C

54 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
hpBC 1 4 2 7 hpBC[G]=2, shift=2 G C A T A G C

55 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
hpBC 1 4 2 7 G C A T A G C

56 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
hpBC 1 4 2 7 G C A T match A G C

57 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
hpBC 1 4 2 7 G C A T match A G C

58 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
hpBC 1 4 2 7 G C A T match A G C

59 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
hpBC 1 4 2 7 hpBC[C]=4, shift=4 G C A T exact match A G C

60 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
hpBC 1 4 2 7 G C A T A G C

61 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
hpBC 1 4 2 7 G C A T match A G C

62 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
hpBC 1 4 2 7 hpBC[G]=2, shift=2 G C A T mismatch A G C

63 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
hpBC 1 4 2 7 G C A T A G C

64 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
hpBC 1 4 2 7 hpBC[A]=1, shift=1 G C A T mismatch A G C

65 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
hpBC 1 4 2 7 G C A T A G C

66 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
hpBC 1 4 2 7 hpBC[G]=2, shift=2 G C A T mismatch A G C

67 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
hpBC 1 4 2 7 G C A T A G C

68 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
hpBC 1 4 2 7 hpBC[A]=1, shift=1 G C A T mismatch A G C

69 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
hpBC 1 4 2 7 G C A T A G C

70 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
hpBC 1 4 2 7 G C A T match A G C

71 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
hpBC 1 4 2 7 G C A T match A G C

72 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
hpBC 1 4 2 7 hpBC[C]=4, shift=4 G C A T mismatch A G C

73 Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
hpBC 1 4 2 7 G C A T A G C

74 Time complexity preprocessing phase in O(m+ σ) time and O(σ) space complexity, σ is the number of alphabets in pattern. searching phase in O(mn) time complexity.

75 Quick Search Algorithm
A very fast substring search algorithm, SUNDAY D.M., Communications of the ACM . 33(8),1990, pp

76 Introduction simplification of the Boyer-Moore algorithm.
uses only the bad-character shift. easy to implement. uses Rule 2-2: 1-Suffix Rule

77 Quick Search Rule Suppose that P1 is aligned to Ts now, and we perform a pair-wise comparing between text T and pattern P from left to right. Assume that the first mismatch occurs when comparing Tq with Pp . Since Tq ≠Pp , we move the pattern P to right such that the largest position i in the right of Pi is equal to Ts+m. We can shift the pattern at least (m-i) positions right. s q s + m T t y x mismatch P t z x 1 p i Shift P t z x 1 p i

78 Quick Search Preprocessing Table
The only thing we want to do is to construct a table as follow. Let x be a character in the alphabet. We record the position of the last x, if it exists in P, we counted the position of x from the right end. If x does not exist in P, we record it as m+1.

79 Quick Search Preprocessing Table
Example: P=CAGAGAG With this table, the number of steps which we move the pattern can be easily done. After the movement, we compare the pattern and the text from left to right until a mismatch occurs, otherwise we output the position of the first character in T which aligns to pattern P . A C G T qsBC 2 7 1 8

80 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
qsBC 2 7 1 8 G C A T C A G

81 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
qsBC 2 7 1 8 G C A T mismatch C A G

82 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
qsBC 2 7 1 8 qsBC[G]=1, shift=1 G C A T mismatch C A G

83 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
qsBC 2 7 1 8 G C A T C A G

84 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
qsBC 2 7 1 8 G C A T mismatch C A G

85 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
qsBC 2 7 1 8 qsBC[A]=2, shift=2 G C A T mismatch C A G

86 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
qsBC 2 7 1 8 G C A T C A G

87 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
qsBC 2 7 1 8 G C A T exact match C A G

88 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
qsBC 2 7 1 8 qsBC[T]=8, shift=8 G C A T exact match C A G

89 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
qsBC 2 7 1 8 G C A T C A G

90 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
qsBC 2 7 1 8 G C A T mismatch C A G

91 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
qsBC 2 7 1 8 qsBC[A]=2, shift=2 G C A T mismatch C A G

92 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
qsBC 2 7 1 8 G C A T C A G

93 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
qsBC 2 7 1 8 G C A T mismatch C A G

94 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
qsBC 2 7 1 8 qsBC[G]=1, shift=1 G C A T mismatch C A G

95 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
qsBC 2 7 1 8 G C A T C A G

96 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
qsBC 2 7 1 8 G C A T mismatch C A G

97 Time complexity preprocessing phase in O(m+ σ) time and O(σ) space complexity, σ is the number of alphabets in pattern. searching phase in O(mn) time complexity.

98 Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp

99 Introduction takes the maximum of the Horspool shift function and the Quick Search shift function. uses Rule 2-2: 1-Suffix Rule

100 Smith Algorithm This algorithm is almost the same as Quick Search Algorithm except the last character of the window is also considered. If this will induce a better movement than the Quick Search Algorithm. This is used; otherwise the Quick Search is used.

101 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
hpBC 1 6 2 7 A C G T qsBC 2 7 1 8 G C A T C A G

102 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
hpBC 1 6 2 7 A C G T qsBC 2 7 1 8 G C A T mismatch C A G

103 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
hpBC 1 6 2 7 A C G T qsBC 2 7 1 8 hpBC[A]=1, qsBC[G]=1, shift=1 G C A T mismatch C A G

104 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
hpBC 1 6 2 7 A C G T qsBC 2 7 1 8 G C A T C A G

105 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
hpBC 1 6 2 7 A C G T qsBC 2 7 1 8 G C A T mismatch C A G

106 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
hpBC 1 6 2 7 A C G T qsBC 2 7 1 8 hpBC[G]=2, qsBC[A]=2, shift=2 G C A T mismatch C A G

107 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
hpBC 1 6 2 7 A C G T qsBC 2 7 1 8 G C A T C A G

108 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
hpBC 1 6 2 7 A C G T qsBC 2 7 1 8 G C A T exact match C A G

109 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
hpBC 1 6 2 7 A C G T qsBC 2 7 1 8 hpBC[G]=2, qsBC[T]=8, shift=8 G C A T exact match C A G

110 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
hpBC 1 6 2 7 A C G T qsBC 2 7 1 8 G C A T C A G

111 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
hpBC 1 6 2 7 A C G T qsBC 2 7 1 8 G C A T mismatch C A G

112 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
hpBC 1 6 2 7 A C G T qsBC 2 7 1 8 hpBC[T]=7, qsBC[A]=2, shift=7 G C A T mismatch C A G

113 Example Text string T=GCGCAGAGAGTAGAGAGTACG Pattern string P=CAGAGAG G
hpBC 1 6 2 7 A C G T qsBC 2 7 1 8 G C A T C A G

114 Time complexity preprocessing phase in O(m+ σ) time and O(σ) space complexity, σ is the number of alphabets in pattern. searching phase in O(mn) time complexity.

115 Zhu-Takaoka Algorithm
On improving the average case of the Boyer-Moore string matching algorithm, R. F. ZHU and T. TAKAOKA, Journal of Information Processing 10(3), 1987, pp

116 The Zhu-Takaoka Algorithm is a variant of the Boyer and Moore Algorithm. The algorithm only improve the bad character of the Boyer and Moore Algorithm. Zhu and Takaoka modified the BM Algorithm. They replaced the bad character rule by a 2-substring rule . The good suffix rules are still used.

117 Rule 2-3: The 2-Substring Rule (A Special Version of Rule 2)
Consider the 2-substring Tk and Tk+1. We may apply Rule 2-3 now. Tk Tk+1 u x Pj Pi Pi+1 u x v u x v

118 Zhu-Takaoka Preprocessing Table
The preprocessing phase of the algorithm consists in computing for each pair of characters (a, b) with a, b the rightmost occurrence of ab in x [ 0..m -2]

119 Case 1 : If ztBc[A,C] = k Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Text G C A T Pattern G C A Shift by 5 G C A ztBc A C G * ← b i 1 2 3 4 5 6 7 x[i] G C A A 8 2 C 5 7 G 1 6 * ztBc[C,A] = 5 ; k ≤ m-2 ; ∵ x[ ] = ab (x[1..2] = CA) and “CA” does not occur in x[ ] (x[2..6] ). a

120 Case 2 : => If ztBc[A,C] = k Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Text G C A T Pattern G C A Shift by 7 G C A ztBc A C G * ← b i 1 2 3 4 5 6 7 x[i] G C A A 8 2 C 5 7 G 1 6 * ztBc[C,G] = 7 ; k = m-1 ; ∵ x[0] = b ( G = G) and “CG” does not occur in x[0..8-2] (x[0..6] ). a

121 Case 3 : => If ztBc[A,C] = k Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Text G C A T Pattern G C A ztBc A C G * i 1 2 3 4 5 6 7 x[i] G C A A 8 2 C 5 7 G 1 6 * ← b ztBc[A,C] = 8 ; k = m ; ∵ x[0] ≠b (G≠C) and “AC” does not occur in x[0..8-2] ( x[0..6] ). a

122 preprocessing phase Consider text= ATTGCCTAATA and pattern=CTAAG
The alphabet of pattern is {A.C.G.T }; The sign “ * ” denotes a word of text which never appears in pattern. First, we fill in the blanks with the length m of pattern. Example: A C G T * 5

123 preprocessing phase Then, we suppose the last 2-substring ab does not occur in [0..m-2]. If P0 = b, we set ztBc[i , b] = m-1 for all i. Example: A C G T * 5 4 ← b T: ATTGCCTAAGTA P: CTAAG CTAAG a

124 preprocessing phase Finally, we set ztBC[a,b] = k if k≤ m-2 and P[m-k-2..m-k-1]=ab and ab does not occur in P[m-k-1..m-2]. Example: A C G T * 1 4 5 3 2 ← b P: CTAAG 1 2 3 a

125 Full Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Text G C A T Pattern G C A Shift by 5 G C A In the step, we select the ztBc function to shift because ztBc[P6P7=CA] = 5 > bmGs [7] =1. The pattern shifts 5 steps right by case 1. ← b ztBc A C G * i 1 2 3 4 5 6 7 x[i] G C A bmGs A 8 2 C 5 7 G 1 6 * a

126 Full Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Text G C A T Pattern exact matching G C A Shift by 7 G C A In the step, we select the bmGs function to shift because ztBc[A,G] = 2 < bmGs [0] = 7. ← b ztBc A C G * i 1 2 3 4 5 6 7 x[i] G C A bmGs A 8 2 C 5 7 G 1 6 * a

127 Full Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Text G C A T Pattern G C A Shift by 4 G C A In the step, we select the bmGs function to shift because ztBc[A,G] = 2 < bmGs [5] = 4. ← b ztBc A C G * i 1 2 3 4 5 6 7 x[i] G C A bmGs A 8 2 C 5 7 G 1 6 * a

128 Full Example 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 Text G C A T Pattern G C A By the bmGs or ztBc function ; We can select the ztBc function or the bmGs function to shift because ztBc[C,G] = 7 = bmGs [6]. ← b ztBc A C G * i 1 2 3 4 5 6 7 x[i] G C A bmGs A 8 2 C 5 7 G 1 6 * a

129 Time complexity preprocessing phase in O(m+ ) time and space complexity. ( = the numbers of alphabet of the text ). searching phase in O(m × n) time complexity.

130 Reference [KMP77] Fast pattern matching in strings, D. E. Knuth, J. H. Morris, Jr and V. B. Pratt, SIAM J. Computing, 6, 1977, pp. 323–350. [BM77] A fast string search algorithm, R. S. Boyer and J. S. Moore, Comm. ACM, 20, 1977, pp. 762–772. [S90] A very fast substring search algorithm, D. M. Sunday, Comm. ACM, 33, 1990, pp. 132–142. [RR89] The Rand MH Message Handling system: User’s Manual (UCIVersion), M. T. Rose and J. L. Romine, University of California, Irvine, 1989. [S82] A comparison of three string matching algorithms, G. De V. Smith, Software—Practice and Experience,12, 1982, pp. 57–66. [HS91] Fast string searching, HUME A. and SUNDAY D.M. , Software - Practice & Experience 21(11), 1991, pp. [S94] String Searching Algorithms , Stephen, G.A., World Scientific, 1994. [ZT87] On improving the average case of the Boyer-Moore string matching algorithm, ZHU, R.F. and TAKAOKA, T., Journal of Information Processing 10(3) , 1987, pp [R92] Tuning the Boyer-Moore-Horspool string searching algorithm, RAITA T., Software - Practice & Experience, 22(10) , 1992, pp [S94] On tuning the Boyer-Moore-Horspool string searching algorithms, SMITH, P.D., Software - Practice & Experience, 24(4) , 1994, pp [BR92] Average running time of the Boyer-Moore-Horspool algorithm, BAEZA-YATES, R.A., RÉGNIER, M., Theoretical Computer Science 92(1) , 1992, pp [H80] Practical fast searching in strings, HORSPOOL R.N., Software - Practice & Experience, 10(6) , 1980, pp. [L95] Experimental results on string matching algorithms, LECROQ, T., Software - Practice & Experience 25(7) , 1995, pp

131 Thanks for your listening


Download ppt "Adviser: R. C. T. Lee Speaker: C. W. Cheng National Chi Nan University"

Similar presentations


Ads by Google