Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan

Similar presentations


Presentation on theme: "Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan"— Presentation transcript:

1 Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan
Alpha skip Search Algorithm Very Fast String Matching Algorithm for Small Alphabets and Long Patterns, Christian, C., Thierry, L. and Joseph, D.P., Lecture Notes in Computer Science, Vol. 1448, 1998, pp Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan

2 The Exact String Matching Problem:
We are given a text string T of length n and a pattern string P of length m and we want to find of all occurrences of P in T. Example: Input: There are two occurrences of P in T as shown below: Output: 2, 10

3 The Alpha Skip Search Algorithm is an improvement of the Skip Search Algorithm.
The Skip Search Algorithm uses Rule 2, the substring matching rule and Rule 4, two window rule.

4 Rule 2: The Substring Matching Rule
For any substring u in T, find a nearest u in P which is to the left of it. If such an u in P exists, move P such then the two u’s match; otherwise, we may define a new partial window.

5 Rule 2-2: 1-Suffix Rule (A Special Version of Rule 2)
Consider the 1-suffix x. We may apply Rule 2-2 now.

6 Rule 4: Two Window Rule T = C G C A C G G T A C C T T A C G G T P = C
No prefix of P = a suffix of W1. No suffix of P = a prefix of W2. C G C A C G G T w3 w4 A C C T T A C G C T T A Matched!

7 The Skip Search Algorithm
The Skip Search Algorithm uses Rule 2-2 together with Rule 4 in a very clever way. Example: G C A T C G C A G A G A G T A T A C A G T A C G T : P : G C A G A G A G G C A G A G A G the length of two window The length of the pattern is m. The length of two window which is a wide window is 2m-1.

8 G C A T C G C A G A G A G T A T A C A G T A C G T : P :
Example: G C A T C G C A G A G A G T A T A C A G T A C G T : P : G C A G A G A G G C A G A G A G G C A G A G A G The length of two window is 2m-1. A C G T (6,4,2) (1) (7,5,3,0) φ

9 G C A T C G C A G A G A G T A T A C A G T A C G T :
Example: G C A T C G C A G A G A G T A T A C A G T A C G T : The length of two window is 2m-1. A C G T (6,4,2) (1) (7,5,3,0) φ

10 G C A T C G C A G A G A G T A T A C A G T A C G T : P :
Example: G C A T C G C A G A G A G T A T A C A G T A C G T : P : G C A G A G A G The length of two window is 2m-1. A C G T (6,4,2) (1) (7,5,3,0) φ

11 The Skip Search Algorithm uses a very special version of Rule 2
The Skip Search Algorithm uses a very special version of Rule 2. In it, the substring is limited to one character. Later, in alpha skip algorithm, it uses a substring whose length may be longer than 1 and a wide window with length 2m-L is used.

12 We assume that the size of the alphabet Σ of the text and pattern is σ
We assume that the size of the alphabet Σ of the text and pattern is σ. In the preprocessing phase, we first use a formula to determine L and then find all substrings in pattern P whose length is L. The information about where the substrings are location in P is stored in a trie. In the searching phase, we use the information which is stored in trie to compare text T with pattern P.

13 Preprocessing phase If logσm > 1, L = logσm where σ is the size of the alphabet and m is the length of pattern P; otherwise L=1. Example: trie a b T = aaaababbababbbbbbaabababababbac P = ababbaba σ=3, m=8 L= logσm = log38 = 1 [7,5,2,0] [6,4,3,1] In this case, the σ is 3 and the length of pattern is 8, so that L is 1, that is, the limit of the length of substring is 1.

14 Every trie’s leaf stores decreasing numbers of position of pattern P.
Example: T : a a a a b a b b a b a b b b b b b a a b a b a b a b a b b a P : a b a b b a b a a b b a b σ= 2, m = 8 L = logσm = log28 = 3 a b b a [5,0] [2] [4,1] [3]

15 Trie Example: root a b b a b a b b a [5,0] [2] [4,1] [3] P :
a b a b b a b a root a b b a b a b b a [5,0] [2] [4,1] [3]

16 a a b a b b b a a a b a b a b b b b a a b a b a a b a b a b b b P :
a b a b b a b a root a a b a b b b a a [0] P : a b a b b a b a a b a b a b b b b a a b a b a a b a b a b b b [0] [1] [0] [2] [1] [0] [2] [1] [3]

17 a b a b b a b a b a b P : a b a b b a b a [5,0] [4,1] [2] [3] [0] [2]
a b a b [5,0] [4,1] [2] [3] b a b a b a b [0] [2] [4,1] [3]

18 We use a wide window with length 2m-L.
Example: T : a a a a b a b b a b a b b b b b b a a b a b a b a b a b b a This is a wide window with length 2m-L= 2*8-3=13. P : a b a b b a b a σ= 2, m = 8 L = logσm = log28 = 3

19 T = aaaababbababbbbbbaabababababba P = ababbaba
Example: a b [5,0] [4,1] [2] [3] T = aaaababbababbbbbbaabababababba P = ababbaba T = aaaababbababbbbbbaabababababba T = aaaababbababbbbbbaabababababba ababbaba Match!

20 T = aaaababbababbbbbbaabababababba ababbaba
[5,0] [4,1] [2] [3] T = aaaababbababbbbbbaabababababba ababbaba No bbb in P Match! T = aaaababbababbbbbbaabababababba ababbaba No aab in P Match! T = aaaababbababbbbbbaabababababba ababbaba Match!

21 T = aaaababbababbbbbbaabababababba ababbaba
[5,0] [4,1] [2] [3] T = aaaababbababbbbbbaabababababba ababbaba ababbaba Match! T = aaaababbababbbbbbaabababababba ababbaba ababbaba Match! ababbaba

22 Time complexity: preprocessing phase in O(m) time and space complexity; searching phase in O(mn) time complexity;

23 References [BM77]    A Fast String Searching Algorithm , Boyer, R. S. and Moore, J. S. , Communication of the ACM , Vol. 20 , 1977 , pp.  [HS91]    Fast String Searching , Hume, A. and Sundy, D. M. , Software, Practice and Experience , Vol. 21 , 1991 , pp.  [MTALSWW92] Speeding Up Two String-Matching Algorithms, Maxime C., Thierry L., Artur C., Leszek G., Stefan J., Wojciech P. and Wojciech R., Lecture Notes In Computer Science, Vol. 577, 1992, pp [MW94] Text algorithms, M. Crochemore and W. Rytter, Oxford University Press, [KMP77] Fast Pattern Matching in Strings, D.E. Knuth, J.H. Morris and V.R. Pratt, SIAM Journal on Computing, Vol. 6, No.2, 1977, pp [T92] A variation on the Boyer-Moore algorithm, Thierry Lecroq, Theoretical Computer Science archive, Vol. 92 , No.1, 1992, pp [T98] Experiments on string matching in memory structures, Thierry Lecroq, Software—Practice & Experience archive, Vol. 28, No.5, 1998, pp [T92] Tuning the Boyer-Moore-Horspool string searching algorithm, Timo Raita, Software—Practice & Experience archive, Vol. 22, No.10, 1992, pp [G94] String searching algorithms, G.A. Stephen, World Scientific Lecture Notes Series On Computing, Vol. 3, 1994, pp


Download ppt "Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan"

Similar presentations


Ads by Google