Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 KMP Skip Search Algorithm Advisor: Prof. R. C. T. Lee Speaker: Z. H. Pan Very Fast String Matching Algorithm for Small Alphabets and Long Patterns, Christian,

Similar presentations


Presentation on theme: "1 KMP Skip Search Algorithm Advisor: Prof. R. C. T. Lee Speaker: Z. H. Pan Very Fast String Matching Algorithm for Small Alphabets and Long Patterns, Christian,"— Presentation transcript:

1 1 KMP Skip Search Algorithm Advisor: Prof. R. C. T. Lee Speaker: Z. H. Pan Very Fast String Matching Algorithm for Small Alphabets and Long Patterns, Christian, C., Thierry, L. and Joseph, D.P., Lecture Notes in Computer Science, Vol. 1448, 1998, pp. 55-64

2 2 Definition String Matching Problem: Input: a text string T of length n and a pattern string P of length m. Output: Find all occurrence of P in T. abcdaba Example abcdadadbdccd T: P: The occurrences of P in T : T 5 2345678171819201112131491011516 badadbada d

3 3 The KMP Skip Search algorithm consists two phases which are processing and searching. KMP Skip Search algorithm uses KMP table to improve the Skip Search algorithm.

4 4 Preprocessing The preprocessing phase computes the buckets for all characters of the alphabet, list table, MP table and KMP table. Example: Text string T=GCATCGCAGAGAGTATACAGTACG 0 12 3 4 5 6 7 Pattern string P=GCAGAGAG P = G C A G A G A G 0 1 2 3 4 5 6 7 c Z[c] ACGT 617 i List [i] 0 1 2 3 4 5 6 7 -1 -1 -1 0 2 3 4 5 mpNext kmpNext -1 0 0 0 1 0 1 0 1 -1 0 0 -1 1 -1 1 -1 1 0 1 2 3 4 5 6 7 8

5 5 A general situation for the search phase First it uses skip search algorithm which makes T[i]=P[j]. wall is the first mismatch position of T when T align with P. start is the first position of T when T align with P. k is a small string when the substring of P equal to the substring of T. KmpStart is the next shift position of kmp. Skipstart is the next shift position of skip. T P i j X k k T P startwall i j

6 6 If k=0, that there is not the prefix of P which equals the substring of T, it uses skip search algorithm; otherwise, when k>0, that there is not the prefix of P which equals the substring of T, we have to find out Kmpstart 、 wall and Skipstart to compare its four cases. Case1. skipStart < kmpStart then a shift according to the skip algorithm is applied which gives a new value for skipStart, and we have to compare again skipStart and kmpStart. Case2. kmpStart < skipStart < wall then a shift according to the shift table of Morris-Pratt is applied. This gives a new value for kmpStart. We have to compare again skipStart and kmpStart. Case3. skipStart = kmpStart then another step can be performed with start = skipStart. Case4. kmpStart < wall < skipStart then another step can be performed with start = skipStart.

7 7 Example: step 1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 P = ACTACGT 0 1 2 3 4 5 6 ACTACGT First it uses the Skip Search algorithm to align T and P. wall = 5 0 1 2 3 4 5 6 ACTACGT (kmp’s shift) kmpstart = 3 (skip’s shift) skipstart = 4 wall kmpstart skipstart = 5 = 3 = 4 Case2. kmpStart < skipStart < wall then a shift according to the shift table of Morris-Pratt is applied. This gives a new value for kmpStart. We have to compare again skipStart and kmpStart. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT k = 5 start = 0

8 8 Example: step 1-1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT wall = 5 0 1 2 3 4 5 6 ACTACGT (kmp’s shift) kmpstart = 5 (skip’s shift) skipstart = 4 wall kmpstart skipstart = 5 = 4 Case1. skipStart < kmpStart then a shift according to the skip algorithm is applied which gives a new value for skipStart, and we have to compare again skipStart and kmpStart. 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT 0 1 2 3 4 5 6 ACTACGT k = 2 start = 0

9 9 Example: step 1-2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT k = 0 ∴ uses skip search algorithm 0 1 2 3 4 5 6 ACTACGT start = 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT start = 9

10 10 Example: step 2 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T = ACTACATATAGGACTACGTACCAGCATTACTACGTT wall = 10 k = 1 0 1 2 3 4 5 6 ACTACGT 0 1 2 3 4 5 6 ACTACGT (kmp’s shift) kmpstart = 10 (skip’s shift) skipstart = 12 0 1 2 3 4 5 6 ACTACGT wall kmpstart skipstart = 10 = 12 Case4. kmpStart < wall < skipStart then another attempt can be performed with start = skipStart. start = 9 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT start = 12

11 11 Example: step 3 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T = ACTACATATAGGACTACGTACCAGCATTACTACGTT wall = 19 match, k=7 0 1 2 3 4 5 6 ACTACGT 0 1 2 3 4 5 6 ACTACGT (kmp’s shift) kmpstart = 19 (skip’s shift) skipstart = 16 0 1 2 3 4 5 6 ACTACGT wall kmpstart skipstart = 19 = 16 Case1. skipStart < kmpStart then a shift according to the skip algorithm is applied which gives a new value for skipStart, and we have to compare again skipStart and kmpStart. start = 12 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT

12 12 Example: step 3-1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T = ACTACATATAGGACTACGTACCAGCATTACTACGTT k=0 0 1 2 3 4 5 6 ACTACGT start = 12 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT ∴ uses skip search algorithm 0 1 2 3 4 5 6 ACTACGT start = 19

13 13 Example: step 4 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T = ACTACATATAGGACTACGTACCAGCATTACTACGTT k=2 start = 19 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT 0 1 2 3 4 5 6 ACTACGT start = 21 0 1 2 3 4 5 6 ACTACGT (kmp’s shift) kmpstart = 21 (skip’s shift) skipstart = 21 0 1 2 3 4 5 6 ACTACGT wall = 21 wall kmpstart skipstart = 21 Case3. skipStart = kmpStart then another attempt can be performed with start = skipStart.

14 14 Example: step 5 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T = ACTACATATAGGACTACGTACCAGCATTACTACGTT k=0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT 0 1 2 3 4 5 6 ACTACGT start = 25 start = 21 ∴ uses skip search algorithm 0 1 2 3 4 5 6 ACTACGT

15 15 Example: step 6 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T = ACTACATATAGGACTACGTACCAGCATTACTACGTT k=1 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT 0 1 2 3 4 5 6 ACTACGT start = 28 start = 25 0 1 2 3 4 5 6 ACTACGT (kmp’s shift) kmpstart = 26 (skip’s shift) skipstart = 28 0 1 2 3 4 5 6 ACTACGT wall kmpstart skipstart = 26 = 28 Case4. kmpStart < wall < skipStart then another attempt can be performed with start = skipStart. wall = 26

16 16 Example: step 7 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 T = ACTACATATAGGACTACGTACCAGCATTACTACGTT 0 1 2 3 4 5 6 ACTACGT start = 28 match, k=7

17 17 Time Complexity The preprocessing phase of kmp Skip Search is O(m+σ)(σ is the number of alphabet.) The Searching Phase of Kmp Skip Search algorithm is O(n).

18 18 References [BM77] A Fast String Searching Algorithm, Boyer, R. S. and Moore, J. S., Communication of the ACM, Vol. 20, 1977, pp. 762-772. [HS91] Fast String Searching, Hume, A. and Sundy, D. M., Software, Practice and Experience, Vol. 21, 1991, pp. 1221-1248. [MTALSWW92] Speeding Up Two String-Matching Algorithms, Maxime C., Thierry L., Artur C., Leszek G., Stefan J., Wojciech P. and Wojciech R., Lecture Notes In Computer Science, Vol. 577, 1992, pp. 589-600. [MW94] Text algorithms, M. Crochemore and W. Rytter, Oxford University Press, 1994. [KMP77] Fast Pattern Matching in Strings, D.E. Knuth, J.H. Morris and V.R. Pratt, SIAM Journal on Computing, Vol. 6, No.2, 1977, pp 323-350. [T92] A variation on the Boyer-Moore algorithm, Thierry Lecroq, Theoretical Computer Science archive, Vol. 92, No.1, 1992, pp 119-144. [T98] Experiments on string matching in memory structures, Thierry Lecroq, Software—Practice & Experience archive, Vol. 28, No.5, 1998, pp 561-568 [T92] Tuning the Boyer-Moore-Horspool string searching algorithm, Timo Raita, Software—Practice & Experience archive, Vol. 22, No.10, 1992, pp. 879-884. [G94] String searching algorithms, G.A. Stephen, World Scientific Lecture Notes Series On Computing, Vol. 3, 1994, pp. 243.


Download ppt "1 KMP Skip Search Algorithm Advisor: Prof. R. C. T. Lee Speaker: Z. H. Pan Very Fast String Matching Algorithm for Small Alphabets and Long Patterns, Christian,"

Similar presentations


Ads by Google