Presentation is loading. Please wait.

Presentation is loading. Please wait.

The Galil-Giancarlo algorithm

Similar presentations


Presentation on theme: "The Galil-Giancarlo algorithm"— Presentation transcript:

1 The Galil-Giancarlo algorithm
On the exact complexity of string matching: upper bounds , SIAM Journal on Computing , Vol. 21 , No. 3 , 1992 , pp.   Galil, Z. and Giancarlo, R.  Advisor: Prof. R. C. T. Lee Speaker: S. Y. Tang

2 String matching problem:
The Galil-Giancarlo algorithm is an algorithm which solves the string matching problem. String matching problem: Input: a text string T of length n and a pattern string P of length m. Output: all occurrences of P in T.

3 The Galil-Giancarlo algorithm(GG algorithm for short) is an algorithm which improves the worst case of the Colussi algorithm. There are two phases in the GG algorithm which are preprocessing and searching. The preprocessing phase is the same as the Colussi algorithm. The GG algorithm adds 5 cases to determine how to jump in the searching phase and this is the difference between GG algorithm and Colussi algorithm.

4 The cases under which the GG algorithm is not used.
Case1: The pattern has only one period. The entire window is skipped. There is no way to know whether there is a prefix in the window equal to a prefix of the pattern. Example: T: GCAGCGGGAC P: GGAGC GGAGC i 1 2 3 4 X[i] G A C Kmp[i] -1 Kmin[i] - Rmin[i] 5 Shift[i] mismatch shift

5 Case2: A prefix of the pattern is already known to be equal to a prefix of the window.
T: GGACGGAACGCA P: GGAGGGA GGAGGGA T: GCAGGAGCAGCA P: GGAGGAG GGAGGAG i 1 2 3 4 5 6 X[i] G A Kmp[i] -1 Kmin[i] - Rmin[i] Shift[i] mismatch shift i 1 2 3 4 5 6 X[i] G A Kmp[i] -1 Kmin[i] - Rmin[i] 7 Shift[i] mismatch shift

6 If l<k ; p[l+1]≠t[j+k] Case:3 Text
G A k = 2 If l>k G A Pattern l = 5 shift G A If l=k ; p[l+1]≠t[j+k] Case:2 Text G C T A k = 3 Pattern G A C l = 3 shift G A C If l<k ; p[l+1]≠t[j+k] Case:3 Text G C A k = 5 Pattern G A C l = 2 shift G A C

7 If l<k ; p[l+1]= t[j+k] Pattern
Case: 4 Text G A C k = 3 If l=k ; p[l+1]= t[j+k] ; Pattern G A C l = 3 Do not need to shift. Case: 5 Text G A C k = 5 If l<k ; p[l+1]= t[j+k] Pattern G A C l = 3 shift G A C

8 Example(1/7) Shift[4] = 4 We first compare noholes by using
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 T T A G C P G A C mismatch shift Shift[4] = 4 G A C i 1 2 3 4 5 6 X[i] G A C Kmp[i] -1 Kmin[i] - Rmin[i] 7 Shift[i] We first compare noholes by using phase 1 of Colussi algorithm and shift by using the Shift[i].

9 Example(2/7) T P match i 1 2 3 4 5 6 X[i] G A C Kmp[i] -1 Kmin[i] -
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 T T A G C P G A C match i 1 2 3 4 5 6 X[i] G A C Kmp[i] -1 Kmin[i] - Rmin[i] 7 Shift[i]

10 Example(3/7) Shift[0] = 5 After all noholes are matched, we
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 T T A G C P G A C mismatch shift G A C Shift[0] = 5 After all noholes are matched, we compare holes by using phase 2 of Colussi algorithm and shift by using the Shift[i]. i 1 2 3 4 5 6 X[i] G A C Kmp[i] -1 Kmin[i] - Rmin[i] 7 Shift[i]

11 Example(4/7) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 T T A G C k = 2 P G A C l = 3 shift G A C i 1 2 3 4 5 6 X[i] G A C Kmp[i] -1 Kmin[i] - Rmin[i] 7 Shift[i] In this case, we use the Case 1 of the GG algorithm to shift because this case satisfies the condition overlay < l of using the GG algorithm and l > k.

12 Example(5/7) Shift[2] = 5 After comparing the cases of
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 T T A G C P G A C 5 1 2 4 3 All noholes are match mismatch shift G A C i 1 2 3 4 5 6 X[i] G A C Kmp[i] -1 Kmin[i] - Rmin[i] 7 Shift[i] Shift[2] = 5 After comparing the cases of the GG algorithm, We return to use the Colussi algorithm.

13 Example(6/7) 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 T T A G C k = 2 P G A C l = 3 shift G A C i 1 2 3 4 5 6 X[i] G A C Kmp[i] -1 Kmin[i] - Rmin[i] 7 Shift[i] In the case, we use the Case 5 of the GG algorithm to shift because this case satisfies the condition of using the GG algorithm and l < k.

14 Example(7/7) After comparing the cases of the GG algorithm, We return
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 T T A G C P G A C 1 3 2 Exact match i 1 2 3 4 5 6 X[i] G A C Kmp[i] -1 Kmin[i] - Rmin[i] 7 Shift[i] After comparing the cases of the GG algorithm, We return to use the Colussi algorithm.

15 Time complexity preprocessing phase in O(m) time and space complexity.
searching phase in O(n) time complexity. performs (4/3)n text character comparisons in the worst case.

16 Conclusion The Galil-Giancarlo algorithm is very similar
to Colussi algorithm. The Colussis algorithm performs very badly if the pattern starts and ends with a sequence of repetitions of the same symbol. For these patterns Colussis algorithm shifts by a single position and (3/2)n comparisons are actually performed. Galil and Giancarlo devised a way to avoid these shifts by a single position.

17 References [B92] BRESLAUER, D., Efficient String Algorithmics, Ph. D. Thesis, Report CU , Computer Science Department, Columbia University, New York, NY, 1992. [GG92] On the exact complexity of string matching: upper bounds , Galil, Z. and Giancarlo, R. , SIAM Journal on Computing , Vol. 21 , No. 3 , 1992 , pp. 


Download ppt "The Galil-Giancarlo algorithm"

Similar presentations


Ads by Google