Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Fast String Searching Algorithm Robert S. Boyer, and J Strother Moore. Communication of the ACM, vol.20 no.10, Oct. 1977.

Similar presentations


Presentation on theme: "A Fast String Searching Algorithm Robert S. Boyer, and J Strother Moore. Communication of the ACM, vol.20 no.10, Oct. 1977."— Presentation transcript:

1 A Fast String Searching Algorithm Robert S. Boyer, and J Strother Moore. Communication of the ACM, vol.20 no.10, Oct. 1977

2 Outline:  Introduction  The Knuth-Morris-Pratt algorithm  The Boyer-Moore algorithm Bad Character heuristic Good Suffix heuristic Matching Algorithm  Experimental Result  Conclusion

3 Introduction  String Matching: Searching a pattern from a text or a longer string. If the pattern exist in the string, return the position of the first character in the substring which match the pattern. string spattern

4 Introduction (cont.)  Some definition: m : the length of the pattern. n : the length of the string( or text ). s (shift): the distance between first character of matched substring and start character. w  x : a string w is a prefix of a string x. w  x : a string w is a suffix of a string x.

5 Introduction (cont.)  The naive string-matching algorithm:  Time Complexity: Θ ((n-m+1)m) in the worse case. Θ(n 2 ) if m = for s ← 0 to n-m do if pattern[1..m] = string[s+1..s+m] printf “ Pattern occurs with shift ” s

6 Knuth-Morris-Pratt Algorithm BABCBAABABABCAB string ACBAABA pattern s q BABCBAABABABCAB string ACBAABA pattern s’s’ k s + q = s ’ + k

7 Knuth-Morris-Pratt Algorithm(cont.)  Prefix Function: f(j) = largest i < j such that P[1..i] = P[j-i+1..j] 0 if I dose not exist. ABABA PqPq PkPk P k  P q ABA

8 Knuth-Morris-Pratt Algorithm(cont.)  Prefix Function Algorithm: f[1] ← 0 k ← 0 for q ← 2 to m do while k>0 and P[k+1] ≠P[q] do k ← f[k] if P[k+1] = P[q] then k ← k+1 f[q] = k return f[1..m]

9 Knuth-Morris-Pratt Algorithm(cont.)  Example:  Time Complexity: Prefix function : O(m) by amortize analysis Matching function: O(n) Total : O(m+n)  Linear Complexity ABABACABABA 00 1110987654321 k P[k] f[k] 321054321

10 The Boyer-Moore Algorithm  Symbols used: Σ : the set of alphabets patlen : the length of pattern m : the last m characters of pattern matched char : the mismatched character m ……… string pattern char

11 Characteristic  Match pattern from rightmost character of the pattern to the left most character of the pattern.  Pattern is relatively long, and Σ is reasonably large, this algorithm is likely to be the most efficient string- matching algorithm.

12 Bad Character heuristic  Observation 1: if the char doesn ’ t occur in pat: Pattern Shift : j character String pointer shift: patlen character Example: ADCABCABA CBA

13 Bad Character heuristic (cont.)  Observation 2: The char occur in the pattern  The rightmost char in pattern in position δ 1 [char] and the pointer to the pattern is in j  If j < δ 1 [char] we shift the pattern right by 1  If j > δ 1 [char] we shift the pattern right by j- δ 1 [char] δ 1 [] is an array which size is the size of Σ

14 Bad Character heuristic (cont.)  Example: ACBBACABCA ABC j = 3 and δ1[B] = 2 pattern shift 1 string pointer shift 1 (m+ pattern shift)

15 Good Suffix heuristic  2 sequence [c 1.. c n ] and [d 1.. d n ] is unify if for j from 1 to patlen, either c i = d i or c i = $ or d i = $, which $ be a character doesn’t occur in pat.  the position of rightmost plausible reoccurrence, rpr(j) = k, such that [pat(j+1)..pat(patlen)] and [pat(k)..pat(k+patlen – j - 1)] are unify, and either k≤1 or pat(k-1) ≠pat(j)

16 Good Suffix heuristic (cont.)  Example:  Pattern shift : j+1 – rar(j)  String pointer shift: m + j + 1 – rar(j) = strlen – j + j + 1 – rar(j) = δ 2 [j] -7-6-5-4-3-2 0123456789 $$$$$$$$ABXYCDEXY -7-6-5-4-3-2 301 j pat rpr(j)

17 Good Suffix heuristic (cont.)  Algorithm:

18 Boyer-Moore Matching Algorithm i = patlen; if n < patlen return false j = patlen While j > 0 do { if string(i) = pat(j) j = j-1 i = i-1 else i = i + max(δ 1 (string(i)), δ 2 (j) ) if i > n then return false }

19 Boyer-Moore Matching Algorithm  Time Complexity: Bad Character heuristic :O(patlen) Good Suffix heuristic : O(patlen) Matching : O(n) Total O(n+patlen)

20 Experimental Result

21 Conclusion  Boyer-Moore algorithm have sublinear time complexity :O(n+m)  Boyer-Moore is most efficient string matching algorithm when pattern is long and character is reasonably large.


Download ppt "A Fast String Searching Algorithm Robert S. Boyer, and J Strother Moore. Communication of the ACM, vol.20 no.10, Oct. 1977."

Similar presentations


Ads by Google