Download presentation
Presentation is loading. Please wait.
1
A Fast String Searching Algorithm Robert S. Boyer, and J Strother Moore. Communication of the ACM, vol.20 no.10, Oct. 1977
2
Outline: Introduction The Knuth-Morris-Pratt algorithm The Boyer-Moore algorithm Bad Character heuristic Good Suffix heuristic Matching Algorithm Experimental Result Conclusion
3
Introduction String Matching: Searching a pattern from a text or a longer string. If the pattern exist in the string, return the position of the first character in the substring which match the pattern. string spattern
4
Introduction (cont.) Some definition: m : the length of the pattern. n : the length of the string( or text ). s (shift): the distance between first character of matched substring and start character. w x : a string w is a prefix of a string x. w x : a string w is a suffix of a string x.
5
Introduction (cont.) The naive string-matching algorithm: Time Complexity: Θ ((n-m+1)m) in the worse case. Θ(n 2 ) if m = for s ← 0 to n-m do if pattern[1..m] = string[s+1..s+m] printf “ Pattern occurs with shift ” s
6
Knuth-Morris-Pratt Algorithm BABCBAABABABCAB string ACBAABA pattern s q BABCBAABABABCAB string ACBAABA pattern s’s’ k s + q = s ’ + k
7
Knuth-Morris-Pratt Algorithm(cont.) Prefix Function: f(j) = largest i < j such that P[1..i] = P[j-i+1..j] 0 if I dose not exist. ABABA PqPq PkPk P k P q ABA
8
Knuth-Morris-Pratt Algorithm(cont.) Prefix Function Algorithm: f[1] ← 0 k ← 0 for q ← 2 to m do while k>0 and P[k+1] ≠P[q] do k ← f[k] if P[k+1] = P[q] then k ← k+1 f[q] = k return f[1..m]
9
Knuth-Morris-Pratt Algorithm(cont.) Example: Time Complexity: Prefix function : O(m) by amortize analysis Matching function: O(n) Total : O(m+n) Linear Complexity ABABACABABA 00 1110987654321 k P[k] f[k] 321054321
10
The Boyer-Moore Algorithm Symbols used: Σ : the set of alphabets patlen : the length of pattern m : the last m characters of pattern matched char : the mismatched character m ……… string pattern char
11
Characteristic Match pattern from rightmost character of the pattern to the left most character of the pattern. Pattern is relatively long, and Σ is reasonably large, this algorithm is likely to be the most efficient string- matching algorithm.
12
Bad Character heuristic Observation 1: if the char doesn ’ t occur in pat: Pattern Shift : j character String pointer shift: patlen character Example: ADCABCABA CBA
13
Bad Character heuristic (cont.) Observation 2: The char occur in the pattern The rightmost char in pattern in position δ 1 [char] and the pointer to the pattern is in j If j < δ 1 [char] we shift the pattern right by 1 If j > δ 1 [char] we shift the pattern right by j- δ 1 [char] δ 1 [] is an array which size is the size of Σ
14
Bad Character heuristic (cont.) Example: ACBBACABCA ABC j = 3 and δ1[B] = 2 pattern shift 1 string pointer shift 1 (m+ pattern shift)
15
Good Suffix heuristic 2 sequence [c 1.. c n ] and [d 1.. d n ] is unify if for j from 1 to patlen, either c i = d i or c i = $ or d i = $, which $ be a character doesn’t occur in pat. the position of rightmost plausible reoccurrence, rpr(j) = k, such that [pat(j+1)..pat(patlen)] and [pat(k)..pat(k+patlen – j - 1)] are unify, and either k≤1 or pat(k-1) ≠pat(j)
16
Good Suffix heuristic (cont.) Example: Pattern shift : j+1 – rar(j) String pointer shift: m + j + 1 – rar(j) = strlen – j + j + 1 – rar(j) = δ 2 [j] -7-6-5-4-3-2 0123456789 $$$$$$$$ABXYCDEXY -7-6-5-4-3-2 301 j pat rpr(j)
17
Good Suffix heuristic (cont.) Algorithm:
18
Boyer-Moore Matching Algorithm i = patlen; if n < patlen return false j = patlen While j > 0 do { if string(i) = pat(j) j = j-1 i = i-1 else i = i + max(δ 1 (string(i)), δ 2 (j) ) if i > n then return false }
19
Boyer-Moore Matching Algorithm Time Complexity: Bad Character heuristic :O(patlen) Good Suffix heuristic : O(patlen) Matching : O(n) Total O(n+patlen)
20
Experimental Result
21
Conclusion Boyer-Moore algorithm have sublinear time complexity :O(n+m) Boyer-Moore is most efficient string matching algorithm when pattern is long and character is reasonably large.
Similar presentations
© 2025 SlidePlayer.com Inc.
All rights reserved.