Presentation is loading. Please wait.

Presentation is loading. Please wait.

CSC 212 – Data Structures Lecture 36: Pattern Matching.

Similar presentations


Presentation on theme: "CSC 212 – Data Structures Lecture 36: Pattern Matching."— Presentation transcript:

1 CSC 212 – Data Structures Lecture 36: Pattern Matching

2 Suffixes and Prefixes “I am the Lizard King!” PrefixesSuffixes I I I a I am … I am the Lizard Kin I am the Lizard King I am the Lizard King! ! g! ng! ing! … am the Lizard King! am the Lizard King! I am the Lizard King!

3 KMP Algorithm Asymptotically optimal algorithm  Means cannot do better in big-Oh terms Compares from left-to-right  So like BruteForce, not Boyer-Moore  But shifts pattern intelligently Relies on a Key Insight™  Preprocess pattern to avoid redundant comparisons  Always go forward; Never, ever look back

4 The KMP Algorithm x j.. abaab..... abaaba abaaba Do not repeat these comparisons Need to resume comparing here Shifting P here ensures these two entries match

5 KMP Failure Function Assume P[j] ≠ T[k]. Need rank in P to next compared to T[k]  E.g., How should we shift P after a miss? Uses failure function, F(j-1),  One value defined for each rank in P  Specifies rank j in P must restart comparisons

6 Computing Failure Function For rank j, find longest proper prefix and suffix of P[0...j]  For speed, store failure function in array  Unlike Boyer-Moore, works w/infinite alphabets Takes at most O(2m) = O(m) time Similar algorithm computes failure function & KMP

7 Computing Failure Function Algorithm KMPFailureFunction(String P) F[0]  0 i  1 j  0 while i < P.length() if P[i] = P[j] // So, P[0…j] = P[i - j…i] F[i]  j + 1 // Record the length of this prefix/suffix i  i + 1// Advance a character and see if still matches j  j + 1 else if j > 0 // No match, need to restart our computation j  F[j - 1] // Skip over longest prefix that is also a suffix else F[i]  0// No prefix of P[0…i] is a suffix of P[0…i] i  i + 1// Move to the next character return F

8 KMP Failure Function j01234  P[j]P[j]abaaba F(j)F(j) 00112 

9 The KMP Algorithm Algorithm KMPMatch(String T, String P) F  KMPFailureFunction(P) i  0 j  0 while i < T.length() if P[j] = T[i] // So, P[0…j] = T[i - j…i] if j = P.length() - 1 return i - j i  i + 1// Advance and see if still a match j  j + 1 else if j > 0 // No match, but a prefix of P[0…j-1] matches j  F[j - 1] // So skip past longest prefix that is a suffix else i  i + 1// Nothing to reuse, move to the next character return F

10 Example j01234  P[j]P[j]abacab F(j)F(j)00101 

11 The KMP Algorithm In each pass of KMPMatch, either:  P[j]=T[i] i increases by one, or  P[j]≠T[i] & j > 0 P shifted right by at least 1  P[j]≠T[i] & j = 0 i increases by 1 So at most 2 n iterations of loop KMPMatch takes O(2n) = O(n) time KMPFailureFunction needs O(m) time Thus, algorithm runs in O(m  n) time

12 Your Turn Get back into groups and do activity

13 Before Next Lecture… Finish up assignments Start thinking about questions for Final


Download ppt "CSC 212 – Data Structures Lecture 36: Pattern Matching."

Similar presentations


Ads by Google