Presentation is loading. Please wait.

Presentation is loading. Please wait.

Knuth-Morris-Pratt Algorithm left to right scan like the naïve algorithm one main improvement –on a mismatch, calculate maximum possible shift to the right.

Similar presentations


Presentation on theme: "Knuth-Morris-Pratt Algorithm left to right scan like the naïve algorithm one main improvement –on a mismatch, calculate maximum possible shift to the right."— Presentation transcript:

1 Knuth-Morris-Pratt Algorithm left to right scan like the naïve algorithm one main improvement –on a mismatch, calculate maximum possible shift to the right for the pattern

2 Basic Idea Definition –For each position i in pattern P, define sp i (P) to be the length of the longest proper suffix of P[1..i] that matches a prefix of P –Define sp i ’(P) to have the added condition that P(i+1) is not equal to P(sp i ’(P) + 1) may denote as sp i and sp i ’ when P is clear from context Usage –mismatch occurs between P(i+1) and T(k) –Shift P to the right so that P(sp i ’+1) aligns with T(k) shift P i-sp i ’ spaces total –If P is found, shift by n - sp n ’ places

3 Illustration of sp and sp ’ 0 1 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 a b c d a b c e a b c d a b c e f sp i 0 0 0 0 1 2 3 0 1 2 3 4 5 6 7 8 0 sp i ’0 0 0 0 0 0 3 0 0 0 0 0 0 0 0 8 0

4 Illustration 1 of KMP shift 0 1 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 x y a b c a b d a b c f q f e a b a b c a b d a b d

5 Illustration 2 of KMP shift 0 1 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 x y a b x a b d a b c f q f e a b a b x a b d a b d

6 sp i ’ and Z-boxes Definitions –Position j > 1 maps to i if i is the right end of a Z-box that starts at j Note, i = j + Z j -1 in this case Observation –For any i > 1, sp i ’ = 0 if no j maps to i –Otherwise, sp i ’ = max j maps to i Z j Choosing the smallest j that maps to i leads to the maximum possible Z j value

7 Z-based computation of sp i ’ for (i=1;i<=n;i++) sp i ’ = 0; for (j=n; j>=2; j--) { i = j+Z j -1; sp i ’ = Z j ; }

8 Observations Original KMP defined in terms of failure functions F(i) and F’(i) –F’(i) = sp i-1 ’ and F(i) = sp i-1 for i = 1 to n+1 2m upper bound on number of comparisons –once a position in T matches, it is never compared again to any position in P –there may be cases where positions in T that mismatch are compared against multiple positions in P, but this can happen at most m times total Full implementation of KMP is on page 27

9 FSA KMP algorithm Definition –For each position i in pattern P and each character x in , define sp (i,x) (P) to be the length of the longest proper suffix of P[1..i] that matches a prefix of P and P(sp i +1) = x Observation –Now each position in T will be compared exactly once, even on a mismatch

10 Z-based computation of sp (i,x) for (i=1;i<=n;i++) for (all x in  ) sp (i,x) = 0; for (j=n; j>=2; j--) { i = j+Z j -1; x = P(Z j +1); sp (i,x) = Z j ; }


Download ppt "Knuth-Morris-Pratt Algorithm left to right scan like the naïve algorithm one main improvement –on a mismatch, calculate maximum possible shift to the right."

Similar presentations


Ads by Google