Download presentation

Presentation is loading. Please wait.

Published byKristin Farmer Modified over 2 years ago

1
CSE 30331 Lecture 23 – String Matching Simple (Brute-Force) Approach Knuth-Morris-Pratt Algorithm Boyer-Moore Algorithm

2
The Problem Find the first occurrence of the pattern P in text T. The number of characters in P is m The number of characters in T is n

3
The Simple Approach For each position j in the text If T[ j.. j+m) matches P[0..m) stop : pattern found at position j Advantage: simple to increment Disadvantage: may require ability to push previously read characters back into input stream Worst Case Efficiency: O(m*n) The pattern is moved forward only one position each time a mismatch is found, no matter how much of the pattern matched prior to the mismatch character

4
Knuth-Morris-Pratt (KMP) Based on FSA for recognizing the pattern P The FSA is represented by a KMP flowchart States are letters in the pattern P Arcs are SUCCESS or FAIL On success ( T[ j ] == P[ k ] ) move forward with match ( j++ & k++ ) On failure ( T[ j ] != P[ k ] ) Move backward in the pattern (or shift the pattern forward over the text) to align the rightmost character P [ fail [ k ] ] with text character T [ j ] preserving the longest matching prefix

5
KMP Fail Links: hubbahubba Example pattern: hubbahubba P: H U B B A H U B B A K: 0 1 2 3 4 5 6 7 8 9 Fail[k] -1 0 0 0 0 0 1 2 3 4 Match to text: hubbahubbletelescope... hubbahubbalast A != Lfail[9]= 4 hubbahubbafirst A != Lfail[4]= 0 hubbahubba H != Lfail[0]= -1 hubbahubba hubbahubbletelescope... ^

6
KNP – Building Fail Links Pattern: ABABDD If P [ k ] != T [ j ] then K new = fail [ k ] is the position of the pattern character with the longest prefix matching the text T prior to the mismatch character T [ j ] Finding fail[k]: Go to P [ k-1 ] & find its fail [ k-1 ] (prefix that matches up to T[ k-2 ] ) If P [ fail[k-1] ] matches P[k-1], then fail [ k ] becomes P[ fail[k-1] ] + 1 Else follow next fail arrow fail [ fail [ k-1 ] ] and repeat Read char ABABDD* 012345

7
KNP – Building Fail Links void kmpSetup(char P[], int m, int fail[]) { int k, s; fail[0] = -1; // ch != P[0], read another ch for (k=1; k

8
KNP – Building Fail Links Pattern: A B A B D D Fail: -1 0 void kmpSetup(char P[], int m, int fail[]) { int k, s; fail[0] = -1; // ch != P[0], read another ch for (k=1; k

9
KNP – Building Fail Links Pattern: A B A B D D Fail: -1 0 0 void kmpSetup(char P[], int m, int fail[]) { int k, s; fail[0] = -1; // ch != P[0], read another ch for (k=1; k

10
KNP – Building Fail Links Pattern: A B A B D D Fail: -1 0 0 1 void kmpSetup(char P[], int m, int fail[]) { int k, s; fail[0] = -1; // ch != P[0], read another ch for (k=1; k

11
KNP – Building Fail Links Pattern: A B A B D D Fail: -1 0 0 1 2 void kmpSetup(char P[], int m, int fail[]) { int k, s; fail[0] = -1; // ch != P[0], read another ch for (k=1; k

12
KNP – Building Fail Links Pattern: A B A B D D Fail: -1 0 0 1 2 0 void kmpSetup(char P[], int m, int fail[]) { int k, s; fail[0] = -1; // ch != P[0], read another ch for (k=1; k

13
KMP Fail Links: on mismatch, new k = fail[k] Example pattern: ABABDD fail: -1 0 0 1 2 0 ABABDD.ABABDDA != X sofail[0] = -1 X?????X?????Skip X & k=0 ABABDD.ABABDDB != X sofail[1] = 0 AX????AX??????k=0 (shifts pattern 1) ABABDD..ABABDD2nd A != X sofail[2] = 0 ABX???ABX???k=0 (shifts pattern 2) ABABDD..ABABDD2nd B != X sofail[3] = 1 ABAX??ABAX????k=1 (shifts pattern 2)

14
KMP Fail Links: on mismatch, new k = fail[k] Example pattern: ABABDD fail: -1 0 0 1 2 0 ABABDD..ABABDDD != X sofail[4] = 2 ABABX?ABABX?k=2 (shifts pattern 2) ABABDD.....ABABDD2nd D != X sofail[5] = 0 ABABDXABABDXk=0 (shifts pattern 5)

15
KNP Scan Algorithm int kmpScan (char P[], char T[], int m, int fail[]) { int match = -1; // position of match in text int j = 0, k = 0; while (! atEndOfText(T,j)) { // there is more text if (k == m) { match = j - m; // matched entire pattern, so stop break; } if (k == -1) { // nothing in pattern matched last text char, so j++; // get next text character k = 0; // start pattern over } else if (T[j] == P[k]) { j++; k++; // move forward one character in pattern and text } else { k = fail[k]; // follow fail link to best restart in pattern } return match; }

16
KNP - Efficiency Building Fail Links – O(m) Scanning text – O(n) Overall – O(m+n) = O(n)

17
Boyer-Moore (BM) Heuristic # 1 Match pattern Right-to-Left Create a charJump[ch] array with entry for each character in the alphabet (ASCII code) If T[ j ] != P[ k ] then If T[ j ] appears in P[0..k) then the rightmost occurrence is aligned with T[ j ] Else the pattern P is aligned beginning at T[ j+1 ] J new = charJump[ T[ j ] ] matching resumes with T[ j new ] and P[m-1] This skips multiple text characters WITHOUT ever examining them

18
Boyer Moore Algorithm Heuristic # 2 MatchJump[k] = slide[k] + m – k Slide[k] is amount of slide to align substrings M-k is length of suffix (substring) being realigned Similar to KMP fail links, but calculated right to left If a suffix has matched in P & T and that same substring appears elsewhere in P, then upon a mismatch the pattern P is “slid” to align the rightmost such matching substring with the suffix in T Matching resumes at the new end of the pattern determined by matchJump [ k ]

19
BM - Example Pattern: BATSANDCATS BATSANDCATS first Pattern alignment BATSANDCATScharJump[T[j]] aligns N’s BATSANDCATS matchJump[k] aligns ATS’s TWOOLDGNATSCANBELIKEBATSANDCATS The Text New j (where matching resumes) is at end of pattern P, but which (S =?= A) or (S =?= I) Use MAX(charJump(T[j]),matchJump[k])

20
Computing individual charJumps // find cJ[ch] for each character ch in pattern P void computeJumps (char P[], int m, int alpha, int charJump[]) { // assume jump distance is entire pattern length for all // characters that do not match a pattern letter. for (int ch=0; ch

21
Computing substring matchJumps void computeMatchJumps (char P[], int m, int matchJump[]) { int k, s, low, shift, *sufx = new int[m+1]; // note: sufx[0] tells what suffix matches a prefix of P for (k=0;k

22
Computing substring matchJumps // if no suffix match at k+1, compute slide based on prefix that // matches suffix. Prefix length = (m - shift). low = 1; shift = sufx[0]; while (shift <= m) { for (k=low; k<=shift; k++) { if (shift < matchJump[k-1]) matchJump[k-1] = shift; } low = shift + 1; shift = sufx[shift]; } // Add number of matched characters to slide amount for (k=0; k

23
BM Scan Algorithm int boyerMooreScan (char P[], char T[], int m, int charJump[], int matchJump[]) { int match = -1, j = m-1, k = m-1; while (! endOfText(T,j)){ if (k < 0) { match = j + 1; break; // entire pattern matches, so stop } if (T[j] == P[k]) { j--; k--; // continue match right-to-left } else { jump = matchJump[k]; if (charJump[(int)t[j]] > matchJump[k]) jump = charJump[(int)t[i]]; j += jump; // jump forward & restart matching at right k = m-1; } return match; }

24
BM - Example Pattern: WOWWOW mJump: 876731cJump: ‘W’=0, ‘O’=1, others=6 WOWTHISISWOWXOWWOWWOW the TEXT (21 chars) 1 1111111111121# of comparisons (15) WOWWOW W != I, cJ[I]=6, mJ[5]=1 WOWWOWW != S, cJ[S]=6, mJ[2]=6 WOWWOWW != X, cJ[X]=6, mJ[3]=7 WOWWOWW != O, cJ[O]=1, mJ[5]=1 WOWWOWmatch Note: cJump[‘W’]=0 means simply that if the TEXT character is ‘W’ the pattern realignment placing the rightmost pattern ‘W’ over the text ‘W’ is achieved by not moving the pattern Note: the algorithm will NOT work using only cJump

25
BM Algorithm Efficiency Building charJump[ ] – O( ) Building matchJump[ ] – O(m) Scanning text – O(n) In practice, only every 3 or 4 characters are examined in text so BM is quite fast Overall – O(n)

26
String Matching Program Program to demonstrate all three approaches to string matching demos\strScan.cpp

Similar presentations

OK

Pattern Matching1. 2 Outline and Reading Strings (§9.1.1) Pattern matching algorithms Brute-force algorithm (§9.1.2) Boyer-Moore algorithm (§9.1.3) Knuth-Morris-Pratt.

Pattern Matching1. 2 Outline and Reading Strings (§9.1.1) Pattern matching algorithms Brute-force algorithm (§9.1.2) Boyer-Moore algorithm (§9.1.3) Knuth-Morris-Pratt.

© 2018 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on astronomy and astrophysics encyclopedia Ppt on question tags worksheets Ppt on network security algorithms Ppt on power system harmonics pdf Maths addition ppt on blocks Ppt on touch screen technology download Free download ppt on probability for class 9 Ppt on social media on business Ppt on sound navigation and ranging system one Ppt on sectors of indian economy class 10