Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Fast String Matching Algorithm The Boyer Moore Algorithm.

Similar presentations


Presentation on theme: "A Fast String Matching Algorithm The Boyer Moore Algorithm."— Presentation transcript:

1 A Fast String Matching Algorithm The Boyer Moore Algorithm

2 The obvious search algorithm Considers each character position of str and determines whether the successive patlen characters of str matches pat. In worst case, the number of comparisons is in the order of i*patlen. Ex. pat: aab ; str:..aaa aac.

3 Knuth-Pratt-Morris Algoritm Linear search algorithm. Preprocesses pat in time linear in patlen and searches str in time linear in i+patlen. EXAMPLE HERE IS A SIMPLE EXAMPLE EXAMP LE …

4 Characteristics of Boyer Moore Algorithm Basic idea: string matches the pattern from the right rather than from the left. Expected value: c*( i +patlen ), c<1 Preprocessing pat and compute two tables: delta1 & delta2 for shifting pat & the pointer of str. Ex. pat : AT-THAT ; str : … WHICH-FINALLY- HALTS. — AT-THAT-POINT

5 Informal Description Compare the last char of the pat with the patlen th char of str : AT-THAT WHICH-FINALLY-HALTS. — AT-THAT- POINT Observation 1 : char is not to occur in pat, skip patlen( =delta1(F) ) chars of str. AT-THAT

6 Informal Description Observation 2 : char is in pat, slide pat down delta1(-) positions so that char is aligned to the corresponding character in pat. delta1( char ) = if char not occur in pat,then patlen ; else patlen – j, where j is the maximum integer such that pat(j)=char. AT-THAT WHICH-FINALLY-HALTS.--AT- THAT-POINT

7 Informal Description Observation 3a: str matches the last m chars of pat, and came to a mismatch at some new char. Move strptr by delta1(L).(pat shifted by delta1(L)-m ) AT-THAT … FINALLY-HALTS.--AT-THAT-POINT AT- THAT

8 Informal Description Observation 3b: the final m chars of pat (a subpat) is matched, find the right most plausible reoccurrence of the subpat, align it with the matched m chars of str (slide pat delta2(-) positions). AT-THAT … FINALLY-HALTS. — AT-THAT-POINT AT- THAT

9 The delta1 & delta2 tables The delta1 table has as many entries as there are chars in the alphabet. Ex. pat : a b c d e ; a t – t h a t delta1: 4 3 2 1 0 else,5; 1 0 4 0 2 1 0 else,7 The delta2 table has as many entries as there are chars in pat. delta2( j )= ( j + 1- rpr(j) ) + (patlen – j)= patlen + 1 - rpr(j) Ex. pat: a b c d e ; a t - t h a t delta2: 9 8 7 6 1 ; 11 10 9 8 7 8 1

10 The algorithm stringlen length of string. i patlen. top : if i > stringlen then return false. j patlen. loop: if j=0 then return i+1. if string(i)=pat(j) then j j-1 i i-1 goto loop. close; i i +max( delta1(sting(i)), delta2(j)) goto top.

11

12 Performance (empirical evidence)

13 The Implementation in mstring.c Function: make_skip(char*, int) –Purpose: create the skip(delta 1) table –Function inputs: char *ptrn, int plen –Local variables: int *skip, *sptr –Return: int *skip Function: make_shift(char*, int) –Purpose: create the shift(delta2) table –Function inputs: char*ptrn, int plen –Local variables: int *shift, *sptr; char *pptr, c –Return: int *shift

14 Flowchart of make_skip() Allocate memory to skip *skip++=plen+1 plen==0? skip[*ptrn++]=plen-- Return skip true false

15 make_skip() int *make_skip(char *ptrn, int plen) { int *skip = (int *) malloc(256 * sizeof(int)); int *sptr = &skip[256]; if (skip == NULL) FatalPrintError("malloc"); while(sptr-- != skip) *sptr = plen + 1; while(plen != 0) skip[(unsigned char) *ptrn++] = plen--; return skip; }

16 Allocate memory to shift c=ptrn[plen-1]; Look for rpr of c Look for two identical subpat Assign values to shift Return shift Procedures of make_shift():

17 make_shift() int *shift = (int *) malloc(plen * sizeof(int)); int *sptr = shift + plen - 1; char *pptr = ptrn + plen - 1; char c; if (shift == NULL) FatalPrintError("malloc"); c = ptrn[plen - 1]; *sptr = 1;

18 make_shift() while(sptr-- != shift) { char *p1 = ptrn + plen - 2, *p2, *p3; do { while(p1 >= ptrn && *p1-- != c); p2 = ptrn + plen - 2; p3 = p1; while(p3 >= ptrn && *p3-- == *p2-- && p2 >= pptr); } while(p3 >= ptrn && p2 >= pptr); // p2>=j,p3>=1 *sptr = shift + plen - sptr + p2 - p3; pptr--; } return shift;

19 Ex:j=5 j= 1 2 3 4 5 6 7 Pat: e d b c a b c step1 p1 step2 p3 p2 syep3 p3 p2 ∴ delta2( j )= (p2-p3)+ (plen – j) =5


Download ppt "A Fast String Matching Algorithm The Boyer Moore Algorithm."

Similar presentations


Ads by Google