# Tuned Boyer Moore Algorithm

Fast string searching , HUME A. and SUNDAY D.M., Software - Practice & Experience 21(11), 1991, pp Adviser: R. C. T. Lee Speaker: C. W. Cheng National Chi Nan University

Problem Definition Input: a text string T with length n and a pattern string P with length m. Output: all occurrences of P in T.

Definition Ts : the first character of a string T aligns to a pattern P. Pl : the first character of a pattern P aligns to a string T. Tj : the character of the jth position of a string T. Pi : the character of the ith position of a pattern P. Pf : the last character of a pattern P. n : The length of T. m : The length of P.

Rule 2-2: 1-Suffix Rule (A Special Version of Rule 2)
Consider the 1-suffix x. We may apply Rule 2-2 now.

Introduction simplification of the Boyer-Moore algorithm.
uses only the bad-character shift. easy to implement. very fast in practice uses Rule 2-2: 1-Suffix Rule

In this algorithm, We always focus on the last character of the window of T and try to slide the pattern to match the last character of T.

Tuned Boyer Moore Algorithm Rule
Since Ts+m-1 ≠ Pf , we move the pattern P to right such that the largest position i in the right of Pi is equal to Ts+m. We can shift the pattern at least (m-i) positions right until Ts+m-1 = Pf. s s+m-1 T x z y P z x y i f 1 Shift P z x y i f 1 Shift P z x y 1 i f

Tuned Boyer Moore Preprocessing Table
In this algorithm, we construct a table as follow. Let x be a character in the alphabet. We record the position of the last x, if it exists in P, we record the position of x from the second last position of P. If x does not exist in P1 to Pm-1, we record it as m.

Tuned Boyer Moore Preprocessing Table
Example： P=AGCAGAC A C G T bmBC 1 4 2 7

Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 G C A T A G C

Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 tbmBC[A]=1, shift=1 G C A T A G C

Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 G C A T A G C

Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 tbmBC[G]=2, shift=2 G C A T A G C

Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 G C A T A G C

Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 G C A T match A G C

Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 tbmBC[C]=4, shift=4 G C A T exact match A G C

Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 G C A T A G C

Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 G C A T match A G C

Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 tbmBC[C]=4, shift=4 G C A T mismatch A G C

Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 G C A T A G C

Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 tbmBC[T]=7, shift=7 G C A T A G C

Example Text string T=GCGAGCAGACGTGCGAGTACG Pattern string P=AGCAGAC G
tbmBC 1 4 2 7 G C A T A G C

Time complexity preprocessing phase in O(m+ σ) time and O(σ) space complexity, σ is the number of alphabets in pattern. searching phase in O(mn) time complexity.

