Pattern Matching in String

Slides:



Advertisements
Similar presentations
1 Average Case Analysis of an Exact String Matching Algorithm Advisor: Professor R. C. T. Lee Speaker: S. C. Chen.
Advertisements

© 2004 Goodrich, Tamassia Pattern Matching1. © 2004 Goodrich, Tamassia Pattern Matching2 Strings A string is a sequence of characters Examples of strings:
Space-for-Time Tradeoffs
String Searching Algorithm
String Searching Algorithms Problem Description Given two strings P and T over the same alphabet , determine whether P occurs as a substring in T (or.
Boyer Moore Algorithm String Matching Problem Algorithm 3 cases Searching Timing.
Prefix & Suffix Example W = ab is a prefix of X = abefac where Y = efac. Example W = cdaa is a suffix of X = acbecdaa where Y = acbe A string W is a prefix.
1 A simple fast hybrid pattern- matching algorithm Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
1 Prof. Dr. Th. Ottmann Theory I Algorithm Design and Analysis (12 - Text search, part 1)
Design and Analysis of Algorithms - Chapter 71 Space-time tradeoffs For many problems some extra space really pays off (extra space in tables - breathing.
Pattern Matching1. 2 Outline and Reading Strings (§9.1.1) Pattern matching algorithms Brute-force algorithm (§9.1.2) Boyer-Moore algorithm (§9.1.3) Knuth-Morris-Pratt.
Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan
Boyer-Moore string search algorithm Book by Dan Gusfield: Algorithms on Strings, Trees and Sequences (1997) Original: Robert S. Boyer, J Strother Moore.
Knuth-Morris-Pratt Algorithm left to right scan like the naïve algorithm one main improvement –on a mismatch, calculate maximum possible shift to the right.
Boyer-Moore Algorithm 3 main ideas –right to left scan –bad character rule –good suffix rule.
A Fast String Searching Algorithm Robert S. Boyer, and J Strother Moore. Communication of the ACM, vol.20 no.10, Oct
String Matching COMP171 Fall String matching 2 Pattern Matching * Given a text string T[0..n-1] and a pattern P[0..m-1], find all occurrences of.
Smith Algorithm Experiments with a very fast substring search algorithm, SMITH P.D., Software - Practice & Experience 21(10), 1991, pp Adviser:
Quick Search Algorithm A very fast substring search algorithm, SUNDAY D.M., Communications of the ACM. 33(8),1990, pp Adviser: R. C. T. Lee Speaker:
Knuth-Morris-Pratt Algorithm Prepared by: Mayank Agarwal Prepared by: Mayank Agarwal Nitesh Maan Nitesh Maan.
1 Boyer-Moore Charles Yan Exact Matching Boyer-Moore ( worst-case: linear time, Typical: sublinear time ) Aho-Corasik ( A set of pattern )
Pattern Matching1. 2 Outline Strings Pattern matching algorithms Brute-force algorithm Boyer-Moore algorithm Knuth-Morris-Pratt algorithm.
String Matching Input: Strings P (pattern) and T (text); |P| = m, |T| = n. Output: Indices of all occurrences of P in T. ExampleT = discombobulate later.
String Matching. Problem is to find if a pattern P[1..m] occurs within text T[1..n] Simple solution: Naïve String Matching –Match each position in the.
KMP String Matching Prepared By: Carlens Faustin.
Advisor: Prof. R. C. T. Lee Speaker: T. H. Ku
Advanced Algorithm Design and Analysis (Lecture 3) SW5 fall 2004 Simonas Šaltenis E1-215b
Chapter 2.8 Search Algorithms. Array Search –An array contains a certain number of records –Each record is identified by a certain key –One searches the.
20/10/2015Applied Algorithmics - week31 String Processing  Typical applications: pattern matching/recognition molecular biology, comparative genomics,
String Matching Fundamental Data Structures and Algorithms April 22, 2003.
MCS 101: Algorithms Instructor Neelima Gupta
Strings and Pattern Matching Algorithms Pattern P[0..m-1] Text T[0..n-1] Brute Force Pattern Matching Algorithm BruteForceMatch(T,P): Input: Strings T.
MCS 101: Algorithms Instructor Neelima Gupta
String Searching CSCI 2720 Spring 2007 Eileen Kraemer.
String Matching String Matching Problem We introduce a general framework which is suitable to capture an essence of compressed pattern matching according.
1 String Matching Algorithms Topics  Basics of Strings  Brute-force String Matcher  Rabin-Karp String Matching Algorithm  KMP Algorithm.
CS5263 Bioinformatics Lecture 15 & 16 Exact String Matching Algorithms.
String Sorts Tries Substring Search: KMP, BM, RK
Fundamental Data Structures and Algorithms
1/39 COMP170 Tutorial 13: Pattern Matching T: P:.
String Searching 2 of 2. String search Simple search –Slide the window by 1 t = t +1; KMP –Slide the window faster t = t + s – M[s] –Never recheck the.
1 String Matching Algorithms Mohd. Fahim Lecturer Department of Computer Engineering Faculty of Engineering and Technology Jamia Millia Islamia New Delhi,
CSG523/ Desain dan Analisis Algoritma
Advanced Algorithms Analysis and Design
COMP261 Lecture 20 String Searching 2 of 2.
String Matching (Chap. 32)
Advanced Algorithm Design and Analysis (Lecture 12)
13 Text Processing Hongfei Yan June 1, 2016.
Rabin & Karp Algorithm.
CSCE350 Algorithms and Data Structure
Chapter 3 String Matching.
Boyer and Moore Algorithm
Tuesday, 12/3/02 String Matching Algorithms Chapter 32
String-Matching Algorithms (UNIT-5)
Adviser: R. C. T. Lee Speaker: C. W. Cheng National Chi Nan University
Chapter 7 Space and Time Tradeoffs
Pattern Matching 12/8/ :21 PM Pattern Matching Pattern Matching
Pattern Matching 1/14/2019 8:30 AM Pattern Matching Pattern Matching.
KMP String Matching Donald Knuth Jim H. Morris Vaughan Pratt 1997.
Pattern Matching 2/15/2019 6:17 PM Pattern Matching Pattern Matching.
Data Structures and Algorithms (AT70. 02) Comp. Sc. and Inf. Mgmt
Knuth-Morris-Pratt Algorithm.
Chap 3 String Matching 3 -.
String Processing.
Pattern Matching Pattern Matching 5/1/2019 3:53 PM Spring 2007
Space-for-time tradeoffs
Pattern Matching 4/27/2019 1:16 AM Pattern Matching Pattern Matching
Space-for-time tradeoffs
Sequences 5/17/ :43 AM Pattern Matching.
MA/CSSE 473 Day 27 Student questions Leftovers from Boyer-Moore
Presentation transcript:

Pattern Matching in String

Pattern Matching in String Bài toán: Cho: Tập các kí tự  xâu kí tự P (pattern), |P| = m, văn bản T, |T| = n, n>>m. Câu hỏi: P  T? Nếu P  T: vị trí xuất hiện đầu tiên của P trong T? Example 1: P = ABABDE ║║║║║║ T = ABABABDEAA, i0=3 2018-12-31 Dao Thanh Tinh 2

A Straightforward String Matching Brute Force Algorithm Input: P[1..m], T[1..n]; Output: i0 (if P  T[i0.. i0+m-1], i0 1, otherwise i0=0) a) i=1; j= i; k=1; b) while (jn) & (km) if (T(j) = P(k)) { j++; k++; } else { i++; j=i; k=1;} c) if (k>m) i0 := i else i0 :=0; Cpmplexity: O(mn). 2018-12-31 Dao Thanh Tinh

A Straightforward String Matching Brute Force Algorithm (*) Input: P[1..m], T[1..n]; Output: i0 (if P  T[i0.. i0+m-1], i0 1, otherwise i0=0) a) j= 1; k=1; b) while (jn) & (km) if (T(j) = P(k)) { j++; k++; } else {j = j k+2; k=1;} c) if (k>m) i0 := j  m else i0 :=0; Cpmplexity: O(mn). k 1 2 3 4 5 P A B E T D j 6 7 8 9 10 11 12 k=5, j=7 On the new step: k=1, j= 7 – 5 + 2 = 4 k 1 2 3 4 5 P A B E T D j 6 7 8 9 10 11 12 2018-12-31 Dao Thanh Tinh 4

A Straightforward String Matching Example 2: P = ABABDE T = ABABABDEAA ║║║║ T = A BABABDEAA ║║║║║║ T = ABABABDEAA, successful match, i0=3 Example 3: P = UUUUUUX T = UUUUUUUUUUUU ║║║║║║ 2018-12-31 Dao Thanh Tinh 5

The Morris-Pratt Algorithm (1) Assume that the first mismatch occurs between P(k) and T(j) with 1 < k ≤ m. Then, P(1..k-1) = T(j-k+1... j-1) = u u P1 Pk-1 Pk Pm Tj-1 Tj Tj-k+1 u P1....Pk-1 = Tj-k+1….Tj-1 =u Pk Tj P1 Pr Pk-r Pk-1 Pk Pm Tj-1 Tj Tj-k+1 v Idea: Shifting P on the left, expect that a prefix v of P matches some suffix of the portion u. The longest such prefix v is called the border of u. P1…Pr = Pk-r….Pk-1 2018-12-31 Dao Thanh Tinh 6

The Morris-Pratt Algorithm (2) The Brute Force Algorithm: T H Ủ Ư Ợ Ở N G T H Ủ Ư Ợ Ờ N G (1) (2) (3) (4) (5) (6) (7) T H Ủ Ư Ợ Ờ N G T H Ủ Ư Ợ Ờ N G T H Ủ Ư Ợ Ờ N G T H Ủ Ư Ợ Ờ N G T H Ủ Ư Ợ Ờ N G T H Ủ Ư Ợ Ờ N G 2018-12-31 Dao Thanh Tinh 7

The Morris-Pratt Algorithm (3) Ủ Ư Ợ Ờ N G (8) (9) (10) (11) (12) (13) T H Ủ Ư Ợ Ờ N G T H Ủ Ư Ợ Ờ N G T H Ủ Ư Ợ Ờ N G T H Ủ Ư Ợ Ờ N G The Brute Force Algorithm performs on 13 steps. 2018-12-31 Dao Thanh Tinh 8

The Morris-Pratt Algorithm (4) Ủ Ư Ợ Ờ N G T H Ủ Ư Ợ Ờ N G T H Ủ Ư Ợ Ờ N G T H Ủ Ư Ợ Ờ N G T H Ủ Ư Ợ Ờ N G T H Ủ Ư Ợ Ờ N G Pattern was found on the 6th step. 2018-12-31 Dao Thanh Tinh 9

The Morris-Pratt Algorithm (5) Pk-r Pk-1 Pk Pm Tj-1 Tj Tj-k+1 Pr+1 P1…Pr = Pk-r….Pk-1 Set mp(k) = r+1. Then, after a shift, the comparisons can resume between characters P(mp(k)) and T(j). a) j= 1; k=1; b) while (jn) & (km) if (T(j) = P(k)) { j++; k++; } else { j=mp(k); k=1;} c) if (k>m) i0 := j-m; else i0 :=0; 2018-12-31 Dao Thanh Tinh 10

The Morris-Pratt Algorithm (6) The value of mp(1) is set to 0. P1 Pr Pk-r Pk-1 Pk Pm Tj-1 Tj Tj-k+1 Pr+1 k 5 P A B E k 1 2 3 4 5 P A B E T D j 6 7 8 9 10 11 12 k>1: r = k-2; while (r>0) && (P1..Pr ≠ Pk-r..Pk-1) do r--; mp(k) = r+1; On the next step: k=mp(5) =3, j= 7 (giữ nguyên) r= k-2 = 3 P[1..3] ? P[2..4]: ABA ≠ BAB r = 2 P[1..2] ? P[4..5]: AB = AB mp(5)= r+1 = 3 k 1 2 3 4 5 P A B E T j 6 7 8 9 10 11 12 2018-12-31 Dao Thanh Tinh 11

The Morris-Pratt Algorithm (7) k>1: r = k-2; while (r>0) && (P1..Pr ≠ Pk-r..Pk-1) do r--; mp(k) = r+1; k 1 2 3 4 5 6 7 P U X k=7 r= k-2 = 5 P[1..5] ? P[2..6]: UUUUU =UUUUU mp(5)= r+1 = 6 k 1 2 3 4 5 6 7 P U X T j 8 9 10 11 12 13 14 15 On the next step: k=mp(7) =6, j= 7 (giữ nguyên) k 1 2 3 4 5 6 7 P U X T j 8 9 10 11 12 13 14 15 2018-12-31 Dao Thanh Tinh 12

The Morris-Pratt Algorithm (8) k 1 2 3 4 5 6 7 P T H Ủ Ư k>1: r = k-2; while (r>0) && (P1..Pr ≠ Pk-r..Pk-1) do r--; mp(k) = r+1; k=7 r= k-2 = 5 P[1..5] ? P[2..6]: THU_T ≠ HỦ_TH r= 4 P[1..4] ? P[3..6]: THU_ ≠ Ủ_TH r= 3 P[1..3] ? P[4..6]: THU ≠ _TH r= 2 P[1..2] ? P[5..6]: TH = TH mp(7)= r+1 = 3 k 1 2 3 4 5 6 7 P T H Ủ Ư Ợ Ờ N G j 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 On the next step: k=mp(7) =3, j= 7 (giữ nguyên) k 1 2 3 4 5 6 7 P T H Ủ Ư Ợ Ờ N G j 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2018-12-31 Dao Thanh Tinh 13

The Morris-Pratt Algorithm (9) k 1 2 3 4 5 6 7 P T H Ủ Ư k>1: r = k-2; while (r>0) && (P1..Pr ≠ Pk-r..Pk-1) do r--; mp(k) = r+1; k=5 r= k-2 = 3 P[1..3] ? P[2..4]: THU ≠ HỦ_ r= 2 P[1..2] ? P[3..4]: TH ≠ Ủ_ r= 1 P[1..1] ? P[4..4]: T ≠ _ r= 0 mp(5)= r+1 = 1 k 1 2 3 4 5 6 7 P T H Ủ Ư Ỉ N G Y Ê j 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 On the next step: k=mp(5) =1, j= 5 (giữ nguyên) k 1 2 3 4 5 6 7 P T H Ủ Ư Ỉ N G Y Ê j 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2018-12-31 Dao Thanh Tinh 14

The Morris-Pratt Algorithm (10) k=1: r = -1 mp(k) = 0 ? P1 Pm Tj Tj+1 k=1: mp(1) = 0 comparisons can resume between characters P(mp(k)) = P(0) and T(j), but P(0) is not existent. In this case, comparisons can resume between P(1) and T(j+1). Then, set mp(1) = 1, j= j+1. 2018-12-31 Dao Thanh Tinh 15

The Morris-Pratt Algorithm (11) k=2..m: r = k-2; while (r>0) && (P1..Pr ≠ Pk-r..Pk-1) do r--; mp(k) = r+1; k=1: mp(k)=1. a) j= 1; k=1; b) while (jn) & (km) if (T(j) = P(k)) { j++; k++; } else { if (k=1) j++; k=mp(k); } c) if (k>m) i0 := j-m; else i0 :=0; mp k =7 T H Ủ Ư 3 6 2 5 1 4 2018-12-31 Dao Thanh Tinh 16

The Morris-Pratt Algorithm (12) Ủ Ư mp 1 2 3 T H Ủ Ư Ợ Ờ N G T H Ủ Ư Ợ Ờ N G T H Ủ Ư Ợ Ờ N G T H Ủ Ư Ợ Ờ N G T H Ủ Ư Ợ Ờ N G T H Ủ Ư Ợ Ờ N G Pattern was found on the 6th step. 2018-12-31 Dao Thanh Tinh 17

The Morris-Pratt Algorithm (13) a) j= 1; k=1; b) while (jn) & (km) if (T(j) = P(k)) { j++; k++; } else { if (k=1) j++; k = mp(k); } c) if (k>m) i0 := j - m; else i0 :=0; T H Ủ Ư mp 1 2 3 T H Ủ Ư R Ờ N G T H Ủ Ư R Ờ N G T H Ủ Ư R Ờ N G T H Ủ Ư R Ờ N G T H Ủ Ư R Ờ N G T H Ủ Ư R Ờ N G 2018-12-31 Dao Thanh Tinh 18

The Knuth-Morris-Pratt Algorithm (1) Look more closely at the Morris-Pratt algorithm: P1 Pk Tj T ≠ u a Input: P[1..m], T[1..n]; Output: i0 a) j= 1; k=1; b) while (jn) & (km) if (T(j) = P(k)) { j++; k++; } else { if (k=1) j++; k =mp(k); } c) if (k>m) i0= j-m; else i0=0; b v P1 Pmp(k) c a v PK P1 a k=1..m: r = k-2; while (r>0) && (P1..Pr ≠ Pk-r..Pk-1) do r--; mp(k) = r+1; 2018-12-31 Dao Thanh Tinh 19

The Knuth-Morris-Pratt Algorithm (2) Pk Tj T ≠ u a Let 1< k ≤ m: If c=a then c≠b. The mismatch between P(mp(k)) and T(j) occurs! To avoid another immediate mismatch, the character P(mp(k)) must be different from a=P(k). b v P1 Pmp(k) c a P1 v PK a b k=1..m: r = k-2; while (r>0) && ((P1..Pr ≠ Pk-r..Pk-1) OR (Pr=Pk)) do r--; kmp(k) = r+1; 2018-12-31 Dao Thanh Tinh 20

The Knuth-Morris-Pratt Algorithm (3) The Morris-Pratt: The Knuth-Morris-Pratt: 1 2 3 4 5 6 7 mp k =7 T H Ủ Ư 1 2 3 4 5 6 7 kmp k =7 T H Ủ Ư k=6 r= k-2 = 4: P[1..4] ? P[2..5]: THU_ ≠ HỦ_T r= 3: P[1..3] ? P[3..5]: THU ≠ Ủ_T r= 2: P[1..2] ? P[4..5]: TH ≠ _T r= 1: P[1..1] ? P[5..5]: T = T, P[r] ? P[k] H = H r = 0 kmp(6)= r+1 = 1 2018-12-31 Dao Thanh Tinh 21

The Knuth-Morris-Pratt Algorithm (4) Example: mp kmp k = 8, P=“ABABABAB” 6 1 k = 7, P=“ABABABAB” 5 1 k = 6, P=“ABABABAB” 4 1 k = 5, P=“ABABABAB” 3 1 k = 4, P=“ABABABAB” 2 1 k = 3, P=“ABABABAB” 1 1 k = 2, P=“ABABABAB” 1 1 k = 1, P=“ABABABAB” 1 1 2018-12-31 Dao Thanh Tinh 22

The Knuth-Morris-Pratt Algorithm (5) Look more closely at the Morris-Pratt algorithm: P1 Pk Tj T ≠ u a Input: P[1..m], T[1..n]; Output: i0 a) j= 1; k=1; b) while (jn) & (km) if (T(j) = P(k)) { j++; k++; } else { if (k=1) j++; k =kmp(k); } c) if (k>m) i0= j-m; else i0=0; b v P1 Pmp(k) c a v PK P1 a k=1..m: r = k-2; while (r>0) && ((P1..Pr ≠ Pk-r..Pk-1) OR (Pr=Pk)) do r--; kmp(k) = r+1; 2018-12-31 Dao Thanh Tinh 23

The Brute Force Algorithm 2 Input: P[1..m], T[1..n]; Output: i0 (if P  T[i0.. i0+m-1], i0 1, otherwise i0=0) a) i=m; j= i; k=m; b) while (jn) & (k>0) if (T(j) = P(k)) { j--; k--; } else { i++; j=i+m; k=m; } c) if (k=0) i0 := i else i0 :=0; Complexity: O(mn). Example: P = DEABAB ║║ T = ABDEABABAA  T = ABDEABABAA ║║║║║║ T = ABDEABABAA, successful match, i0=3 2018-12-31 Dao Thanh Tinh 24

The Brute Force Algorithm 2* Input: P[1..m], T[1..n]; Output: i0 (if P  T[i0.. i0+m-1], i0 1, otherwise i0=0) a) j= m; k=m; b) while (jn) & (k>0) if (T(j) = P(k)) { j--; k--; } else { j = j+m-k+1; k=m; } c) if (k=0) i0 := j-m else i0 :=0 Complexity: O(mn). Example: P = DEABAB ║║ T = ABDEABABAA  T = ABDEABABAA ║║║║║║ T = ABDEABABAA, successful match, i0=3 2018-12-31 Dao Thanh Tinh 25

The Boyer-Moore Algorithm (1) m-k-1 u Pk Pk+1 Pm Tj+m-k Tj Tj+1 T[j] P[k], T[j+1....j+m-k] = P[k+1....m] = u 2018-12-31 Dao Thanh Tinh 26

The Boyer-Moore Algorithm (2) The good-suffix shift consists in aligning the segment u with its rightmost occurrence in P good-suffix shift Pq-1 Pm Pq Pt Pk+1 Pk Pm Tj+1 Tj Tj+m-k u Tj-new a) Find largest t [1..m-1] such that: u = P[k+1..m] P[q..t], Pq-1≠Pk (q>1) u = P[k+1..m] P[q..t], (q=1) Then, j-new = j + m-q+1 = j + 2m-t-k 2018-12-31 Dao Thanh Tinh 27

The Boyer-Moore Algorithm (3) b) If not exists t [1..m-1] such that: u = P[k+1..m] P[q..t] the shift consists in aligning the longest suffix v of P with a matching prefix of P Find largest t [1..m-1] such that: u = P[m-t+1..m] P[1..t], Then, j-new = j + 2m-t-k good-suffix shift P1 Pt Pm v Pk+1 Pk Pm Tj+1 Tj Tj+m-k Pm-t+1 Tj-new 2018-12-31 Dao Thanh Tinh 28

The Boyer-Moore Algorithm (4) c) If not exists t [1..m-1] such that: u = P[m-t+1..m] P[1..t] Then, j-new = j + 2m-k or j-new = j + 2m-k-t, where t=0 good-suffix shift Pm P1 Pk+1 Pk Pm Tj+1 Tj Tj+m-k Tj-new 2018-12-31 Dao Thanh Tinh 29

The Boyer-Moore Algorithm (5) d) If Tj [P1...Pm] : then, j-new = j + m good-suffix shift Pm P1 Pk+1 Pk Pm Tj+1 Tj Tj+m-k Tj-new 2018-12-31 Dao Thanh Tinh 30

The Boyer-Moore Algorithm (6) a) j=m; k=m; b) while (jn) & (k>0) if T(j) = P(k) { j--; k--; } else { k = m; j = jnew; } c) if (k=0) i0= j+1; else i0= 0; Complexity: O(nm) remark: Jnew and j+1 are the new components on comparison with Brute Force Algorithm. 2018-12-31 Dao Thanh Tinh 31

Computing Jnew ? The Boyer-Moore Algorithm (7) 2018-12-31 Dao Thanh Tinh 32

The Boyer-Moore Algorithm (8) a) Find largest t [1..m-1] such that: u = P[k+1..m] P[q..t], Pq-1≠Pk (q>1) u = P[k+1..m] P[q..t], (q=1) Then,j-new = j + m-q+1 = j + 2m-t-k good-suffix shift Pq-1 Pm Pq Pt Pk+1 Pk Pm Tj+1 Tj Tj+m-k u Tj-new a) bmg(k) = 2m-k-t t = m-1; while (t>m-k) & (P[k+1..m] ≠P[t-m+k+1..t]) OR (Pt-m+k=Pk) t=t-1; if (t=m-k) & P[k+1..m] ≠ P[1..t]) t=0; remark: when t=0, bmg(k) = 2m-k; 2018-12-31 Dao Thanh Tinh 33

The Boyer-Moore Algorithm (9) b) If not exists t [1..m-1] such that: u = P[k+1..m] P[q..t] the shift consists in aligning the longest suffix v of P with a matching prefix of P Find largest t [1..m-1] such that: u = P[m-t+1..m] P[1..t], Then, j-new = j + 2m-k-t b) bmg(k) = 2m-k-t t = m - k-1; while (t>0) & (P[m-t+1..m] ≠P[1..t]) t = t-1; if (t=m-k+2) & P[k+1..m] ≠ P[1..t]) t=0; remark: when t=0, bmg(k) = 2m-k; good-suffix shift Tj-new Pm P1 Pt Pk+1 Pk Tj+1 Tj Tj+m-k Pm-t+1 v 2018-12-31 Dao Thanh Tinh 34

The Boyer-Moore Algorithm (10) c) If not exists t [1..m-1] such that: u = P[m-t+1..m] P[1..t] Then, j-new = j + 2m-k or j-new = j + 2m-k-t, where t=0 c) bmg(k) = 2m – k - t where t = 0 good-suffix shift Pm P1 Pk+1 Pk Pm Tj+1 Tj Tj+m-k Tj-new 2018-12-31 Dao Thanh Tinh 35

The Boyer-Moore Algorithm (11) Pq-1 Pm Pq Pt The Boyer-Moore Algorithm (11) Pk+1 Pk Pm Tj+1 Tj Tj+m-k Tj-new t = m-1; while (t>m-k) & (P[k+1..m] ≠P[t-m+k+1..t]) OR (Pt-m+k=Pk) t=t-1; if (t=m-k) & P[k+1..m] ≠ P[1..t]) t=0; if (t>0) bmg(k) = 2m-k-t; else t = m - k-1; while (t>0) & (P[m-t+1..m] ≠P[1..t]) t = t-1; if (t=m-k+2) & P[k+1..m] ≠ P[1..t]) t=0; if (t>0) bmg(k) = 2m-k-t else bmg(k) = 2m – k Tj-new Pm P1 Pt Pk+1 Pk Tj+1 Tj Tj+m-k Pm-t+1 Pm P1 Tj-new Pk+1 Pk Pm Tj+1 Tj Tj+m-k 2018-12-31 Dao Thanh Tinh 36

The Boyer-Moore Algorithm (12) d) If Tj [P1...Pm] : then, the left end of the window is aligned with the character immediately after Tj, namely Tj+1. j-new = j + m d) bmS(Tj) = m but Tj  {P1, ..., Pm} ? good-suffix shift Pm P1 Pk+1 Pk Pm Tj+1 Tj Tj+m-k Tj-new 2018-12-31 Dao Thanh Tinh 37

bmS(c) =m The Boyer-Moore Algorithm (13) Define: for all c  {P1,...., Pm} good-suffix shift Pm P1 Pk+1 Pk Pm Tj+1 c Tj+m-k Tj-new 2018-12-31 Dao Thanh Tinh 38

The Boyer-Moore Algorithm (14) Find Px= b where Px is rightmost occurrence characer’s b in {P1,...., Pm-1} contains no b Px+1 Pm b Tj-new Pk+1 Pk Pm Tj+1 T j=b Tj+m-k jnew = j + m-x 2018-12-31 Dao Thanh Tinh 39

The Boyer-Moore Algorithm (15) for k=1 to m-1 t=k for i=k+1 to m-1 if (P(t)=P(i)) t=i; bmS(P(k)) = m-t; bmS(P(m)) = 1; T H Ủ T H Ư bmS: 2 1 4 3 2 1 1 contains no b Px Pm b Tj-new Pk+1 Pk Pm Tj+1 T j=b Tj+m-k 2018-12-31 Dao Thanh Tinh 40

The Boyer-Moore Algorithm (16) Ủ Ư bmg 13 12 11 10 9 8 1 T H Ủ Ư Ợ Ờ N G 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 T H Ủ Ư bmS 2 1 4 3 T H Ủ Ư Ợ Ờ N G 2 1 4 3 7 5 6 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2018-12-31 Dao Thanh Tinh 41

The Boyer-Moore Algorithm (17) Ủ Ư mp 1 2 3 T H Ủ Ư Ợ Ờ N G T H Ủ Ư bmS 2 1 4 3 T H Ủ Ư Ợ Ờ N G 3 1 4 7 2 5 6 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 2018-12-31 Dao Thanh Tinh 42

The Boyer-Moore Algorithm (18) S E N L D Y T M F O R 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 P =“SENSE” bms=[11 211 ] bmg=[76 541 ] S E N L D Y T M F O R 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 S E N L D Y T M F O R 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 max{bms,bmg} 2018-12-31 Dao Thanh Tinh 43

The Karp-Rabin Algorithm Giả thiết  = {1, 2, ...,9} p = ts ?  s {1,...,n-m+1}: ts = p ? 2018-12-31 Dao Thanh Tinh 44

The Karp-Rabin Algorithm (2) Tính p theo sơ đồ Horner’s : p= P(m) + 10*{P(m-1)+ 10*{P(m-2)+....+10*{P(2)+10*P(1)}..}} p=P(1) for i=2 to m do p = P(i) + 10*p; Thời gian tính: O(m) 2018-12-31 Dao Thanh Tinh 45

The Karp-Rabin Algorithm (3) Tính ts: ts = 10m-1T(s) + 10m-2T(s+1)+10m-3T(s+2)...+10T(s+m-2)+T(s+m-1) ts+1 = 10m-1T(s+1) + 10m-2T(s+2)+...+102T(s+m-2)+10T(s+m-1)+T(s+m) = 10{10m-2T(s+1) + 10m-3T(s+2)+...+10T(s+m-2)+T(s+m-1)}+T(s+m) = 10{ ts – 10m-1T(s)} + T(s+m) 2018-12-31 Dao Thanh Tinh 46

The Karp-Rabin Algorithm (4) p=P(1); t=T(1); a=1; for i=2 to m { p = P(i) + 10*p; t = T(i) + 10*t; a = a*10; } 1. s=1; 2. while (s<n-m+1) &(t ≠ p) a) t=10*( t – a*T(s))+ T(s+m) b) s = s+1; 3. if (s=n-m) return 0 else return s; O(m+n) 2018-12-31 Dao Thanh Tinh 47

The Karp-Rabin Algorithm (5) p=P(1) mod q; t=T(1) mod q; a=1; for i=2 to m { p = (P(i) + 10*p) mod q; t = (T(i) + 10*t) mod q; a = (a*10) mod q; } defined: a(q) = a mod q t(q) = t mod q p(q) = p mod q t(q) =10*( t(q) – a(q)*T(s)mod q)+ T(s+m)mod q t = p  t(q) = p(q) t(q) ≠ p(q)  t ≠ p 2018-12-31 Dao Thanh Tinh 48

The Karp-Rabin Algorithm (6) p=P(1) mod q; t=T(1) mod q; a=1; for i=2 to m { p = (P(i) + 10*p) mod q; t = (T(i) + 10*t) mod q; a = (a*10) mod q; } s=1; while(s<m-n+1) if (t(q)=p(q)) if (P=Ts) return s; else t(q) =10*( t(q) – a(q)*T(s)mod q)+ T(s+m)mod q s = s+1; 2018-12-31 Dao Thanh Tinh 49

Conclusion Brute Force Algorithm 1: Straightforward Matching The Morris-Pratt Algorithm Knuth-Morris-Pratt Algorithm Brute Force Algorithm 2: Backing The Boyer-Moore Algorithm The Karp-Rabin Algorithm 2018-12-31 Dao Thanh Tinh 50