Presentation is loading. Please wait.

Presentation is loading. Please wait.

Pattern Matching in String

Similar presentations


Presentation on theme: "Pattern Matching in String"— Presentation transcript:

1 Pattern Matching in String

2 Pattern Matching in String
Bài toán: Cho: Tập các kí tự  xâu kí tự P (pattern), |P| = m, văn bản T, |T| = n, n>>m. Câu hỏi: P  T? Nếu P  T: vị trí xuất hiện đầu tiên của P trong T? Example 1: P = ABABDE ║║║║║║ T = ABABABDEAA, i0=3 Dao Thanh Tinh 2

3 A Straightforward String Matching
Brute Force Algorithm Input: P[1..m], T[1..n]; Output: i0 (if P  T[i0.. i0+m-1], i0 1, otherwise i0=0) a) i=1; j= i; k=1; b) while (jn) & (km) if (T(j) = P(k)) { j++; k++; } else { i++; j=i; k=1;} c) if (k>m) i0 := i else i0 :=0; Cpmplexity: O(mn). Dao Thanh Tinh

4 A Straightforward String Matching
Brute Force Algorithm (*) Input: P[1..m], T[1..n]; Output: i0 (if P  T[i0.. i0+m-1], i0 1, otherwise i0=0) a) j= 1; k=1; b) while (jn) & (km) if (T(j) = P(k)) { j++; k++; } else {j = j k+2; k=1;} c) if (k>m) i0 := j  m else i0 :=0; Cpmplexity: O(mn). k 1 2 3 4 5 P A B E T D j 6 7 8 9 10 11 12 k=5, j=7 On the new step: k=1, j= 7 – = 4 k 1 2 3 4 5 P A B E T D j 6 7 8 9 10 11 12 Dao Thanh Tinh 4

5 A Straightforward String Matching
Example 2: P = ABABDE T = ABABABDEAA ║║║║ T = A BABABDEAA ║║║║║║ T = ABABABDEAA, successful match, i0=3 Example 3: P = UUUUUUX T = UUUUUUUUUUUU ║║║║║║ Dao Thanh Tinh 5

6 The Morris-Pratt Algorithm (1)
Assume that the first mismatch occurs between P(k) and T(j) with 1 < k ≤ m. Then, P(1..k-1) = T(j-k+1... j-1) = u u P1 Pk-1 Pk Pm Tj-1 Tj Tj-k+1 u P1....Pk-1 = Tj-k+1….Tj-1 =u Pk Tj P1 Pr Pk-r Pk-1 Pk Pm Tj-1 Tj Tj-k+1 v Idea: Shifting P on the left, expect that a prefix v of P matches some suffix of the portion u. The longest such prefix v is called the border of u. P1…Pr = Pk-r….Pk-1 Dao Thanh Tinh 6

7 The Morris-Pratt Algorithm (2)
The Brute Force Algorithm: T H Ư N G T H Ư N G (1) (2) (3) (4) (5) (6) (7) T H Ư N G T H Ư N G T H Ư N G T H Ư N G T H Ư N G T H Ư N G Dao Thanh Tinh 7

8 The Morris-Pratt Algorithm (3)
Ư N G (8) (9) (10) (11) (12) (13) T H Ư N G T H Ư N G T H Ư N G T H Ư N G The Brute Force Algorithm performs on 13 steps. Dao Thanh Tinh 8

9 The Morris-Pratt Algorithm (4)
Ư N G T H Ư N G T H Ư N G T H Ư N G T H Ư N G T H Ư N G Pattern was found on the 6th step. Dao Thanh Tinh 9

10 The Morris-Pratt Algorithm (5)
Pk-r Pk-1 Pk Pm Tj-1 Tj Tj-k+1 Pr+1 P1…Pr = Pk-r….Pk-1 Set mp(k) = r+1. Then, after a shift, the comparisons can resume between characters P(mp(k)) and T(j). a) j= 1; k=1; b) while (jn) & (km) if (T(j) = P(k)) { j++; k++; } else { j=mp(k); k=1;} c) if (k>m) i0 := j-m; else i0 :=0; Dao Thanh Tinh 10

11 The Morris-Pratt Algorithm (6)
The value of mp(1) is set to 0. P1 Pr Pk-r Pk-1 Pk Pm Tj-1 Tj Tj-k+1 Pr+1 k 5 P A B E k 1 2 3 4 5 P A B E T D j 6 7 8 9 10 11 12 k>1: r = k-2; while (r>0) && (P1..Pr ≠ Pk-r..Pk-1) do r--; mp(k) = r+1; On the next step: k=mp(5) =3, j= 7 (giữ nguyên) r= k-2 = 3 P[1..3] ? P[2..4]: ABA ≠ BAB r = 2 P[1..2] ? P[4..5]: AB = AB mp(5)= r+1 = 3 k 1 2 3 4 5 P A B E T j 6 7 8 9 10 11 12 Dao Thanh Tinh 11

12 The Morris-Pratt Algorithm (7)
k>1: r = k-2; while (r>0) && (P1..Pr ≠ Pk-r..Pk-1) do r--; mp(k) = r+1; k 1 2 3 4 5 6 7 P U X k=7 r= k-2 = 5 P[1..5] ? P[2..6]: UUUUU =UUUUU mp(5)= r+1 = 6 k 1 2 3 4 5 6 7 P U X T j 8 9 10 11 12 13 14 15 On the next step: k=mp(7) =6, j= 7 (giữ nguyên) k 1 2 3 4 5 6 7 P U X T j 8 9 10 11 12 13 14 15 Dao Thanh Tinh 12

13 The Morris-Pratt Algorithm (8)
k 1 2 3 4 5 6 7 P T H Ư k>1: r = k-2; while (r>0) && (P1..Pr ≠ Pk-r..Pk-1) do r--; mp(k) = r+1; k=7 r= k-2 = 5 P[1..5] ? P[2..6]: THU_T ≠ HỦ_TH r= 4 P[1..4] ? P[3..6]: THU_ ≠ Ủ_TH r= 3 P[1..3] ? P[4..6]: THU ≠ _TH r= 2 P[1..2] ? P[5..6]: TH = TH mp(7)= r+1 = 3 k 1 2 3 4 5 6 7 P T H Ư N G j 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 On the next step: k=mp(7) =3, j= 7 (giữ nguyên) k 1 2 3 4 5 6 7 P T H Ư N G j 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Dao Thanh Tinh 13

14 The Morris-Pratt Algorithm (9)
k 1 2 3 4 5 6 7 P T H Ư k>1: r = k-2; while (r>0) && (P1..Pr ≠ Pk-r..Pk-1) do r--; mp(k) = r+1; k=5 r= k-2 = 3 P[1..3] ? P[2..4]: THU ≠ HỦ_ r= 2 P[1..2] ? P[3..4]: TH ≠ Ủ_ r= 1 P[1..1] ? P[4..4]: T ≠ _ r= 0 mp(5)= r+1 = 1 k 1 2 3 4 5 6 7 P T H Ư N G Y Ê j 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 On the next step: k=mp(5) =1, j= 5 (giữ nguyên) k 1 2 3 4 5 6 7 P T H Ư N G Y Ê j 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Dao Thanh Tinh 14

15 The Morris-Pratt Algorithm (10)
k=1: r = -1 mp(k) = 0 ? P1 Pm Tj Tj+1 k=1: mp(1) = 0 comparisons can resume between characters P(mp(k)) = P(0) and T(j), but P(0) is not existent. In this case, comparisons can resume between P(1) and T(j+1). Then, set mp(1) = 1, j= j+1. Dao Thanh Tinh 15

16 The Morris-Pratt Algorithm (11)
k=2..m: r = k-2; while (r>0) && (P1..Pr ≠ Pk-r..Pk-1) do r--; mp(k) = r+1; k=1: mp(k)=1. a) j= 1; k=1; b) while (jn) & (km) if (T(j) = P(k)) { j++; k++; } else { if (k=1) j++; k=mp(k); } c) if (k>m) i0 := j-m; else i0 :=0; mp k =7 T H Ư 3 6 2 5 1 4 Dao Thanh Tinh 16

17 The Morris-Pratt Algorithm (12)
Ư mp 1 2 3 T H Ư N G T H Ư N G T H Ư N G T H Ư N G T H Ư N G T H Ư N G Pattern was found on the 6th step. Dao Thanh Tinh 17

18 The Morris-Pratt Algorithm (13)
a) j= 1; k=1; b) while (jn) & (km) if (T(j) = P(k)) { j++; k++; } else { if (k=1) j++; k = mp(k); } c) if (k>m) i0 := j - m; else i0 :=0; T H Ư mp 1 2 3 T H Ư R N G T H Ư R N G T H Ư R N G T H Ư R N G T H Ư R N G T H Ư R N G Dao Thanh Tinh 18

19 The Knuth-Morris-Pratt Algorithm (1)
Look more closely at the Morris-Pratt algorithm: P1 Pk Tj T u a Input: P[1..m], T[1..n]; Output: i0 a) j= 1; k=1; b) while (jn) & (km) if (T(j) = P(k)) { j++; k++; } else { if (k=1) j++; k =mp(k); } c) if (k>m) i0= j-m; else i0=0; b v P1 Pmp(k) c a v PK P1 a k=1..m: r = k-2; while (r>0) && (P1..Pr ≠ Pk-r..Pk-1) do r--; mp(k) = r+1; Dao Thanh Tinh 19

20 The Knuth-Morris-Pratt Algorithm (2)
Pk Tj T u a Let 1< k ≤ m: If c=a then c≠b. The mismatch between P(mp(k)) and T(j) occurs! To avoid another immediate mismatch, the character P(mp(k)) must be different from a=P(k). b v P1 Pmp(k) c a P1 v PK a b k=1..m: r = k-2; while (r>0) && ((P1..Pr ≠ Pk-r..Pk-1) OR (Pr=Pk)) do r--; kmp(k) = r+1; Dao Thanh Tinh 20

21 The Knuth-Morris-Pratt Algorithm (3)
The Morris-Pratt: The Knuth-Morris-Pratt: 1 2 3 4 5 6 7 mp k =7 T H Ư 1 2 3 4 5 6 7 kmp k =7 T H Ư k=6 r= k-2 = 4: P[1..4] ? P[2..5]: THU_ ≠ HỦ_T r= 3: P[1..3] ? P[3..5]: THU ≠ Ủ_T r= 2: P[1..2] ? P[4..5]: TH ≠ _T r= 1: P[1..1] ? P[5..5]: T = T, P[r] ? P[k] H = H r = 0 kmp(6)= r+1 = 1 Dao Thanh Tinh 21

22 The Knuth-Morris-Pratt Algorithm (4)
Example: mp kmp k = 8, P=“ABABABAB” k = 7, P=“ABABABAB” k = 6, P=“ABABABAB” k = 5, P=“ABABABAB” k = 4, P=“ABABABAB” k = 3, P=“ABABABAB” k = 2, P=“ABABABAB” k = 1, P=“ABABABAB” Dao Thanh Tinh 22

23 The Knuth-Morris-Pratt Algorithm (5)
Look more closely at the Morris-Pratt algorithm: P1 Pk Tj T u a Input: P[1..m], T[1..n]; Output: i0 a) j= 1; k=1; b) while (jn) & (km) if (T(j) = P(k)) { j++; k++; } else { if (k=1) j++; k =kmp(k); } c) if (k>m) i0= j-m; else i0=0; b v P1 Pmp(k) c a v PK P1 a k=1..m: r = k-2; while (r>0) && ((P1..Pr ≠ Pk-r..Pk-1) OR (Pr=Pk)) do r--; kmp(k) = r+1; Dao Thanh Tinh 23

24 The Brute Force Algorithm 2
Input: P[1..m], T[1..n]; Output: i0 (if P  T[i0.. i0+m-1], i0 1, otherwise i0=0) a) i=m; j= i; k=m; b) while (jn) & (k>0) if (T(j) = P(k)) { j--; k--; } else { i++; j=i+m; k=m; } c) if (k=0) i0 := i else i0 :=0; Complexity: O(mn). Example: P = DEABAB ║║ T = ABDEABABAA T = ABDEABABAA ║║║║║║ T = ABDEABABAA, successful match, i0=3 Dao Thanh Tinh 24

25 The Brute Force Algorithm 2*
Input: P[1..m], T[1..n]; Output: i0 (if P  T[i0.. i0+m-1], i0 1, otherwise i0=0) a) j= m; k=m; b) while (jn) & (k>0) if (T(j) = P(k)) { j--; k--; } else { j = j+m-k+1; k=m; } c) if (k=0) i0 := j-m else i0 :=0 Complexity: O(mn). Example: P = DEABAB ║║ T = ABDEABABAA T = ABDEABABAA ║║║║║║ T = ABDEABABAA, successful match, i0=3 Dao Thanh Tinh 25

26 The Boyer-Moore Algorithm (1)
m-k-1 u Pk Pk+1 Pm Tj+m-k Tj Tj+1 T[j] P[k], T[j+1....j+m-k] = P[k+1....m] = u Dao Thanh Tinh 26

27 The Boyer-Moore Algorithm (2)
The good-suffix shift consists in aligning the segment u with its rightmost occurrence in P good-suffix shift Pq-1 Pm Pq Pt Pk+1 Pk Pm Tj+1 Tj Tj+m-k u Tj-new a) Find largest t [1..m-1] such that: u = P[k+1..m] P[q..t], Pq-1≠Pk (q>1) u = P[k+1..m] P[q..t], (q=1) Then, j-new = j + m-q+1 = j + 2m-t-k Dao Thanh Tinh 27

28 The Boyer-Moore Algorithm (3)
b) If not exists t [1..m-1] such that: u = P[k+1..m] P[q..t] the shift consists in aligning the longest suffix v of P with a matching prefix of P Find largest t [1..m-1] such that: u = P[m-t+1..m] P[1..t], Then, j-new = j + 2m-t-k good-suffix shift P1 Pt Pm v Pk+1 Pk Pm Tj+1 Tj Tj+m-k Pm-t+1 Tj-new Dao Thanh Tinh 28

29 The Boyer-Moore Algorithm (4)
c) If not exists t [1..m-1] such that: u = P[m-t+1..m] P[1..t] Then, j-new = j + 2m-k or j-new = j + 2m-k-t, where t=0 good-suffix shift Pm P1 Pk+1 Pk Pm Tj+1 Tj Tj+m-k Tj-new Dao Thanh Tinh 29

30 The Boyer-Moore Algorithm (5)
d) If Tj [P1...Pm] : then, j-new = j + m good-suffix shift Pm P1 Pk+1 Pk Pm Tj+1 Tj Tj+m-k Tj-new Dao Thanh Tinh 30

31 The Boyer-Moore Algorithm (6)
a) j=m; k=m; b) while (jn) & (k>0) if T(j) = P(k) { j--; k--; } else { k = m; j = jnew; } c) if (k=0) i0= j+1; else i0= 0; Complexity: O(nm) remark: Jnew and j+1 are the new components on comparison with Brute Force Algorithm. Dao Thanh Tinh 31

32 Computing Jnew ? The Boyer-Moore Algorithm (7) 2018-12-31
Dao Thanh Tinh 32

33 The Boyer-Moore Algorithm (8)
a) Find largest t [1..m-1] such that: u = P[k+1..m] P[q..t], Pq-1≠Pk (q>1) u = P[k+1..m] P[q..t], (q=1) Then,j-new = j + m-q+1 = j + 2m-t-k good-suffix shift Pq-1 Pm Pq Pt Pk+1 Pk Pm Tj+1 Tj Tj+m-k u Tj-new a) bmg(k) = 2m-k-t t = m-1; while (t>m-k) & (P[k+1..m] ≠P[t-m+k+1..t]) OR (Pt-m+k=Pk) t=t-1; if (t=m-k) & P[k+1..m] ≠ P[1..t]) t=0; remark: when t=0, bmg(k) = 2m-k; Dao Thanh Tinh 33

34 The Boyer-Moore Algorithm (9)
b) If not exists t [1..m-1] such that: u = P[k+1..m] P[q..t] the shift consists in aligning the longest suffix v of P with a matching prefix of P Find largest t [1..m-1] such that: u = P[m-t+1..m] P[1..t], Then, j-new = j + 2m-k-t b) bmg(k) = 2m-k-t t = m - k-1; while (t>0) & (P[m-t+1..m] ≠P[1..t]) t = t-1; if (t=m-k+2) & P[k+1..m] ≠ P[1..t]) t=0; remark: when t=0, bmg(k) = 2m-k; good-suffix shift Tj-new Pm P1 Pt Pk+1 Pk Tj+1 Tj Tj+m-k Pm-t+1 v Dao Thanh Tinh 34

35 The Boyer-Moore Algorithm (10)
c) If not exists t [1..m-1] such that: u = P[m-t+1..m] P[1..t] Then, j-new = j + 2m-k or j-new = j + 2m-k-t, where t=0 c) bmg(k) = 2m – k - t where t = 0 good-suffix shift Pm P1 Pk+1 Pk Pm Tj+1 Tj Tj+m-k Tj-new Dao Thanh Tinh 35

36 The Boyer-Moore Algorithm (11)
Pq-1 Pm Pq Pt The Boyer-Moore Algorithm (11) Pk+1 Pk Pm Tj+1 Tj Tj+m-k Tj-new t = m-1; while (t>m-k) & (P[k+1..m] ≠P[t-m+k+1..t]) OR (Pt-m+k=Pk) t=t-1; if (t=m-k) & P[k+1..m] ≠ P[1..t]) t=0; if (t>0) bmg(k) = 2m-k-t; else t = m - k-1; while (t>0) & (P[m-t+1..m] ≠P[1..t]) t = t-1; if (t=m-k+2) & P[k+1..m] ≠ P[1..t]) t=0; if (t>0) bmg(k) = 2m-k-t else bmg(k) = 2m – k Tj-new Pm P1 Pt Pk+1 Pk Tj+1 Tj Tj+m-k Pm-t+1 Pm P1 Tj-new Pk+1 Pk Pm Tj+1 Tj Tj+m-k Dao Thanh Tinh 36

37 The Boyer-Moore Algorithm (12)
d) If Tj [P1...Pm] : then, the left end of the window is aligned with the character immediately after Tj, namely Tj+1. j-new = j + m d) bmS(Tj) = m but Tj  {P1, ..., Pm} ? good-suffix shift Pm P1 Pk+1 Pk Pm Tj+1 Tj Tj+m-k Tj-new Dao Thanh Tinh 37

38 bmS(c) =m The Boyer-Moore Algorithm (13) Define:
for all c  {P1,...., Pm} good-suffix shift Pm P1 Pk+1 Pk Pm Tj+1 c Tj+m-k Tj-new Dao Thanh Tinh 38

39 The Boyer-Moore Algorithm (14)
Find Px= b where Px is rightmost occurrence characer’s b in {P1,...., Pm-1} contains no b Px+1 Pm b Tj-new Pk+1 Pk Pm Tj+1 T j=b Tj+m-k jnew = j + m-x Dao Thanh Tinh 39

40 The Boyer-Moore Algorithm (15)
for k=1 to m-1 t=k for i=k+1 to m-1 if (P(t)=P(i)) t=i; bmS(P(k)) = m-t; bmS(P(m)) = 1; T H Ủ T H Ư bmS: contains no b Px Pm b Tj-new Pk+1 Pk Pm Tj+1 T j=b Tj+m-k Dao Thanh Tinh 40

41 The Boyer-Moore Algorithm (16)
Ư bmg 13 12 11 10 9 8 1 T H Ư N G 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 T H Ư bmS 2 1 4 3 T H Ư N G 2 1 4 3 7 5 6 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Dao Thanh Tinh 41

42 The Boyer-Moore Algorithm (17)
Ư mp 1 2 3 T H Ư N G T H Ư bmS 2 1 4 3 T H Ư N G 3 1 4 7 2 5 6 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 Dao Thanh Tinh 42

43 The Boyer-Moore Algorithm (18)
S E N L D Y T M F O R 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 P =“SENSE” bms=[ ] bmg=[ ] S E N L D Y T M F O R 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 S E N L D Y T M F O R 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 max{bms,bmg} Dao Thanh Tinh 43

44 The Karp-Rabin Algorithm
Giả thiết  = {1, 2, ...,9} p = ts ?  s {1,...,n-m+1}: ts = p ? Dao Thanh Tinh 44

45 The Karp-Rabin Algorithm (2)
Tính p theo sơ đồ Horner’s : p= P(m) + 10*{P(m-1)+ 10*{P(m-2) *{P(2)+10*P(1)}..}} p=P(1) for i=2 to m do p = P(i) + 10*p; Thời gian tính: O(m) Dao Thanh Tinh 45

46 The Karp-Rabin Algorithm (3)
Tính ts: ts = 10m-1T(s) + 10m-2T(s+1)+10m-3T(s+2)...+10T(s+m-2)+T(s+m-1) ts+1 = 10m-1T(s+1) + 10m-2T(s+2) T(s+m-2)+10T(s+m-1)+T(s+m) = 10{10m-2T(s+1) + 10m-3T(s+2) T(s+m-2)+T(s+m-1)}+T(s+m) = 10{ ts – 10m-1T(s)} + T(s+m) Dao Thanh Tinh 46

47 The Karp-Rabin Algorithm (4)
p=P(1); t=T(1); a=1; for i=2 to m { p = P(i) + 10*p; t = T(i) + 10*t; a = a*10; } 1. s=1; 2. while (s<n-m+1) &(t ≠ p) a) t=10*( t – a*T(s))+ T(s+m) b) s = s+1; 3. if (s=n-m) return 0 else return s; O(m+n) Dao Thanh Tinh 47

48 The Karp-Rabin Algorithm (5)
p=P(1) mod q; t=T(1) mod q; a=1; for i=2 to m { p = (P(i) + 10*p) mod q; t = (T(i) + 10*t) mod q; a = (a*10) mod q; } defined: a(q) = a mod q t(q) = t mod q p(q) = p mod q t(q) =10*( t(q) – a(q)*T(s)mod q)+ T(s+m)mod q t = p  t(q) = p(q) t(q) ≠ p(q)  t ≠ p Dao Thanh Tinh 48

49 The Karp-Rabin Algorithm (6)
p=P(1) mod q; t=T(1) mod q; a=1; for i=2 to m { p = (P(i) + 10*p) mod q; t = (T(i) + 10*t) mod q; a = (a*10) mod q; } s=1; while(s<m-n+1) if (t(q)=p(q)) if (P=Ts) return s; else t(q) =10*( t(q) – a(q)*T(s)mod q)+ T(s+m)mod q s = s+1; Dao Thanh Tinh 49

50 Conclusion Brute Force Algorithm 1: Straightforward Matching
The Morris-Pratt Algorithm Knuth-Morris-Pratt Algorithm Brute Force Algorithm 2: Backing The Boyer-Moore Algorithm The Karp-Rabin Algorithm Dao Thanh Tinh 50


Download ppt "Pattern Matching in String"

Similar presentations


Ads by Google