Download presentation

Presentation is loading. Please wait.

Published byAshton Erickson Modified over 4 years ago

1
1 Rules for Approximate String Matching R.C.T. Lee

2
2 Rule 1 Consider two substrings A 1 and A 2 as shown below: A1A1 P1P1 S1S1 A2A2 P2P2 S2S2 If ed(A 1, A 2 ) k and S 1 =S 2, then ed(P 1, P 2 ) k.

3
3 Rule 1:[AKLLLR2000], [H2005], [HHLS2006], [JB2000], [LV89], [NB99], [NB2000], [S80], [TU93], and [WM92].

4
4 Rule 2 If ed(A, B) k, then the length of A must be between m-k and m+k. A B m

5
5 Rule 2: [FN2004], [NB99], [NB2000] and [TU93].

6
6 Rule 3 If S 1 contain S 1 completely and the distance between S 1 and any substring of P is larger than k, then ed(S 1, P)>k. S1S1 P S 1

7
7 Rule 3: [ALP2004].

8
8 Rule 4 For any substring S 1 in T, if there exists a substring S 2 in P to the left of S 1, ed(S 1, S 2 ) k and S 2 is the rightmost such substring, then move P to align S 1 and S 2. TS1S1 PS2S2 PS2S2

9
9 Rule 4: [ALP2004].

10
10 Based upon Rule 3 and Rule 2, we have Rule 5 If the window size is (m-k) and there exists a substring S 1 in the window such that the distance between S 1 and any substring of P is larger than k, then we can safely move P as follows: TS1S1 P m-k TS1S1 P

11
11 If Rule 5 is not satisfied, it means the following: For every substring S 1 in T, there exists a substring S 2 in P such that ed(S 1, S 2 ) k.

12
12 TS1S1 P m-k Rule 5-1 If Rule 5 is not satisfied, we can only move 1 step as follows: TS1S1 P m-k

13
13 Rule 5: [HN2005].

14
14 Rule 6 Hamming Distance(A, B) Edit Distance(A, B).

15
15 Rule 6: [AKLLLR2000], [FN2004] and [TU93].

16
16 Rule 7 For strings A and B, if there are k+1 characters which do not appear in B, then ed(A, B)>k. Rule 7-1 Let A and B be two strings. Let there be k+1 characters a 1, a 2, …, a k+1 in A and a i is aligned with b i in B. If every a i does not appear in B[i-k, i+k], then ed(A, B)>k.

17
17 Rule 7: [TU93].

18
18 Rule 8 Let there be two strings A and B. Let B be divided into j pieces B 1, B 2, …, B j. If ed(A, B)>k, there is at least one substring A i in A such that ed(A i, B i ).

19
19 Rule 8-1 Let A and B be two strings. Let B be divided into j pieces B 1, B 2, …, B j. If for every B i and every substring S of A, ed(S, B i ), ed(A, B)>k.

20
20 Rule 8-2 Let A and B be two strings. Let the lengths of A and B be m+k and m repsectively. Let B be divided into j pieces B 1, B 2, …, B j. Let AP be a prefix of A. If for every B i and every substring S of A, ed(S, B i ), ed(AP, B)>k.

21
21 Rule 8: [NB99] and [NB2000].

22
22 Rule 9 Let A and B be two strings with lengths m+k and m respectively. Let A be the prefix of A with length m-k. Let there be j characters a 1, a 2, …, a j in A. Let the number of times that a i appears in A and B be N(A, a i ) and N(B, a i ) respectively. Let C i =N(A, a i )-N(B, a i ). Let AP be any prefix of A. If, ed(AP, B)>k.

23
23 Rule 9-1 Let A and B be two strings with lengths m+k and m respectively. Let there be j characters a 1, a 2, …, a j in A. Let the number of times that a i appears in A and B be N(A, a i ) and N(B, a i ) respectively. Let C i =N(B, a i )-N(A, a i ). Let AP be any prefix of A. If, ed(AP, B)>k.

24
24 Rule 10 Let P and T be two strings with lengths m and n respectively. If P matches with a substring P of T at position i, any substring S of T[i-k, i+m+k] has the probability of ed(S, P) k. TP P m+2k ii-ki+m+k

25
25 Rule 10: [NB99].

26
26 Rule 11 Let P and Q be two strings. Let P be divided as follows: P1P1 P2P2 … PnPn Let Q i be the substring in Q and that ed(P i, Q i ) is the smallest. P1P1 P2P2 PnPn … Q2Q2 QNQN … Q1Q1 If

27
27 Application of Rule 11 T W tntn t2t2 PnPn P1 P1 P2P2 t1t1 … ed(t i,P i ) is the smallest. If for some n,

28
28 [AKLLLR2000] Text Indexing and Dictionary Matching with One Error, Amir, A., Keselman, D., Landau, G. M., Lewenstein, M., Lewenstein, N. and Rodeh, M., Journal of Algorithms, Vol. 37, 2000, pp. 309-325. [ALP2004] Faster Algorithms for String Matching with k Mismatches, Amir, A., Lewenstein, and Porat, E. Journal of Algorithms, Vol. 50, 2004, pp. 257-275. [FN2004] Average-Optimal Multiple Approximate String Matching, Kimmo Fredriksson, Gonzalo Navarro, ACM Journal of Experimental Algorithmics, Vol 9, Article No. 1.4,2004, pp. 1-47.

29
29 [GG86] Improved String Matching with k Mismatches, Galil, Z. and Giancarlo, R.,SIGACT News, Vol. 17, No. 4, 1986, pp. 52-54. [H2005] Bit-parallel approximate string matching algorithms with transposition Heikki Hyyrö, Journal of Discrete Algorithms, Vol. 3, 2005, pp. 215-229. [HHLS2006] Approximate String Matching Using Compressed Suffix Arrays, Trinh N. D. Huynh, W. K. Hon, T. W. Lam and W. K. Sung, Theoretical Computer Science, Vol. 352, 2006, pp. 240-249.

30
30 [HN2005] Bit-parallel Witnesses and their Applications to Approximate String Matching, Heikki Hyyro and Gonzalo Navarro, Algorithmica, Vol 4, No. 3, 2005, pp.203-231. [JB2000] Approximate string matching using factor automata, Jan Holub, Borivoj Melichar, Theoretical Computer Science 249, 2000, pp. 305-311. [LV86] String Matching with k Mismatches by Using Kangaroo Method, Landau, G.M., and Vishkin, U., Theoret. Comput Sci 43, 1986, pp. 239-249.

31
31 [LV89] Fast Parallel and Serial Approximate String Matching, G. Landau and U. Vishkin, Journal of algorithms, 10, 1989, pp.157-169. [NB99] Very fast and simple approximate string matching, G. Navarro and R. Baeza- Yates, Information Processing Letters, Vol. 72, 1999, pp.65-70. [NB2000] A Hybrid Indexing Method for Approximate String Matching, Gonzalo Navarro and Ricardo Baeza-Yates, 2000, No.1, Vol.1, pp.205-239.

32
32 [S80] String Matching with Errors, Sellers, P. H., Journal of Algorithms, Vol. 20, No. 1, 1980, pp. 359-373. [TU93] Approximate Boyer-Moore String Matching, J. Tarhio and E. Ukkonen, SIAM Journal on Computing, Vol. 22, No. 2, 1993, pp.243-260. [WM92] Fast Text Searching: Allowing Errors, Sun Wu and Udi Manber, Communications of the ACM, Vol. 35, 1992, pp. 83-91.

Similar presentations

OK

1 Fast text searching: allowing errors Sun Wu and Udi Manber, Communications of the ACM, Vol. 35, 1992, pp. 83-91 Advisor: Prof. R. C. T. Lee Reporter:

1 Fast text searching: allowing errors Sun Wu and Udi Manber, Communications of the ACM, Vol. 35, 1992, pp. 83-91 Advisor: Prof. R. C. T. Lee Reporter:

© 2018 SlidePlayer.com Inc.

All rights reserved.

To make this website work, we log user data and share it with processors. To use this website, you must agree to our Privacy Policy, including cookie policy.

Ads by Google