Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Bit-parallel approximate string matching algorithms with transposition Heikki Hyyrö, Journal of Discrete Algorithms, Vol. 3, 2005, pp. 215-229 Advisor:

Similar presentations


Presentation on theme: "1 Bit-parallel approximate string matching algorithms with transposition Heikki Hyyrö, Journal of Discrete Algorithms, Vol. 3, 2005, pp. 215-229 Advisor:"— Presentation transcript:

1 1 Bit-parallel approximate string matching algorithms with transposition Heikki Hyyrö, Journal of Discrete Algorithms, Vol. 3, 2005, pp. 215-229 Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan

2 2 This paper is based on Fast text searching allowing errors which there are three operation which are insertion, deletion and substitution. This paper uses a new operation transposition. We will use those operations to solve the approximate string matching problem under bit-parallel. Transposition: It is transposing two adjacent characters in the string. Example: Insertion: cat cast Deletion: cat at Substitution:cat car Transposition:cta cat

3 3 But now, first we will use the three operations insertion, deletion and substitution to apply in the approximate string matching problem. Then the transposition will be introduced after the three operations.

4 4 Our approximate string matching problem is defined as follows: We are given a text T 1,n, a pattern P 1,m and an error bound k. We want to find all locations T i when 1 i n such that there exists a suffix A of T 1,i and d(A,P) k where d(x,y) is the edit distance (or difference) between x and y. Example: T=deaabeg, P=aabac and k=2. For i=5. T 1,5 = deaab. We note that there exists a suffix A=aab of T 1,5 such that d(A,P)=d(aab,aabac)=2.

5 5 T P S2S2 Let S be a substring of T. If there exists a suffix S 2 of S and a suffix P 2 of P such that d(S 2, P 2 ) = 0, and d(S 1, P 1 ) k, we have d(S, P) k. S1S1 S P1P1 P2P2 Our approach is based upon the following observation: Case 1 :

6 6 Example: A=addcd and B=abcd. k=2. We may decompose A and B as follows: A=add+cd. B=ab+cd. d(add,ab)=2. Thus d(A,B)=2.

7 7 T P S2S2 Let S be a substring of T. If there exists a suffix S 2 of S and a suffix P 2 of P such that d(S 2, P 2 ) = 1, and d(S 1, P 1 ) k-1, we have d(S, P) k. S1S1 S P1P1 P2P2 Case 2 :

8 8 Example: A=addb and B=dc. k=3. We may decompose A and B as follows: A=add+b. B=d+c. d(b,c)=1 d(add,d)=2. Thus d(A,B)=3.

9 9 To solve our approximate string matching problem, we use a table, called. where 1 i n and 1 j m. 1100011000 a a b a a c a a b a c a b 1110011100 1111011110 1111111111 1111111111 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 1110111101 1101011010 1110011100 1111011110 1111111111 1101111011 1100111001 1110011100 Example: T:aabaacaabacab, P:aabac and k=1. Consider i=9, j=4. T 1,9 =aabaacaab P 1,4 =aaba A=T 7,9 =aab d(A,P 1,4 )=d(aab,aaba)=1 = 1 if there exists a suffix A of T 1,i such that d(A,P 1,j ) k. = 0 otherwise.

10 10 It will be shown later that is obtained from. Thus, it is important for us to know how to find. will be obtained by Shift-And algorithm later under bit-parallel. But now, we will explain it in dynamic programming [S80](String Matching with Errors). can be found by dynamic programming. for all i. for all j. = 1 ifand t i =p j. = 0 if otherwise. From the above, we can see that the ith column of can be obtained from the (i-1)th column of, t i and p j. 111111111 1 2 3 4 5 6 7 8 9 aabacaabac 1234512345 a a b a a c a a b 100000100000 Example: Text = aabaacaab pattern = aabac

11 11 For instance, the 9th-colume is 1234512345 Example: T : aabaacaabacab, P : aabac and k=0. 1000010000 a a b a a c a a b a c a b 1100011000 0010000100 1001010010 1100011000 0000000000 1000010000 1100011000 0010000100 1001010010 0000100001 1000010000 0000000000 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 The 8th-colume is 1234512345 because and t 9 p 3b. because and t 9p 2. But, we do not use dynamic programming approach because we want to use the bit-parallel approach.

12 12 Let us denote the ith column of by whose length equals to m. For instance =( 1,0,0,1,0 ) =( 1,1,0,0,0 ) To find we consider two cases: Case 1: j=1 If t i =p j, otherwise, Example: T : aabaacaabacab, P : aabac and k=0. 1000010000 a a b a a c a a b a c a b 1100011000 0010000100 1001010010 1100011000 0000000000 1000010000 1100011000 0010000100 1001010010 0000100001 1000010000 0000000000 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 Case 2: j>1 If and t i =p j, otherwise,

13 13 We now define a right shift operation >>i on a vector A=(a 1,a 2,…,a m ) as follows: That is, we delete the last i elements and add i 0s at the front. For instance, A=(1,0,0,1,1,1). Then A>>1=(0,1,0,0,1,1). We use & and v to represent AND and OR operation. We further use to denote For example

14 14 Consider, m=5. =( 1,0,0,1,0 ) Example: T : aabaacaabacab, P : aabac and k=0. 1000010000 a a b a a c a a b a c a b 1100011000 0010000100 1001010010 1100011000 0000000000 1000010000 1100011000 0010000100 1001010010 0000100001 1000010000 0000000000 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345

15 15 Given an alphabet α and a pattern P, we need to know where α appears in P. This information is contained in a vector Σ(α) which is defined as follows: Σ(α)[i]=1 if p i =α. Σ(α)[i]=0 if otherwise. Example: P : aabac Σ(a) =( 1,1,0,1,0 ) Σ(b) =( 0,0,1,0,0 ) Σ(c) =( 0,0,0,0,1 )

16 16 Σ(a) =( 1,1,0,1,0 ) and p 4 =t 10 =a Since and p 4 =t 10, we have. Since, we have. Using the following rule: Example: T : aabaacaabacab, P : aabac and k=0. 1000010000 a a b a a c a a b a c a b 1100011000 0010000100 1001010010 1100011000 0000000000 1000010000 1100011000 0010000100 1001010010 0000100001 1000010000 0000000000 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 Case 1: j=1 If t i =p j, otherwise, Case 2: j>1 If and t i =p j, otherwise,

17 17 Similarly, we can conclude that. Example: T : aabaacaabacab, P : aabac and k=0. 1000010000 a a b a a c a a b a c a b 1100011000 0010000100 1001010010 1100011000 0000000000 1000010000 1100011000 0010000100 1001010010 0000100001 1000010000 0000000000 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 =((0,0,0,1,0)v(1,0,0,0,0))&(1,1,0,1,0) =(1,0,0,1,0) Σ[t 10 ] Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000

18 18 Shift-And In general we will update the i-th column of, which is, by using the formula for each new character of i. If, report a match at position i.

19 19 j=4 aabaaca 1234567 abac 89 ab... a aa aaba aabac j=1 j=2 j=3 j=5 aab Example: Text = aabaacaabacab pattern = aabac when i=1 10000 11010 10000 & Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000 110000110000 111111111111 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 a a b a a c a a b a c a b 100000100000 The Shift-And formula Σ(a) The operation & will examine whether the Σ(t i ) equals to the p j or not currently.

20 20 j=4 aabaaca 1234567 abac 89 ab... a aa aaba aabac j=1 j=2 j=3 j=5 aab 110000110000 11111111111 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 a a b a a c a a b a c a b 100000100000 111000111000 11000 11010 11000 & The Shift-And formula Example: Text = aabaacaabacab pattern = aabac when i=2 Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000 Σ(a)

21 21 j=4 aabaaca 1234567 abac 89 ab... a aa aaba aabac j=1 j=2 j=3 j=5 aab 111000111000 1111111111 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 a a b a a c a a b a c a b 100000100000 100100100100 11100 00100 & 110000110000 The Shift-And formula Example: Text = aabaacaabacab pattern = aabac when i=3 Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000 Σ(b)

22 22 j=4 aabaaca 1234567 abac 89 ab... a aa aaba aabac j=1 j=2 j=3 j=5 aab 100100100100 111111111 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 a a b a a c a a b a c a b 100000100000 110010110010 10010 11010 10010 & 110000110000 111000111000 The Shift-And formula Example: Text = aabaacaabacab pattern = aabac when i=4 Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000 Σ(a)

23 23 j=4 aabaaca 1234567 abac 89 ab... a aa aaba aabac j=1 j=2 j=3 j=5 aab 110010110010 11111111 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 a a b a a c a a b a c a b 100000100000 111000111000 11001 11010 11000 & 110000110000 111000111000 100100100100 The Shift-And formula Example: Text = aabaacaabacab pattern = aabac when i=5 Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000 Σ(a)

24 24 Example: Text = aabaacaabacab pattern = aabac 110000110000 111000111000 100100100100 110010110010 111000111000 100000100000 110000110000 111000111000 100100100100 110010110010 100001100001 110000110000 100000100000 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 a a b a a c a a b a c a b aabac 100000100000 Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000

25 25 Approximate String Matching The approximate string matching problem is exact string matching problem with errors. We can use like Shift-And to achieve the table. Here, we will introduce three operation which are insertion, deletion and substitution in approximate string matching. Let, and denote therelated to insertion, deletion and substitution respectively., and indicate that we can perform an insertion, deletion and substitution respectively without violating the error bound which is k. where 1 i n and 1 j m. = 1 if there exists a suffix A of T 1,i such that d(A,P 1,j ) k. = 0 otherwise.

26 26 If t i =p j, and Otherwise, Example: T:aabaacaabacab, P:aabac and k=1. Consider i=6 and j=5. S=T 1,6 =aabaac and P 1,5 =aabac t 6 =p 5 =c and R 1 (5,4)=1. There exists a suffix A of S such that d(A,P) 1. T:P:T:P: aabaa aaba i j i-1 c c j-1

27 27 if t ip j and otherwise. T:P:T:P: aabac b i j i-1 Example: when k=1. T:P:T:P: aabac b b insertion i j i-1 Insertion

28 28 Example: Text = aabaacaabacab. Pattern = aabac. k=1. 1000010000 a a b a a c a a b a c a b 1100011000 0010000100 1001010010 1100011000 0000000000 1000010000 1100011000 0010000100 1001010010 0000100001 1000010000 0000000000 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 (1) When i=6 and j=4. t 6 =cp 4 =a, (2) When i=11 and j=4. t 11 =cp 4 =a, 1000010000 a a b a a c a a b a c a b 1100011000 1110011100 1111011110 1101011010 1100111001 1100011000 1100011000 1110011100 1111011110 1001110011 1100111001 1010010100 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 if t ip j and otherwise.

29 29 Example: Text = aabaacaabacab, Pattern = aabac and k=1 1000010000 a a b a a c a a b a c a b 1100011000 0010000100 1001010010 1100011000 0000000000 1000010000 1100011000 0010000100 1001010010 0000100001 1000010000 0000000000 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 The insertion formula: 11101 00001 11000 11001 Σ(c) &v&v aabaacaabacab aabaac insertion 11110 11010 00100 11110 Σ(a) &v&v aabaacaabacab aaba insertion 1000010000 a a b a a c a a b a c a b 1100011000 1110011100 1111011110 1101011010 1100111001 1100011000 1100011000 1110011100 1111011110 1001110011 1100111001 1010010100 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000

30 30 T:P:T:P: aabac b deletion i jj-1 Example: when k=1. T:P:T:P: aabac b i jj-1 Deletion if t ip j and otherwise.

31 31 Example: Text = aabaacaabacab. Pattern = aabac. k=1. 1000010000 a a b a a c a a b a c a b 1100011000 0010000100 1001010010 1100011000 0000000000 1000010000 1100011000 0010000100 1001010010 0000100001 1000010000 0000000000 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 (1) When i=6 and j=4. t 6 =cp 4 =a, (2) When i=3 and j=4. t 3 =bp 4 =a, 1100011000 a a b a a c a a b a c a b 1110011100 1011010110 1101111011 1110011100 1000010000 1100011000 1110011100 1011010110 1101111011 1000110001 1100011000 1010010100 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 if t ip j and otherwise.

32 32 Example: Text = aabaacaabacab, Pattern = aabac and k=1. 1000010000 a a b a a c a a b a c a b 1100011000 0010000100 1001010010 1100011000 0000000000 1000010000 1100011000 0010000100 1001010010 0000100001 1000010000 0000000000 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 The deletion formula: 1100011000 a a b a a c a a b a c a b 1110011100 1011010110 1101111011 1110011100 1000010000 1100011000 1110011100 1011010110 1101111011 1000110001 1100011000 1010010100 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 11100 00100 10000 10100 Σ(b) &v&v aabaacaabacab aab deletion 11110 00100 10010 10110 Σ(b)&v&v aabaacaabacab aaba deletion Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000

33 33 T:P:T:P: aabac a b T:P:T:P: b substitution b i j i-1 j-1 i j i-1 j-1 Example: when k=1. Substitution if t ip j and otherwise.

34 34 Example: Text = aabaacaabacab. Pattern = aabac. k=1. 1000010000 a a b a a c a a b a c a b 1100011000 0010000100 1001010010 1100011000 0000000000 1000010000 1100011000 0010000100 1001010010 0000100001 1000010000 0000000000 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 (1) When i=6 and j=4. t 6 =cp 4 =a, (2) When i=5 and j=5. t 3 =bp 4 =a, 1000010000 a a b a a c a a b a c a b 1100011000 1110011100 1101011010 1100111001 1110011100 1101011010 1100011000 1110011100 1101011010 1100111001 1100011000 1110011100 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 if t ip j and otherwise.

35 35 Example: Text = aabaacaabacab, Pattern = aabac and k=1. 1000010000 a a b a a c a a b a c a b 1100011000 0010000100 1001010010 1100011000 0000000000 1000010000 1100011000 0010000100 1001010010 0000100001 1000010000 0000000000 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 The substitution formula: 1000010000 a a b a a c a a b a c a b 1100011000 1110011100 1101011010 1100111001 1110011100 1101011010 1100011000 1110011100 1101011010 1100111001 1100011000 1110011100 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 11100 00100 11000 11100 &v&v aabaacaabacab aab aabaacaabacab cab 11101 11010 11000 11001 &v&v aabaacaabacab aabac aabaacaabacab aabaa Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000 Σ(b) Σ(a)

36 36 The algorithm which contains insertion, deletion and substitution allows k error. In the physical meaning, we just combine the formulas of insertion, deletion and substitution. So now, we combine the three formula using the or operation as follows: Initially, insertion deletion substitution The insertion : The deletion : The substitution : t i =p j and

37 37 Example: Text = aabaacaabacab pattern = aabac k=1 1000010000 1100011000 0010000100 1001010010 1100011000 0000000000 1000010000 1100011000 0010000100 1001010010 0000100001 1000010000 0000000000 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 a a b a a c a a b a c a b 01111 11010 01010 00100 00010 01001 10000 11111 &v&v 1100011000 a a b a a c a a b a c a b 1110011100 1111011110 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 1111111111 aabaacaabacab aabac for deletion Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000 Σ(a)

38 38 Example: Text = aabaacaabacab pattern = aabac k=1 1000010000 1100011000 0010000100 1001010010 1100011000 0000000000 1000010000 1100011000 0010000100 1001010010 0000100001 1000010000 0000000000 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 a a b a a c a a b a c a b 01111 11010 01010 10010 01001 01100 10000 11111 &v&v aabaacaabacab aabaa 1100011000 a a b a a c a a b a c a b 1110011100 1111011110 1111111111 1111111111 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 for insertion Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000 Σ(a)

39 39 Example: Text = aabaacaabacab pattern = aabac k=1 1000010000 1100011000 0010000100 1001010010 1100011000 0000000000 1000010000 1100011000 0010000100 1001010010 0000100001 1000010000 0000000000 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 a a b a a c a a b a c a b 01111 00001 11000 01100 00000 10000 11101 &v&v aabaacaabacab aab 1100011000 a a b a a c a a b a c a b 1110011100 1111011110 1111111111 1111111111 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 1110111101 aabaacaabacab aac for substitution Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000 Σ(c)

40 40 1100011000 a a b a a c a a b a c a b 1110011100 1111011110 1111111111 1111111111 1 2 3 4 5 6 7 8 9 10111213 aabacaabac 1234512345 1110111101 1101011010 1110011100 1111011110 1111111111 1101111011 1100111001 1110011100 Example: Text = aabaacaabacab pattern = aabac k=1 Σ(a) Σ(b) Σ(c) * 11010 00100 00001 00000

41 41 Transposition Example: A=badec and B=bdace. If transposition is allowed, we may transpose B as follow: B=bdace A=badec Define for transposition: if& t i =p j-1 & t i-1 =p j otherwise.

42 42 Initially, We combine the operations of insertion, deletion, substitution and transposition for k error. The formula is as follows: insertion deletion substitution t i =p j and transposition V V V V V

43 43 Example: T=abcbcaccbb, P=acbcb and k=1. 1000010000 0000000000 0000000000 0000000000 0000000000 1000010000 0100001000 0000000000 1 2 3 4 5 6 7 8 9 10 acbcbacbcb 1234512345 a b c b c a c c b b 0000000000 0000000000 1100011000 1110011100 1111011110 1010110101 1101011010 1100011000 1110011100 1111011110 1 2 3 4 5 6 7 8 9 10 acbcbacbcb 1234512345 a b c b c a c c b b 1011110111 1000110001 01110 01010 00000 10000 00100 11110 & v the original partial text: abc acb transpose bc to be cb b a c 00100 00101 00100 & a bc bcaccbb a cb a bc bcaccbb a bc k 1k 0 So we can tackle this within k 1. + Σ(c) Σ(b) Σ(c)>>1 Σ(a) Σ(b) Σ(c) * 10000 00101 01010 00000

44 44 Example: T=abcbcaccbb, P=acbcb and k=1. 1000010000 0000000000 0000000000 0000000000 0000000000 1000010000 0100001000 0000000000 1 2 3 4 5 6 7 8 9 10 acbcbacbcb 1234512345 a b c b c a c c b b 0000000000 0000000000 1100011000 1110011100 1111011110 1010110101 1101011010 1100011000 1110011100 1111011110 1 2 3 4 5 6 7 8 9 10 acbcbacbcb 1234512345 a b c b c a c c b b 1011110111 1000110001 01111 00101 00000 10000 00010 10111 & v the original partial text: transpose cb to be bc 00010 01010 00010 & ccb cbc c c b abcbcac cb b ac bc a a a k 1k 0 + So we can tackle this within k 1. abcbcac cb b ac cb Σ(b) Σ(c) Σ(b)>>1 Σ(a) Σ(b) Σ(c) * 10000 00101 01010 00000

45 45 Example: T=abcbcaccbb, P=acbcb and k=2. 1100011000 1110011100 1111011110 1010110101 1101011010 1100011000 1110011100 1111011110 1 2 3 4 5 6 7 8 9 10 acbcbacbcb 1234512345 a b c b c a c c b b 1011110111 1000110001 1110011100 1111011110 1111111111 1111111111 1111111111 1111111111 1111011110 1111111111 1 2 3 4 5 6 7 8 9 10 acbcbacbcb 1234512345 a b c b c a c c b b 1111111111 1111111111 01111 01010 11100 01110 01111 10000 00100 11111 & v 00110 00101 00100 & a bc bcaccbb a cb the original partial text: abc acb transpose bc to be cb b a c k 1 So we can tackle this within k 2. + a bc bcaccbb a bc Σ(c) Σ(b) Σ(c)>>1 Σ(a) Σ(b) Σ(c) * 10000 00101 01010 00000

46 46 Example: T=abcbcaccbb, P=acbcb and k=2. 1100011000 1110011100 1111011110 1010110101 1101011010 1100011000 1110011100 1111011110 1 2 3 4 5 6 7 8 9 10 acbcbacbcb 1234512345 a b c b c a c c b b 1011110111 1000110001 1110011100 1111011110 1111111111 1111111111 1111111111 1111111111 1111011110 1111111111 1 2 3 4 5 6 7 8 9 10 acbcbacbcb 1234512345 a b c b c a c c b b 1111111111 1111111111 01111 00101 11110 01111 01010 10000 00010 11111 & v the original partial text: bcb bbc transpose cb to be bc c b b 00111 01010 00010 & a a a ab cb caccbb ac bc ab cb caccbb ab bc k 1 + So we can tackle this within k 2. Σ(b) Σ(c) Σ(b)>>1 Σ(a) Σ(b) Σ(c) * 10000 00101 01010 00000

47 47 Example: T=abcbcaccbb, P=acbcb and k=2. 1100011000 1110011100 1111011110 1010110101 1101011010 1100011000 1110011100 1111011110 1 2 3 4 5 6 7 8 9 10 acbcbacbcb 1234512345 a b c b c a c c b b 1011110111 1000110001 1110011100 1111011110 1111111111 1111111111 1111111111 1111111111 1111011110 1111111111 1 2 3 4 5 6 7 8 9 10 acbcbacbcb 1234512345 a b c b c a c c b b 1111111111 1111111111 01111 01010 10101 01010 01101 10000 00101 11111 & v 00111 00101 & the original partial text: cbc ccb transpose cb to be bc b c c abc bc accbb c bc abc bc accbb a cb k 1 + So we can tackle this within k 2. Σ(c) Σ(b) Σ(c)>>1 Σ(a) Σ(b) Σ(c) * 10000 00101 01010 00000

48 48 Example: T=abcbcaccbb, P=acbcb and k=2. 1100011000 1110011100 1111011110 1010110101 1101011010 1100011000 1110011100 1111011110 1 2 3 4 5 6 7 8 9 10 acbcbacbcb 1234512345 a b c b c a c c b b 1011110111 1000110001 1110011100 1111011110 1111111111 1111111111 1111111111 1111111111 1111011110 1111111111 1 2 3 4 5 6 7 8 9 10 acbcbacbcb 1234512345 a b c b c a c c b b 1111111111 1111111111 01111 01010 10101 01010 01101 10000 00101 11111 & v 00111 00101 & the original partial text: cbc ccb transpose cb to be bc b c c abc bc accbb abc bc abc bc accbb acb cb k 1 + So we can tackle this within k 2. b b b a a a Σ(c) Σ(b) Σ(c)>>1 Σ(a) Σ(b) Σ(c) * 10000 00101 01010 00000

49 49 Example: T=abcbcaccbb, P=acbcb and k=2. 1100011000 1110011100 1111011110 1010110101 1101011010 1100011000 1110011100 1111011110 1 2 3 4 5 6 7 8 9 10 acbcbacbcb 1234512345 a b c b c a c c b b 1011110111 1000110001 1110011100 1111011110 1111111111 1111111111 1111111111 1111111111 1111011110 1111111111 1 2 3 4 5 6 7 8 9 10 acbcbacbcb 1234512345 a b c b c a c c b b 1111111111 1111111111 01111 00101 10101 01010 01101 10000 00010 11111 & v 00111 01010 00010 & the original partial text: cbc ccb transpose cb to be bc b c c abcbcac cb b ac bc k 1 + So we can tackle this within k 2. b b b a a a Σ(b) Σ(c) Σ(b)>>1 Σ(a) Σ(b) Σ(c) * 10000 00101 01010 00000

50 50 Time complexity k: error value m: the length of pattern n : the length of text w : the size of word of computer

51 51 Reference [WM92] Fast text searching: allowing errors, Wu and Udi Manber, Communications of the ACM, Vol. 35, 1992, pp. 83-91

52 52 ~ Thanks for your attention ~


Download ppt "1 Bit-parallel approximate string matching algorithms with transposition Heikki Hyyrö, Journal of Discrete Algorithms, Vol. 3, 2005, pp. 215-229 Advisor:"

Similar presentations


Ads by Google