Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Bit-parallel approximate string matching algorithms with transposition Heikki Hyyrö, Journal of Discrete Algorithms, Vol. 3, 2005, pp. 215-229 Advisor:

Similar presentations


Presentation on theme: "1 Bit-parallel approximate string matching algorithms with transposition Heikki Hyyrö, Journal of Discrete Algorithms, Vol. 3, 2005, pp. 215-229 Advisor:"— Presentation transcript:

1 1 Bit-parallel approximate string matching algorithms with transposition Heikki Hyyrö, Journal of Discrete Algorithms, Vol. 3, 2005, pp Advisor: Prof. R. C. T. Lee Reporter: Z. H. Pan

2 2 This paper is based on Fast text searching allowing errors which there are three operation which are insertion, deletion and substitution. This paper uses a new operation transposition. We will use those operations to solve the approximate string matching problem under bit-parallel. Transposition: It is transposing two adjacent characters in the string. Example: Insertion: cat cast Deletion: cat at Substitution:cat car Transposition:cta cat

3 3 But now, first we will use the three operations insertion, deletion and substitution to apply in the approximate string matching problem. Then the transposition will be introduced after the three operations.

4 4 Our approximate string matching problem is defined as follows: We are given a text T 1,n, a pattern P 1,m and an error bound k. We want to find all locations T i when 1 i n such that there exists a suffix A of T 1,i and d(A,P) k where d(x,y) is the edit distance (or difference) between x and y. Example: T=deaabeg, P=aabac and k=2. For i=5. T 1,5 = deaab. We note that there exists a suffix A=aab of T 1,5 such that d(A,P)=d(aab,aabac)=2.

5 5 T P S2S2 Let S be a substring of T. If there exists a suffix S 2 of S and a suffix P 2 of P such that d(S 2, P 2 ) = 0, and d(S 1, P 1 ) k, we have d(S, P) k. S1S1 S P1P1 P2P2 Our approach is based upon the following observation: Case 1 :

6 6 Example: A=addcd and B=abcd. k=2. We may decompose A and B as follows: A=add+cd. B=ab+cd. d(add,ab)=2. Thus d(A,B)=2.

7 7 T P S2S2 Let S be a substring of T. If there exists a suffix S 2 of S and a suffix P 2 of P such that d(S 2, P 2 ) = 1, and d(S 1, P 1 ) k-1, we have d(S, P) k. S1S1 S P1P1 P2P2 Case 2 :

8 8 Example: A=addb and B=dc. k=3. We may decompose A and B as follows: A=add+b. B=d+c. d(b,c)=1 d(add,d)=2. Thus d(A,B)=3.

9 9 To solve our approximate string matching problem, we use a table, called. where 1 i n and 1 j m a a b a a c a a b a c a b aabacaabac Example: T:aabaacaabacab, P:aabac and k=1. Consider i=9, j=4. T 1,9 =aabaacaab P 1,4 =aaba A=T 7,9 =aab d(A,P 1,4 )=d(aab,aaba)=1 = 1 if there exists a suffix A of T 1,i such that d(A,P 1,j ) k. = 0 otherwise.

10 10 It will be shown later that is obtained from. Thus, it is important for us to know how to find. will be obtained by Shift-And algorithm later under bit-parallel. But now, we will explain it in dynamic programming [S80](String Matching with Errors). can be found by dynamic programming. for all i. for all j. = 1 ifand t i =p j. = 0 if otherwise. From the above, we can see that the ith column of can be obtained from the (i-1)th column of, t i and p j aabacaabac a a b a a c a a b Example: Text = aabaacaab pattern = aabac

11 11 For instance, the 9th-colume is Example: T : aabaacaabacab, P : aabac and k= a a b a a c a a b a c a b aabacaabac The 8th-colume is because and t 9 p 3b. because and t 9p 2. But, we do not use dynamic programming approach because we want to use the bit-parallel approach.

12 12 Let us denote the ith column of by whose length equals to m. For instance =( 1,0,0,1,0 ) =( 1,1,0,0,0 ) To find we consider two cases: Case 1: j=1 If t i =p j, otherwise, Example: T : aabaacaabacab, P : aabac and k= a a b a a c a a b a c a b aabacaabac Case 2: j>1 If and t i =p j, otherwise,

13 13 We now define a right shift operation >>i on a vector A=(a 1,a 2,…,a m ) as follows: That is, we delete the last i elements and add i 0s at the front. For instance, A=(1,0,0,1,1,1). Then A>>1=(0,1,0,0,1,1). We use & and v to represent AND and OR operation. We further use to denote For example

14 14 Consider, m=5. =( 1,0,0,1,0 ) Example: T : aabaacaabacab, P : aabac and k= a a b a a c a a b a c a b aabacaabac

15 15 Given an alphabet α and a pattern P, we need to know where α appears in P. This information is contained in a vector Σ(α) which is defined as follows: Σ(α)[i]=1 if p i =α. Σ(α)[i]=0 if otherwise. Example: P : aabac Σ(a) =( 1,1,0,1,0 ) Σ(b) =( 0,0,1,0,0 ) Σ(c) =( 0,0,0,0,1 )

16 16 Σ(a) =( 1,1,0,1,0 ) and p 4 =t 10 =a Since and p 4 =t 10, we have. Since, we have. Using the following rule: Example: T : aabaacaabacab, P : aabac and k= a a b a a c a a b a c a b aabacaabac Case 1: j=1 If t i =p j, otherwise, Case 2: j>1 If and t i =p j, otherwise,

17 17 Similarly, we can conclude that. Example: T : aabaacaabacab, P : aabac and k= a a b a a c a a b a c a b aabacaabac =((0,0,0,1,0)v(1,0,0,0,0))&(1,1,0,1,0) =(1,0,0,1,0) Σ[t 10 ] Σ(a) Σ(b) Σ(c) *

18 18 Shift-And In general we will update the i-th column of, which is, by using the formula for each new character of i. If, report a match at position i.

19 19 j=4 aabaaca abac 89 ab... a aa aaba aabac j=1 j=2 j=3 j=5 aab Example: Text = aabaacaabacab pattern = aabac when i= & Σ(a) Σ(b) Σ(c) * aabacaabac a a b a a c a a b a c a b The Shift-And formula Σ(a) The operation & will examine whether the Σ(t i ) equals to the p j or not currently.

20 20 j=4 aabaaca abac 89 ab... a aa aaba aabac j=1 j=2 j=3 j=5 aab aabacaabac a a b a a c a a b a c a b & The Shift-And formula Example: Text = aabaacaabacab pattern = aabac when i=2 Σ(a) Σ(b) Σ(c) * Σ(a)

21 21 j=4 aabaaca abac 89 ab... a aa aaba aabac j=1 j=2 j=3 j=5 aab aabacaabac a a b a a c a a b a c a b & The Shift-And formula Example: Text = aabaacaabacab pattern = aabac when i=3 Σ(a) Σ(b) Σ(c) * Σ(b)

22 22 j=4 aabaaca abac 89 ab... a aa aaba aabac j=1 j=2 j=3 j=5 aab aabacaabac a a b a a c a a b a c a b & The Shift-And formula Example: Text = aabaacaabacab pattern = aabac when i=4 Σ(a) Σ(b) Σ(c) * Σ(a)

23 23 j=4 aabaaca abac 89 ab... a aa aaba aabac j=1 j=2 j=3 j=5 aab aabacaabac a a b a a c a a b a c a b & The Shift-And formula Example: Text = aabaacaabacab pattern = aabac when i=5 Σ(a) Σ(b) Σ(c) * Σ(a)

24 24 Example: Text = aabaacaabacab pattern = aabac aabacaabac a a b a a c a a b a c a b aabac Σ(a) Σ(b) Σ(c) *

25 25 Approximate String Matching The approximate string matching problem is exact string matching problem with errors. We can use like Shift-And to achieve the table. Here, we will introduce three operation which are insertion, deletion and substitution in approximate string matching. Let, and denote therelated to insertion, deletion and substitution respectively., and indicate that we can perform an insertion, deletion and substitution respectively without violating the error bound which is k. where 1 i n and 1 j m. = 1 if there exists a suffix A of T 1,i such that d(A,P 1,j ) k. = 0 otherwise.

26 26 If t i =p j, and Otherwise, Example: T:aabaacaabacab, P:aabac and k=1. Consider i=6 and j=5. S=T 1,6 =aabaac and P 1,5 =aabac t 6 =p 5 =c and R 1 (5,4)=1. There exists a suffix A of S such that d(A,P) 1. T:P:T:P: aabaa aaba i j i-1 c c j-1

27 27 if t ip j and otherwise. T:P:T:P: aabac b i j i-1 Example: when k=1. T:P:T:P: aabac b b insertion i j i-1 Insertion

28 28 Example: Text = aabaacaabacab. Pattern = aabac. k= a a b a a c a a b a c a b aabacaabac (1) When i=6 and j=4. t 6 =cp 4 =a, (2) When i=11 and j=4. t 11 =cp 4 =a, a a b a a c a a b a c a b aabacaabac if t ip j and otherwise.

29 29 Example: Text = aabaacaabacab, Pattern = aabac and k= a a b a a c a a b a c a b aabacaabac The insertion formula: Σ(c) &v&v aabaacaabacab aabaac insertion Σ(a) &v&v aabaacaabacab aaba insertion a a b a a c a a b a c a b aabacaabac Σ(a) Σ(b) Σ(c) *

30 30 T:P:T:P: aabac b deletion i jj-1 Example: when k=1. T:P:T:P: aabac b i jj-1 Deletion if t ip j and otherwise.

31 31 Example: Text = aabaacaabacab. Pattern = aabac. k= a a b a a c a a b a c a b aabacaabac (1) When i=6 and j=4. t 6 =cp 4 =a, (2) When i=3 and j=4. t 3 =bp 4 =a, a a b a a c a a b a c a b aabacaabac if t ip j and otherwise.

32 32 Example: Text = aabaacaabacab, Pattern = aabac and k= a a b a a c a a b a c a b aabacaabac The deletion formula: a a b a a c a a b a c a b aabacaabac Σ(b) &v&v aabaacaabacab aab deletion Σ(b)&v&v aabaacaabacab aaba deletion Σ(a) Σ(b) Σ(c) *

33 33 T:P:T:P: aabac a b T:P:T:P: b substitution b i j i-1 j-1 i j i-1 j-1 Example: when k=1. Substitution if t ip j and otherwise.

34 34 Example: Text = aabaacaabacab. Pattern = aabac. k= a a b a a c a a b a c a b aabacaabac (1) When i=6 and j=4. t 6 =cp 4 =a, (2) When i=5 and j=5. t 3 =bp 4 =a, a a b a a c a a b a c a b aabacaabac if t ip j and otherwise.

35 35 Example: Text = aabaacaabacab, Pattern = aabac and k= a a b a a c a a b a c a b aabacaabac The substitution formula: a a b a a c a a b a c a b aabacaabac &v&v aabaacaabacab aab aabaacaabacab cab &v&v aabaacaabacab aabac aabaacaabacab aabaa Σ(a) Σ(b) Σ(c) * Σ(b) Σ(a)

36 36 The algorithm which contains insertion, deletion and substitution allows k error. In the physical meaning, we just combine the formulas of insertion, deletion and substitution. So now, we combine the three formula using the or operation as follows: Initially, insertion deletion substitution The insertion : The deletion : The substitution : t i =p j and

37 37 Example: Text = aabaacaabacab pattern = aabac k= aabacaabac a a b a a c a a b a c a b &v&v a a b a a c a a b a c a b aabacaabac aabaacaabacab aabac for deletion Σ(a) Σ(b) Σ(c) * Σ(a)

38 38 Example: Text = aabaacaabacab pattern = aabac k= aabacaabac a a b a a c a a b a c a b &v&v aabaacaabacab aabaa a a b a a c a a b a c a b aabacaabac for insertion Σ(a) Σ(b) Σ(c) * Σ(a)

39 39 Example: Text = aabaacaabacab pattern = aabac k= aabacaabac a a b a a c a a b a c a b &v&v aabaacaabacab aab a a b a a c a a b a c a b aabacaabac aabaacaabacab aac for substitution Σ(a) Σ(b) Σ(c) * Σ(c)

40 a a b a a c a a b a c a b aabacaabac Example: Text = aabaacaabacab pattern = aabac k=1 Σ(a) Σ(b) Σ(c) *

41 41 Transposition Example: A=badec and B=bdace. If transposition is allowed, we may transpose B as follow: B=bdace A=badec Define for transposition: if& t i =p j-1 & t i-1 =p j otherwise.

42 42 Initially, We combine the operations of insertion, deletion, substitution and transposition for k error. The formula is as follows: insertion deletion substitution t i =p j and transposition V V V V V

43 43 Example: T=abcbcaccbb, P=acbcb and k= acbcbacbcb a b c b c a c c b b acbcbacbcb a b c b c a c c b b & v the original partial text: abc acb transpose bc to be cb b a c & a bc bcaccbb a cb a bc bcaccbb a bc k 1k 0 So we can tackle this within k 1. + Σ(c) Σ(b) Σ(c)>>1 Σ(a) Σ(b) Σ(c) *

44 44 Example: T=abcbcaccbb, P=acbcb and k= acbcbacbcb a b c b c a c c b b acbcbacbcb a b c b c a c c b b & v the original partial text: transpose cb to be bc & ccb cbc c c b abcbcac cb b ac bc a a a k 1k 0 + So we can tackle this within k 1. abcbcac cb b ac cb Σ(b) Σ(c) Σ(b)>>1 Σ(a) Σ(b) Σ(c) *

45 45 Example: T=abcbcaccbb, P=acbcb and k= acbcbacbcb a b c b c a c c b b acbcbacbcb a b c b c a c c b b & v & a bc bcaccbb a cb the original partial text: abc acb transpose bc to be cb b a c k 1 So we can tackle this within k 2. + a bc bcaccbb a bc Σ(c) Σ(b) Σ(c)>>1 Σ(a) Σ(b) Σ(c) *

46 46 Example: T=abcbcaccbb, P=acbcb and k= acbcbacbcb a b c b c a c c b b acbcbacbcb a b c b c a c c b b & v the original partial text: bcb bbc transpose cb to be bc c b b & a a a ab cb caccbb ac bc ab cb caccbb ab bc k 1 + So we can tackle this within k 2. Σ(b) Σ(c) Σ(b)>>1 Σ(a) Σ(b) Σ(c) *

47 47 Example: T=abcbcaccbb, P=acbcb and k= acbcbacbcb a b c b c a c c b b acbcbacbcb a b c b c a c c b b & v & the original partial text: cbc ccb transpose cb to be bc b c c abc bc accbb c bc abc bc accbb a cb k 1 + So we can tackle this within k 2. Σ(c) Σ(b) Σ(c)>>1 Σ(a) Σ(b) Σ(c) *

48 48 Example: T=abcbcaccbb, P=acbcb and k= acbcbacbcb a b c b c a c c b b acbcbacbcb a b c b c a c c b b & v & the original partial text: cbc ccb transpose cb to be bc b c c abc bc accbb abc bc abc bc accbb acb cb k 1 + So we can tackle this within k 2. b b b a a a Σ(c) Σ(b) Σ(c)>>1 Σ(a) Σ(b) Σ(c) *

49 49 Example: T=abcbcaccbb, P=acbcb and k= acbcbacbcb a b c b c a c c b b acbcbacbcb a b c b c a c c b b & v & the original partial text: cbc ccb transpose cb to be bc b c c abcbcac cb b ac bc k 1 + So we can tackle this within k 2. b b b a a a Σ(b) Σ(c) Σ(b)>>1 Σ(a) Σ(b) Σ(c) *

50 50 Time complexity k: error value m: the length of pattern n : the length of text w : the size of word of computer

51 51 Reference [WM92] Fast text searching: allowing errors, Wu and Udi Manber, Communications of the ACM, Vol. 35, 1992, pp

52 52 ~ Thanks for your attention ~


Download ppt "1 Bit-parallel approximate string matching algorithms with transposition Heikki Hyyrö, Journal of Discrete Algorithms, Vol. 3, 2005, pp. 215-229 Advisor:"

Similar presentations


Ads by Google