Presentation is loading. Please wait.

Presentation is loading. Please wait.

Permuted Scaled Matching Ayelet Butman Noa Lewenstein Ian Munro.

Similar presentations


Presentation on theme: "Permuted Scaled Matching Ayelet Butman Noa Lewenstein Ian Munro."— Presentation transcript:

1 Permuted Scaled Matching Ayelet Butman Noa Lewenstein Ian Munro

2 Scaled matching Input: Text T=t 1,…,t n Pattern P=p 1,…,p m Scaling:P [i] =p 1 …p 1 p 2 …p 2 … p m …p m Output:All text-locations j where  i s.t. p [i] matches at j. iii

3 Scaled matching c baa b bcc a a a a b b a bc b a b b cc a aaa

4 Permutation matching Input: Text T=t 1,…,t n Pattern P=p 1,…,p m Permutation (of pattern): p π(1) p π(2) …p π(m) where π is a permutation on [m]. Output:All text-locations j where a pattern permutation occurs.

5 b aca b b a c b b a bc b a c baa a b b Permutation matching

6 b aca b b a c b b a bc b a b a ca b b a

7 Easy to solve in O(n) time (linear size alphabets). The pattern matching version of Jumbled Indexing.

8 Scaled permutation matching Match: First Permutation and then Scaling.

9 Scaled permutation matching c baa a abb c a c a b b a bc b a a a bb c caa

10 Match: First Permutation and then Scaling. B-Eres-Landau[04]: Scaled Permutation Matching in O(n) time. Open: Can one do the reverse efficiently, i.e. scaling and then permutation. Hard ? How can we solve? First - Naïve algorithm

11 Permuted scaled matching Input: Text T=t 1,…,t n Pattern P=p 1,…,p m Output:All text-locations j where exist permuted scaled matching

12 Permuted scaled matching c baa b caa b c a a b b a bc b a b b cc a aaa

13 Naïve algorithm aabcaaaccbacb aacb P= T=

14 Naïve algorithm aabcaaaccbacb aacb P= T= k=1

15 Naïve algorithm aabcaaaccbacb aacb P= T= k=2

16 Naïve algorithm 1.Construct a table R of size (n+1)×|Σ| such that R(i,j)=#σ j (T[0, i]) for i ≥ 0 and R(−1, j) = 0. 2.For every 0 ≤ i < j ≤ n−1 such that j −i+ 1 = km for some natural number k ≥ 1 do: a.Let r(l) =( R(j,l)−R(i−1,l))/# σ l(P). b.if r(l) = k for each l, 0 ≤ l ≤ |Σ| − 1, then announce that i is a k-scaled appearance.

17 Naïve algorithm aabcaaaccbacb aacbP= T=

18 Naïve algorithm aabcaaaccbacbT=

19 Naïve algorithm aabcaaaccbacb 1102011345867912 T=

20 Naïve algorithm aabcaaaccbacb 1102011345867912 a b c T=

21 Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 T=

22 Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 T=

23 Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 T=

24 Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 aT=

25 Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 T=

26 Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 3 1 1 3 2 1 3 2 2 4 2 2 4 3 2 4 4 2 5 4 2 6 4 2 6 4 3 T=

27 Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 3 1 1 3 2 1 3 2 2 4 2 2 4 3 2 4 4 2 5 4 2 6 4 2 6 4 3 aacbP= T=

28 Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 3 1 1 3 2 1 3 2 2 4 2 2 4 3 2 4 4 2 5 4 2 6 4 2 6 4 3 aacb1P= T= K=

29 Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 3 1 1 3 2 1 3 2 2 4 2 2 4 3 2 4 4 2 5 4 2 6 4 2 6 4 3 aacb1 #a=2 #b=#c=1 P= T= K=

30 Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 3 1 1 3 2 1 3 2 2 4 2 2 4 3 2 4 4 2 5 4 2 6 4 2 6 4 3 aacb1 #a=2 #b=#c=1 P= T= K=

31 Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 3 1 1 3 2 1 3 2 2 4 2 2 4 3 2 4 4 2 5 4 2 6 4 2 6 4 3 aacb1 #a=2 #b=#c=1 P= T= K=

32 Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 3 1 1 3 2 1 3 2 2 4 2 2 4 3 2 4 4 2 5 4 2 6 4 2 6 4 3 aacb1K= #a=2 #b=#c=1 P= T=

33 Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 3 1 1 3 2 1 3 2 2 4 2 2 4 3 2 4 4 2 5 4 2 6 4 2 6 4 3 aacb1 #a=2 #b=#c=1 K=P= T=

34 Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 3 1 1 3 2 1 3 2 2 4 2 2 4 3 2 4 4 2 5 4 2 6 4 2 6 4 3 aacb2 #a=2 #b=#c=1 K=P= T=

35 Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 3 1 1 3 2 1 3 2 2 4 2 2 4 3 2 4 4 2 5 4 2 6 4 2 6 4 3 aacb2 #a=2 #b=#c=1 K=P= T=

36 Naïve algorithm aabcaaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 1 0 1 1 1 1 1 a 2 1 1 3 1 1 3 2 1 3 2 2 4 2 2 4 3 2 4 4 2 5 4 2 6 4 2 6 4 3 aacb2 #a=2 #b=#c=1 K=P= T=

37 Naïve algorithm

38 Better? Properties

39 Mod-equivalent Mod-Equivalency: i and j are Mod-Equivalent if for every character σ (with frequency c in P): # σ in T[0,i] mod c = # σ in T[0,j] mod c

40 Mod-equivalent cbbccaaccbacb 1102011345867.912 a b c 0 0 0 0 0 1 0 0 2 0 1 2 1 a 1 2 1 2 2 1 2 3 1 2 3 2 3 3 2 3 4 2 3 5 2 3 5 3 3 6 3 3 6 4 aacbP= #a=2 #b=#c=1 T=

41 Mod-equivalent cbbccaaccbacb 113 a b c a 1 2 1 3 6 3 aacb #a=2 #b=#c=1 P= T=

42 Mod-equivalent cbbccaaccbacb 113 a b c a 1 2 1 3 6 3 aacb a #a=2 P= T=

43 Mod-equivalent cbbccaaccbacb 113 a b c a 1 2 1 3 6 3 aacb a #a=2 P= T=

44 Mod-equivalent cbbccaaccbacb 113 a b c a 1 2 1 3 6 3 aacb #a=2 P= T=

45 Mod-equivalent cbbccaaccbacb 113 a b c a 1 2 1 3 6 3 aacb #b=1 P= T=

46 Mod-equivalent cbbccaaccbacb 113 a b c a 1 2 1 3 6 3 aacb #c=1 P= T=

47 Mod-equivalent cbbccaaccbacb 113 a b c a 1 2 1 3 6 3 aacbP= T=

48 Mod-equivalent cbbccaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 0 0 2 0 1 2 1 a 1 2 1 2 2 1 2 3 1 2 3 2 3 3 2 3 4 2 3 5 2 3 5 3 3 4 3 3 4 4 aacb #a=2 P= T=

49 Mod-equivalent cbbccaaccbacb 102 a b c 0 1 2 1 a 3 5 3 aacb #a=2 P= T=

50 Mod-equivalent cbbccaaccbacb 102 a b c 0 1 2 1 a 3 5 3 aacb #a=2 P= T=

51 Mod-equivalent cbbcaaaccbaab 113 a b c a 1 2 1 5 4 3 aacbP= T=

52 Equal-quotients

53 Equal-quotients cbbcaaaccbaab 1102011345867912 a b c 0 0 0 0 0 1 0 0 2 0 1 2 1 a 1 2 1 2 2 1 2 3 1 2 3 2 3 3 2 3 4 2 4 4 2 4 4 3 5 4 3 5 4 4 aacbP= T=

54 Equal-quotients cbbcaaaccbaab 113 a b c a 1 2 1 5 4 3 aacbP= T=

55 Equal-quotients cbbcaaaccbaab 113 a b c a 1 2 1 5 4 3 aacbP= T=

56 Equal-quotients cbbcaaaccbaab 113 a b c a 1 2 1 5 4 3 aacbP= T=

57 Equal-quotients cbbcaaaccbaab 113 a b c a 1 2 1 5 4 3 aacbP= T=

58 Equal-quotients cbbccaaccbacb 1102011345867912 a b c 0 0 0 0 0 1 0 0 2 0 1 2 1 a 1 2 1 2 2 1 2 3 1 2 3 2 3 3 2 3 4 2 3 5 2 3 5 3 3 6 3 3 6 4 aacbP= T=

59 Equal-quotients cbbccaaccbacb 113 a b c a 1 2 1 3 6 3 aacbP= T=

60 Equal-quotients cbbccaaccbacb 113 a b c a 1 2 1 3 6 3 aacbP= T=

61 Equal-quotients aaaabbaaaaaa b 115203…1013111214 a b 0 0 1 0 2 0 3 0 3 1 … … 10 1 2 3 4 5 6 aaa bbb bbb P= T=

62 Equal-quotients aaaabbaaaaaa b 15 a b 3 … 3 1 … … 10 6 aaa bbb bbb P= T=

63 Theorem T[i, j] is a permuted k-scaling of P for some k iff 1. Locations i and j of T are mod-equivalent 2.Locations i and j of T satisfy the equal-quotients property for each pair of characters

64 ji a b c d e f a-b b-c c-d d-e e-f Mod- Equivalent Equal- quotients

65 ji a b c d e f a-b b-c c-d d-e e-f Mod- Equivalent Equal- quotients

66 cbbccaaccbacb a b c a a-b b-c T= bcaaaca P= 28 0 0 0 0 0 0 0 0

67 Putting it together

68 ji a b c d e f a-b b-c c-d d-e e-f Mod- Equivalent Equal- quotients 012 Build a table R of size n×2|Σ|+1

69 ji012 Each vector is associated with its location i

70 ji012

71 irir isis i1i1 i2i2 i3i3 Sort the vectors using Radix sort

72 irir isis i1i1 i2i2 i3i3 Group the vectors into equivalence classes according to their prefix of length 2|Σ|−1.

73 irir isis i1i1 i2i2 i3i3 For each equivalence class containing locations i 1, i 2,..., i l announce appearances T[i + 1, j] for each i,j ∈ {i 1, i 2,..., i l }, s.t. i < j.

74 Putting it all together

75 Putting it together 3. Each vector is associated with its location i. 4. Sort the vectors using Radix sort. 5. Group the vectors into equivalence classes according to their prefix of length 2|Σ|−1. 6. For each equivalence class containing locations i 1, i 2,..., i l announce appearances T[i + 1, j] for each i,j ∈ {i 1, i 2,..., i l }, s.t. i < j.

76 Theorem The running time of the permuted scaled matching algorithm is: O(n|Σ|+occ).

77 Output representation The output of the algorithm which we denoted occ may be as large as O(n 2 /m). Example: o Text a n. o Pattern a m.

78 Output representation to reduce large number of appearances set output to shortest match at each text location i. abbcaaaaabaab abaP= T=

79 Output representation to reduce large number of appearances set output to shortest match at each text location i. abbcaaaaabaab abaP= T=

80 Claim Let i < j < h be three text locations. Assume T[i, j] is a permuted scaled appearance of P. Then T[i, h] is a permuted scaled appearance of P iff T[j + 1, h] is a permuted scaled appearance of P. abbcaaaaabaab abaP= T=

81 Claim Let i < j < h be three text locations. Assume T[i, j] is a permuted scaled appearance of P. Then T[i, h] is a permuted scaled appearance of P iff T[j + 1, h] is a permuted scaled appearance of P. abbcaaaaabaab abaP= T=

82 Claim Let i < j < h be three text locations. Assume T[i, j] is a permuted scaled appearance of P. Then T[i, h] is a permuted scaled appearance of P iff T[j + 1, h] is a permuted scaled appearance of P. abbcaaaaabaab abaP= T=

83 Putting it all together

84 Putting it together 3. Each vector is associated with its location i. 4. Sort the vectors using Radix sort. 5. Group the vectors into equivalence classes according to their prefix of length 2|Σ|−1. 6. For each entry q’ containing linked list i 1, i 2,..., i l announce appearances T[i r +1,i r+1 ] for each i r ∈ {i 1, i 2,..., i l }.

85 Running Time Permuted Scaled Matching: The running time is: O(n|Σ|).

86 For efficiency Need to generate the vectors quickly. Need to compare vectors quickly. Idea: hash

87 Need hash on vectors that can be modified quickly if vector changes very little. Use: hash – similar to Karp-Rabin

88 i+1i a b c d e f a-b b-c c-d d-e e-f Mod- Equivalent Equal- quotients At most 1 changes At most 2 changes

89 cbbccaaccbacb 8 a b c 0 0 0 a 0 0 0 a-b b-c 0 0 0 bcaaaca 9 0 1 0 0 T= P=

90 cbbccaaccbacb 8 a b c 0 0 0 a 0 0 0 a-b b-c 0 0 0 bcaaaca 9 0 1 0 0 T= P=

91 The running time can be improved to o Deterministic O(n log |Σ|) o Randomized O(n)

92


Download ppt "Permuted Scaled Matching Ayelet Butman Noa Lewenstein Ian Munro."

Similar presentations


Ads by Google