Presentation is loading. Please wait.

Presentation is loading. Please wait.

Function Matching Amihood Amir Yonatan Aumann Moshe Lewenstein Ely Porat Bar Ilan University.

Similar presentations


Presentation on theme: "Function Matching Amihood Amir Yonatan Aumann Moshe Lewenstein Ely Porat Bar Ilan University."— Presentation transcript:

1 Function Matching Amihood Amir Yonatan Aumann Moshe Lewenstein Ely Porat Bar Ilan University

2 Prog.c int a,b; a=1; a = g(a)*5+f(a); b=2; a = func(a,b); a = a*g(b); b=1; b = g(b)*5+f(b); …. Baker ’ s Parameterized Matching

3 Prog.c int a,b; a=1; a = g(a)*5+f(a); b=2; a = func(a,b); a = a*g(b); b=1; b = g(b)*5+f(b); …. Baker ’ s Parameterized Matching c=1; c = g(c)*5+f(c); Pattern Baker ’ s work pdup dupstat psearch SICOMP 1997 JCSS 1996

4 Two dimensional parameterized matching pattern ‘ A horse is a horse, it ain ’ t make a difference what color it is ’ John Wayne

5 Input P = p 1 …p m over alphabet T = t 1... t n over alphabet Output: locations i of T, for which a bijection : exists s.t. (P) = (p 1 ) (p 2 )… (p m ) = t i …t i+m-1 Parameterized Matching

6 One dimensional Baker 1996, JCSS- Suffix Trees Baker 1997, SICOMP- Boyer Moore Amir, Farach, Muthu 1995, IPL- Knuth-Morris-Pratt Two dimensional Regular methods fail !!

7 Function Matching Input: P = p 1 …p m over alphabet T = t 1... t n over alphabet Output: locations i of T, where f: exists s.t. f(P) = f(p 1 )f(p 2 )…f(p m ) = t i …t i+m-1

8 Input: P = p 1 …p m over alphabet T = t 1... t n over alphabet P = h e h a e h T = a b c b a c b a d a b d a d d a d Function Matching Output: locations i of T, where f: exists s.t. f(P) = f(p 1 )f(p 2 )…f(p m ) = t i …t i+m-1

9 Input: P = p 1 …p m over alphabet T = t 1... t n over alphabet P = h e h a e h T = a b c b a c b a d a b d a d d a d f(h) = b f(e) = c f(a) = a Function Matching Output: locations i of T, where f: exists s.t. f(P) = f(p 1 )f(p 2 )…f(p m ) = t i …t i+m-1

10 Input: P = p 1 …p m over alphabet T = t 1... t n over alphabet P = h e h a e h T = a b c b a c b a d a b d a d d a d f(h) = a f(e) = d f(a) = b Function Matching Output: locations i of T, where f: exists s.t. f(P) = f(p 1 )f(p 2 )…f(p m ) = t i …t i+m-1

11 Input: P = p 1 …p m over alphabet T = t 1... t n over alphabet P = h e h a e h T = a b c b a c b a d a b d a d d a d f(h) = d f(e) = a f(a) = d Function Matching Output: locations i of T, where f: exists s.t. f(P) = f(p 1 )f(p 2 )…f(p m ) = t i …t i+m-1

12 Input: P = p 1 …p m over alphabet T = t 1... t n over alphabet P = h e h a e h T = a b c b a c b a d a b d a d d a d f(h) = ?? no match ! Function Matching Output: locations i of T, where f: exists s.t. f(P) = f(p 1 )f(p 2 )…f(p m ) = t i …t i+m-1

13 Function Matching vs. Parameterized Matching P p-matches t i …t i+m-1 iff 1. P f-matches t i …t i+m-1 and 2. # of symbols in t i …t i+m-1 = # of symbols in P P = h e h a e h h e h a e h T = a b c b a c b a d a b d a d d a d f(h) = d f(e) = a f(a) = d f(h) = b f(e) = c f(a) = a

14 Na ï ve Algorithm At each location i of text T check if pattern f-matches Check For each letter ‘a’ in pattern Are elements aligned with the pattern ‘a’s the same? no? declare ‘no match’ All letters “OK” – declare ‘match’ Running time: O(nm), where m = |P| and n = |T|

15 Function Matching with Don ’ t Cares Input: P = p 1 …p m over alphabet {?} T = t 1... t n over alphabet P = h e ? ? e h T = a b c b a c b c d b c d a d d a d Output: locations i of T, where f: exists s.t. f(P) = f(p 1 )f(p 2 )…f(p m ) = t i …t i+m-1, f(?) - wildcard

16 Why do we need don ’ t cares? Pattern Text

17 Linearize Text and Pattern Text Pattern … Line 1Line 2 T =

18 Linearize Text and Pattern Text Pattern … Line 5Line 6 T= … P = ???????????????????????? Line 1Line 2 n n m m n-m

19 t 1 t 2 t 3 t 4... t n-2 t n-1 t n p m p m-1... p 2 p 1 p 1 t 1 p 1 t 2... p 1 t n-2 p 1 t n-1 p 1 t n p 2 t 1 p 2 t 2 p 2 t 3... p 2 t n-2 p 2 t n-1 p 2 t n p 3 t 1 p 3 t 2 p 3 t 3 p 3 t 3... p 3 t n-1 p 3 t n p m t 1... p m t m p m t m+1.. p m t n-1 p m t n..... Polynomial Multiplication - Convolutions... Running time: O(n log m)

20 t 1 t 2 t 3 t 4... t n-2 t n-1 t n p m p m-1... p 2 p 1 p 1 t 1 p 1 t 2... p 1 t n-2 p 1 t n-1 p 1 t n p 2 t 1 p 2 t 2 p 2 t 3... p 2 t n-2 p 2 t n-1 p 2 t n p 3 t 1 p 3 t 2 p 3 t 3 p 3 t 4... p 3 t n-1 p 3 t n p m t 1... p m t m p m t m+1.. p m t n-1 p m t n..... Convolutions: Fischer-Patterson [1974] p 1 p 2 p 3 p 4... p m...

21 t 1 t 2 t 3 t 4... t n-2 t n-1 t n p m p m-1... p 2 p 1 p 1 t 1 p 1 t 2... p 1 t n-2 p 1 t n-1 p 1 t n p 2 t 1 p 2 t 2 p 2 t 3... p 2 t n-2 p 2 t n-1 p 2 t n p 3 t 1 p 3 t 2 p 3 t 3 p 3 t 4... p 3 t n-1 p 3 t n p m t 1... p m t m p m t m+1.. p m t n-1 p m t n..... p 1 p 2 p 3 p 4... p m... Convolutions: Fischer-Patterson [1974]

22 How does this help for Function Matching? beneath each symbol from the pattern alphabet all text characters must be the same The property that needs to be checked is:

23 T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e P R = e ? h e a h e h Example -

24 h in P vs. a in T T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e P R = e ? h e a h e h Example - T a = 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 P R h = 0 0 1 0 0 1 0 1

25 h - aT a = 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 P R h = 0 0 1 0 0 1 0 1 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 2 0 2 1 0 3 0 1 2 0 1 2 0 1 1 0 1 T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e P R = e ? h e a h e h Example -

26 h - aT a = 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 P R h = 0 0 1 0 0 1 0 1 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 1 1 0 2 0 2 1 0 3 0 1 2 0 1 2 0 1 1 0 1 T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e P R = e ? h e a h e h Example - h e h a e h ? e

27 h - a 0 0 1 0 0 1 1 1 0 2 0 2 1 0 3 0 1 2 0 1 2 0 1 1 0 1 => in O(n log m) time!! T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e P R = e ? h e a h e h Example - T a = 1 0 0 0 1 0 0 1 0 1 0 0 1 0 0 1 0 0 1 P R h = 0 0 1 0 0 1 0 1

28 h - a 1 0 2 0 2 1 0 3 0 1 2 0 => in O(| | n log m) time!! h - b 0 3 0 1 1 1 1 0 1 0 1 0 h - c 2 0 1 2 0 1 1 0 1 0 0 0 h - d 0 0 0 0 0 0 1 0 1 2 0 3 0 1 0 0 0 0 0 1 0 0 0 1Match(h) T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e P R = e ? h e a h e h Example -

29 In general - the Algorithm For each character ‘a’ in create P a For each character ‘b’ in create T b For all P a and T b multiply them and construct Match(a) for each ‘a’ in Announce each location i of T as a ‘match’ if Match(a)[i] = 1 for all a’s in P => in O(| || | n log m) time.

30 Improvement Lemma: Let a 1,..., a k, then k iff for all i,j, a i = a j Idea: Let’s encode text with numbers for symbols and encode pattern to compute their sum and separately their sum of squares.

31 Improvement Lemma: Let a 1,..., a k, then k iff for all i,j, a i = a j T # = 1 2 3 2 13 2 1 3 1 2 4 1 4 4 1 4 5 1 T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e P e = 0 1 0 0 1 0 0 1 Example: Compute sum of text char ’ s beneath “ e ”

32 Improvement Lemma: Let a 1,..., a k, then k iff for all i,j, a i = a j T # 2 = 1 4 9 4 1 9 4 1 9 1 4 16 1 16 16 1 16 25 1 T # = 1 2 3 2 1 3 2 1 3 1 2 4 1 4 4 1 4 5 1 T = a b c b a c b a c a b d a d d a d e a P = h e h a e h ? e P e = 0 1 0 0 1 0 0 1 Example: Compute sum of squares beneath “ e ”

33 Improvement Lemma: Let a 1,..., a k, then k iff for all i,j, a i = a j Running Time: Two convolutions for each pattern character. O(| | n log m)

34 Can we do better for big alphabets? We have seen – 2 algorithms for Function Matching 1.O(nm) - na ï ve algorithm 2.O(| | n log m) - convolution based We will see: 1.O(n log 2 m)- randomized convolutions based 2.Lower bound of (nm) for deterministic convolutions based methods

35 Def: Def: A pattern is 2-charactered if every character appears at most twice in the pattern. Example: Example: P = a b c b c c b b P 1 = a 1 b 1 c 1 b 1 c 1 c 2 b 2 b 2 (even pairs) P 2 = a 1 b 1 c 1 b 2 c 2 c 2 b 2 b 3 (odd pairs) Lemma: Let P be a pattern and T a text. 2-charactered patterns P 1 and P 2 s.t. at loc. i of T P f-matches iff P 1 and P 2 f-match.

36 Situation: Situation: An algorithm for Function Matching with 2-charactered patterns a general algorithm for Function Matching. So, all that needs to be checked is that: each pair in P has equal text symbols beneath it.

37 1.For each character: 1.For each character: - a in T, randomly choose r a in {0, 1} - relace all a ’ s in T with r a - get T ’ - b in P, randomly choose s b in {1,2} - set first b to be s b and the second b to be -s b - get P ’ 2.Convolve T ’ and P ’ R 3.For each location i, for which T ’ *P ’ R [i] equals 0 for the convolution declare a ‘ match ’ New Randomized Algorithm

38 Example: P = v q v u q u ? s T = a b a a b a b a c a b d a b c b d b a f(a) = f(b) = f(c) = f(d) = 10011001 g(v) = g(q) = g(u) = 268268 f(T) = 1 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 0 1 0 1 g(P) = 2 6 – 2 8 – 6 – 8 0 0 2+0 – 2+8+0 – 8+0+0 = 0 h(v) = a h(q) = b h(u) = a h(s) = a

39 Example: P = v q v u q u ? s T = a b a a b a b a c a b d a b c b d b a f(a) = f(b) = f(c) = f(d) = 10011001 g(v) = g(q) = g(u) = 268268 f(T) = 1 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 0 1 0 1 g(P) = 2 6 – 2 8 – 6 – 8 0 0 0+6 – 2+0-6+0+0+0 = -2

40 Example: P = v q v u q u ? s T = a b a a b a b a c a b d a b c b d b a f(a) = f(b) = f(c) = f(d) = 10011001 g(v) = g(q) = g(u) = 268268 f(T) = 1 0 1 1 0 1 0 1 0 0 1 0 1 1 0 0 0 1 0 1 g(P) = 2 6 – 2 8 – 6 – 8 0 0 0= 2+6+0+0+0-8+0+0

41 Running Time: Running Time: O(nk log m) with probability 2 -k O(n log 2 m) with probability 1/m if P f-matches at location i of T then f(T)*g(P) R [i+m-1] is trivially always equal to 0 if P does not f-match at location i of T then for each convolution, f(T)*g(P) R [i+m-1], equals 0 with probability ½ with k rounds of amplification the probability is ( ½ ) k Correctness:

42 Limitation of the Convolutions Model Can we do the same deterministically? No! To show this we use the model of communication complexity Alice Bob x f(x,y) y

43 Limitation of the Convolutions Model Known: Known: for x,y in {0,1} k the communication complexity of equals(x,y) is (k) Take pattern P = a 1 a 2 a 3 … a m a 1 a 2 a 3 … a m, where i j a i a j Given a collection of convolutions { } the convolutions of location i, (g(P)*f(t))[i+m-1] = g(a j )*f(t i+j-1 ) + g(a j )*f(t i+j+m-1 ). Since we are in essence comparing t i … t i+m-1 to t i+m … t i+2m-1 we get the equal information from the convolution. This is lower bounded by (m) for each location, In general (nm)

44 Another Application for Function Matching Protein Folding detection: 1 2 34 5 6 7 8 9 10 7 8 9 1 2 3 P = 1 2 3 4 5 6 7 8 9 10 10 9 8 7 6 5 4 11 12 … 12 11 3 2 1

45 Questions 1.Can Function Matching be solved deterministically in o(nm) time for big alphabets? 2.Are there special cases of Function Matching that are easier (other than Parameterized Matching and other trivial ones)? 3.Does 2-dimensional Parameterized Matching need to be solved with function matching?


Download ppt "Function Matching Amihood Amir Yonatan Aumann Moshe Lewenstein Ely Porat Bar Ilan University."

Similar presentations


Ads by Google