Presentation is loading. Please wait.

Presentation is loading. Please wait.

Real time pattern matching Porat Benny Porat Ely Bar-Ilan University.

Similar presentations


Presentation on theme: "Real time pattern matching Porat Benny Porat Ely Bar-Ilan University."— Presentation transcript:

1 Real time pattern matching Porat Benny Porat Ely Bar-Ilan University

2 Pattern Matching  Given a Text T and Pattern P, the problem is to find all the substring of T that equal to P. T= P=

3 Online pattern matching  We get the text character by character =P

4 Outline  Motivation  Presentation of 3 online models  Space lower bound  A black box algorithm  Exact and approximate pattern matching in the streaming model

5 Motivation …  Monitoring internet traffic

6 Motivation …  Stock market

7 Motivation..  Espionage

8 Motivation …  Viruses and malware

9 3 online models Read only memory Working memory Second m, for saving the pattern O(poly(log(m)) third 0, we can ’ t save the pattern O(poly(log(m)) First m, for saving the pattern O(m)

10 Space lower bound (deterministic)  Assume algorithm A, use o(m) space for solving the online pattern matching problem Alice Bob A s 1,s 2,s 3 …. s m S = S A Run over all the string Q = q 1,q 2, … q m. and insert Q, as the text for A. A Q Q = S match

11 A black box for online approximate pattern matching Raphaël Clifford Benny Porat Ely Porat CPM 2008

12 Black box for the First model Read only memory Working memory First m, for saving the pattern O(m)

13 Problem definition  There are a lot of offline pattern matching algorithms.  We want to find a black box algorithm, that takes most offline pattern matching algorithms and converts them to be pseudo real time. pseudo real time – take the best time of the offline algorithm, divide it by n And this is bound the time per character. Not Amortized!!

14 Result  In example, we can applied our algorithm to the flowing problem Hamming norm K-mismatch Matching under L 2 Matching under L 1 Online Convolution..

15 Exact And Approximate Pattern Matching In The Streaming Model Porat Benny Porat Ely FOCS 2009

16 solution for the third model Read only memory Working memory third 0, we can ’ t save the pattern O(poly(log(m))  Pattern Matching  Pattern Matching up to k mistake

17 It ’ s not minor!  Cache Work much faster then the Ram Now it ’ s can fit!  Anti virus on routers Researchers thought that there is a lower bound and it can't be done.

18 Randomized algorithm (RK) p m-1, …p 2,p 1, p 0 t 1,t 2,t 3, …,t i+1,t i+2, … t m,, … t n How can I calculate from without remembering t i ??? titi t m+1 All the calculation in F q

19 Streaming pattern matching P= Z Z T Signature Start signing Signature The pattern start with z, and there is no more z's in the pattern Z Signature Start signing

20 No Z P= U U T Signature Start signing Signature There is a prefix U s.t U appear only once in the pattern U Signature Start signing m =<m/2 Seek in recursion

21 No small U P= U Look on the first m/2 character They appear again somewhere U P= v v v v v v v v Prefix of v Option 1 Option 2 P= v v v v w w isn't a prefix of v and v isn't a prefix of w v=<m/2

22 Solving this case Option 2 P= v v v v w v=<m/2 Search in recursion for v, and count how many time you found it Sign on w T v v Start signing Signature v

23 Solving this case - continue Option 2 P= v v v v w v=<m/2 Search in recursion for v, and count how many time you found it Sign on w T v v Start signing Signature v Using O(log m) signatures and counters in the worst case Time = O(log m) in the worst case v v v >m/2 <m/2 Signature Start signing

24 Pattern Matching up to k mistake  1 – mismatch  Pattern Matching up to k mistake

25 Chinese Remainder Theorem  Lets n and m be two coprimes.

26 1-mismatch p 1,p 2,p 3, … p m p 1,p 3,p 5 … p 2,p 4,p 6 … p 1,p 4,p 7 … p 2,p 5,p 8 … p 3,p 6,p 9 … mod 2 mod 3

27 1-mismatch p 1,p 3,p 5 … p 2,p 4,p 6 … t 1,t 3,t 5 … t 2,t 4,t 6 … p 1,p 3,p 5 … p 2,p 4,p 6 … mod 2 p 1,p 4,p 7 … p 2,p 5,p 8 … p 3,p 6,p 9 … mod 3 Overall sum of all primes

28 1-mismatch p 1,p 3,p 5 … p 2,p 4,p 6 … t 1,t 3,t 5 … t 2,t 4,t 6 … p 1,p 3,p 5 … p 2,p 4,p 6 … mod 2

29 Problem p 1,p 3,p 5 … p 2,p 4,p 6 … t 1,t 3,t 5 … t 2,t 4,t 6 … p 1,p 3,p 5 … p 2,p 4,p 6 … mod 2 p 1,p 3,p 5 … t 2,t 4,t 6 … When we compare? For each q i we will start to compare for each alignment

30 Space complexity  For each q i we run q i time our algorithm for each alignment.  For each alignment we run again q i time for each shift.  Overall:

31 Time complexity  Each character go to just one alignment for each shift.  Overall:

32 1-mismatch  Lemma1  There is exactly one mismatch  There is exactly one subpattern in each group that not match. C.R.T

33 Pattern Matching up to k mistake  Group testing/ Random selector …

34 A black box for online approximate pattern matching Raphaël Clifford Benny Porat Ely Porat CPM 2008

35 The idea  We will split the pattern to log(m) consecutive subpattern p 1, p 2, p 3, … p m-3, p m-2, p m-1, p m pmpm p 1, p 2, p 3, … p m/2 p m-6,p m-5, p m-4,p m-3 p m-2,p m-1 P1P1 P2P2 P4P4 P m/2

36 Bring it online  Let look on subpattern with length m ’ =>P m ’ When we got to the i ’ th character of the text, to where is P m ’ align?  Conclusion 1 We need to know DIFF(P m ’,T (i-m ’,i) ) just at position i+m ’ of the text. titi pmpm p m-1 p m-2 … Pm’Pm’ … m m’m’ m ’ -1 …

37 The idea …  For each subpattren of length m ’. we partition the text to overlap substring of length 2m ’ m’m’ m’m’ m’m’ m’m’ m’m’ m’m’ 2m ’

38 The idea …  For each subpattren of length m ’ we run the offline algorithm on each partition of the text separately.  This ensure us, that we got the difference on time. titi If i=2lm ’ or 2lm ’ +m ’ for some l run the offline algorithm on the last 2m ’ character. m’m’ 2m ’ We will got all the differences for this section

39 Running Time  T(n,m)=nT(m) – the running time of the offline algorithm  For each subpattern of length m ’ We got overlap partition. total time for each subpattrn:  Total time:

40 The problem  We saw, that overall the time is good  But, 2m ’ = m 2tm ’ +m ’ m ’ = m/2P m/2 m ’ = m/2 titi 2(t+1)m ’ We must wait until the run of the offline algorithm on P m/2 and the last m character to finish, before we can return the answer for. => (m/2)T(m) time!

41 The solution  We will split the text to partition of length 1.5m ’ m’m’ m’m’ m’m’ m’m’ m’m’ 1.5m ’ m’m’

42 The solution …  The latest we will get DIFF(P m ’,T i-m ’,i ) will be at index i+m ’ /2  And by Conclusion 1, we can wait m ’ /2 character, before we will need this difference. Conclusion 1. We need to know DIFF(P m ’,T i-m ’,i ) just at position i+m ’ of the text.

43 Spreading the work  So, we can spread the work over the next m ’ /2 character. m ’ /2 P1P1 P2P2 P3P3 Work on p 1 Work on p 2 Work on p 3 Need to know the difference of P 1

44 Spreading the work …  Overall, we can spread the work for a specific subpattern equivalently between all the character of the text.  All we left to do, is to check that the running time, not change.

45 Running Time  T(n,m)=nT(m) – the running time of the offline algorithm  For each subpattern of length m ’ Now, We got overlap partition. total time for each subpattrn:  Total time for all the text: Not change!

46 Running Time …  By spreading the work we got total running time for each character

47 conclusion  We give a space lower bound for deterministic online pattern matching  We give a black box algorithm that can adapt any offline algorithm to online algorithm, using only O(m) space and take time per character.


Download ppt "Real time pattern matching Porat Benny Porat Ely Bar-Ilan University."

Similar presentations


Ads by Google