Download presentation

Presentation is loading. Please wait.

Published byAlan Shropshire Modified over 2 years ago

1
D ICTIONARY M ATCHING WITH O NE G AP Amihood Amir, Avivit Levy, Ely Porat and B. Riva Shalom 1 CPM 2014

2
CPM 2014 - M OSCOW 2 CPM 2014

3
!MIND THE GAP 3 CPM 2014

4
O UTLINE The DMG( Dictionary Matching with one Gap ) Problem Motivation Previous Work Bidirectional Suffix Trees Solution Lookup Table addition Open Problems 4 CPM 2014

5
T HE DMG P ROBLEM 5 A gapped pattern is a pattern P of the form: P 1 { 1, 1 } P 2 { 2, 2 }… P k-1 { k-1, k-1 }P k Each P j is over alphabet , { j, j } is a sequence of at least j and at most j don’t cares = @. Example: aba{3,6}cbb aba @@@cbb aba@@@@cbb aba@@@@@cbb aba@@@@@@cbb CPM 2014

6
T HE DMG P ROBLEM The DMG problem is: Preprocess: A dictionary D of d gapped patterns P 1,…, P d over alphabet . Query: A text T of length n over alphabet . Output: all locations in T where a dictionary gapped pattern ends. We focus on DMG with a single gap. 6 CPM 2014

7
7 E XAMPLE Dictionary: P 1 = aba {3,6} cbb P 2 = ab {3,6} bbac P 3 = aa {3,6} ac Query 1 2 3 4 5 6 7 8 9 10 11 text: a b a a b a c b b a c P 1,1 P 1,2 P 2,1 P 2,2 P 3,1 P 3,2 CPM 2014 First = 1≤i≤d { P i,1 } Second = 1≤i≤d { P i,2 }

8
M OTIVATION Computational Biology A renew interest due to cyber security. Network intrusion detection systems perform protocol analysis, content searching and content matching to detect harmful software. Malware may appear in several packets! 8 CPM 2014

9
P REVIOUS W ORK Gapped pattern matching problem was studied for a few decades, eg. [Myers, JACM 1992],[Navaro&Raffinot, Algorithmica 2004],[Bille&Thorup, ICALP 2009],[Bille&Thorup SODA 2010], [Morgante et al., JCB 2005], [Rahman et al., COCOON 2006], [Bille et al., TCS 2012] DMG problem not studied enough ! [Kucherov&Rosinovich,TCS 1997],[Zhang et al., IPL 2010]-no bounds on the length of the gap. 9 CPM 2014

10
B I - DIRECTIONAL SUFFIX TREES ALGORITHM 10 Gapped pattern: a b{3,6}b b a c Query: a b a a b a c b b a c CPM 2014

11
B I - DIRECTIONAL SUFFIX TREES ALGORITHM Idea: view as [Amir et al., JAL 2000] 11 Gapped patterns: P 1 = a b a{3,6}a b a c P 2 = a b a{3,6}b b a P 3 = a b{3,6}b a a Query: a b a a b a c b b a c Use suffix tree T S of Second Use suffix tree T F R of First R gap CPM 2014

12
B I - DIRECTIONAL SUFFIX TREES ALGORITHM For each text location l Insert t l t l +1 …t n to T S (the node h) to find labels on the path to h. For f= l - -1 to l - -1 Insert t f t f-1 …t 1 to T F R (the node g) to find labels on the path to g. Output intersection (for end locations). 12 Finds P i,2 starting at location l. Finds P i,1 ending at location f. CPM 2014

13
13 B I - DIRECTIONAL SUFFIX TREES ALGORITHM - I NTERSECTION Patterns: {(1,4),(2,9),(3,7),…,(6,5),…} TSTS TFRTFR Range: [1,9] Range: [2,7] CPM 2014 3 6 9 1 g 5 7 2 h

14
14 B I - DIRECTIONAL SUFFIX TREES ALGORITHM ( CONTINUED ) Intersection via range queries: Range: [2,7] Range: [1,9] (1,4) (3,7) (6,5) (8,8) (2,9) CPM 2014

15
T IME & S PACE Preprocessing Time: Dictionary segments suffix tree and reverse suffix tree: O(|D|) Preprocessing grid for range queries: O(d log d). [Chan et al., SoCG 2011] Preprocessing Space: Dictionary segments suffix tree and reverse suffix tree: O(|D|) Space for grid: O(d log d). [Chan et al., SoCG 2011] 15 CPM 2014

16
T IME & S PACE Query Time: For each end text location, we try every gap size: a factor of . The number of range queries is the number of vertical paths in a given path: O(log 2 min{d, log |D|}). A range query costs: O(log log d+occ). [Chan et al., SoCG 2011] Total: O(n( )log log d log 2 min{d, log |D|}+occ). 16 CPM 2014 3 6 9 1 g

17
17 L OOKUP T ABLE ALGORITHM Idea: Instead of using range queries in a grid to compute the intersection, we use a pre-computed lookup table. Enables intersection in O(occ) time. Total query time becomes: O(n( )+occ). CPM 2014

18
18 L OOKUP T ABLE ALGORITHM Inter[g,h] = all i s.t. P i,1 R appears on the path from the root of T F R till node g and P i,2 appears on the path from the root of T S till node h. CPM 2014 3 6 9 1 5 7 2 P 1 =(1,4), P 2 =(2,9), P 3 =(3,7), P 4 =(3,2), …,P 6 =(6,5), P 7 =(9,6) Inter[ 3, 5 ]= {4} g h

19
19 L OOKUP T ABLE ALGORITHM Inter[g,h] = all i s.t. P i,1 R appears on the path from the root of T F R till node g and P i,2 appears on the path from the root of T S till node h. CPM 2014 3 6 9 1 5 7 2 P 1 =(1,4), P 2 =(2,9), P 3 =(3,7), P 4 =(3,2), …,P 6 =(6,5), P 7 =(9, 6) Inter[ 3, 5 ]= {4} Inter[ 3, 7 ]= {3,4} g h

20
20 L OOKUP T ABLE ALGORITHM Inter[g,h] = all i s.t. P i,1 R appears on the path from the root of T F R till node g and P i,2 appears on the path from the root of T S till node h. CPM 2014 3 6 9 1 5 7 2 P 1 =(1,4), P 2 =(2,9), P 3 =(3,7), P 4 =(3,2), …, P 6 =(6,5), P 7 =(9,6) Inter[ 3, 5 ]= {4} Inter[ 3, 7 ]= {3,4} Inter[ 6, 7 ]= {3,4,6} g h

21
21 L OOKUP T ABLE ALGORITHM Inter[g,h] = all i s.t. P i,1 R appears on the path from the root of T F R till node g and P i,2 appears on the path from the root of T S till node h. CPM 2014 3 6 9 1 5 7 2 P 1 =(1,4), P 2 =(2,9), P 3 =(3,7), P 4 =(3,2), …,P 6 =(6,5), P 7 =(9,6) Inter[ 3, 5 ]= {4} Inter[ 3, 7 ]= {3,4} Inter[ 6, 7 ]= {3,4,6} Inter[ 9, 7 ]= {3,4,6} g h

22
22 L OOKUP T ABLE ALG. CPM 2014 3 6 9 1 5 7 2 P 1 =(1,4), P 2 =(2,9), P 3 =(3,7), P 4 =(3,2), …,P 6 =(6,5),P 7 =(9,6) Inter[3,5]= {4} Inter[3,7]= {3,4} Inter[6,7]= {3,4,7} 1 3 : 1 9 6 …. 2 5 6 7 2 : -- 4 1 6 3 4 7

23
23 L OOKUP T ABLE ALGORITHM Preprocessing: Time: Table can be computed using DP in time O(d 2 ovr + |D|) where ovr is the number of subpatterns including other subpattern as a prefix or suffix. Space: O(d 2 + |D|). Query time: O(n( )+occ). CPM 2014

24
O UR R ESULTS Preprocessing time: O(d log d + |D|). Space: O(d log d + |D|). Query time: O(n( )log log d log 2 (min{d, log |D|} )+occ). Preprocessing time: O(d 2 ovr + |D|). Space: O(d 2 + |D|). Query time: O(n( )+occ). 24 Bi-directional suffix trees & range queries Bi-directional suffix trees & Lookup table CPM 2014

25
O PEN P ROBLEMS Generalizing to k gaps Reducing the dependency on the size Scalability to different gap bounds in the dictionary Online algorithm 25 CPM 2014

26
T HANK Y OU ! 26 CPM 2014

Similar presentations

OK

Advanced Data Structures Lecture 8 Mingmin Xie. Agenda Overview Trie Suffix Tree Suffix Array, LCP Construction Applications.

Advanced Data Structures Lecture 8 Mingmin Xie. Agenda Overview Trie Suffix Tree Suffix Array, LCP Construction Applications.

© 2017 SlidePlayer.com Inc.

All rights reserved.

Ads by Google

Ppt on musical instruments in hindi Ppt on any one mathematician blaise Ppt on standing order act 1946 Ppt on national education day of bangladesh Ppt on power grid control through pc Ppt on home automation using dtmf Ppt on time management in classroom Ppt on animal food habits Ppt on formal education will make you a living Ppt on planet jupiter