Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fast Submatch Extraction using OBDDs Liu Yang 1, Pratyusa Manadhata 2, William Horne 2, Prasad Rao 2, Vinod Ganapathy 1 Rutgers University 1 HP Laboratories.

Similar presentations


Presentation on theme: "Fast Submatch Extraction using OBDDs Liu Yang 1, Pratyusa Manadhata 2, William Horne 2, Prasad Rao 2, Vinod Ganapathy 1 Rutgers University 1 HP Laboratories."— Presentation transcript:

1 Fast Submatch Extraction using OBDDs Liu Yang 1, Pratyusa Manadhata 2, William Horne 2, Prasad Rao 2, Vinod Ganapathy 1 Rutgers University 1 HP Laboratories 2

2 Applications of Regular Expressions Signatures Network traffic Alerts NIDS Network intrusion detection systems (NIDS) employ regular expressions to represent attack signatures.

3 Applications of Regular Expressions (cont.) Connectors (rule set) SIEM Web security compliance security compliance Security information and event management (SIEM) systems employ regular expressions to normalize event logs generated by hardware connectors and software systems.

4 Submatch Extraction … username=(.*), hostname=(.*) … Rule set username=Bob, hostname=Foo Submatch extraction $1 = Bob, $2 = Foo

5 Signature Matching Non-deterministic finite automaton (NFAs) –Space efficient, time inefficient Deterministic finite automaton (DFAs) –Time efficient, states blow-up Recursive backtracking –Fast in general –Vulnerable to algorithmic complexity attacks

6 Motivation: Time/Space Tradeoff Space Time Ideal DFA (deterministic finite automaton) NFA (non-deterministic finite automaton) Backtracking Our approach

7 Our Contributions A novel way of annotating capturing groups, tagged-NFAs Design of a novel technique on submatch extraction (called Submatch-OBDD) –Extending Thompson’s algorithm –Using Boolean functions to represent tagged- NFAs –Using ordered binary decision diagrams (OBDDs) to improve time efficiency Evaluation and comparison with RE2 and PCRE Note: RE2 is a hybrid approach, using a mix of DFA/NFA, while PCRE uses recursive backtracking.

8 Solution Overview RegExps with capturing groups Tagged-NFAs Boolean Representations OBDD representations

9 NFA Representation of RegExps E = a*aa Current state (x)Input symbol (i)Next state (y) 1a1 1a2 2a3 NFA of regexp “a*aa” Transition table T(x,i,y)

10 Submatch Tagging: tagged NFAs E = (a*)aa Current state (x)Input symbol (i)Next state (y)Output tags (t) 1a1{t 1 } 1a2{} 2a3 Tagged NFA of “(a*)aa” with submatch tagging t 1 Extended transition table T(x,i,y,t) of the tagged NFA / t 1 Tag(E) = (a*) t aa 1

11 Match Test RegExp=(a*)aa; Input: aaaa aa a a {1}{1,2}{1,2,3} {t 1 } accept Frontier

12 Submatch Extraction aa a a {t 1 } accept {1}{1,2}{1,2,3} Frontier Any path from an accept state to a start state generates a valid assignment of submatches. $1=aa

13 Complexity of Tagged NFAs Match test: Submatch extraction: n – size of tagged NFA l – length of input string Can we make the operations faster?

14 Submatch-OBDD Representing tagged NFAs using Boolean functions –Updating frontiers in one-step using a single Boolean formula Using OBDDs to manipulate Boolean functions

15 Transitions as Boolean Functions Current state (x)Input symbol (i)Next state (y)Output tag (t) 1a1{t1} 1a2{} 2a3 T(x,i,y,t) = (1 Λ a Λ 1 Λ t1) V (1 Λ a Λ 2 Λ{}) V (2 Λ a Λ 3 Λ{}) RegExp: (a*)aa

16 Match Test using Boolean Functions {1} Λ a Λ T(x,i,y,t) (1ΛaΛ 1 Λt1) V (1ΛaΛ 2 Λ{}) {1,2} Λ a Λ T(x,i,y,t) (1ΛaΛ 1 Λ t1) V (1ΛaΛ 2 Λ{}) V (2ΛaΛ 3 Λ{}) {1,2,3} Λ a Λ T(x,i,y,t) (1ΛaΛ 1 Λt1) V (1ΛaΛ 2 Λ{}) V (2ΛaΛ 3 Λ{}) Input symbol Start states Transition table Intermediate transitions Next states Current states Accept aaaa …

17 Submatch Extraction using Boolean Functions (1 Λ a Λ 1 Λ t1) V (1 Λ a Λ 2 Λ {}) V (2 Λ a Λ 3 Λ {}) a Λ 3 Λ Accept state The last input symbol Intermediate transitions [4] 2 Λ a Λ 3 Λ {} Previous state of 3 aΛ2ΛaΛ2Λ (1 Λ a Λ 1 Λ t1) V (1 Λ a Λ 2 Λ {}) V (2 Λ a Λ 3 Λ {}) 1 Λ a Λ 2 Λ {} Rename previous state as current state and continue No output submatch tag Intermediate transitions [3] Previous state of 2 Start from the last symbol, going backwards aaaa

18 Submatch Extraction using Boolean Functions aΛ1ΛaΛ1Λ (1ΛaΛ1Λt1) V (1ΛaΛ2Λ{}) V (2ΛaΛ3Λ{}) 1ΛaΛ1Λ t1 Output submatch tag aΛ1ΛaΛ1Λ (1ΛaΛ1Λt1) V (1ΛaΛ2Λ{}) 1ΛaΛ1Λ t1 Output submatch tag aaaa t1 $1=aa Intermediate transitions [2] Intermediate transitions [1] Previous state of 1 aaaa

19 More Formal: Match Test Finding new frontiers after processing an input symbol: Next frontiers = Checking acceptance:

20 More Formal: Submatch Extraction Submatch extraction: the last consecutive sequence of characters that are assigned with t i A back traversal approach: starting from the last input symbol.

21 Submatch-OBDD Representation of tagged NFAs, match test, and submatch extraction using OBDDs OBDD representations for –Transitions with submatch tags –Intermediate transitions –Submatch tags –Set of start states –Set of accept states –Set of frontiers –Input symbols

22 Implementation R E 2T NFA T NFA 2O BDD P ATTERN M ATCH RegExps Tagged NFAs OBDDs Input strings / network traffic Matched at reg# Submatches $1= …, $2 = … No match Toolchain in C++, interfacing with the CUDD* *CUDD is a package for manipulation of Binary Decision Diagrams

23 Feasibility Study Data sets –Snort-2009 RegExps: 115 regexps with capturing groups from HTTP rules Traces –1.2GB department network traffic (average packet size 126 bytes) –1.3GB Twitter traffic (average packet size 1202 bytes) –1MB synthetic trace (average string length 311 bytes) –Snort-2012 RegExps: 403 regexps with capturing groups from HTTP rules Traces –1.2GB department network traffic (average packet size 126 bytes) –1.3GB Twitter traffic (average packet size 1202 bytes) –1MB synthetic trace (average string length 689 bytes) –Firewall-504 RegExps: 504 patterns from a commercial firewall F Trace: 87MB of firewall logs (average line size 87 bytes)

24 Experimental Setup Platform: Intel Core2 Duo E7500, Linux , 2GB RAM Two configurations on pattern matching –Conf. S patterns compiled individually Compiled pattern matched sequentially against input traces –Conf.C patterns combined with UNION and compiled combined pattern matched against input traces

25 Performance Execution time (cycles/byte) and memory consumption (MB) of RE2, PCRE, and Submatch-OBDD for the Snort-2009 data set

26 Performance Execution time (cycles/byte) and memory consumption (MB) of RE2, PCRE, and Submatch-OBDD for the Snort-2012 data set

27 Performance Execution time (cycles/byte) and memory consumption (MB) of RE2, PCRE, and Submatch-OBDD for the Firewall-504 data set

28 Related Work NFA-OBDD [ Yang et al., RAID’10, Chasaki and Wolf, ANCS’10 ] RE2 [ Cox, code.google.com/p/re2 ] PCRE [ ] TNFA [ Laurikari et al., SPIRE’00 ] MDFA [ Yu et al., ANCS’06 ] Hybrid FA [ Becchi and Crowley, CoNEXT’07 ] XFA [ Smith et al., Oakland’08 ] More – see paper for details

29 Conclusion A novel way of annotating capturing groups Submatch-OBDD: a novel technique on submatch extraction using OBDDs Feasibility study –Submatch-OBDD achieves ideal performance when patterns are combined –Faster than RE2 and PCRE when patterns are combined


Download ppt "Fast Submatch Extraction using OBDDs Liu Yang 1, Pratyusa Manadhata 2, William Horne 2, Prasad Rao 2, Vinod Ganapathy 1 Rutgers University 1 HP Laboratories."

Similar presentations


Ads by Google