Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Hybrid Finite Automaton for Practical Deep Packet Inspection

Similar presentations


Presentation on theme: "A Hybrid Finite Automaton for Practical Deep Packet Inspection"— Presentation transcript:

1 A Hybrid Finite Automaton for Practical Deep Packet Inspection
Michela Becchi and Patrick Crowley CoNEXT 2007

2 Context Deep packet inspection
Challenge: perform regular expression matching at line rate, given data-sets of hundreds (or thousands) of patterns Processing time Memory requirement Matching Engine and RegEx set FTP.OPEN.* Host= Server.*HTTP Safe packets Incoming packets Hosxyz blaBLAbla Safe_payload Safe_payload xHost= Malicious packets ServerxHTTP

3 Deterministic vs. Non-Deterministic FA
RegEx: (1) .*a+bc (2) .*bcd+ (3) .*cde NFA a b c 1 2 3/1 a * d b c d 4 5 6/2 DFA c a a: 1-10 b c d c: 1,3,5-10 e b: 2-10 1 2 3/1 4 5 6/2 7/2 8 9 10/3 d e 7 8 9/3 Text: d a b c d

4 Memory-time tradeoff NFA DFA Idea limited size
potentially NNFA states active in parallel DFA one state traversal/char size: potentially 2N states where N=NNFA In practical cases single DFA infeasible! Idea Hybrid automaton Size comparable to NFA by preventing “state explosion” Predictable and small memory bandwidth/processing time Limit to classes of RegEx in Intrusion Detection Systems Analyze state explosion scenarios time NFA DFA memory

5 SNORT Regular expressions
Examples Server\s+Guptachar\s+\d+\x2E\d+ User-Agent [^\r\n]*A-311\s+Server Host[^\r\n]*wwp\.mirabilis\.com.*from=[^\r\n]*from =[^\r\n]*subject=[^\r\n]*to= \sPARTIAL.*BODY\.PEEK[^\n]{1024} SNORT RegExs DO consist of: Sequences of sub-patterns Possibly containing (repetitions of) character ranges Separated by dot-star terms and counting constraints SNORT RegExs DON’T normally contain: Nested repetitions Disjunctions of complex sub-expressions pattern1.*pattern2.{n,m}[…]patternk[^cxcy]*[…]patternn

6 Dot-star terms Definition Examples
Unconstrained repetitions of wildcards (.*) or large ranges [^c1c2..ck]* Examples User-Agent[^\r\n]*ZC-Bridge On single regular expressions (from practical data-sets) NO state Blowup 1 2 3 4 a b c d 0,1 0,2 * ^c^d ^c NFA DFA RegEx: ab.*cd ^c

7 Dot-star conditions (cont’d)
[^ce] Compiling together several RegEx Duplication “sub-DFAs” at “.*” states NO exponential blow-up a e f g h b c d [^ce] [^ceh] [^cef] [^cde] [^ceg] 10/2 1 2 4 7 3 5 8/1 6 9 12/2 11 ab.*cd efgh

8 Counting constraints Definition Examples Exponential state explosion:
Constrained repetition of wildcard .{n,m} or large ranges [^c1c2..ck]{n,m} Examples AUTH\s[^\n]{100} (buffer overflow) Exponential state explosion: Single regular expressions: all possible occurrences of the prefix in the counting constraint Multiple regular expressions: additionally, all the possible occurrences of other RegEx in the counting constraint

9 Counting constraints (cont’d)
NFA * DFA a 7 a 1 2 3 4 5 6 7 a b * c d 8 a a d b 1 2 ^a ^a ^a 3 4 5 c 6 a a a a c Ex:ab.{3}cd [^ab] [^ab] 8 a 9 a 10 a 1 b b b a a 2 11 ^a 12 13 [^ac] 3 10 a [^ac] a a b c c [^ad] 5 14 c 15 16 [^ad] 4 a [^abc] d d 4 a c 18 9 1 17 6

10 First step: hybrid-FA Hybrid-FA NFA
Idea: Stop subset construction at the state where state blowup would occur Implication: hybrid-FA with a head-DFA, one or more tail-NFAs and one of more border-states Hybrid-FA NFA * a 1 2 4/1 3 8/2 7 6 5 9 10/3 11 12 13/4 e c d b f 1 11 2 1 11 4/1 * 1 2 3 8/2 7 6 5 9 10/3 11 12 13/4 d c e a b f e

11 Hybrid-FA traversal NFA Hybrid-FA
4/1 * 1 2 3 8/2 7 6 5 9 10/3 11 12 13/4 d c e a b f Hybrid-FA a 1 2 4/1 3 8/2 7 6 5 9 10/3 11 12 13/4 e c d b f 1 11 2 1 1113 1 11 * b a c e f d 1 5 9 2 3 11 6 12 7 8 4 b a c e f d 1 5 9 2 3 6 7 8 4 11 Functional equivalence (commonly reached accepting states) Hybrid-FA: Limitation in size of active vector till border state is reached No back activation from tail-NFAs to head-DFA

12 Improving the worst case
Size: Hybrid-FA ≈Size of NFA Bandwidth: Average case improved (in DFA) Worst case dependent on tail-NFAs size Can we do better?

13 Dot-star terms: Tail-DFAs
Idea: Problem: Multiple border state traversals => Multiple tail-DFA activations Fact: In case of sub_pattern1.* sub_pattern2 sub_pattern1[^c1…ck] *sub_pattern2 w/ c1,..,ck  sub_pattern2 subsequent activations of a tail-DFA can be safely ignored Implication Each tail-DFA adds only 1 to the worst case bound head-DFA tail-NFA head-DFA tail-DFA tail-NFA

14 Counting Constraints: counter trick
* b b+1 b+n n states b+n-1 . . . suffix NFA for .{n}suffix Observation: n “counting states” do not carry real next state information Idea: Replace n counting states w/ auto-decrementing counter At most 2 memory accesses per counter sufficient Optimization Counting constraint at the end of the regular expression (no suffix) => ONE counter is enough

15 Rule-sets Distinct PCREs: 982 Header-based grouping
25% w/ long counting constraints (generally at the end of the RegEx, n= ) 11.4 % containing .* terms 54.89% containing [^c1c2..ck]* terms Header-based grouping Rule-set Number of rules Header Characteristics Protocol Source IP Src Port Destination IP Destination Port .* and [^x]* .{n,m} Group1 329 Tcp $HOME_NET any $EXTERNAL_NET $HTTP_PORTS/any 283 - Group2 40 25/any 24 Group3 18 7777:7778/any 5 10 Group4 45 143/any 19 Group5 20 119/any 6 11 Group6 110/any 7 12

16 Memory storage requirements
Tail-DFAs and counter trick used (counters at end) Rule-set NFA DFA Hybrid-FA # states # DFAs Total states # tail-FA head-DFA states Total tail-states Group1 15679 32 71234 31 40461 30321 Group2 1036 3 2 22651 31521 20724 1905 Group3 8871 N-A 10 514 - Group4 3119 19 2560 Group5 5205 11 2485 Group6 1952 12 4878

17 Memory bandwidth requirements
Simulations on 12 packet traces From 17MB to 264 MB 1-6 rules matched/traces Observations: active set size: # of parallel active states Rule-set NFA DFA Hybrid-FA Avg Max Worst case Avg= Max= Group1 1.15 34 15679 32 1.009 5 Group2 1.06 13 1036 2/3 1.001 2 3 Group3 1.04 4 8871 - 1.002 11 Group4 2.45 12 3119 20 Group5 5205 Group6 2.99 6 1952 1.088

18 Conclusion Contributions: Experimental results: Deployment observation
Analysis of practical rule-sets Proposal of hybrid-FA to reduce memory storage requirement limit average case memory bandwidth Refinements: tail-DFAs and counter tricks bound worst case memory bandwidth Experimental results: Memory size: comparable to the corresponding NFA Memory bandwidth: Average case ≈ single (unfeasible) DFA Worst case dependent upon number of “problematic” RegEx Deployment observation Head and tail-FAs independent Hybrid-FA suitable for deployment on parallel architectures and FPGAs

19 Thanks Questions?

20 A SNORT rule HEADER MATCHING
(protocol, source addr, source port, dest. addr, dest. port) alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (msg:"BACKDOOR a-311 death user-agent string detected"; flow:to_server,established; content:"User-Agent|3A|"; nocase; content:"A-311"; distance:0; nocase; content:"Server"; distance:0; nocase; pcre:"/^User-Agent\x3A[^\r\n]*A-311\s+Server/smi"; reference:url,www3.ca.com/securityadvisor/pest/pest.aspx?id= ; classtype:trojan-activity; sid:6396; rev:1;) PAYLOAD INSPECTION Keywords (“content”) Regular expression (PCRE)

21 Problem Network Intrusion Detection Systems use Regular Expression Matching for Payload Inspection Regular Expression Matching performed in Linear time through deterministic finite automata (DFAs) Several compression techniques put in place to reduce memory requirement of given DFAs BUT Complexity of RegEx may make DFAs unfeasible because of “state explosion”. How to prevent state explosion from happening preserving worst case bound in memory bandwidth?

22 Deterministic vs. Non-Deterministic FA
RegEx: (1) .*abc; (2) .*bcd; (3) .*cde NFA a b c * d e 1 2 3/1 4 5 6/2 7 8 9/3 a 0,1 0,4 2 DFA a b c ` c 0,7 b 0,4


Download ppt "A Hybrid Finite Automaton for Practical Deep Packet Inspection"

Similar presentations


Ads by Google