A Hybrid Finite Automaton for Practical Deep Packet Inspection

Slides:



Advertisements
Similar presentations
Deep Packet Inspection: Where are We? CCW08 Michela Becchi.
Advertisements

Deep packet inspection – an algorithmic view Cristian Estan (U of Wisconsin-Madison) at IEEE CCW 2008.
CPSC Compiler Tutorial 4 Midterm Review. Deterministic Finite Automata (DFA) Q: finite set of states Σ: finite set of “letters” (input alphabet)
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Efficient Memory Utilization on Network Processors for Deep Packet Inspection Piti Piyachon Yan Luo Electrical and Computer Engineering Department University.
Decompression-Free Inspection: DPI for Shared Dictionary Compression over HTTP Anat Bremler-Barr Interdisciplinary Center Herzliya Shimrit Tzur David Interdisciplinary.
Reviewer: Jing Lu Gigabit Rate Packet Pattern- Matching Using TCAM Fang Yu, Randy H. Katz T. V. Lakshman UC Berkeley Bell Labs, Lucent ICNP’2004.
Tries Standard Tries Compressed Tries Suffix Tries.
Efficient Multi-match Packet Classification with TCAM Fang Yu Randy H. Katz EECS Department, UC Berkeley {fyu,
A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,
Efficient Multi-Match Packet Classification with TCAM Fang Yu
1 Gigabit Rate Multiple- Pattern Matching with TCAM Fang Yu Randy H. Katz T. V. Lakshman
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National.
CS5371 Theory of Computation Lecture 4: Automata Theory II (DFA = NFA, Regular Language)
A High Throughput String Matching Architecture for Intrusion Detection and Prevention Lin Tan U of Illinois, Urbana Champaign Tim Sherwood UC, Santa Barbara.
Deep Packet Inspection with Regular Expression Matching Min Chen, Danny Guo {michen, CSE Dept, UC Riverside 03/14/2007.
Improving Signature Matching using Binary Decision Diagrams Liu Yang, Rezwana Karim, Vinod Ganapathy Rutgers University Randy Smith Sandia National Labs.
A Statistical Anomaly Detection Technique based on Three Different Network Features Yuji Waizumi Tohoku Univ.
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions.
1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Fang Yu Microsoft Research, Silicon Valley Work was done in UC Berkeley,
An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.
SI-DFA: Sub-expression Integrated Deterministic Finite Automata for Deep Packet Inspection Authors: Ayesha Khalid, Rajat Sen†, Anupam Chattopadhyay Publisher:
Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Authors: Fang Yu, Zhifeng Chen, Yanlei Diao, T. V. Lakshman, Randy H.
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
TFA : A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Tang Song and H. Jonathan Chao Publisher: Technical.
An Implementation of The Teiresias Algorithm Na Zhao Chengjun Zhan.
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
Writing Snort Rules A quick guide Brian Caswell. 2 The life of a packet through Snort’s detection engine.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
TASHKENT UNIVERSITY OF INFORMATION TECHNOLOGIES Lesson №18 Telecommunication software design for analyzing and control packets on the networks by using.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Sampling Techniques to Accelerate Pattern Matching in Network Intrusion Detection Systems Author : Domenico Ficara, Gianni Antichi, Andrea Di Pietro, Stefano.
1 Optimization of Regular Expression Pattern Matching Circuits on FPGA Department of Computer Science and Information Engineering National Cheng Kung University,
5 Firewalls in VoIP Selected Topics in Information Security – Bazara Barry.
Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan.
Extending Finite Automata to Efficiently Match Perl-Compatible Regular Expressions Publisher : Conference on emerging Networking EXperiments and Technologies.
TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.
Memory-Efficient Regular Expression Search Using State Merging Author: Michela Becchi, Srihari Cadambi Publisher: INFOCOM th IEEE International.
Advanced Regular Expression Matching for Line-Rate Deep Packet Inspection Sailesh Kumar, Jon Turner Michela Becchi, Patrick Crowley, George Varghese.
A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching Yao Song 11/05/2015.
TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.
LaFA Lookahead Finite Automata Scalable Regular Expression Detection Authors : Masanori Bando, N. Sertac Artan, H. Jonathan Chao Masanori Bando N. Sertac.
Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Publisher : ANCS’ 06 Author : Fang Yu, Zhifeng Chen, Yanlei Diao, T.V.
Author : S. Kumar, B. Chandrasekaran, J. Turner, and G. Varghese Publisher : ANCS ‘07 Presenter : Jo-Ning Yu Date : 2011/04/20.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
Accelerating Multi-Pattern Matching on Compressed HTTP Traffic Dr. Anat Bremler-Barr (IDC) Joint work with Yaron Koral (IDC), Infocom[2009]
Series DFA for Memory- Efficient Regular Expression Matching Author: Tingwen Liu, Yong Sun, Li Guo, and Binxing Fang Publisher: CIAA 2012( International.
Deterministic Finite Automata Nondeterministic Finite Automata.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
Department of Software & Media Technology
CS314 – Section 5 Recitation 2
Lecture 2 Lexical Analysis
Lexical analysis Finite Automata
Chapter 2 FINITE AUTOMATA.
Advanced Algorithms for Fast and Scalable Deep Packet Inspection
Chapter 3: Lexical Analysis
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
Jaya Krishna, M.Tech, Assistant Professor
Principles of Computing – UFCFA3-30-1
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
4b Lexical analysis Finite Automata
Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.
4b Lexical analysis Finite Automata
Instructor: Aaron Roth
Chapter 1 Regular Language
2019/5/3 A De-compositional Approach to Regular Expression Matching for Network Security Applications Author: Eric Norige Alex Liu Presenter: Yi-Hsien.
High-Performance Pattern Matching for Intrusion Detection
CHAPTER 1 Regular Languages
Presentation transcript:

A Hybrid Finite Automaton for Practical Deep Packet Inspection Michela Becchi and Patrick Crowley CoNEXT 2007

Context Deep packet inspection Challenge: perform regular expression matching at line rate, given data-sets of hundreds (or thousands) of patterns Processing time Memory requirement Matching Engine and RegEx set FTP.OPEN.* www.spyware Host= Server.*HTTP Safe packets Incoming packets Hosxyz blaBLAbla Safe_payload Safe_payload xHost= Malicious packets ServerxHTTP

Deterministic vs. Non-Deterministic FA RegEx: (1) .*a+bc (2) .*bcd+ (3) .*cde NFA a b c 1 2 3/1 a * d b c d 4 5 6/2 DFA c a a: 1-10 b c d c: 1,3,5-10 e b: 2-10 1 2 3/1 4 5 6/2 7/2 8 9 10/3 d e 7 8 9/3 Text: d a b c d

Memory-time tradeoff NFA DFA Idea limited size potentially NNFA states active in parallel DFA one state traversal/char size: potentially 2N states where N=NNFA In practical cases single DFA infeasible! Idea Hybrid automaton Size comparable to NFA by preventing “state explosion” Predictable and small memory bandwidth/processing time Limit to classes of RegEx in Intrusion Detection Systems Analyze state explosion scenarios time NFA DFA memory

SNORT Regular expressions Examples Server\s+Guptachar\s+\d+\x2E\d+ User-Agent [^\r\n]*A-311\s+Server Host[^\r\n]*wwp\.mirabilis\.com.*from=[^\r\n]*fromemail=[^\r\n]*subject=[^\r\n]*to=24962844 \sPARTIAL.*BODY\.PEEK[^\n]{1024} SNORT RegExs DO consist of: Sequences of sub-patterns Possibly containing (repetitions of) character ranges Separated by dot-star terms and counting constraints SNORT RegExs DON’T normally contain: Nested repetitions Disjunctions of complex sub-expressions pattern1.*pattern2.{n,m}[…]patternk[^cxcy]*[…]patternn

Dot-star terms Definition Examples Unconstrained repetitions of wildcards (.*) or large ranges [^c1c2..ck]* Examples User-Agent[^\r\n]*ZC-Bridge On single regular expressions (from practical data-sets) NO state Blowup 1 2 3 4 a b c d 0,1 0,2 * ^c^d ^c NFA DFA RegEx: ab.*cd ^c

Dot-star conditions (cont’d) [^ce] Compiling together several RegEx Duplication “sub-DFAs” at “.*” states NO exponential blow-up a e f g h b c d [^ce] [^ceh] [^cef] [^cde] [^ceg] 10/2 1 2 4 7 3 5 8/1 6 9 12/2 11 ab.*cd efgh

Counting constraints Definition Examples Exponential state explosion: Constrained repetition of wildcard .{n,m} or large ranges [^c1c2..ck]{n,m} Examples AUTH\s[^\n]{100} (buffer overflow) Exponential state explosion: Single regular expressions: all possible occurrences of the prefix in the counting constraint Multiple regular expressions: additionally, all the possible occurrences of other RegEx in the counting constraint

Counting constraints (cont’d) NFA * DFA a 7 a 1 2 3 4 5 6 7 a b * c d 8 a a d b 1 2 ^a ^a ^a 3 4 5 c 6 a a a a c Ex:ab.{3}cd [^ab] [^ab] 8 a 9 a 10 a 1 b b b a a 2 11 ^a 12 13 [^ac] 3 10 a [^ac] a a b c c [^ad] 5 14 c 15 16 [^ad] 4 a [^abc] d d 4 a c 18 9 1 17 6

First step: hybrid-FA Hybrid-FA NFA Idea: Stop subset construction at the state where state blowup would occur Implication: hybrid-FA with a head-DFA, one or more tail-NFAs and one of more border-states Hybrid-FA NFA * a 1 2 4/1 3 8/2 7 6 5 9 10/3 11 12 13/4 1 5 11 e c d b f 1 11 2 1 11 13 1 11 4/1 * 1 2 3 8/2 7 6 5 9 10/3 11 12 13/4 d c e a b f e

Hybrid-FA traversal NFA Hybrid-FA 4/1 * 1 2 3 8/2 7 6 5 9 10/3 11 12 13/4 d c e a b f Hybrid-FA a 1 2 4/1 3 8/2 7 6 5 9 10/3 11 12 13/4 e c d b f 1 5 11 1 11 2 1 1113 1 11 * b a c e f d 1 5 9 2 3 11 6 12 7 8 4 1 5 11 b a c e f d 1 5 9 2 3 6 7 8 4 11 Functional equivalence (commonly reached accepting states) Hybrid-FA: Limitation in size of active vector till border state is reached No back activation from tail-NFAs to head-DFA

Improving the worst case Size: Hybrid-FA ≈Size of NFA Bandwidth: Average case improved (in DFA) Worst case dependent on tail-NFAs size Can we do better?

Dot-star terms: Tail-DFAs Idea: Problem: Multiple border state traversals => Multiple tail-DFA activations Fact: In case of sub_pattern1.* sub_pattern2 sub_pattern1[^c1…ck] *sub_pattern2 w/ c1,..,ck  sub_pattern2 subsequent activations of a tail-DFA can be safely ignored Implication Each tail-DFA adds only 1 to the worst case bound head-DFA tail-NFA head-DFA tail-DFA tail-NFA

Counting Constraints: counter trick * b b+1 b+n n states b+n-1 . . . suffix NFA for .{n}suffix Observation: n “counting states” do not carry real next state information Idea: Replace n counting states w/ auto-decrementing counter At most 2 memory accesses per counter sufficient Optimization Counting constraint at the end of the regular expression (no suffix) => ONE counter is enough

Rule-sets Distinct PCREs: 982 Header-based grouping 25% w/ long counting constraints (generally at the end of the RegEx, n=100-1024) 11.4 % containing .* terms 54.89% containing [^c1c2..ck]* terms Header-based grouping Rule-set Number of rules Header Characteristics Protocol Source IP Src Port Destination IP Destination Port .* and [^x]* .{n,m} Group1 329 Tcp $HOME_NET any $EXTERNAL_NET $HTTP_PORTS/any 283 - Group2 40 25/any 24 Group3 18 7777:7778/any 5 10 Group4 45 143/any 19 Group5 20 119/any 6 11 Group6 110/any 7 12

Memory storage requirements Tail-DFAs and counter trick used (counters at end) Rule-set NFA DFA Hybrid-FA # states # DFAs Total states # tail-FA head-DFA states Total tail-states Group1 15679 32 71234 31 40461 30321 Group2 1036 3 2 22651 31521 20724 1905 Group3 8871 N-A 10 514 - Group4 3119 19 2560 Group5 5205 11 2485 Group6 1952 12 4878

Memory bandwidth requirements Simulations on 12 packet traces From 17MB to 264 MB 1-6 rules matched/traces Observations: active set size: # of parallel active states Rule-set NFA DFA Hybrid-FA Avg Max Worst case Avg= Max= Group1 1.15 34 15679 32 1.009 5 Group2 1.06 13 1036 2/3 1.001 2 3 Group3 1.04 4 8871 - 1.002 11 Group4 2.45 12 3119 20 Group5 5205 Group6 2.99 6 1952 1.088

Conclusion Contributions: Experimental results: Deployment observation Analysis of practical rule-sets Proposal of hybrid-FA to reduce memory storage requirement limit average case memory bandwidth Refinements: tail-DFAs and counter tricks bound worst case memory bandwidth Experimental results: Memory size: comparable to the corresponding NFA Memory bandwidth: Average case ≈ single (unfeasible) DFA Worst case dependent upon number of “problematic” RegEx Deployment observation Head and tail-FAs independent Hybrid-FA suitable for deployment on parallel architectures and FPGAs

Thanks Questions?

A SNORT rule HEADER MATCHING (protocol, source addr, source port, dest. addr, dest. port) alert tcp $HOME_NET any -> $EXTERNAL_NET $HTTP_PORTS (msg:"BACKDOOR a-311 death user-agent string detected"; flow:to_server,established; content:"User-Agent|3A|"; nocase; content:"A-311"; distance:0; nocase; content:"Server"; distance:0; nocase; pcre:"/^User-Agent\x3A[^\r\n]*A-311\s+Server/smi"; reference:url,www3.ca.com/securityadvisor/pest/pest.aspx?id=453076778; classtype:trojan-activity; sid:6396; rev:1;) PAYLOAD INSPECTION Keywords (“content”) Regular expression (PCRE)

Problem Network Intrusion Detection Systems use Regular Expression Matching for Payload Inspection Regular Expression Matching performed in Linear time through deterministic finite automata (DFAs) Several compression techniques put in place to reduce memory requirement of given DFAs BUT Complexity of RegEx may make DFAs unfeasible because of “state explosion”. How to prevent state explosion from happening preserving worst case bound in memory bandwidth?

Deterministic vs. Non-Deterministic FA RegEx: (1) .*abc; (2) .*bcd; (3) .*cde NFA a b c * d e 1 2 3/1 4 5 6/2 7 8 9/3 a 0,1 0,4 2 DFA a b c ` c 0,7 b 0,4