Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Publisher : ANCS’ 06 Author : Fang Yu, Zhifeng Chen, Yanlei Diao, T.V.

Slides:



Advertisements
Similar presentations
CPSC Compiler Tutorial 4 Midterm Review. Deterministic Finite Automata (DFA) Q: finite set of states Σ: finite set of “letters” (input alphabet)
Advertisements

1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
COMP-421 Compiler Design Presented by Dr Ioanna Dionysiou.
Efficient Memory Utilization on Network Processors for Deep Packet Inspection Piti Piyachon Yan Luo Electrical and Computer Engineering Department University.
Compiler Construction
XFA : Faster Signature Matching With Extended Automata Author: Randy Smith, Cristian Estan and Somesh Jha Publisher: IEEE Symposium on Security and Privacy.
1 Introduction to Computability Theory Lecture12: Decidable Languages Prof. Amos Israeli.
Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.
A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,
1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.
1 Regular expression matching with input compression : a hardware design for use within network intrusion detection systems Department of Computer Science.
Aho-Corasick String Matching An Efficient String Matching.
1 Gigabit Rate Multiple- Pattern Matching with TCAM Fang Yu Randy H. Katz T. V. Lakshman
1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National.
Presenter: PCLee Design Automation Conference, ASP-DAC '07. Asia and South Pacific.
1 Performing packet content inspection by longest prefix matching technology Authors: Nen-Fu Huang, Yen-Ming Chu, Yen-Min Wu and Chia- Wen Ho Publisher:
CS Chapter 2. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
Deep Packet Inspection with Regular Expression Matching Min Chen, Danny Guo {michen, CSE Dept, UC Riverside 03/14/2007.
On the Use of Regular Expressions for Searching Text Charles L.A. Clarke and Gordon V. Cormack Fast Text Searching.
 Author: Tsern-Huei Lee  Publisher: 2009 IEEE Transation on Computers  Presenter: Yuen-Shuo Li  Date: 2013/09/18 1.
Sampling Techniques to Accelerate Pattern Matching in Network Intrusion Detection Systems Author: Domenico Ficara, Gianni Antichi, Andrea Di Pietro, Stefano.
1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Fang Yu Microsoft Research, Silicon Valley Work was done in UC Berkeley,
1 Outline Informal sketch of lexical analysis –Identifies tokens in input string Issues in lexical analysis –Lookahead –Ambiguities Specifying lexers –Regular.
An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.
1 Regular Expressions. 2 Regular expressions describe regular languages Example: describes the language.
SI-DFA: Sub-expression Integrated Deterministic Finite Automata for Deep Packet Inspection Authors: Ayesha Khalid, Rajat Sen†, Anupam Chattopadhyay Publisher:
Lexical Analysis Constructing a Scanner from Regular Expressions.
Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Authors: Fang Yu, Zhifeng Chen, Yanlei Diao, T. V. Lakshman, Randy H.
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
GPEP : Graphics Processing Enhanced Pattern- Matching for High-Performance Deep Packet Inspection Author: Lucas John Vespa, Ning Weng Publisher: 2011 IEEE.
1 Assignment #1 is due on Friday. Any questions?.
1 Prove the following languages over Σ={0,1} are regular by giving regular expressions for them: 1. {w contains two or more 0’s} 2. {|w| = 3k for some.
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection Author: Xiaofei Wang, Junchen Jiang, Yi Tang, Bin Liu, and Xiaojun Wang Publisher:
Sampling Techniques to Accelerate Pattern Matching in Network Intrusion Detection Systems Author : Domenico Ficara, Gianni Antichi, Andrea Di Pietro, Stefano.
INFAnt: NFA Pattern Matching on GPGPU Devices Author: Niccolo’ Cascarano, Pierluigi Rolando, Fulvio Risso, Riccardo Sisto Publisher: ACM SIGCOMM 2010 Presenter:
Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan.
CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
StriD2FA Scalable Regular Expression Matching for Deep Packet Inspection Author : Xiaofei Wang, Junchen Jiang, Yi Tang,Yi Wang,Bin Liu Xiaojun Wang Publisher.
CSE 311 Foundations of Computing I Lecture 27 FSM Limits, Pattern Matching Autumn 2012 CSE
Extending Finite Automata to Efficiently Match Perl-Compatible Regular Expressions Publisher : Conference on emerging Networking EXperiments and Technologies.
CMSC 330: Organization of Programming Languages Theory of Regular Expressions NFAs  DFAs.
Memory-Efficient Regular Expression Search Using State Merging Author: Michela Becchi, Srihari Cadambi Publisher: INFOCOM th IEEE International.
Author : Randy Smith & Cristian Estan & Somesh Jha Publisher : IEEE Symposium on Security & privacy,2008 Presenter : Wen-Tse Liang Date : 2010/10/27.
Regular Grammars Reading: 3.3. What we know so far…  FSA = Regular Language  Regular Expression describes a Regular Language  Every Regular Language.
TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.
A Fast Regular Expression Matching Engine for NIDS Applying Prediction Scheme Author: Lei Jiang, Qiong Dai, Qiu Tang, Jianlong Tan and Binxing Fang Publisher:
CSCI 2670 Introduction to Theory of Computing October 13, 2005.
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
Author : S. Kumar, B. Chandrasekaran, J. Turner, and G. Varghese Publisher : ANCS ‘07 Presenter : Jo-Ning Yu Date : 2011/04/20.
CSE 311 Foundations of Computing I Lecture 24 FSM Limits, Pattern Matching Autumn 2011 CSE 3111.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
SRD-DFA Achieving Sub-Rule Distinguishing with Extended DFA Structure Author: Gao Xia, Xiaofei Wang, Bin Liu Publisher: IEEE DASC (International Conference.
Deflating the Big Bang: Fast and Scalable Deep Packet Inspection with Extended Finite Automata Date:101/3/21 Publisher:SIGCOMM 08 Author:Randy Smith Cristian.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Range Hash for Regular Expression Pre-Filtering Publisher : ANCS’ 10 Author : Masanori Bando, N. Sertac Artan, Rihua Wei, Xiangyi Guo and H. Jonathan Chao.
Efficient Signature Matching with Multiple Alphabet Compression Tables Publisher : SecureComm, 2008 Author : Shijin Kong,Randy Smith,and Cristian Estan.
A DFA with Extended Character-Set for Fast Deep Packet Inspection
Chapter 2 FINITE AUTOMATA.
Regular Expression Matching in Reconfigurable Hardware
Speculative Parallel Pattern Matching
NFAs and Transition Graphs
A Hybrid Finite Automaton for Practical Deep Packet Inspection
Part Two : Nondeterministic Finite Automata
Presentation transcript:

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Publisher : ANCS’ 06 Author : Fang Yu, Zhifeng Chen, Yanlei Diao, T.V. Lakshman and Randy H. Katz Presenter : Yu-Hsiang Wang Date : 2010/11/17 1

Outline Introduction DFA Analysis for Individual Regular expression Regular Expression Rewrites Regular Expressions Grouping Evaluation results 2

Introduction A theoretical worst case study [14] shows a single regular expression of length n can be expressed as an NFA with O(n) states. When the NFA is converted into a DFA, it may generate O(Σ n ) states. (Σ : a finite set of input symbols, 2 8 symbols from the ASCII code) The processing complexity for each character in the input is O(1) in a DFA, but is O(n 2 ) for an NFA when all n states are active at the same time. 3

Introduction To handle m regular expressions, two choices are possible: - processing them individually in m automata : O(m) - compiling m regular expressions into a composite DFA : O(1) 4

Design Consideration Completeness of matching result: Pattern : ab* Input : abbb -Exhaustive Matching : a, ab, abb,abbb -Non-overlapping Matching : a (or abbb) left-most longest match, shortest match results DFA execution model for substring matching : patterns without ^ attached at the beginning. - Repeated search :Start scanning from one position, if no match, start again at the next position -One-pass search :.* is pre-pended to each pattern without ^ 5

DFA Analysis We use Exhaustive Matching and One-pass search Typical patterns in network payload scanning applications 6

Case 4 : DFA of Quadratic size if an input contains multiple Bs, the DFA needs to remember the number of Bs it has seen and their locations 7

Case 4 Rewrites Rewrite Rule(1) Rewriting is enabled by relaxing the requirement of exhaustive matching to that of non-overlapping matching the new pattern essentially implements non-overlapping left- most shortest match. Ex: ^SEARCH\s+[^\n]{1024}  ^SEARCH\s [^\n]{1024} input : SEARCH\s\s... \s aa... a number of states linear in j because it has removed the ambiguity for matching \s

Case 5 : DFA of Exponential Size we need to remember all possible effects of the preceding As as they may yield different results when combined with subsequent inputs. 9 AAB ABA BCD O BCD X

Case 5 : DFA of Exponential Size Often for detecting buffer overflow attempts :.*AUTH\s[^\n]{100} DFA needs to remember all the possible AUTH\s : DFA > 10000states -A second AUTH\s can either match [^\n]{100} or be counted as a new match of the start of the pattern AUTH\s Can’t be efficiently processed by an NFA-based approach either 10 AUTH\s[\^n] 100 states ε NFA for.*AUTH\s[^\n]{100} Input AUTH\sAUTH\s AUTH\s\s AUTH\s\s\s …

Case 5 Rewrites Only the first AUTH\s matters -If there is a ‘\n’ within the next 100 bytes None of the AUTH\s matches the pattern -Otherwise, the first AUTH\s and the following characters have already matched the pattern Rewrite the pattern to: ([^A]|A[^U]|AU[^T]|AUT[^H]|AUTH[^\s]|AUTH\s[^\n]{0,99}\n) *AUTH\s[^\n]{100} generates a DFA of only 106 states 11

Regular Expressions Grouping Some composite patterns generate DFA of exponential sizes interaction : two patterns interact with each other if their composite DFA contains more states than the sum of two individual ones 12

Regular Expressions Grouping Multi-core architectures (ex: IXP 2800 NPU,16 processing unit) Goal : design an algorithm that divides regular expressions into several groups, so that one processing unit can run one or several composite DFAs. the size of local memory of each processing unit is quite limited -Compute pair-wise interactive results, form a graph -Pick a pattern with the fewest interactions to the new group -Keep adding patterns until reaching limit 13

Regular Expressions Grouping 14

Evaluation results Effect of Rule Rewriting -L7-filter: protocol identifiers (70 regular expression) -Bro: intrusion patterns (2781 regular expression) -SNORT: No regular expression in April out of 4867 regular expressions as of Jan

Evaluation results Effect of Grouping Multiple Patterns 16

Evaluation results 17