Author : Randy Smith & Cristian Estan & Somesh Jha Publisher : IEEE Symposium on Security & privacy,2008 Presenter : Wen-Tse Liang Date : 2010/10/27.

Slides:



Advertisements
Similar presentations
Deep packet inspection – an algorithmic view Cristian Estan (U of Wisconsin-Madison) at IEEE CCW 2008.
Advertisements

Non-Deterministic Finite Automata
CSE 311 Foundations of Computing I
4b Lexical analysis Finite Automata
Complexity and Computability Theory I Lecture #4 Rina Zviel-Girshin Leah Epstein Winter
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 2 Mälardalen University 2005.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Compiler Construction
XFA : Faster Signature Matching With Extended Automata Author: Randy Smith, Cristian Estan and Somesh Jha Publisher: IEEE Symposium on Security and Privacy.
CS21 Decidability and Tractability
Introduction to Computability Theory
Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.
1 Introduction to Computability Theory Lecture2: Non Deterministic Finite Automata Prof. Amos Israeli.
Introduction to Computability Theory
Finite Automata Finite-state machine with no output. FA consists of States, Transitions between states FA is a 5-tuple Example! A string x is recognized.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
Aho-Corasick String Matching An Efficient String Matching.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National.
CS Chapter 2. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
CSE 311: Foundations of Computing Fall 2014 Lecture 23: State Minimization, NFAs.
Finite-State Machines with No Output Longin Jan Latecki Temple University Based on Slides by Elsa L Gunter, NJIT, and by Costas Busch Costas Busch.
Finite-State Machines with No Output
An Improved Algorithm to Accelerate Regular Expression Evaluation Author : Michela Becchi 、 Patrick Crowley Publisher : ANCS’07 Presenter : Wen-Tse Liang.
Nondeterministic Finite Automata CS 130: Theory of Computation HMU textbook, Chapter 2 (Sec 2.3 & 2.5)
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
REGULAR LANGUAGES.
Theory of Computation, Feodor F. Dragan, Kent State University 1 Regular expressions: definition An algebraic equivalent to finite automata. We can build.
An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.
Lexical Analysis Constructing a Scanner from Regular Expressions.
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
TFA : A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Tang Song and H. Jonathan Chao Publisher: Technical.
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
Pembangunan Kompilator.  A recognizer for a language is a program that takes a string x, and answers “yes” if x is a sentence of that language, and.
2. Regular Expressions and Automata 2007 년 3 월 31 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.33 ~ 56.
CHAPTER 1 Regular Languages
StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection Author: Xiaofei Wang, Junchen Jiang, Yi Tang, Bin Liu, and Xiaojun Wang Publisher:
Sampling Techniques to Accelerate Pattern Matching in Network Intrusion Detection Systems Author : Domenico Ficara, Gianni Antichi, Andrea Di Pietro, Stefano.
Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan.
CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.
TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.
Memory-Efficient Regular Expression Search Using State Merging Author: Michela Becchi, Srihari Cadambi Publisher: INFOCOM th IEEE International.
CMSC 330: Organization of Programming Languages Theory of Regular Expressions Finite Automata.
INHERENT LIMITATIONS OF COMPUTER PROGAMS CSci 4011.
A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching Yao Song 11/05/2015.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 1 Regular Languages Some slides are in courtesy.
TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.
A Fast Regular Expression Matching Engine for NIDS Applying Prediction Scheme Author: Lei Jiang, Qiong Dai, Qiu Tang, Jianlong Tan and Binxing Fang Publisher:
An Improved DFA for Fast Regular Expression Matching Author : Domenico Ficara 、 Stefano Giordano 、 Gregorio Procissi Fabio Vitucci 、 Gianni Antichi 、 Andrea.
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Advanced Algorithms for Fast and Scalable Deep Packet Inspection Author : Sailesh Kumar 、 Jonathan Turner 、 John Williams Publisher : ANCS’06 Presenter.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
1 Section 11.2 Finite Automata Can a machine(i.e., algorithm) recognize a regular language? Yes! Deterministic Finite Automata A deterministic finite automaton.
Deflating the Big Bang: Fast and Scalable Deep Packet Inspection with Extended Finite Automata Date:101/3/21 Publisher:SIGCOMM 08 Author:Randy Smith Cristian.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
June 13, 2016 Prof. Abdelaziz Khamis 1 Chapter 2 Scanning – Part 2.
Compiler Construction Lecture Three: Lexical Analysis - Part Two CSC 2103: Compiler Construction Lecture Three: Lexical Analysis - Part Two Joyce Nakatumba-Nabende.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Efficient Signature Matching with Multiple Alphabet Compression Tables Publisher : SecureComm, 2008 Author : Shijin Kong,Randy Smith,and Cristian Estan.
Lexical analysis Finite Automata
Two issues in lexical analysis
Chapter 2 FINITE AUTOMATA.
Speculative Parallel Pattern Matching
4b Lexical analysis Finite Automata
4b Lexical analysis Finite Automata
Presentation transcript:

Author : Randy Smith & Cristian Estan & Somesh Jha Publisher : IEEE Symposium on Security & privacy,2008 Presenter : Wen-Tse Liang Date : 2010/10/27

 Introduction  Background  Technical Overview  Reducing state space with XFAs  Building XFAs from regular expressions  Combining XFAs  Performance

 In this paper our primary goal is to improve the time and space efficiency of signature matching in network intrusion detection systems (NIDS)  To achieve our goal we introduce extended finite automata (XFAs) which augment traditional FSAs with a finite scratch memory used to remember various types of information relevant to the progress of signature matching  XFAs have time complexity similar to DFAs and space complexity similar to or better than NFAs.

 String matching was important for early network intrusion detection systems as their signatures consisted of simple strings. The Aho-Corasick [1] algorithm builds a concise automaton (linear in the total size of the strings) that recognizes multiple such signatures in a single pass.  NFAs can compactly represent multiple signatures but may require large amounts of matching time, since the matching operation needs to explore multiple paths in the automaton to determine whether the input matches any signatures.

 DFAs can be efficiently implemented in software, although the resulting state-space explosion often exceeds available memory.  Yu et al. [33] propose combining signatures into multiple DFAs instead of one DFA, using simple heuristics to determine which signatures should be grouped together.  The D 2 FA technique [14] performs edge compression to reduce the memory footprint of individual states. It stores only the difference between transitions in similar states, and in some sense, extends the string-based Aho-Corasick algorithm to DFAs.

.*S1.*S2 .{n}

 Annotating regular expressions  We use the parallel concatenation operator to “break up” a regular expression, or parts of one, into string-like subexpressions that are individually suitable for string matching.  For example, we annotate.*S1.*S2, where S1 and S2 are strings, as.*S1#.*S2. Put another way, we add the ‘#’ operator right before subexpressions such as ‘.*’ and [ˆ\n]{300} that repeat characters from either the whole input alphabet or a large subset thereof.

 Annotating regular expressions

 Parsing the regular expression  building a non-deterministic automaton through a bottom-up traversal of the parse tree,  ε –elimination  Determinization  Minimization.

 XFA is a 7-tuple ( Q, D, Σ, δ, U δ, ( q 0, d 0 ), F )  Q is the set of states, Σ is the set of inputs (inputalphabet), δ : Q×Σ → Q is the transition function  D is the finite set of values in the data domain,  U δ : Q×Σ×D → D is the per transition update functionwhich defines how the data value is updated on every transition  ( q 0, d 0 ) is the initial configuration which consists of an initial state q 0 and an initial data value d 0,  and F ⊆ Q×D is the set of accepting configurations.

 NXFA is a 7-tuple ( Q, D, Σ, δ, U δ,QD 0, F )  Q is the set of states, Σ is the set of inputs (input alphabet), δ ⊆ Q×(Σ ∪ {ε })×Q is the nondeterministic relation describing the allowed transitions  D is the finite set of values in the data domain  U δ : δ → 2 D×D is the nondeterministic update function (or update relation) which defines how the data value is updated on every transition  QD0 ⊆ Q×D is the set of initial configurations of the NXFA  and F ⊆ Q×D is the set of accepting configurations.

 From parse trees to NXFAs.

 ε –elimination  It extends standard ε - elimination by composing update functions along chains of “collapsed” ε transitions from the original NXFA and places these new relations into the appropriate transition in the ε -free NXFA.

 Determinization (transitions)  After consuming an input string, the matching lgorithm will know the exact automaton state it is in, but multiple data domain values may be possible

 Determinization(transitions) .*ab[^a]{1}

 Determinization (update relations)  Algorithm 3 determinizes the data domain of the NXFA produced by Algorithm 2, yielding a deterministic XFA.

 Determinization(update relations)  Minimization

 Finding efficient implementations.  The last step in the compilation process is to map abstract data domain operations to efficient, concrete instructions for manipulating data values.

 Finding efficient implementations.  Combining XFAs

 yielding 1556 signatures in total. We eliminated 106 signatures for reasons discussed below, giving us a signature set size of 1450.