TFA : A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Tang Song and H. Jonathan Chao Publisher: Technical.

Slides:



Advertisements
Similar presentations
Lexical Analysis IV : NFA to DFA DFA Minimization
Advertisements

CSE 311 Foundations of Computing I
4b Lexical analysis Finite Automata
Finite Automata CPSC 388 Ellen Walker Hiram College.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Compiler Construction
Authors: Raphael Polig, Kubilay Atasu, and Christoph Hagleitner Publisher: FPL, 2013 Presenter: Chia-Yi, Chu Date: 2013/10/30 1.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.
1 Introduction to Computability Theory Lecture4: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Introduction to Computability Theory
A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,
1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.
1 Single Final State for NFAs and DFAs. 2 Observation Any Finite Automaton (NFA or DFA) can be converted to an equivalent NFA with a single final state.
CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)
1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National.
1 Performing packet content inspection by longest prefix matching technology Authors: Nen-Fu Huang, Yen-Ming Chu, Yen-Min Wu and Chia- Wen Ho Publisher:
CS Chapter 2. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
CSE 311: Foundations of Computing Fall 2014 Lecture 23: State Minimization, NFAs.
Regular Model Checking Ahmed Bouajjani,Benget Jonsson, Marcus Nillson and Tayssir Touili Moran Ben Tulila
An Improved Algorithm to Accelerate Regular Expression Evaluation Author : Michela Becchi 、 Patrick Crowley Publisher : ANCS’07 Presenter : Wen-Tse Liang.
REGULAR LANGUAGES.
An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
Fast Packet Classification Using Bloom filters Authors: Sarang Dharmapurikar, Haoyu Song, Jonathan Turner, and John Lockwood Publisher: ANCS 2006 Present:
4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.
An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
Converting NFAs to DFAs How a Syntax Analyser is constructed.
Pattern-Based DFA for Memory- Efficient and Scalable Multiple Regular Expression Matching Author: Junchen Jiang, Yang Xu, Tian Pan, Yi Tang, Bin Liu Publisher:IEEE.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
INFAnt: NFA Pattern Matching on GPGPU Devices Author: Niccolo’ Cascarano, Pierluigi Rolando, Fulvio Risso, Riccardo Sisto Publisher: ACM SIGCOMM 2010 Presenter:
CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
Extending Finite Automata to Efficiently Match Perl-Compatible Regular Expressions Publisher : Conference on emerging Networking EXperiments and Technologies.
Formal Definition of Computation Let M = (Q, ∑, δ, q 0, F) be a finite automaton and let w = w 1 w w n be a string where each wi is a member of the.
Memory-Efficient Regular Expression Search Using State Merging Author: Michela Becchi, Srihari Cadambi Publisher: INFOCOM th IEEE International.
Lexical Analysis: DFA Minimization & Wrap Up. Automating Scanner Construction PREVIOUSLY RE  NFA ( Thompson’s construction ) Build an NFA for each term.
A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching Yao Song 11/05/2015.
Author : Yang Xu, Lei Ma, Zhaobo Liu, H. Jonathan Chao Publisher : ANCS 2011 Presenter : Jo-Ning Yu Date : 2011/12/28.
Author : Randy Smith & Cristian Estan & Somesh Jha Publisher : IEEE Symposium on Security & privacy,2008 Presenter : Wen-Tse Liang Date : 2010/10/27.
TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.
A Fast Regular Expression Matching Engine for NIDS Applying Prediction Scheme Author: Lei Jiang, Qiong Dai, Qiu Tang, Jianlong Tan and Binxing Fang Publisher:
Lecture Notes 
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
An Improved DFA for Fast Regular Expression Matching Author : Domenico Ficara 、 Stefano Giordano 、 Gregorio Procissi Fabio Vitucci 、 Gianni Antichi 、 Andrea.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.
Regular Languages Chapter 1 Giorgi Japaridze Theory of Computability.
What do we know? DFA = NFA =  -NFA We have seen algorithms to transform DFA to NFA (trival) NFA to  NFA (trivial) NFA to DFA (subset construction)
Theory of Computation Automata Theory Dr. Ayman Srour.
Packet Classification Using Multi- Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: 2013 IEEE 37th Annual Computer Software.
Deterministic Finite Automata Nondeterministic Finite Automata.
CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
Range Hash for Regular Expression Pre-Filtering Publisher : ANCS’ 10 Author : Masanori Bando, N. Sertac Artan, Rihua Wei, Xiangyi Guo and H. Jonathan Chao.
WELCOME TO A JOURNEY TO CS419 Dr. Hussien Sharaf Dr. Mohammad Nassef Department of Computer Science, Faculty of Computers and Information, Cairo University.
A DFA with Extended Character-Set for Fast Deep Packet Inspection
Lexical analysis Finite Automata
Non Deterministic Automata
Two issues in lexical analysis
Chapter 2 FINITE AUTOMATA.
Non-Deterministic Finite Automata
COSC 3340: Introduction to Theory of Computation
Non Deterministic Automata
Chapter Nine: Advanced Topics in Regular Languages
Compact DFA Structure for Multiple Regular Expressions Matching
Nondeterminism The Chinese University of Hong Kong Fall 2010
Presentation transcript:

TFA : A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Tang Song and H. Jonathan Chao Publisher: Technical Report Presenter: Ye-Zhi Chen Date: 2013/04/17

Introduction The main contributions in this paper : 1. We introduce TFA, the first finite automaton model with a clear and tunable bound on the number of concurrent active states (more than one) independent of the number and patterns of regular expressions. 2. A mathematical model is built to analyze the set split problem (SSP) 3. We develop a novel state encoding scheme to facilitate the implementation of a TFA. 4. The proposed TFA is evaluated using regular expression sets from Snort and Bro.

Introduction This paper proposed a new finite automaton TFA, the first finite automaton model with a clear and tunable bound on the number of concurrent active states (more than one) independent of the number and patterns of regular expressions.

Motivation Patterns : 1..*a.*b[ˆa]*c 2..*d.*e[ˆd]*f 3..*g.*h[ˆg]*i Alphaset Σ ={a, b,..., i} Number of state in DFA :54 Number of state in NFA :10 the NFA requires much less memory, its memory bandwidth requirement is four times that of the DFA

Motivation

We have seen the main reason for the DFA having far more states than the corresponding NFA is that the DFA needs one state for each NFA active state combination One possible solution is to allow multiple automaton states (bounded by a given bound factor b) to represent each combination of NFA active states. We name it Tunable Finite Automaton (TFA). N T : the number of TFA states P : the number of all possible statuses that can be represented by at most b active states A TFA with N T = O (log b (N D ) ) states can represent a DFA with N D states

TUNABLE FINITE AUTOMMATON(TFA) Definition : A b-TFA is a 6-tuple A finite set of TFA states Q T ; A finite set of input symbols Σ; A transition function δ T : Q T × Σ→ P(Q N ); A set of initial states I ⊆ Q T, |I| ≤ b; A set of accept states F T ⊆ Q T ; A set split function SS : Q D → (Q T ) b ∪... ∪ (Q T ) 1.

Constructing A TFA The implementation of a TFA logically consists of two components : 1. A TFA structure 2. Set Split Table (SST) : Each entry of the SST table corresponds to one combination of NFA active states (i.e., a DFA state) recording how to split the combination into multiple TFA states

TFA

Constructing A TFA

Generating a equivalent b-TFA from an NFA consists of four steps: 1. Generate the DFA states using the subset construction scheme 2. Split each NFA active state combination into up to b subsets. After this step, we obtain the TFA state set Q T and the set split function SS; 3. Decide the transition function δ T. Outgoing transitions of TFA states point to a data structure called state label, which contains a set of NFA state IDs. 4. Decide the set of initial states (I) and the set of accept states (F T )

Constructing A TFA

Operating A TFA assume the input string is “adegf ”. Initial active state : O 1. a : return label {A,O}, next active states : OA 2. d : return label {A,D,O}, next active states :O, AD 3. e : return label {A,E,O}, next active states :O, AE 4. g : return label {A,E,G,O}, next active states :OG, AE 5. f : return label {A,F,G,O}, next active states :OG, AF 6. AF is an accept state => match!

Operating A TFA

Splitting NFA Active State Combinations Set Split Problem (SSP) : Find a minimal number of subsets from the NFA state set, so that for any valid NFA active state combination, we can always find up to b subsets to exactly cover it b-SSP problem is an NP-hard problem for any b > 1

Splitting NFA Active State Combinations

A Heuristic Algorithm for SSP Problem To simplify the problem, we add another constraint on the model of the b-SSP problem :

State Encoding Original : using an array to store state labels which include different numbers of NFA state IDs Problem : 1.High storage cost 2.TDA operation overhead : a)Redundancy elimination b)Sorting Solution : bit vector

Performance Evaluation