Extending Finite Automata to Efficiently Match Perl-Compatible Regular Expressions Publisher : Conference on emerging Networking EXperiments and Technologies.

Slides:



Advertisements
Similar presentations
Deep Packet Inspection: Where are We? CCW08 Michela Becchi.
Advertisements

Deep packet inspection – an algorithmic view Cristian Estan (U of Wisconsin-Madison) at IEEE CCW 2008.
Automata Theory Part 1: Introduction & NFA November 2002.
XFA : Faster Signature Matching With Extended Automata Author: Randy Smith, Cristian Estan and Somesh Jha Publisher: IEEE Symposium on Security and Privacy.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.
Introduction to Computability Theory
CS5371 Theory of Computation
1 An adaptable FPGA-based System for Regular Expression Matching Department of Computer Science and Information Engineering National Cheng Kung University,
11 An Improved Algorithm to Accelerate Regular Expression Evaluation Authors: Michela Becchi and Patrick Crowley Publisher: ANCS’07 Present: Kia-Tso Chang.
A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,
1 ReCPU:a Parallel and Pipelined Architecture for Regular Expression Matching Department of Computer Science and Information Engineering National Cheng.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
Lecture 3 Goals: Formal definition of NFA, acceptance of a string by an NFA, computation tree associated with a string. Algorithm to convert an NFA to.
1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National.
CS Chapter 2. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
Memory-Efficient Regular Expression Search Using State Merging Department of Computer Science and Information Engineering National Cheng Kung University,
Lexical Analysis The Scanner Scanner 1. Introduction A scanner, sometimes called a lexical analyzer A scanner : – gets a stream of characters (source.
Improving Signature Matching using Binary Decision Diagrams Liu Yang, Rezwana Karim, Vinod Ganapathy Rutgers University Randy Smith Sandia National Labs.
Thopson NFA Presenter: Yuen-Shuo Li Date: 2014/5/7 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.
Sampling Techniques to Accelerate Pattern Matching in Network Intrusion Detection Systems Author: Domenico Ficara, Gianni Antichi, Andrea Di Pietro, Stefano.
An Improved Algorithm to Accelerate Regular Expression Evaluation Author : Michela Becchi 、 Patrick Crowley Publisher : ANCS’07 Presenter : Wen-Tse Liang.
An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.
SI-DFA: Sub-expression Integrated Deterministic Finite Automata for Deep Packet Inspection Authors: Ayesha Khalid, Rajat Sen†, Anupam Chattopadhyay Publisher:
Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Authors: Fang Yu, Zhifeng Chen, Yanlei Diao, T. V. Lakshman, Randy H.
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
TFA : A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Tang Song and H. Jonathan Chao Publisher: Technical.
An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
Lexical Analysis: Finite Automata CS 471 September 5, 2007.
Lesson No.6 Naveen Z Quazilbash. Overview Attendance and lesson plan sharing Assignments Quiz (10 mins.). Some basic ideas about this course Regular Expressions.
StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection Author: Xiaofei Wang, Junchen Jiang, Yi Tang, Bin Liu, and Xiaojun Wang Publisher:
Sampling Techniques to Accelerate Pattern Matching in Network Intrusion Detection Systems Author : Domenico Ficara, Gianni Antichi, Andrea Di Pietro, Stefano.
Regular Expression Matching for Reconfigurable Packet Inspection Authors: Jo˜ao Bispo, Ioannis Sourdis, Jo˜ao M.P. Cardoso and Stamatis Vassiliadis Publisher:
INFAnt: NFA Pattern Matching on GPGPU Devices Author: Niccolo’ Cascarano, Pierluigi Rolando, Fulvio Risso, Riccardo Sisto Publisher: ACM SIGCOMM 2010 Presenter:
Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan.
StriD2FA Scalable Regular Expression Matching for Deep Packet Inspection Author : Xiaofei Wang, Junchen Jiang, Yi Tang,Yi Wang,Bin Liu Xiaojun Wang Publisher.
TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.
Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29.
Memory-Efficient Regular Expression Search Using State Merging Author: Michela Becchi, Srihari Cadambi Publisher: INFOCOM th IEEE International.
Exercise 1 Consider a language with the following tokens and token classes: ID ::= letter (letter|digit)* LT ::= " " shiftL ::= " >" dot ::= "." LP ::=
Author : Randy Smith & Cristian Estan & Somesh Jha Publisher : IEEE Symposium on Security & privacy,2008 Presenter : Wen-Tse Liang Date : 2010/10/27.
TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.
LaFA Lookahead Finite Automata Scalable Regular Expression Detection Authors : Masanori Bando, N. Sertac Artan, H. Jonathan Chao Masanori Bando N. Sertac.
Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Publisher : ANCS’ 06 Author : Fang Yu, Zhifeng Chen, Yanlei Diao, T.V.
An Improved DFA for Fast Regular Expression Matching Author : Domenico Ficara 、 Stefano Giordano 、 Gregorio Procissi Fabio Vitucci 、 Gianni Antichi 、 Andrea.
Author : S. Kumar, B. Chandrasekaran, J. Turner, and G. Varghese Publisher : ANCS ‘07 Presenter : Jo-Ning Yu Date : 2011/04/20.
using Deterministic Finite Automata & Nondeterministic Finite Automata
Nondeterministic Finite Automata (NFAs). Reminder: Deterministic Finite Automata (DFA) q For every state q in Q and every character  in , one and only.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Advanced Algorithms for Fast and Scalable Deep Packet Inspection Author : Sailesh Kumar 、 Jonathan Turner 、 John Williams Publisher : ANCS’06 Presenter.
Series DFA for Memory- Efficient Regular Expression Matching Author: Tingwen Liu, Yong Sun, Li Guo, and Binxing Fang Publisher: CIAA 2012( International.
Deflating the Big Bang: Fast and Scalable Deep Packet Inspection with Extended Finite Automata Date:101/3/21 Publisher:SIGCOMM 08 Author:Randy Smith Cristian.
Range Hash for Regular Expression Pre-Filtering Publisher : ANCS’ 10 Author : Masanori Bando, N. Sertac Artan, Rihua Wei, Xiangyi Guo and H. Jonathan Chao.
Efficient Signature Matching with Multiple Alphabet Compression Tables Publisher : SecureComm, 2008 Author : Shijin Kong,Randy Smith,and Cristian Estan.
Department of Software & Media Technology
A DFA with Extended Character-Set for Fast Deep Packet Inspection
Lexical analysis Finite Automata
Advanced Algorithms for Fast and Scalable Deep Packet Inspection
Regular Expression Matching in Reconfigurable Hardware
Non-Deterministic Finite Automata
Non Deterministic Automata
NFAs and Transition Graphs
4b Lexical analysis Finite Automata
Memory-Efficient Regular Expression Search Using State Merging
4b Lexical analysis Finite Automata
Author: Domenico Ficara ,Gianni Antichi ,Nicola Bonelli ,
Compact DFA Structure for Multiple Regular Expressions Matching
A Hybrid Finite Automaton for Practical Deep Packet Inspection
2019/10/9 Regular Expression Matching for Reconfigurable Constraint Repetition Inspection Authors : Miad Faezipour and Mehrdad Nourani Publisher : IEEE.
Presentation transcript:

Extending Finite Automata to Efficiently Match Perl-Compatible Regular Expressions Publisher : Conference on emerging Networking EXperiments and Technologies (CoNext), 2008 Author : Michela Becchi and Patrick Crowley Presenter : Yu-Hsiang Wang Date : 2011/02/16 1

Outline Introduction Counting Constraints Back-References Combining Multiple Reg-Ex Architecture Experimental Evaluation 2

Introduction As of November 2007, 5,549 of the 8,536 Snort rules contain at least one Perl- Compatible Regular Expression (PCRE). Among these, 905 (16.3%) and 2,445 (44%) contain unbounded and bounded repetitions of large character classes, respectively, and 338 (6%) include back-references. This paper show how the proposed extended- automaton can be combined with the hybrid-FA proposed in [20]. 3

Counting Constraints -NFA When the counting constraint n is large, the states of the NFA is linear in n. The basic problem of an NFA representation resides in the fact that, during operation, many states can be active in parallel, leading to a high memory bandwidth requirement and/or processing time. (e.g. aaaaaaaaaaaa...aaaabc) 4

Counting Constraints -DFA 5 For large n, number of states is exponential in n. e.g. n=40 =>1000 billion states

Counting NFA This basic concept is complicated by observing that, to preserve functional equivalence between the original and the counting-NFA, one instance of the counter is not enough. e.g. axaybzbc 6 a, cnt b | cnt=n c | cnt  n 01 2 cnt++ 34 ∑ ∑ ∑

Counting NFA Differential representation - Since the increment operation acts in parallel on all the counter instances, the difference between them will remain constant over execution. store oldest (and largest) instance c i ’ and, for j>i, Δc j =c j -c j-1 7 c’ΔciΔci n=10

Back-References A back-reference in a regular expression refers to some sub-expression enclosed within capturing parentheses, and indicates that the referred sub-expression can be matched later within the regular expression itself. e.g..*(abc|bcd).\1y -matches abcdabcdy, does not match abcdaabcy.*a([a-z]+)a\1y -matches babacabacy 8

Back-References ( ) Defines a marked sub-expression. The string matched within the parentheses can be recalled later. A marked sub-expression is also called a block or capturing group. \num Matches what the num-th marked sub- expression matched e.g. HTML ( abcde ) /^ (.*) |\s+\/>)$/ 9

Back-References Each active state can be associated with a set of matched substrings MS k for each back-reference \k. This is performed as follows. (i) When a transition S x → S y is taken, the set MS k associated to S x gets moved to S y. (ii) If the taken transition is tagged k, the current input character is appended to the strings in MS k. If a back-reference \k originates from state S j, S j is consuming: when active, all the strings in its MS k are processed and shortened (one character at a time) Two special conditional transitions representing the back-reference are created. If the input character matches some string in MS k then: (i) transition S j → S j+1 is taken if the corresponding string is consumed completely; (ii) transition S j → S j is taken otherwise. 10

Back-References RegEx :.*(abc|bcd).\1y Input : abcdabcdy 11

Combining Multiple Reg-Ex States representing dot-star conditions, which we will call special states. If a regular expression RE1 containing a dot-star is compiled with a regular expression RE2, the sub-DFA representing the match of RE2 is duplicated: one instance will start at state 0 and one instance will start at the special state. 12

Combining Multiple Reg-Ex When more regular expressions containing.* conditions are compiled, the number of possible special state combinations will affect the complexity and the size of the resulting DFA. The same considerations hold for extended automata. In fact, counting/consuming states behave like special states (they have an auto-loop on a large character class). 13

head-DFA tail-DFA 1 tail-DFA 2 tail-DFA k Combining Multiple Reg-Ex When processing the input text, the head-DFA is always active: each input character will trigger a state transition on it. The operation of distinct tail-DFA machines is, in principle, independent. Furthermore, once the head- DFA has activated a tail-DFA, the two can execute in sequence or in parall el threads. [20] M. Becchi and P. Crowley, “A Hybrid Finite Automaton for Practical Deep Packet Inspection”, in CoNEXT

Combining Multiple Reg-Ex 15 The tail-DFA will be activated every time the head- state 0-3 is traversed.

Combining Multiple Reg-Ex A distinct activation of the tail-DFA is required each time head-state 1-3 is reached. Therefore, the tail-DFA may be active in parallel on different states. New activation of the tail-DFA always begins from the counting state 3, this ensures that, if the tail-DFA is already active, the new activation will be covered by the current active state. 16

Architecture 17 Head-DFA Tail-DFA

Experimental Evaluation 18