Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Authors: Fang Yu, Zhifeng Chen, Yanlei Diao, T. V. Lakshman, Randy H.

Slides:



Advertisements
Similar presentations
Deep packet inspection – an algorithmic view Cristian Estan (U of Wisconsin-Madison) at IEEE CCW 2008.
Advertisements

Deterministic Finite Automata (DFA)
An Efficient Regular Expressions Compression Algorithm From A New Perspective Authors : Tingwen Liu,Yifu Yang,Yanbing Liu,Yong Sun,Li Guo Tingwen LiuYifu.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Compiler Construction
Reviewer: Jing Lu Gigabit Rate Packet Pattern- Matching Using TCAM Fang Yu, Randy H. Katz T. V. Lakshman UC Berkeley Bell Labs, Lucent ICNP’2004.
1 Languages. 2 A language is a set of strings String: A sequence of letters Examples: “cat”, “dog”, “house”, … Defined over an alphabet: Languages.
Finite Automata Great Theoretical Ideas In Computer Science Anupam Gupta Danny Sleator CS Fall 2010 Lecture 20Oct 28, 2010Carnegie Mellon University.
Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.
Courtesy Costas Busch - RPI1 Non Deterministic Automata.
Theory of Computation What types of things are computable? How can we demonstrate what things are computable?
61 Nondeterminism and Nodeterministic Automata. 62 The computational machine models that we learned in the class are deterministic in the sense that the.
A hybrid finite automaton for practical deep packet inspection Department of Computer Science and Information Engineering National Cheng Kung University,
Homework #2 Solutions.
1 Languages and Finite Automata or how to talk to machines...
1 Regular expression matching with input compression : a hardware design for use within network intrusion detection systems Department of Computer Science.
1 Gigabit Rate Multiple- Pattern Matching with TCAM Fang Yu Randy H. Katz T. V. Lakshman
1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National.
1 Performing packet content inspection by longest prefix matching technology Authors: Nen-Fu Huang, Yen-Ming Chu, Yen-Min Wu and Chia- Wen Ho Publisher:
1 Non-Deterministic Automata Regular Expressions.
Great Theoretical Ideas in Computer Science.
1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Fang Yu Microsoft Research, Silicon Valley Work was done in UC Berkeley,
Overview of Previous Lesson(s) Over View  Strategies that have been used to implement and optimize pattern matchers constructed from regular expressions.
An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.
Introduction to CS Theory Lecture 3 – Regular Languages Piotr Faliszewski
SI-DFA: Sub-expression Integrated Deterministic Finite Automata for Deep Packet Inspection Authors: Ayesha Khalid, Rajat Sen†, Anupam Chattopadhyay Publisher:
Overview of Previous Lesson(s) Over View  An NFA accepts a string if the symbols of the string specify a path from the start to an accepting state.
TFA : A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Tang Song and H. Jonathan Chao Publisher: Technical.
Great Theoretical Ideas in Computer Science.
An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:
CS412/413 Introduction to Compilers Radu Rugina Lecture 4: Lexical Analyzers 28 Jan 02.
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
2. Regular Expressions and Automata 2007 년 3 월 31 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.33 ~ 56.
StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection Author: Xiaofei Wang, Junchen Jiang, Yi Tang, Bin Liu, and Xiaojun Wang Publisher:
Sampling Techniques to Accelerate Pattern Matching in Network Intrusion Detection Systems Author : Domenico Ficara, Gianni Antichi, Andrea Di Pietro, Stefano.
INFAnt: NFA Pattern Matching on GPGPU Devices Author: Niccolo’ Cascarano, Pierluigi Rolando, Fulvio Risso, Riccardo Sisto Publisher: ACM SIGCOMM 2010 Presenter:
Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan.
CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.
Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.
CSE 311 Foundations of Computing I Lecture 27 FSM Limits, Pattern Matching Autumn 2012 CSE
Extending Finite Automata to Efficiently Match Perl-Compatible Regular Expressions Publisher : Conference on emerging Networking EXperiments and Technologies.
TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.
Memory-Efficient Regular Expression Search Using State Merging Author: Michela Becchi, Srihari Cadambi Publisher: INFOCOM th IEEE International.
Cross-Product Packet Classification in GNIFS based on Non-overlapping Areas and Equivalence Class Author: Mohua Zhang, Ge Li Publisher: AISS 2012 Presenter:
CS 208: Computing Theory Assoc. Prof. Dr. Brahim Hnich Faculty of Computer Sciences Izmir University of Economics.
Author : Randy Smith & Cristian Estan & Somesh Jha Publisher : IEEE Symposium on Security & privacy,2008 Presenter : Wen-Tse Liang Date : 2010/10/27.
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 1 Regular Languages Some slides are in courtesy.
TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.
A Fast Regular Expression Matching Engine for NIDS Applying Prediction Scheme Author: Lei Jiang, Qiong Dai, Qiu Tang, Jianlong Tan and Binxing Fang Publisher:
LaFA Lookahead Finite Automata Scalable Regular Expression Detection Authors : Masanori Bando, N. Sertac Artan, H. Jonathan Chao Masanori Bando N. Sertac.
Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Publisher : ANCS’ 06 Author : Fang Yu, Zhifeng Chen, Yanlei Diao, T.V.
Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.
An Improved DFA for Fast Regular Expression Matching Author : Domenico Ficara 、 Stefano Giordano 、 Gregorio Procissi Fabio Vitucci 、 Gianni Antichi 、 Andrea.
Author : S. Kumar, B. Chandrasekaran, J. Turner, and G. Varghese Publisher : ANCS ‘07 Presenter : Jo-Ning Yu Date : 2011/04/20.
CS 154 Formal Languages and Computability February 9 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.
CSE 311 Foundations of Computing I Lecture 24 FSM Limits, Pattern Matching Autumn 2011 CSE 3111.
Nondeterministic Finite Automata (NFAs). Reminder: Deterministic Finite Automata (DFA) q For every state q in Q and every character  in , one and only.
Finite Automata Great Theoretical Ideas In Computer Science Victor Adamchik Danny Sleator CS Spring 2010 Lecture 20Mar 30, 2010Carnegie Mellon.
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
CSC-305 Design and Analysis of AlgorithmsBS(CS) -6 Fall-2014CSC-305 Design and Analysis of AlgorithmsBS(CS) -6 Fall-2014 Design and Analysis of Algorithms.
Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia
A DFA with Extended Character-Set for Fast Deep Packet Inspection
Non Deterministic Automata
Regular Expression Matching in Reconfigurable Hardware
Non Deterministic Automata
2019/5/3 A De-compositional Approach to Regular Expression Matching for Network Security Applications Author: Eric Norige Alex Liu Presenter: Yi-Hsien.
A Hybrid Finite Automaton for Practical Deep Packet Inspection
Presentation transcript:

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Authors: Fang Yu, Zhifeng Chen, Yanlei Diao, T. V. Lakshman, Randy H. Katz Publisher: ANCS(Architectures for Networking and Communications Systems ) Presenter: 楊皓中 Date: 2014/02/25

Introduction The inefficiency in regular expression matching is largely due to the fact that the current solutions are not optimized for the following three unique complex features of regular expressions used in network packet scanning applications. ◦ 1) Large numbers of wildcards can cause DFA to grow exponentially ◦ 2) Wildcard are used with length restriction(‘?’, ‘+’) will increase the resource ◦ 3) Groups of characters are also commonly used : The class of characters may intersect with other classes or wildcards. Such interaction can result in a highly complex state machine.

Introduction Make following contributions ◦ 1) Analyze the computational and storage cost of building individual DFAs ◦ Identify the structural characteristics of the regular expressions that lead to exponential growth of DFAs. ◦ 2) Two rewrite rules for specific regular expressions ◦ 3) Combine multiple DFAs into a small number of group to improve the matching speed

Regular Expression Patterns Compares the regular expressions used in two networking applications (Snort, Linux L-7 filter & XML filtering) ◦ 1)Both types of app. Use wildcards (‘.’,’?‘,’+’,’*’) contain larger numbers of them. ◦ 2) Classes of characters (“[ ]”) are used only in packet scanning applications. ◦ 3) High percentage of scanning app. Have length restrictions on some of the classes or wildcards.

Solution Space for Regular Expression Matching A single regular expression of length n can be expressed as an NFA with O(n) When the NFA is converted into a DFA, it may generate states The processing complexity for each character in the input is O(1) in DFA, but is O(n 2 ) for an NFA when all n states are active at the same time To handle m regular expressions, two choices are possible : ◦ Processing them individually in m automata ◦ Compiling them into a single automaton composite DFA reduces processing cost ◦ from O(m) (O(1) for each automaton) to O(1)

Problem statement Our goal is to achieve O(1) computation cost for each incoming character, which cannot be accomplished by any existing DFA-based solutions ◦ due to their excessive memory usage. Thus, the focus of the study is to reduce memory overhead of DFA while approaching the optimal processing speed of O(1) per character There are two sources of memory usage in DFAs : states and transitions ◦ We consider the number of states as the primary factor

Design Considerations Define Completeness of Matching Results : ◦ Exhaustive Matching : M(P,S)={substring S’ of S | S’ is accepted by the DFA of P}  It is expensive and often unnecessary to report all matching substrings  We propose a new concept, Non-overlapping Matching, that relaxes the requirements of exhaustive matching ◦ Non-overlapping Matching :  M(P, S) = {substring Si of S| ∀ Si, Sj accepted by the DFA of P, Si ∩ Sj = φ }.  those applications are mostly interested in knowing if certain attacks or application layer patterns appear in a packet ◦ Ex : ab* if input abbb non-overlapping matching will report one match instead of three ◦ Exhaustive Matching will report, ab, abb, abbb

Design Considerations Define DFA Execution Model for Substring Matching : ◦ We focus on patterns without ‘^’ attached at the beginning ◦ Repeater searches ◦ One-pass search – this approach can truly achieve O(1) computation cost per character

DFA Analysis for Individual Regular Expressions The study is based on the use of exhaustive matching and one-pass search

Case 4: DFA of Quadratic Size The quadratic complexity comes from the fact that ◦ the letter B overlaps with the class of character [^\n] hence, there is inherent ambiguity in the pattern: ◦ A second B letter can be matched either as part of B+, or as part of [^\n]{3} the DFA needs to remember the number of Bs it has seen and their locations

Case 5: DFA of Exponential Size AAB(AABBCD) is different from ABA(ABABCD) because a subsequence input BCD

Regular Expression Rewrites We have identified the typical patterns used in packet payload scanning that can cause the creation of large DFAs The commonality of the patterns amenable to rewrite is that their suffixes address length restricted occurrences of a class of characters that overlap with their prefixes propose two rewrite rules ◦ Rewrite Rule (1) ◦ Rewrite Rule (2) rewrites rules can convert these patterns into DFAs with their sizes successfully reduced from quadratic or exponential to only linear.

Rewrite Rule (1) General case where the suffix of a pattern contains a class of characters overlapping with its prefix and a length restriction “ ◦ ^SEARCH\s+[^\n]{1024} ” to “ ^SEARCH\s[^\n]{1024} ” ◦ “^A+[A-Z]{j}” to “^A [A-Z]{j}” We can prove match “^A+[A-Z]{j}” also match “^A [A-Z]{j}” ◦ EX : j = 3, input = AAAZZ, match ◦ input = AAA!Z, mismatch non-overlapping matches we successfully rewrote 17 similar patterns in the Snort rule set

Rewrite Rule (2) patterns like “.*AUTH\s[^\n]{100}” generate exponential numbers of states to keep track of all the AUTH\s subsequent to the fist AUTH\s. we do not need to keep track of the second AUTH\s. because ◦ (1) if there is a ‘\n’ character within the next 100 bytes, the return character must also be within 100 bytes to the second AUTH\s ◦ (2) if there is no ‘\n’ character within the next 100 bytes, the first AUTH\s and the following characters have already matched the pattern “([^A]|A[^U]|AU[^T]|AUT[^H]|AUTH[^\s]|AUTH\s [^\n]{0,99}\n)*AUTH\s[^\n]{100}”. After rewriting, the DFA only contains 106 states. applicable to 54 expressions in the Snort. 49 in the Bro rule set.

SELECTIVE GROUPING OF MULTIPLE PATTERNS it is well known that the computation complexity for processing m patterns reduces from O(m) to O(1) per character, when the m patterns are compiled into a single composite DFA. ◦ It is usually infeasible to compile a large set of patterns together due to the complicated interactions between patterns. Figure 6 shows a composite DFA for matching “.*AB.*CD” and “.*EF.*GH”. This DFA contains many states that did not exist in the individual DFAs. certain patterns interact with each other when compiled together, which can result in a large composite DFA

Regular Expressions Grouping Algorithms we propose algorithms to selectively partition m patterns to k groups such that patterns in each group do not adversely interact with each other. these algorithms reduce the computation complexity from O(m) to O(k) without causing extra memory usage. We first provide a formal definition of interaction: two patterns interact with each other if their composite DFA contains more states than the sum of two individual ones In multi-core architecture Our goal is to design an algorithm that divides regular expressions into several groups, so that one processing unit can run one or several composite DFAs. In addition, the size of local memory of each processing unit is quite limited.

Grouping Algorithm

EVALUTION RESULTS