An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:

Slides:

Advertisements

Similar presentations

CSE 311 Foundations of Computing I

Advertisements

4b Lexical analysis Finite Automata

CSC 361NFA vs. DFA1. CSC 361NFA vs. DFA2 NFAs vs. DFAs NFAs can be constructed from DFAs using transitions: Called NFA- Suppose M 1 accepts L 1, M 2 accepts.

An Efﬁcient Regular Expressions Compression Algorithm From A New Perspective Authors : Tingwen Liu,Yifu Yang,Yanbing Liu,Yong Sun,Li Guo Tingwen LiuYifu.

Finite Automata CPSC 388 Ellen Walker Hiram College.

Authors: Raphael Polig, Kubilay Atasu, and Christoph Hagleitner Publisher: FPL, 2013 Presenter: Chia-Yi, Chu Date: 2013/10/30 1.

1 Introduction to Computability Theory Lecture3: Regular Expressions Prof. Amos Israeli.

CS5371 Theory of Computation

1 The scanning process Goal: automate the process Idea: –Start with an RE –Build a DFA How? –We can build a non-deterministic finite automaton (Thompson's.

CS5371 Theory of Computation Lecture 6: Automata Theory IV (Regular Expression = NFA = DFA)

CS 490: Automata and Language Theory Daniel Firpo Spring 2003.

1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National.

1 Performing packet content inspection by longest prefix matching technology Authors: Nen-Fu Huang, Yen-Ming Chu, Yen-Min Wu and Chia- Wen Ho Publisher:

Memory-Efficient Regular Expression Search Using State Merging Department of Computer Science and Information Engineering National Cheng Kung University,

 Author: Tsern-Huei Lee  Publisher: 2009 IEEE Transation on Computers  Presenter: Yuen-Shuo Li  Date: 2013/09/18 1.

Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions Copyright 2003, Keith D. Cooper, Ken Kennedy & Linda Torczon, all rights reserved.

Lexical Analysis — Part II: Constructing a Scanner from Regular Expressions.

1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.

REGULAR LANGUAGES.

Overview of Previous Lesson(s) Over View  Strategies that have been used to implement and optimize pattern matchers constructed from regular expressions.

An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.

Lexical Analysis Constructing a Scanner from Regular Expressions.

2. Scanning College of Information and Communications Prof. Heejin Park.

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Authors: Fang Yu, Zhifeng Chen, Yanlei Diao, T. V. Lakshman, Randy H.

TFA : A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Tang Song and H. Jonathan Chao Publisher: Technical.

4b 4b Lexical analysis Finite Automata. Finite Automata (FA) FA also called Finite State Machine (FSM) –Abstract model of a computing entity. –Decides.

Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:

TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.

Transition Diagrams Lecture 3 Wed, Jan 21, Building Transition Diagrams from Regular Expressions A regular expression consists of symbols a, b,

StriD 2 FA: Scalable Regular Expression Matching for Deep Packet Inspection Author: Xiaofei Wang, Junchen Jiang, Yi Tang, Bin Liu, and Xiaojun Wang Publisher:

Sampling Techniques to Accelerate Pattern Matching in Network Intrusion Detection Systems Author ： Domenico Ficara, Gianni Antichi, Andrea Di Pietro, Stefano.

INFAnt: NFA Pattern Matching on GPGPU Devices Author: Niccolo’ Cascarano, Pierluigi Rolando, Fulvio Risso, Riccardo Sisto Publisher: ACM SIGCOMM 2010 Presenter:

CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.

Overview of Previous Lesson(s) Over View  Symbol tables are data structures that are used by compilers to hold information about source-program constructs.

Extending Finite Automata to Efficiently Match Perl-Compatible Regular Expressions Publisher : Conference on emerging Networking EXperiments and Technologies.

TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.

Memory-Efficient Regular Expression Search Using State Merging Author: Michela Becchi, Srihari Cadambi Publisher: INFOCOM th IEEE International.

Cross-Product Packet Classification in GNIFS based on Non-overlapping Areas and Equivalence Class Author: Mohua Zhang, Ge Li Publisher: AISS 2012 Presenter:

Author ： Randy Smith & Cristian Estan & Somesh Jha Publisher ： IEEE Symposium on Security & privacy,2008 Presenter ： Wen-Tse Liang Date ： 2010/10/27.

Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 1 Regular Languages Some slides are in courtesy.

TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.

A Fast Regular Expression Matching Engine for NIDS Applying Prediction Scheme Author: Lei Jiang, Qiong Dai, Qiu Tang, Jianlong Tan and Binxing Fang Publisher:

LaFA Lookahead Finite Automata Scalable Regular Expression Detection Authors : Masanori Bando, N. Sertac Artan, H. Jonathan Chao Masanori Bando N. Sertac.

Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Publisher : ANCS’ 06 Author : Fang Yu, Zhifeng Chen, Yanlei Diao, T.V.

Chapter 2 Scanning. Dr.Manal AbdulazizCS463 Ch22 The Scanning Process Lexical analysis or scanning has the task of reading the source program as a file.

An Improved DFA for Fast Regular Expression Matching Author ： Domenico Ficara 、 Stefano Giordano 、 Gregorio Procissi Fabio Vitucci 、 Gianni Antichi 、 Andrea.

using Deterministic Finite Automata & Nondeterministic Finite Automata

CS 154 Formal Languages and Computability February 9 Class Meeting Department of Computer Science San Jose State University Spring 2016 Instructor: Ron.

Overview of Previous Lesson(s) Over View  A token is a pair consisting of a token name and an optional attribute value.  A pattern is a description.

98 Nondeterministic Automata vs Deterministic Automata We learned that NFA is a convenient model for showing the relationships among regular grammars,

CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.

LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:

Series DFA for Memory- Efficient Regular Expression Matching Author: Tingwen Liu, Yong Sun, Li Guo, and Binxing Fang Publisher: CIAA 2012( International.

Theory of Computation Automata Theory Dr. Ayman Srour.

Packet Classification Using Multi- Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: 2013 IEEE 37th Annual Computer Software.

1 Section 11.2 Finite Automata Can a machine(i.e., algorithm) recognize a regular language? Yes! Deterministic Finite Automata A deterministic finite automaton.

Deterministic Finite Automata Nondeterministic Finite Automata.

Chapter 2-II Scanning Sung-Dong Kim Dept. of Computer Engineering, Hansung University.

CS412/413 Introduction to Compilers Radu Rugina Lecture 3: Finite Automata 25 Jan 02.

Lecture 2 Compiler Design Lexical Analysis By lecturer Noor Dhia

Reorganized and Compact DFA for Efficient Regular Expression Matching

A DFA with Extended Character-Set for Fast Deep Packet Inspection

Lexical analysis Finite Automata

Two issues in lexical analysis

Recognizer for a Language

Chapter 2 FINITE AUTOMATA.

Finite Automata.

4b Lexical analysis Finite Automata

4b Lexical analysis Finite Automata

Presentation transcript:

An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher: INFOCOM, 2011  Presenter: Yuen-Shuo Li  Date: 2013/02/27 1

Background  Deep packet inspection (DPI) is widely recognized as a powerful and important technology used in network security and application-specific services.  e.g. Firewalls, traffic monitoring, packet classifier.  Currently, regular expressions are replacing exact strings to describe patterns in most popular software tools, because their expressive power, simplicity and flexibility for expressing signatures. 2

About DFA Deterministic Finite Automata (DFA) and Nondeterministic Finite Automata (NFA) are two classical equivalent representation of regular expressions. DFA is the preferred representation to perform deep packet inspection in high- speed network environments.  it triggers only one state transition (one corresponding memory access) for each input symbol processed  it is possible to compile multiple regular expressions into a composite DFA which can inspect the input in a single pass. 3

About DFA Unfortunately, the use of DFAs demand for a large memory space to store state transition tables for current sets of regular expressions. 4 Big

Our goal In this paper, we focus on reducing memory usage of composite DFAs by compressing transitions. 5

CSCA  We introduce a new method, named Cluster-based Splitting Compression Algoriithm(CSCA)

get a unique determinate trie-tree after traversing DFA by level, if we stipulate that we traverse the son states by the label character from small to large. 7 7

Cluster In the trie-tree, if state r has a transition to state s, we call r the father state of s. Conversely s is the son state of r. A states set is called a cluster if it is composed of all son states of a certain state. 8

We divided all the transitions and store them into three different matrixes T1, T2, T3. 9 sparse matrix

Combinative Row  In matrix M (M is T1 or T2), for row s, if there is a row r which  M[r, c] = X or M[s, c] = X  M[r, c] = M[s, c],  we say that row r is a combinative row of row s. 10 StateAB^ 0X34 12X4

If rows r and s are combinative row, we process them according to the rules as follows:  for character c,  if M[s, c] = X, reset M[s, c] = M[r, c]  if M[s, c] = M[r, c] or M[r, c] = X, keep M[s, c] unchanged 11 StateAB^ 0X34 12X4 AB^ 0234

 find base 12 StateAB^ 0X23 13X4 AB^Base 0X012 10X13 combinative row

 add a new index array equal, and set equal[r] = s, meaning that row r now equals with row s, and we can get the value of row r from row s equally  delete row r in matrix M. 13 StateAB^Base 0X012 10X13 StateAB^Baseequal 0X

 The main idea of compressing matrixes T1 and T2 is: convert the matrix into an offset matrix, which can generate many combinative rows, and then merge them in order to reduce memory usage. 14 T3 is a sparse, in this paper we do not discuss how to compress it.

 The lookup function need to decide which matrix the next state is in first, so we add two bitmaps to distinguish T1 and T2.  We can quickly get the information by adding two bitmaps to distinguish three parts. 15 T1 T3 T2

 One advantage of our work is that it is orthogonal to many previous compression schemes.  Because our work focuses on utilizing the transition characteristic inside states and reducing memory usage by extracting the base value of each cluster, while previous schemes almost are based on the transition characteristic among states. 16

EXPERIMENT RESULTS Pattern sets 17

18 n : the row number of matrix T n1 (n2) :the row number of matrix R1(R2) R1(R2): offset matrixes base1(2): a int-type array equal1(2) : an array r: ratio of effective elements in T3

 We extract base value for DFA matrixes of regular expression groups to get the corresponding offset-matrixes, and then we compress DFA matrixes and offset-matrixes with previous compression schemes.  The result is shown in Table V, the value in which means the ratio of effect transitions after compressing DFAs. 19

20