INFAnt: NFA Pattern Matching on GPGPU Devices Author: Niccolo’ Cascarano, Pierluigi Rolando, Fulvio Risso, Riccardo Sisto Publisher: ACM SIGCOMM 2010 Presenter:

Slides:



Advertisements
Similar presentations
School of EECS, Peking University “Advanced Compiler Techniques” (Fall 2011) SSA Guo, Yao.
Advertisements

4b Lexical analysis Finite Automata
CSC 361NFA vs. DFA1. CSC 361NFA vs. DFA2 NFAs vs. DFAs NFAs can be constructed from DFAs using transitions: Called NFA- Suppose M 1 accepts L 1, M 2 accepts.
Instructor Notes We describe motivation for talking about underlying device architecture because device architecture is often avoided in conventional.
DFA Minimization Jeremy Mange CS 6800 Summer 2009.
1 1 CDT314 FABER Formal Languages, Automata and Models of Computation Lecture 3 School of Innovation, Design and Engineering Mälardalen University 2012.
Exploiting Graphics Processors for High- performance IP Lookup in Software Routers Author: Jin Zhao, Xinya Zhang, Xin Wang, Yangdong Deng, Xiaoming Fu.
Instructor Notes This lecture discusses three important optimizations The performance impact of mapping threads to data on the GPU is subtle but extremely.
Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.
Courtesy Costas Busch - RPI1 Non Deterministic Automata.
Data Parallel Algorithms Presented By: M.Mohsin Butt
Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:
1 Regular expression matching with input compression : a hardware design for use within network intrusion detection systems Department of Computer Science.
Programming with CUDA WS 08/09 Lecture 9 Thu, 20 Nov, 2008.
1 Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Department of Computer Science and Information Engineering National.
1 Performing packet content inspection by longest prefix matching technology Authors: Nen-Fu Huang, Yen-Ming Chu, Yen-Min Wu and Chia- Wen Ho Publisher:
INFAnt: NFA Pattern Matching on GPGPU Devices Author: Niccolo’ Cascarano, Pierluigi Rolando, Fulvio Risso, and Riccardo Sisto Published in: ACM SIGCOMM.
1 Non-Deterministic Automata Regular Expressions.
CS Chapter 2. LanguageMachineGrammar RegularFinite AutomatonRegular Expression, Regular Grammar Context-FreePushdown AutomatonContext-Free Grammar.
Fall 2004COMP 3351 Another NFA Example. Fall 2004COMP 3352 Language accepted (redundant state)
Memory-Efficient Regular Expression Search Using State Merging Department of Computer Science and Information Engineering National Cheng Kung University,
Gregex: GPU based High Speed Regular Expression Matching Engine Date:101/1/11 Publisher:2011 Fifth International Conference on Innovative Mobile and Internet.
CuMAPz: A Tool to Analyze Memory Access Patterns in CUDA
Challenges Bit-vector approach Conclusion & Future Work A subsequence of a string of symbols is derived from the original string by deleting some elements.
An Improved Algorithm to Accelerate Regular Expression Evaluation Author : Michela Becchi 、 Patrick Crowley Publisher : ANCS’07 Presenter : Wen-Tse Liang.
MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.
1Computer Sciences Department. Book: INTRODUCTION TO THE THEORY OF COMPUTATION, SECOND EDITION, by: MICHAEL SIPSER Reference 3Computer Sciences Department.
REGULAR LANGUAGES.
An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.
Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Authors: Fang Yu, Zhifeng Chen, Yanlei Diao, T. V. Lakshman, Randy H.
TFA : A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Tang Song and H. Jonathan Chao Publisher: Technical.
An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:
GPU Architecture and Programming
GPEP : Graphics Processing Enhanced Pattern- Matching for High-Performance Deep Packet Inspection Author: Lucas John Vespa, Ning Weng Publisher: 2011 IEEE.
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
1 CD5560 FABER Formal Languages, Automata and Models of Computation Lecture 3 Mälardalen University 2010.
2. Regular Expressions and Automata 2007 년 3 월 31 일 인공지능 연구실 이경택 Text: Speech and Language Processing Page.33 ~ 56.
IP Routing Processing with Graphic Processors Author: Shuai Mu, Xinya Zhang, Nairen Zhang, Jiaxin Lu, Yangdong Steve Deng, Shu Zhang Publisher: IEEE Conference.
Limits of Instruction-Level Parallelism Presentation by: Robert Duckles CSE 520 Paper being presented: Limits of Instruction-Level Parallelism David W.
Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan.
CMSC 330: Organization of Programming Languages Finite Automata NFAs  DFAs.
Precomputation- based Prefetching By James Schatz and Bashar Gharaibeh.
Extending Finite Automata to Efficiently Match Perl-Compatible Regular Expressions Publisher : Conference on emerging Networking EXperiments and Technologies.
A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching Yao Song 11/05/2015.
Author : Randy Smith & Cristian Estan & Somesh Jha Publisher : IEEE Symposium on Security & privacy,2008 Presenter : Wen-Tse Liang Date : 2010/10/27.
Von Neumann Computers Article Authors: Rudolf Eigenman & David Lilja
Donghyun (David) Kim Department of Mathematics and Physics North Carolina Central University 1 Chapter 1 Regular Languages Some slides are in courtesy.
TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.
Lecture Notes 
Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Publisher : ANCS’ 06 Author : Fang Yu, Zhifeng Chen, Yanlei Diao, T.V.
An Improved DFA for Fast Regular Expression Matching Author : Domenico Ficara 、 Stefano Giordano 、 Gregorio Procissi Fabio Vitucci 、 Gianni Antichi 、 Andrea.
Author : S. Kumar, B. Chandrasekaran, J. Turner, and G. Varghese Publisher : ANCS ‘07 Presenter : Jo-Ning Yu Date : 2011/04/20.
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
My Coordinates Office EM G.27 contact time:
CS 404Ahmed Ezzat 1 CS 404 Introduction to Compiler Design Lecture 1 Ahmed Ezzat.
Advanced Algorithms for Fast and Scalable Deep Packet Inspection Author : Sailesh Kumar 、 Jonathan Turner 、 John Williams Publisher : ANCS’06 Presenter.
LECTURE 5 Scanning. SYNTAX ANALYSIS We know from our previous lectures that the process of verifying the syntax of the program is performed in two stages:
Theory of Computation Automata Theory Dr. Ayman Srour.
Deflating the Big Bang: Fast and Scalable Deep Packet Inspection with Extended Finite Automata Date:101/3/21 Publisher:SIGCOMM 08 Author:Randy Smith Cristian.
1 Chapter 2 Finite Automata (part a) Hokkaido, Japan.
Efficient Signature Matching with Multiple Alphabet Compression Tables Publisher : SecureComm, 2008 Author : Shijin Kong,Randy Smith,and Cristian Estan.
Theory of Computation Automata Theory Dr. Ayman Srour.
EECE571R -- Harnessing Massively Parallel Processors ece
A DFA with Extended Character-Set for Fast Deep Packet Inspection
Non Deterministic Automata
Chapter 2 FINITE AUTOMATA.
Advanced Algorithms for Fast and Scalable Deep Packet Inspection
Non Deterministic Automata
Finite Automata.
Presentation transcript:

iNFAnt: NFA Pattern Matching on GPGPU Devices Author: Niccolo’ Cascarano, Pierluigi Rolando, Fulvio Risso, Riccardo Sisto Publisher: ACM SIGCOMM 2010 Presenter: Ye-Zhi Chen Date: 2012/03/21

Introduction Pattern matching is commonly performed by expressing patterns as sets of regular expressions and converting them into nite state automata (FSAs) Two kinds of FSAs are known from the literature, deterministic (DFA) and non-deterministic (NFA) NFA :NFA traversal requires, by definition, non-deterministic choices that are hard to emulate on actual DFA :DFAs, while fast to execute, can be less space-efficient

Introduction software-based NFA implementations suer from a higher per-byte traversal cost when compared with DFAs: intuitively, this is because multiple NFA states can be active at any given step, while only a single one must be considered when processing DFAs. This paper presents iNFAnt, a NFA-based regular expression engine running on graphical processing units (GPUs).

CUDA Architecture CUDA devices are logically composed of arrays of single instruction, multiple threads (SIMT) processors In the SIMT paradigm each multiprocessor executes the same instruction simultaneously for multiple threads by assigning a different execution context to each core threads are said to diverge and the execution of groups of threads that go along different code paths is sequentialized by the scheduler

CUDA Architecture In order to use efficiently all the available bandwidth it is necessary to perform as few accesses as possible, as wide as the memory bus allows Since each thread typically accesses small amounts of memory at a time, a hardware controller tries to automatically coalesce many smaller memory accesses in fewer, larger transactions at run-time. CUDA devices all the addresses must fall within the same naturally- aligned 256-byte memory area.

iNFAnt Design classic automata traversal algorithms: input symbols must be processed sequentially and their randomness can lead to unpredictable branching and irregular access patterns Design guidelines: 1. Memory bandwidth is abundant 2. Memory space is scarce. 3. Threads are cheap. 4. Thread divergence is expensive.

NFA representation iNFAnt adopts an internal format for the FSA transition graph that we dubbed symbol-first representation: the system keeps a list of (source; destination) tuples, representing transitions

NFA representation In order to reduce the number of transitions to be stored, and also to speed up execution, iNFAnt adopts a special representation for self- looping states, i.e. those with an outgoing transition to themselves for each symbol of the alphabet These states are marked as persistent in a dedicated bit-vector and, once reached during a traversal and marked as active, they will never be reset. Bit-vectors containing current and future active state sets are stored in shared-memory.

Traversal algorithm Many packets are batched together and mapped 1:1 to CUDA blocks to be processed in parallel for each packet each thread examines a different transition among those pending for the current symbol when running the inner while loop (lines 6-12).

Traversal algorithm

NFA multistriding Multistriding is a transformation that repeatedly squares the input alphabet of a state machine and adjusts its transition graph accordingly After multistriding it is possible for the traversal algorithm to consider pairs of symbols at once, thus reducing global execution time. Squaring has multiple effects, most prominently producing an increase in both transition count and alphabet size. iNFAnt performs an alphabet compression step that removes any symbol that does not appear on transition labels and renames all the remaining ones based on equivalence classes.

EXPERIMENTAL EVALUATION The 'NFA++' data series reports results obtained by enabling the self-looping states optimization

EXPERIMENTAL EVALUATION