Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.

Slides:



Advertisements
Similar presentations
Computer Architecture
Advertisements

Systems and Technology Group © 2006 IBM Corporation Cell Programming Tutorial - JHD24 May 2006 Cell Programming Tutorial Jeff Derby, Senior Technical Staff.
Dr. Rabie A. Ramadan Al-Azhar University Lecture 3
4. Shared Memory Parallel Architectures 4.4. Multicore Architectures
An Efficient Regular Expressions Compression Algorithm From A New Perspective Authors : Tingwen Liu,Yifu Yang,Yanbing Liu,Yong Sun,Li Guo Tingwen LiuYifu.
Cell Broadband Engine. INF5062, Carsten Griwodz & Pål Halvorsen University of Oslo Cell Broadband Engine Structure SPE PPE MIC EIB.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Vector Processing. Vector Processors Combine vector operands (inputs) element by element to produce an output vector. Typical array-oriented operations.
Chapter 4 Conventional Computer Hardware Architecture
Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.
Data Parallel Algorithms Presented By: M.Mohsin Butt
Multiprocessors ELEC 6200: Computer Architecture and Design Instructor : Agrawal Name: Nam.
Computational Astrophysics: Methodology 1.Identify astrophysical problem 2.Write down corresponding equations 3.Identify numerical algorithm 4.Find a computer.
Improved TCAM-based Pre-Filtering for Network Intrusion Detection Systems Department of Computer Science and Information Engineering National Cheng Kung.
1 Multi-Core Architecture on FPGA for Large Dictionary String Matching Department of Computer Science and Information Engineering National Cheng Kung University,
Chapter 1 and 2 Computer System and Operating System Overview
Deep Packet Inspection with Regular Expression Matching Min Chen, Danny Guo {michen, CSE Dept, UC Riverside 03/14/2007.
Pipelining By Toan Nguyen.
Cache Memories Effectiveness of cache is based on a property of computer programs called locality of reference Most of programs time is spent in loops.
Computer performance.
Programming the Cell Multiprocessor Işıl ÖZ. Outline Cell processor – Objectives – Design and architecture Programming the cell – Programming models CellSs.
Cell Architecture. Introduction The Cell concept was originally thought up by Sony Computer Entertainment inc. of Japan, for the PlayStation 3 The architecture.
Introduction to the Cell multiprocessor J. A. Kahle, M. N. Day, H. P. Hofstee, C. R. Johns, T. R. Maeurer, D. Shippy (IBM Systems and Technology Group)
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
Parallelism Processing more than one instruction at a time. Pipelining
Sampling Techniques to Accelerate Pattern Matching in Network Intrusion Detection Systems Author: Domenico Ficara, Gianni Antichi, Andrea Di Pietro, Stefano.
Chapter 1 Computer System Overview Dave Bremer Otago Polytechnic, N.Z. ©2008, Prentice Hall Operating Systems: Internals and Design Principles, 6/E William.
Gedae Portability: From Simulation to DSPs to the Cell Broadband Engine James Steed, William Lundgren, Kerry Barnes Gedae, Inc
MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.
Exploiting Data Parallelism in SELinux Using a Multicore Processor Bodhisatta Barman Roy National University of Singapore, Singapore Arun Kalyanasundaram,
An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.
Programming Examples that Expose Efficiency Issues for the Cell Broadband Engine Architecture William Lundgren Gedae), Rick Pancoast.
Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Authors: Fang Yu, Zhifeng Chen, Yanlei Diao, T. V. Lakshman, Randy H.
TFA : A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Tang Song and H. Jonathan Chao Publisher: Technical.
Towards a Billion Routing Lookups per Second in Software  Author: Marko Zec, Luigi, Rizzo Miljenko Mikuc  Publisher: SIGCOMM Computer Communication Review,
An Efficient Regular Expressions Compression Algorithm From A New Perspective  Author: Tingwen Liu, Yifu Yang, Yanbing Liu, Yong Sun, Li Guo  Publisher:
Kevin Eady Ben Plunkett Prateeksha Satyamoorthy.
GPEP : Graphics Processing Enhanced Pattern- Matching for High-Performance Deep Packet Inspection Author: Lucas John Vespa, Ning Weng Publisher: 2011 IEEE.
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
Parallel architecture Technique. Pipelining Processor Pipelining is a technique of decomposing a sequential process into sub-processes, with each sub-process.
Author : Ioannis Sourdis, Vasilis Dimopoulos, Dionisios Pnevmatikatos and Stamatis Vassiliadis Publisher : ANCS’06 Presenter : Zong-Lin Sie Date : 2011/01/05.
IP Routing Processing with Graphic Processors Author: Shuai Mu, Xinya Zhang, Nairen Zhang, Jiaxin Lu, Yangdong Steve Deng, Shu Zhang Publisher: IEEE Conference.
Chapter 4 MARIE: An Introduction to a Simple Computer.
Computer Hardware A computer is made of internal components Central Processor Unit Internal External and external components.
INFAnt: NFA Pattern Matching on GPGPU Devices Author: Niccolo’ Cascarano, Pierluigi Rolando, Fulvio Risso, Riccardo Sisto Publisher: ACM SIGCOMM 2010 Presenter:
Processor Architecture
LYU0703 Parallel Distributed Programming on PS3 1 Huang Hiu Fung Wong Chung Hoi Supervised by Prof. Michael R. Lyu Department of Computer.
Memory-Efficient Regular Expression Search Using State Merging Author: Michela Becchi, Srihari Cadambi Publisher: INFOCOM th IEEE International.
EECB 473 Data Network Architecture and Electronics Lecture 1 Conventional Computer Hardware Architecture
Computer performance issues* Pipelines, Parallelism. Process and Threads.
Author : Randy Smith & Cristian Estan & Somesh Jha Publisher : IEEE Symposium on Security & privacy,2008 Presenter : Wen-Tse Liang Date : 2010/10/27.
Optimizing Ray Tracing on the Cell Microprocessor David Oguns.
TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.
Aarul Jain CSE520, Advanced Computer Architecture Fall 2007.
Fast and Memory-Efficient Regular Expression Matching for Deep Packet Inspection Publisher : ANCS’ 06 Author : Fang Yu, Zhifeng Chen, Yanlei Diao, T.V.
M211 – Central Processing Unit
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Amdahl’s Law & I/O Control Method 1. Amdahl’s Law The overall performance of a system is a result of the interaction of all of its components. System.
● Cell Broadband Engine Architecture Processor ● Ryan Layer ● Ben Kreuter ● Michelle McDaniel ● Carrie Ruppar.
Basic Computer Organization and Design
A DFA with Extended Character-Set for Fast Deep Packet Inspection
Cell Architecture.
An Introduction to Microprocessor Architecture using intel 8085 as a classic processor
Pipelining and Vector Processing
Array Processor.
Speculative Parallel Pattern Matching
Large data arrays processing on Cell Broadband Engine
Compact DFA Structure for Multiple Regular Expressions Matching
Design principles for packet parsers
Presentation transcript:

Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin Cristea Publisher: 2011 International Conference on Complex, Intelligent, and Software Intensive Systems Presenter: Ye-Zhi Chen Date: 2011/9/7 1

Introduction Main purpose :determine whether incoming network traffic matches known attack signatures Bottleneck : existing signature matching algorithms can scan only one byte at a time Intrusion Detection System (IDS) : an effective way to provide a degree of security to computers connected to network based on string matching. An Internet worm in an incoming network packet is usually identified by a string representing the executable program’s name in the packet payload 2

Introduction Hardware based solutions : FPGA implement specific string matching algorithms, making use of the high parallelism available Ex : Bloom filters, DFA Run an adapted Speculative Parallel Pattern Matching(SPPM) Algorithm on IBM Cell Broadband Engine (Cell BE) 3

Intrusion Detection System Three methodologies :signature-based 、 anomaly-based 、 stateful protocol analysis A. DFA matching : 1. Most signature databases contain several regular expressions which can be combined together into a single large DFA 2. DFAs for distinct signatures are combined into a single DFA that simultaneously represents all the signatures 3. A DFA is a quintuple (Σ; S; s 0 ; δ ; F) : Σ is the input alphabet ; S is a finite set of states; s 0 is the initial state belonging to S ; δ is the transition function ; F is the set of final or accepting states. If an accepting state has been reached then an attack signature has been found. 4

Intrusion Detection System 5 01 S1S1 S2S1 S2S2 S2

Intrusion Detection System 6 In the algorithm, the memory access to read the value at for a certain input character for a certain current state would take several processor cycles In the worst case, when the entire input string is scanned, the performance of the serial algorithm is at least M * | I | cycles, where| I | is the length of the input string and M is the number of processor cycles needed to read an input character multi-byte matching methods : In the ideal case, consuming B bytes of the input string at a time can result in a performance of M * | I | / B

Intrusion Detection System 7 B. Regular Expression Matching with Speculation 1. The main idea behind SPPM is to divide the input string into several chunks of the same size and process them in parallel 2. Initialization stage : the input string is split into two chunks and the state variables for the Primary and Secondary threads are initialized. 3. Parallel processing stage : they scan their private chunks in lockstep. If a match is found by either one of them then the algorithm terminates 4. Validation stage : the Primary continues to scan the Secondary’s chunk

Intrusion Detection System 8 Three possible outcomes arise: 1. A match is found and the algorithm returns success 2. Coupling occurs before the end of the second chunk 3. The entire second chunk is traversed again and no match is found Found at Parallel processing stage Found at Validation processing stage Not Found

Intrusion Detection System 9 This paper adapted the SPPM algorithm to make use of parallel hardware, using all the processing units available. The most favorable case : speedup factor would be K, which K is total number of processing units (in parallel stage) If a match is not found in the parallel processing stage, then a possible speedup gain could occur in the validation stage if the coupling between two right neighbors occurs. The least favorable case : when a match is not found and the entire input buffer is scanned, the complexity of the SPPM algorithm is the same with the one of the serial algorithm.

Intrusion Detection System 10

Cell Intrusion Detection 11 Cell processor can be split into four components: 1. External input and output structures 2. Power Processing Element (PPE) : main processor 3. Synergistic Processing Elements (SPEs) : Eight coprocessors 4. Element Interconnect Bus (EIB) :A specialized high bandwidth circular data bus connecting the PPE, input / output elements and the SPEs

Cell Intrusion Detection 12 PPE : A 64 bit PowerPC architecture based microprocessor It runs at a clock speed of 3.2 GHz. Running the O.S and coordinating the SPEs It has 32KB L1 cache 512KB L2 cache

Cell Intrusion Detection 13 SPE : Each SPE contains a Synergistic Processing Unit (SPU), memory flow controller, a memory management unit, a bus interface and an atomic unit RISC processor Each SPE has bits registers Support for Single Instruction Multiple Data (SIMD) instructions Suitable for efficient loop unrolling and instruction scheduling. Each SPE has 256 KB of local store memory (LS), which the SPU can access it directly Use DMA transfers, because SPEs can’t access directly the main memory of the PPE.

Cell Intrusion Detection 14 Three different programs to perform DFA matching : 1. single-threaded DFA 2. Using the speculative parallel pattern matching solution (2 SPEs) 3. Using the speculative parallel pattern matching solution(8 SPEs)

Cell Intrusion Detection 15

Cell Intrusion Detection 16 Implement Step 1 :Scan and Parse the input file and then bring the DFA Step 2 :Divide input string into several chunks of a specified length by an input string divider Step 3 :These chunks are then matched through the DFA

Cell Intrusion Detection 17 if the state is an accepting one, that fact is shown by the presence of the string a() after the state number

Cell Intrusion Detection 18 The parser uses three buffers to scan and parse the input file : The first one is used to store an entire line from the file. The second buffer is used to hold the state transition part of the line read The third buffer is used to hold each element of this state transition array and we store this value in the corresponding position in the DFA data structure.

Cell Intrusion Detection 19 DFA data structure : four main fields : 1. States 2. Final : an array of STATES_NO rows and SYMBOLS_NO_MIN columns 3. Start : starting state of DFA 4. STATES_NO : total number of states Additional field dummy :Because the DFA has a size greater than one maximum DMA transfer (16KB), we choose this field to have the remaining number of bytes to make the entire size of the structure multiple of 16KB

Cell Intrusion Detection 20 DFA matching for 2 Cell SPUs : 1. PPU waits for strings to process and divides them into two chunks 2. PPU passes the two chunks to the two SPUs(called Primary and Secondary) 3. SPUs run DFA matching algorithm and return the results to PPU. 4. Based on the result, PPU decides whether the Primary SPU should begin the validation stage. Parallel approach for 8 processing units : 1. divide the eight SPUs into four pairs of two which run the two-threaded speculative algorithm 2. Do the same thing described above

Cell Intrusion Detection 21 A DFA with more than 1500 states won’t fit into the local store of the SPUs Solution for large DFAs : 1. Made several input files containing smaller DFAs (550 states is sufficient) 2. By combining together these smaller DFAs, we obtain the large DFA 3. Used the double-buffering technique which consists in issuing a DMA transfer and not waiting for its completion

Result 22

Result 23