A R EAL -T IME P ACKET S CAN A RCHITECTURE Tim Sherwood UC Santa Barbara.

Slides:



Advertisements
Similar presentations
Analysis of Computer Algorithms
Advertisements

Deep Packet Inspection: Where are We? CCW08 Michela Becchi.
Computer Architecture
Presenter : Cheng-Ta Wu Kenichiro Anjo, Member, IEEE, Atsushi Okamura, and Masato Motomura IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39,NO. 5, MAY 2004.
Accessing I/O Devices Processor Memory BUS I/O Device 1 I/O Device 2.
Fast and Scalable Pattern Matching for Content Filtering Sarang Dharmapurikar John Lockwood.
A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
Efficient Memory Utilization on Network Processors for Deep Packet Inspection Piti Piyachon Yan Luo Electrical and Computer Engineering Department University.
1 SECURE-PARTIAL RECONFIGURATION OF FPGAs MSc.Fisnik KRAJA Computer Engineering Department, Faculty Of Information Technology, Polytechnic University of.
Khaled A. Al-Utaibi  Computers are Every Where  What is Computer Engineering?  Design Levels  Computer Engineering Fields  What.
Authors: Raphael Polig, Kubilay Atasu, and Christoph Hagleitner Publisher: FPL, 2013 Presenter: Chia-Yi, Chu Date: 2013/10/30 1.
Reviewer: Jing Lu Gigabit Rate Packet Pattern- Matching Using TCAM Fang Yu, Randy H. Katz T. V. Lakshman UC Berkeley Bell Labs, Lucent ICNP’2004.
A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems Authors: Seongwook Youn and Dennis McLeod Presenter:
Using Cell Processors for Intrusion Detection through Regular Expression Matching with Speculation Author: C˘at˘alin Radu, C˘at˘alin Leordeanu, Valentin.
1 An Evolution of Pattern Matching within Network Intrusion Detection Systems Erik Anderson 9 November 2006.
Modified Data Structure of Aho-Corasick Project ECE-526 Spring 2006 Benfano Soewito, Ed Flanigan and John Pangrazio Southern Illinois University Carbondale.
Snort - an network intrusion prevention and detection system Student: Yue Jiang Professor: Dr. Bojan Cukic CS665 class presentation.
Deterministic Memory- Efficient String Matching Algorithms for Intrusion Detection Nathan Tuck, Timothy Sherwood, Brad Calder, George Varghese Department.
Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:
Efficient Multi-Match Packet Classification with TCAM Fang Yu
1 Gigabit Rate Multiple- Pattern Matching with TCAM Fang Yu Randy H. Katz T. V. Lakshman
ECE 526 – Network Processing Systems Design Network Security: string matching algorithm Chapter 17: George Varghese.
A Signature Match Processor Architecture for Network Intrusion Detection Janardhan Singaraju, Long Bu and John A. Chandy Electrical and Computer Engineering.
A High Throughput String Matching Architecture for Intrusion Detection and Prevention Lin Tan U of Illinois, Urbana Champaign Tim Sherwood UC, Santa Barbara.
Modified Data Structure of Aho-Corasick Project ECE-526 Spring 2006 Benfano Soewito, Ed Flanigan and John Pangrazio Southern Illinois University Carbondale.
Deep Packet Inspection with Regular Expression Matching Min Chen, Danny Guo {michen, CSE Dept, UC Riverside 03/14/2007.
Gnort: High Performance Intrusion Detection Using Graphics Processors Giorgos Vasiliadis, Spiros Antonatos, Michalis Polychronakis, Evangelos Markatos,
Hash, Don’t Cache: Fast Packet Forwarding for Enterprise Edge Routers Minlan Yu Princeton University Joint work with Jennifer.
Department of Electrical and Computer Engineering Kekai Hu, Harikrishnan Chandrikakutty, Deepak Unnikrishnan, Tilman Wolf, and Russell Tessier Department.
Improving Signature Matching using Binary Decision Diagrams Liu Yang, Rezwana Karim, Vinod Ganapathy Rutgers University Randy Smith Sandia National Labs.
RAID2005 CardGuard: Towards software-based signature detection for intrusion prevention on the network card Herbert Bos and Kaiming Huang presented by.
Sarang Dharmapurikar With contributions from : Praveen Krishnamurthy,
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
A High Throughput String Matching Architecture for Intrusion Detection and Prevention Lin Tan, Timothy Sherwood Appeared in ISCA 2005 Presented by: Sailesh.
© Janice Regan, CMPT 300, May CMPT 300 Introduction to Operating Systems Principles of I/0 hardware.
Scalable and Efficient Data Streaming Algorithms for Detecting Common Content in Internet Traffic Minho Sung Networking & Telecommunications Group College.
Sujayyendhiren RS, Kaiqi Xiong and Minseok Kwon Rochester Institute of Technology Motivation Experimental Setup in ProtoGENI Conclusions and Future Work.
An Improved Algorithm to Accelerate Regular Expression Evaluation Author: Michela Becchi, Patrick Crowley Publisher: 3rd ACM/IEEE Symposium on Architecture.
Timothy Whelan Supervisor: Mr Barry Irwin Security and Networks Research Group Department of Computer Science Rhodes University Hardware based packet filtering.
Space-Time Tradeoffs in Software-Based Deep Packet Inspection Anat Bremler-Barr Yotam Harchol ⋆ David Hay IDC Herzliya, Israel Hebrew University, Israel.
Vladimír Smotlacha CESNET Full Packet Monitoring Sensors: Hardware and Software Challenges.
Jump to first page One-gigabit Router Oskar E. Bruening and Cemal Akcaba Advisor: Prof. Agarwal.
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
TRANSITION DIAGRAM BASED LEXICAL ANALYZER and FINITE AUTOMATA Class date : 12 August, 2013 Prepared by : Karimgailiu R Panmei Roll no. : 11CS10020 GROUP.
1 Optimization of Regular Expression Pattern Matching Circuits on FPGA Department of Computer Science and Information Engineering National Cheng Kung University,
StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:
4/19/20021 TCPSplitter: A Reconfigurable Hardware Based TCP Flow Monitor David V. Schuehler.
Department of Computer Science and Engineering Applied Research Laboratory Architecture for a Hardware Based, TCP/IP Content Scanning System David V. Schuehler.
A Resource Efficient Content Inspection System for Next Generation Smart NICs Karthikeyan Sabhanatarajan, Ann Gordon-Ross* The Energy Efficient Internet.
Workpackage 3 New security algorithm design ICS-FORTH Ipswich 19 th December 2007.
Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan.
TCAM –BASED REGULAR EXPRESSION MATCHING SOLUTION IN NETWORK Phase-I Review Supervised By, Presented By, MRS. SHARMILA,M.E., M.ARULMOZHI, AP/CSE.
Author : Sarang Dharmapurikar, John Lockwood Publisher : IEEE Journal on Selected Areas in Communications, 2006 Presenter : Jo-Ning Yu Date : 2010/12/29.
High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.
A Scalable Architecture For High-Throughput Regular-Expression Pattern Matching Yao Song 11/05/2015.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
A Fast Regular Expression Matching Engine for NIDS Applying Prediction Scheme Author: Lei Jiang, Qiong Dai, Qiu Tang, Jianlong Tan and Binxing Fang Publisher:
1 Device Controller I/O units typically consist of A mechanical component: the device itself An electronic component: the device controller or adapter.
Gnort: High Performance Network Intrusion Detection Using Graphics Processors Date:101/2/15 Publisher:ICS Author:Giorgos Vasiliadis, Spiros Antonatos,
Status and Plans for Xilinx Development
NFV Compute Acceleration APIs and Evaluation
Snort – IDS / IPS.
A DFA with Extended Character-Set for Fast Deep Packet Inspection
CSE7701: Research Seminar on Networking
Scalable Memory-Less Architecture for String Matching With FPGAs
Implementing an OpenFlow Switch on the NetFPGA platform
Design principles for packet parsers
ARM920T Processor This training module provides an introduction to the ARM920T processor embedded in the AT91RM9200 microcontroller.We’ll identify the.
Presentation transcript:

A R EAL -T IME P ACKET S CAN A RCHITECTURE Tim Sherwood UC Santa Barbara

Big Questions Can my system be optimized further? If so, then how and when? How much benefit can I expect? Have I seen this behavior before? Is my system working correctly? Soft errors, backdoors, hardware bugs Am I under attack? If so, then by whom? Am I witness to an attack? Online Monitors

To Protect and Serve Our machines are constantly under attack Cannot rely on end users, we need networks which actively defend themselves. This requires the protection system to be able to operate at 10 to 40 Gb/s. (We aim at current and next generation networks.) IDS/IPS are promising ways of providing protection Market for such systems: $918.9 million by the end of Snort: an widely accepted open source IDS

The Problem Our computing infrastructure is fast Processors ~10 9 instructions/second Network Routers ~10 9 bytes/second Beyond our ability to monitor naively Full traces are near impossible to gather Sampling may miss important data Intrusive monitoring will change data New Architectures are Required

Why a new Computer Architecture Latency Common Case Design for worst case stream –Network vendors chip by wire rate –Denial of service and reliability –Caches are no help Throughput is critical –40 Gigabit link = Packet out every 8 ns –Each packet needs multiple memory ref

Packet Scan Architecture High Performance Packet Scan Architecture Underlying primitives to support high-throughput monitors Algorithm – Architecture co-design Example primitive: String Matching 0.4MB and 10Gbps for Snort rule set ( >10,000 characters) Bit-Split String Matching Algorithm Reduces out edges from 256 to 2. Formal language – correctness and efficiency Memory Tile Based Design Memory throughput is the key Data is distributed over tiles with bounded contention Performance/area beats the best techniques we examined by a factor of 10 or more.

Packet Scan Architecture String Matching Bit-Split String Matching Algorithm A Memory Tile Based Architecture Building a Real System Is it really correct? Future Work examine packet content

Scanning for Intrusions Most IDS define a set of rules. A string defines a suspicious transmission. We are not building a full IDS, rather building the primitives from which full systems can be built CodeRed worm: web flow established uricontent with /root.exe Traffic InTraffic Out Scan Software IDS

Multiple String Matching The multiple string matching algorithm: Input: A set of strings/patterns S, and a buffer b Output: Every occurrence of an element of S in b Extra constraint: b is really a stream How to implement: Option 1) search for each string independently Option 2) combine strings together and search all at once A B A string can be anywhere in the payload of a packet. A B D F C A B Input: A BC A Strings:

Why hardware Snort: >1,000 rules, growing at 1 rule/day or more Active research into automated rule building Strings are not limited to be just [a-z]+ We need a high speed string matching technique with stringent worst case performance. Many algorithms are targeted for average case performance. Aho-Corasick can scan once and output all matches. But it is too big to be on-chip.

The Aho-Corasick Algorithm Given a finite set P of patterns, build a deterministic finite automaton G accepting the set of all patterns in P.

The Aho-Corasick Algorithm An Aho/Corasick String Matching Automaton for a given finite set P of patterns is a (deterministic) finite automaton G accepting the set of all words containing a word of P as a suffix. G consists of the following components: finite set Q of states finite alphabet A Transition function g: Q × A Q + {fail} Failure Function h: Q Q + {fail} initial state q 0 in Q a set F of final states

On String Matching and Languages This should not be any big surprise P is a FL FL RL RL can be recognized by a RE RE can be simulated with an NFA An NFA can be simulated with a DFA This last step is the problem Aho and Corasick shows that for FL there is no exponential blow up in state

An AC Automaton Example Example: P = {he, she, his, hers} 0 1 h e s i h s ers Initial State Accepting State State Transition Function h S h h h h h S S S S S S i h r h The Construction: linear time. The search of all patterns in P: linear time (Edges pointing back to State 0 are not shown).

0 1 h e s i h s e r s h S h h h h h S S S S S S i h r h Matching on the example hxhers Only scan the input stream once. Input stream:

Linear Time: So what s the problem … … … … … 16, Next State Pointers How to implement it on chip? Problem: Size too big to be on-chip ~ 10,000 nodes 256 out edges per node Requires 16,384*256*14 = ~10MB Solution: partition into small state machines Less strings per machine Less out edges per machine

Packet Scan Architecture String Matching Bit-Split String Matching Algorithm A Memory Tile Based Architecture Building a Real System Is it really correct? Future Work many tiny FSM working together

An example P 0 = { he, she, his, hers }

An example P 0 = { he, she, his, hers } check for agreement

An example of Bit-Split P 0 = { he, she, his, hers } 0 1 h e s i h s e rs h S h h h h h S S S S S S i h r h (Edges pointing back to State 0 are not shown) b0 {0} P0P0 B 03 0 b1{ }0 1 b2{ },10, { } 0,3 { } 0,1,2,6 b3 1 b3{0,1,2,6} 0 1 b4{0,1,4} b6{0,1,2,5,6} b5{0,3,7,8} b7{0,3,9}

Compact State Set P 0 = { he, she, his, hers } 0 1 h e s i h s e rs h S h h h h h S S S S S S i h r h (Edges pointing back to State 0 are not shown). b0 { } P0P0 B 03 0 b1{ } 1 b2{ } 1 b3{ 2 } 0 b4 { } b6{ 2,5 } b5{7} b7{9}

An example of Bit-Split P 0 = { he, she, his, hers } (Edges pointing back to State 0 are not shown). P0P0 0 1 h e s i h s e r s h S h h h h h S S S S S S i h r h B 03 b0 {} b1{} b2{} b3{2} b4 {} b6{2,5} b5{7} b7{9} B 04 1 b8{2,7} b5 {} b0 {} b1{} b2{} b4{2} b3 {} b6{2,5} b9{9} 0 1 b7 {}

Nice Properties The number of states in B ij is rigorously bounded by the number of states in P i No exponential blow up in state Linear construction time Possible to traverse multiple edges at a time to multiply throughput

0 1 h e s i h s e r s h S h h h h h S S S S S S i h r h Matching on the example P0P0 B 03 b0 {} b1{} b2{} b3{2} b4 {} b6{2,5} b5{7} b7{9} B 04 1 b8{2,7} b5 {} b0 {} b1{} b2{} b4{2} b3 {} b6{2,5} b9{9} 0 1 b7 {} hxhe How do you combine the results from the different state machines? Only if all the state machines agree, is there actually a match. 2

Packet Scan Architecture String Matching Bit-Split String Matching Algorithm A Memory Tile Based Architecture Building a Real System Is it really correct? Future Work SRAM tiles implement FSM

Our Main Idea: Bit-Split Partition rules (P) into smaller sets (P 0 to P n ) Build AC state-machine for each subset For each DFA P i, rip state-machine apart into 8 tiny state-machines (B i0 through B i7 ) Each of which searches for 1 bit in the 8 bit encoding of an input character Only if all the different B machines agree can there actually a match

How to Implement The AC state machine is equivalent to the 8 tiny state machines. The 8 tiny state machines can run independently, which means in parallel Intersection done with bit-wise AND. 8 is intuitive but not optimal How to build a system to implement this algorithm? Our algorithm makes it feasible to be on-chip

A Hardware Implementation A rule module is equivalent to an AC state machine Rule modules, tiles are structurally equivalent All full match vectors are concatenated to indicate which strings are matched One tile stores one tiny bit-split state machine Rule Module 0 Tile 0 Tile 1 Tile 3 Tile 2 Full Match Vector 2-bit Input [0:1] Partial Match Vector 16 8 [6:7] [2:3] [4:5] 8 4 Next State Pointers Partial Match Vector … 3 decoder Input Current State 2 bits from each byte Partial Match Vector Config Data Output Latch 4:1 Mux 16 State Machine Tile Control Block Rule Module 1 Byte from Payload 8 … 2 Rule Module N 8 8 Complete Set of Matches for All Rules String Match Engine 16

An efficient Implementation PMV PMV PMV PMV Tile 0Tile 2 Tile 1Tile 3 Cycle 3e Cycle 2h Cycle 1x Cycle 0h h h x e h h x e h h e x h x h e e1100 h0000 x h e1111 h1110 x1000 h0000 e1000 h0000 x h e1000 h0000 x h Cycle 3 + P1000 Cycle 2 + P0000 Cycle 1 + P0000 Cycle 0 + P

An efficient Implementation PMV PMV PMV PMV Tile 0Tile 2 Tile 1Tile 3 Cycle 3e Cycle 2h Cycle 1x Cycle 0h h h x e h h x e h h e x h x h e e1100 h0000 x h e1111 h1110 x1000 h0000 e1000 h0000 x h e1000 h0000 x h Cycle 3 + P1000 Cycle 2 + P0000 Cycle 1 + P0000 Cycle 0 + P

Performance of Hardware

Key Metric: Throughput*Character/Area

Packet Scan Architecture String Matching Bit-Split String Matching Algorithm A Memory Tile Based Architecture Building a Real System Is it really correct? Future Work Integration and interfaces (FPGA)

Prototype Design Avalon Bus (50MHz, 12Gbps) Microprocessor (control/update) Ethernet Interface 100Mbps (promiscuous) DMA String Match Engine (~1Gbps) Device Drivers/ Application Layer clk reset cs address write data byte_in data_enabl e data_low data_high address we rst byte_in data_enabl e data address we rst clk vector out Connect to bus Reg Interface SM Core

Interface With Avalon Bus clk reset cs address write data byte_in data_enable data_low data_high address we rst byte_in data_enable data address we rst clk vector out sme_write_tile(Base_add, 0, 1, 0, 0x0001, 0x ) Module number Tile number address Upper data Lower data Connect to bus sme_send_byte( Base_add, byte_from_packet) This function is for initializing the memory in the string match engines This function is for sending actual data to the string match engine

Packet Scan Architecture String Matching Bit-Split String Matching Algorithm A Memory Tile Based Architecture Building a Real System Is it really correct? Future Work Proofs (yes)

A Formalization

Splits DFA as an NFA

Correctness stems from RL subset The above property is sufficient, is it necessary? Exploiting fixed wildcards is possible, what about more general patterns?

Packet Scan Architecture String Matching Bit-Split String Matching Algorithm A Memory Tile Based Architecture Building a Real System Is it really correct? Future Work Extensions and Applications

Primitives for Security Packet Address List Lookup Packet Address Range Query Packet Classification String Finding Regular Expression Finding Statefull Flow Monitors Packet Ordering

Related Work Software based Good for ~100Mb/s, common case FPGA-based Many schemes map rules down to a specialized circuit Near optimal utilization of hardware resources Implementing state machines on block-RAMs [Cho and Mangione- Smith] Concurrent to our work: mapping state machines to on-chip SRAM [Aldwairi et. al.] Bloom filters [Dharmapurikar et al.] Excellent filter in the common case TCAM-based Require all patterns to be shorter or equal to TCAM width Cutting long patterns: 2Gbps with 295KB TCAM [Yu et. al.]

Conclusions New Tile-based Architecture 0.4MB and 10Gbps for Snort rule set ( >10,000 characters) Possible to be used for other applications, e.g. IP lookups, packet classification. New Bit-split Algorithm: General purpose enough for many other applications, e.g. spam detection, peephole optimization, IP lookups, packet classification, etc. Feasible to be implemented on other tile-based architecture.

Thanks Lin Tan Brett Brotherton Prof. Ryan Kastner Prof. Ömer Egecioglu Shreyas Prasad, Shashi Mysore, Bita Mazloom, Ted Huffmire, Banit Argawal

All done.