StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date:

Slides:

Advertisements

Similar presentations

Enhanced matrix multiplication algorithm for FPGA Tamás Herendi, S. Roland Major UDT2012.

Advertisements

Multi-dimensional Packet Classification on FPGA: 100Gbps and Beyond

A HIGH-PERFORMANCE IPV6 LOOKUP ENGINE ON FPGA Author : Thilan Ganegedara, Viktor Prasanna Publisher : FPL 2013.

Fast Firewall Implementation for Software and Hardware-based Routers Lili Qiu, Microsoft Research George Varghese, UCSD Subhash Suri, UCSB 9 th International.

Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.

Implementation Approaches with FPGAs Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one.

HybridCuts: A Scheme Combining Decomposition and Cutting for Packet Classification Author: Wenjun Li, Xianfeng Li Publisher: 2013 IEEE 21 st Annual Symposium.

Hybrid Data Structure for IP Lookup in Virtual Routers Using FPGAs Authors: Oĝuzhan Erdem, Hoang Le, Viktor K. Prasanna, Cüneyt F. Bazlamaçcı Publisher:

Authors: Raphael Polig, Kubilay Atasu, and Christoph Hagleitner Publisher: FPL, 2013 Presenter: Chia-Yi, Chu Date: 2013/10/30 1.

A Memory-Efficient Reconfigurable Aho-Corasick FSM Implementation for Intrusion Detection Systems Authors: Seongwook Youn and Dennis McLeod Presenter:

400 Gb/s Programmable Packet Parsing on a Single FPGA Authors : Michael Attig 、 Gordon Brebner Publisher: 2011 Seventh ACM/IEEE Symposium on Architectures.

Characterization Presentation Neural Network Implementation On FPGA Supervisor: Chen Koren Maria Nemets Maxim Zavodchik

Efficient Multi-match Packet Classification with TCAM Fang Yu Randy H. Katz EECS Department, UC Berkeley {fyu,

Fast Filter Updates for Packet Classification using TCAM Authors: Haoyu Song, Jonathan Turner. Publisher: GLOBECOM 2006, IEEE Present: Chen-Yu Lin Date:

1 A Tree Based Router Search Engine Architecture With Single Port Memories Author: Baboescu, F.Baboescu, F. Tullsen, D.M. Rosu, G. Singh, S. Tullsen, D.M.Rosu,

Packet Classification on Multiple Fields Pankaj Gupta and Nick McKeown Stanford University {pankaj, September 2, 1999.

Parallel IP Lookup using Multiple SRAM-based Pipelines Authors: Weirong Jiang and Viktor K. Prasanna Presenter: Yi-Sheng, Lin ( 林意勝 ) Date:

Efficient Multi-Match Packet Classification with TCAM Fang Yu

1 DRES:Dynamic Range Encoding Scheme for TCAM Coprocessors Authors: Hao Che, Zhijun Wang, Kai Zheng and Bin Liu Publisher: IEEE Transactions on Computers,

1 Energy Efficient Packet Classification Hardware Accelerator Alan Kennedy, Xiaojun Wang HDL Lab, School of Electronic Engineering, Dublin City University.

SSA: A Power and Memory Efficient Scheme to Multi-Match Packet Classification Fang Yu 1 T. V. Lakshman 2 Martin Austin Motoyama 1 Randy H. Katz 1 1 EECS.

An Efficient IP Lookup Architecture with Fast Update Using Single-Match TCAMs Author: Jinsoo Kim, Junghwan Kim Publisher: WWIC 2008 Presenter: Chen-Yu.

Chapter 9 Classification And Forwarding. Outline.

GPGPU platforms GP - General Purpose computation using GPU

Juanjo Noguera Xilinx Research Labs Dublin, Ireland Ahmed Al-Wattar Irwin O. Irwin O. Kennedy Alcatel-Lucent Dublin, Ireland.

1 Efficient packet classification using TCAMs Authors: Derek Pao, Yiu Keung Li and Peng Zhou Publisher: Computer Networks 2006 Present: Chen-Yu Lin Date:

1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.

Block Permutations in Boolean Space to Minimize TCAM for Packet Classification Authors: Rihua Wei, Yang Xu, H. Jonathan Chao Publisher: IEEE INFOCOM,2012.

ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.

(TPDS) A Scalable and Modular Architecture for High-Performance Packet Classification Authors: Thilan Ganegedara, Weirong Jiang, and Viktor K. Prasanna.

LayeredTrees: Most Specific Prefix based Pipelined Design for On-Chip IP Address Lookups Author: Yeim-Kuau Chang, Fang-Chen Kuo, Han-Jhen Guo and Cheng-Chien.

Multi-dimensional Packet Classification on FPGA 100 Gbps and Beyond Author: Yaxuan Qi, Jeffrey Fong, Weirong Jiang, Bo Xu, Jun Li, Viktor Prasanna Publisher:

Timothy Whelan Supervisor: Mr Barry Irwin Security and Networks Research Group Department of Computer Science Rhodes University Hardware based packet filtering.

Towards a Billion Routing Lookups per Second in Software  Author: Marko Zec, Luigi, Rizzo Miljenko Mikuc  Publisher: SIGCOMM Computer Communication Review,

Multi-Field Range Encoding for Packet Classification in TCAM Author: Yeim-Kuan Chang, Chun-I Lee and Cheng-Chien Su Publisher: INFOCOM 2011 Presenter:

1 Towards Practical Architectures for SRAM-based Pipelined Lookup Engines Author: Weirong Jiang, Viktor K. Prasanna Publisher: INFOCOM 2010 Presenter:

1 Memory-Efficient and Scalable Virtual Routers Using FPGA Author: Hoang Le, Thilan Ganegedara and Viktor K. Prasanna Publisher: ACM/SIGDA FPGA '11 Presenter:

Author ： Ioannis Sourdis, Vasilis Dimopoulos, Dionisios Pnevmatikatos and Stamatis Vassiliadis Publisher ： ANCS’06 Presenter ： Zong-Lin Sie Date ： 2011/01/05.

A Smart Pre-Classifier to Reduce Power Consumption of TCAMs for Multi-dimensional Packet Classification Yadi Ma, Suman Banerjee University of Wisconsin-Madison.

High-Speed Policy-Based Packet Forwarding Using Efficient Multi-dimensional Range Matching Lakshman and Stiliadis ACM SIGCOMM 98.

Cross-Product Packet Classification in GNIFS based on Non-overlapping Areas and Equivalence Class Author: Mohua Zhang, Ge Li Publisher: AISS 2012 Presenter:

Author : Weirong Jiang, Yi-Hua E. Yang, and Viktor K. Prasanna Publisher : IPDPS 2010 Presenter : Jo-Ning Yu Date : 2012/04/11.

Memory-Efficient and Scalable Virtual Routers Using FPGA Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan,

Packet classification on Multiple Fields Authors: Pankaj Gupta and Nick McKcown Publisher: ACM 1999 Presenter: 楊皓中 Date: 2013/12/11.

Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.

Author: Weirong Jiang and Viktor K. Prasanna Publisher: ACM Symposium on Parallel Algorithms and Architectures, SPAA 2009 Presenter: Chin-Chung Pan Date:

Range Enhanced Packet Classification Design on FPGA Author: Yeim-Kuan Chang, Chun-sheng Hsueh Publisher: IEEE Transactions on Emerging Topics in Computing.

Parallel tree search: An algorithmic approach for multi- field packet classification Authors: Derek Pao and Cutson Liu. Publisher: Computer communications.

Packet Classification Using Multidimensional Cutting Sumeet Singh (UCSD) Florin Baboescu (UCSD) George Varghese (UCSD) Jia Wang (AT&T Labs-Research) Reviewed.

High Throughput and Programmable Online Traffic Classifier on FPGA Author: Da Tong, Lu Sun, Kiran Kumar Matam, Viktor Prasanna Publisher: FPGA 2013 Presenter:

Author: Weirong Jiang and Viktor K. Prasanna Publisher: The 18th International Conference on Computer Communications and Networks (ICCCN 2009) Presenter:

1 Space-Efficient TCAM-based Classification Using Gray Coding Authors: Anat Bremler-Barr and Danny Hendler Publisher: IEEE INFOCOM 2007 Present: Chen-Yu.

Author : Lynn Choi, Hyogon Kim, Sunil Kim, Moon Hae Kim Publisher/Conf : IEEE/ACM TRANSACTIONS ON NETWORKING Speaker : De yu Chen Data :

Author: Weirong Jiang, Viktor K. Prasanna Publisher: th IEEE International Conference on Application-specific Systems, Architectures and Processors.

Optimizing Packet Lookup in Time and Space on FPGA Author: Thilan Ganegedara, Viktor Prasanna Publisher: FPL 2012 Presenter: Chun-Sheng Hsueh Date: 2012/11/28.

Introduction to Intrusion Detection Systems. All incoming packets are filtered for specific characteristics or content Databases have thousands of patterns.

Packet Classification Using Multi- Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: 2013 IEEE 37th Annual Computer Software.

Hierarchical Hybrid Search Structure for High Performance Packet Classification Authors : O˜guzhan Erdem, Hoang Le, Viktor K. Prasanna Publisher : INFOCOM,

400 Gb/s Programmable Packet Parsing on a Single FPGA Author: Michael Attig 、 Gordon Brebner Publisher: ANCS 2011 Presenter: Chun-Sheng Hsueh Date: 2013/03/27.

Optimizing Interconnection Complexity for Realizing Fixed Permutation in Data and Signal Processing Algorithms Ren Chen, Viktor K. Prasanna Ming Hsieh.

Author: Yun R. Qu, Shijie Zhou, and Viktor K. Prasanna Publisher:

Backprojection Project Update January 2002

High-throughput Online Hash Table on FPGA

Toward Advocacy-Free Evaluation of Packet Classification Algorithms

Scalable Memory-Less Architecture for String Matching With FPGAs

A SRAM-based Architecture for Trie-based IP Lookup Using FPGA

Power-efficient range-match-based packet classification on FPGA

A SRAM-based Architecture for Trie-based IP Lookup Using FPGA

Authors: Ding-Yuan Lee, Ching-Che Wang, An-Yeu Wu Publisher: 2019 VLSI

MEET-IP Memory and Energy Efficient TCAM-based IP Lookup

Presentation transcript:

StrideBV: Single chip 400G+ packet classification Author: Thilan Ganegedara, Viktor K. Prasanna Publisher: HPSR 2012 Presenter: Chun-Sheng Hsueh Date: 2012/12/12

Introduction From a hardware perspective, the main bottleneck in implementing packet classification engines has been the amount of memory required to store the rule set. To solve this, various solutions have been proposed to reduce the memory footprint of rule set storage. Most of all, these solutions exploit some properties or features of the rule set to achieve memory efficiency

Introduction This work improves throughput as the primary goal while memory efficiency is kept secondary. StrideBV is a performance independent of rule set features, high- throughput(400+ Gbps) packet classification scheme based on field- split algorithm.

BACKGROUND AND RELATED WORK FPGAs are widely used in various networking applications. Even though the operating frequency of FPGA is low fine-grained pipelining can be used to dramatically improve the performance.

ALGORITHM AND CLASSIFICATION PROCESS Problem Definition Field-split Algorithm StrideBV: Algorithm Multi-match to Single-match StrideBV: Lookup Process and Hardware Architecture

Problem Definition The most widely used scheme is 5-field packet classification in which the following tuple of headers of each incoming packet is inspected: Source IP (SA), Destination IP (DA), Source Port (SP), Destination Port (DP), Protocol (PR)

Problem Definition Given a packet classification rule set that has N number of rules that considers d number of packet header fields, f0, f1,..., fd−1, devise: A lookup scheme whose performance is independent of the features or properties of rule set A hardware architecture to perform wire-speed packet classification for 400 Gbps and beyond

Field-split Algorithm

StrideBV: Algorithm This paper apply field-split algorithm to all the 5 fields. The challenge in doing the above is the pipeline length. The resulting pipeline length becomes In the case of 5-field packet classification, this amounts to 104 pipeline stages.

StrideBV: Algorithm Reducing pipeline length in this approach can be done using multiple bits (k bit stride) than a single-bit inspection. This can be performed in two different methods by storing: 1) Bit vectors corresponding to the 2^k combinations of the k bit stride and load a single bit vector per stage 2) 2×k bit vectors corresponding to the individual bits of the k bit stride and load multiple bit vectors per stage

StrideBV: Algorithm The first method consumes more memory while reducing memory bandwidth and second method saves memory at the cost of memory bandwidth.

StrideBV: Algorithm However, it should be noted that in the second case, in a single stage, k number of N bit AND operations need to be performed. This increases the amount of work to be done per stage which causes the clock period to increase. 1) Bit vectors corresponding to the 2^k combinations of the k bit stride and load a single bit vector per stage 2) 2×k bit vectors corresponding to the individual bits of the k bit stride and load multiple bit vectors per stage

StrideBV: Algorithm Since the goal of this paper is to implement a high-throughput packet classification engine, we opt for the first method at the cost of increasing the memory consumption

Multi-match to Single-match The output of the lookup engine is the bit-vector that indicates the matching rules for the input packet header. However, in packet classification, only the highest priority match is reported since routing is the main concern. The rules of a classifier is sorted in the order of decreasing priority. This task can be easily realized using a priority encoder. However, when the length of the bit vector increases, the time required to report the highest priority match increases significantly.

Multi-match to Single-match As a remedy, we introduce a Pipelined Priority Encoder (PPE). A PPE for a N bit-vector consists of logN number of stages and since the work done per stage is trivial, the PPE is able to operate at very high frequencies.

StrideBV: Lookup Process and Hardware Architecture The output of the stage memory is the N bit-vector corresponding to the input stride. This bit-vector is ANDed with the bit-vector from the preceding stage to produce the intermediate result.

StrideBV: Lookup Process and Hardware Architecture This process is implemented as a linear Static Random Access Memory (SRAM) based pipeline. The output of the initial pipeline is the multimatch result. In order to extract the highest-priority match, the StrideBV pipeline is followed by a PPE.

PERFORMANCE ANALYSIS ON FPGA In this section, we provide a detailed analysis of the StrideBV architecture under different configurations. The performance of the proposed architecture is measured in throughput, memory efficiency, power and resource usage. A state-of-art Xilinx Virtex-6 device (XC6VLX760) was used for the experiments and the results presented here are based on the post place-and-route performance. Since StrideBV does not rely on rule set features, to evaluate the performance, we used rule set sizes ranging from 32 to 512 rules, considering real-life firewall classifiers.

PERFORMANCE ANALYSIS - Throughput There were two options: Use 1) distributed RAM or 2) block RAM as stage memory. Here the tradeoff is memory size vs. clock period. This paper opted to use distributed RAM since the memory requirement for real-life classifiers in our application is less than the maximum distributed RAM available on the considered FPGA.

PERFORMANCE ANALYSIS - Throughput A single pipeline was not adequate to support 400 Gbps, this requires an operating frequency of 1.25 GHz, which current FPGA device do not support. For that, this paper employed 4 pipelines for rule set sizes less than 512 and 6 pipelines for rule set size 512.

PERFORMANCE ANALYSIS - Throughput Figure 3 shows the throughput variation with the size of the classifier for various stride sizes for minimum packet size (40 Bytes).

PERFORMANCE ANALYSIS - Memory Efficiency The author employed only the distributed RAM (built using logic) in this work. Since it use dual-ported stage memory, to implement 4 and 6 parallel pipelines, to calculate the total memory consumption, multiplication factors of 2× and 3× has to be introduced, respectively.

PERFORMANCE ANALYSIS – Power Per Unit Throughput To measure power consumption of our device, we used the XPower Analyzer tool available in the Xilinx ISE 12.4 suite. Using a small stride size yields lower power efficiency. This is mainly because of the extensive resource usage.

PERFORMANCE ANALYSIS - Resource Consumption

Comparison with Existing Approaches Comparing the worst case performance of several existing solutions to illustrate the benefits of StrideBV. For this evaluation, we considered a 5-field classification rule set with 512 rules for all the schemes.