High-Performance Packet Classification on GPU Author: Shijie Zhou, Shreyas G. Singapura and Viktor K. Prasanna Publisher: HPEC 2014 Presenter: Gang Chi.

Slides:



Advertisements
Similar presentations
Deep Packet Inspection with DFA-trees and Parametrized Language Overapproximation Author: Daniel Luchaup, Lorenzo De Carli, Somesh Jha, Eric Bach Publisher:
Advertisements

Scalable Packet Classification Using Hybrid and Dynamic Cuttings Authors : Wenjun Li,Xianfeng Li Publisher : Engineering Lab on Intelligent Perception.
Optimizing Regular Expression Matching with SR-NFA on Multi-Core Systems Authors : Yang, Y.E., Prasanna, V.K. Yang, Y.E. Prasanna, V.K. Publisher : Parallel.
An Efficient and Scalable Pattern Matching Scheme for Network Security Applications Department of Computer Science and Information Engineering National.
Memory-Efficient Regular Expression Search Using State Merging Department of Computer Science and Information Engineering National Cheng Kung University,
Gregex: GPU based High Speed Regular Expression Matching Engine Date:101/1/11 Publisher:2011 Fifth International Conference on Innovative Mobile and Internet.
OpenFlow-Based Server Load Balancing GoneWild Author : Richard Wang, Dana Butnariu, Jennifer Rexford Publisher : Hot-ICE'11 Proceedings of the 11th USENIX.
HybridCuts: A Scheme Combining Decomposition and Cutting for Packet Classification Authors : Wenjun Li, Xianfeng Li Publisher : 2013 IEEE 21st Annual Symposium.
Packet Classification using Rule Caching Author: Nitesh B. Guinde, Roberto Rojas-Cessa, Sotirios G. Ziavras Publisher: IISA, 2013 Fourth International.
Fast forwarding table lookup exploiting GPU memory architecture Author : Youngjun Lee,Minseon Jeong,Sanghwan Lee,Eun-Jin Im Publisher : Information and.
Packet Classification Using Multi-Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: COMPSACW, 2013 IEEE 37th Annual (Computer.
(TPDS) A Scalable and Modular Architecture for High-Performance Packet Classification Authors: Thilan Ganegedara, Weirong Jiang, and Viktor K. Prasanna.
Leveraging Traffic Repetitions for High- Speed Deep Packet Inspection Author: Anat Bremler-Barr, Shimrit Tzur David, Yotam Harchol, David Hay Publisher:
HPCLatAm 2013 HPCLatAm 2013 Permutation Index and GPU to Solve efficiently Many Queries AUTORES  Mariela Lopresti  Natalia Miranda  Fabiana Piccoli.
A Regular Expression Matching Algorithm Using Transition Merging Department of Computer Science and Information Engineering National Cheng Kung University,
High-Speed Packet Classification Using Binary Search on Length Authors: Hyesook Lim and Ju Hyoung Mun Presenter: Yi-Sheng, Lin ( 林意勝 ) Date: Jan. 14, 2008.
A Hybrid IP Lookup Architecture with Fast Updates Author : Layong Luo, Gaogang Xie, Yingke Xie, Laurent Mathy, Kavé Salamatian Conference: IEEE INFOCOM,
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
EQC16: An Optimized Packet Classification Algorithm For Large Rule-Sets Author: Uday Trivedi, Mohan Lal Jangir Publisher: 2014 International Conference.
Scalable Many-field Packet Classification on Multi-core Processors Authors : Yun R. Qu, Shijie Zhou, Viktor K. Prasanna Publisher : International Symposium.
Deterministic Finite Automaton for Scalable Traffic Identification: the Power of Compressing by Range Authors: Rafael Antonello, Stenio Fernandes, Djamel.
DBS A Bit-level Heuristic Packet Classification Algorithm for High Speed Network Author : Baohua Yang, Xiang Wang, Yibo Xue, Jun Li Publisher : th.
Memory-Efficient Regular Expression Search Using State Merging Author: Michela Becchi, Srihari Cadambi Publisher: INFOCOM th IEEE International.
2017/4/26 Rethinking Packet Classification for Global Network View of Software-Defined Networking Author: Takeru Inoue, Toru Mano, Kimihiro Mizutani, Shin-ichi.
Memory-Efficient and Scalable Virtual Routers Using FPGA Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan,
Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.
Binary-tree-based high speed packet classification system on FPGA Author: Jingjiao Li*, Yong Chen*, Cholman HO**, Zhenlin Lu* Publisher: 2013 ICOIN Presenter:
Boundary Cutting for Packet Classification Author: Hyesook Lim, Nara Lee, Geumdan Jin, Jungwon Lee, Youngju Choi, Changhoon Yim Publisher: Networking,
A Fast Regular Expression Matching Engine for NIDS Applying Prediction Scheme Author: Lei Jiang, Qiong Dai, Qiu Tang, Jianlong Tan and Binxing Fang Publisher:
Lightweight Traffic-Aware Packet Classification for Continuous Operation Author: Shariful Hasan Shaikot, Min Sik Kim Presenter: Yen-Chun Tseng Date: 2014/11/26.
Range Enhanced Packet Classification Design on FPGA Author: Yeim-Kuan Chang, Chun-sheng Hsueh Publisher: IEEE Transactions on Emerging Topics in Computing.
PC-TRIO: A Power Efficient TACM Architecture for Packet Classifiers Author: Tania Banerjee, Sartaj Sahni, Gunasekaran Seetharaman Publisher: IEEE Computer.
Lossy Compression of Packet Classifiers Author: Ori Rottenstreich, J’anos Tapolcai Publisher: 2015 IEEE International Conference on Communications Presenter:
LaFA Lookahead Finite Automata Scalable Regular Expression Detection Authors : Masanori Bando, N. Sertac Artan, H. Jonathan Chao Masanori Bando N. Sertac.
Packet Classification Using Dynamically Generated Decision Trees
GFlow: Towards GPU-based High- Performance Table Matching in OpenFlow Switches Author : Kun Qiu, Zhe Chen, Yang Chen, Jin Zhao, Xin Wang Publisher : Information.
Author: Weirong Jiang and Viktor K. Prasanna Publisher: The 18th International Conference on Computer Communications and Networks (ICCCN 2009) Presenter:
LOP_RE: Range Encoding for Low Power Packet Classification Author: Xin He, Jorgen Peddersen and Sri Parameswaran Conference : IEEE 34th Conference on Local.
SRD-DFA Achieving Sub-Rule Distinguishing with Extended DFA Structure Author: Gao Xia, Xiaofei Wang, Bin Liu Publisher: IEEE DASC (International Conference.
Practical Multituple Packet Classification Using Dynamic Discrete Bit Selection Author: Baohua Yang, Fong J., Weirong Jiang, Yibo Xue, Jun Li Publisher:
Hierarchical Hybrid Search Structure for High Performance Packet Classification Authors : O˜guzhan Erdem, Hoang Le, Viktor K. Prasanna Publisher : INFOCOM,
LightFlow : Speeding Up GPU-based Flow Switching and Facilitating Maintenance of Flow Table Author : Nobutaka Matsumoto and Michiaki Hayashi Conference:
Scalable Multi-match Packet Classification Using TCAM and SRAM Author: Yu-Chieh Cheng, Pi-Chung Wang Publisher: IEEE Transactions on Computers (2015) Presenter:
JA-trie: Entropy-Based Packet Classification Author: Gianni Antichi, Christian Callegari, Andrew W. Moore, Stefano Giordano, Enrico Anastasi Conference.
A Multi-dimensional Packet Classification Algorithm Based on Hierarchical All-match B+ Tree Author: Gang Wang, Yaping Lin*, Jinguo Li, Xin Yao Publisher:
Author: Yun R. Qu, Shijie Zhou, and Viktor K. Prasanna Publisher:
Reorganized and Compact DFA for Efficient Regular Expression Matching
A DFA with Extended Character-Set for Fast Deep Packet Inspection
2018/6/26 An Energy-efficient TCAM-based Packet Classification with Decision-tree Mapping Author: Zhao Ruan, Xianfeng Li , Wenjun Li Publisher: 2013.
2018/11/19 Source Routing with Protocol-oblivious Forwarding to Enable Efficient e-Health Data Transfer Author: Shengru Li, Daoyun Hu, Wenjian Fang and.
SigMatch Fast and Scalable Multi-Pattern Matching
Parallel Processing Priority Trie-based IP Lookup Approach
2018/12/29 A Novel Approach for Prefix Minimization using Ternary trie (PMTT) for Packet Classification Author: Sanchita Saha Ray, Abhishek Chatterjee,
Binary Prefix Search Author: Yeim-Kuan Chang
2019/1/3 Exscind: Fast Pattern Matching for Intrusion Detection Using Exclusion and Inclusion Filters Next Generation Web Services Practices (NWeSP) 2011.
Memory-Efficient Regular Expression Search Using State Merging
Virtual TCAM for Data Center Switches
A Small and Fast IP Forwarding Table Using Hashing
Scalable Multi-Match Packet Classification Using TCAM and SRAM
A New String Matching Algorithm Based on Logical Indexing
Compact DFA Structure for Multiple Regular Expressions Matching
Power-efficient range-match-based packet classification on FPGA
Large-scale Packet Classification on FPGA
A Hybrid IP Lookup Architecture with Fast Updates
A SRAM-based Architecture for Trie-based IP Lookup Using FPGA
Authors: Ding-Yuan Lee, Ching-Che Wang, An-Yeu Wu Publisher: 2019 VLSI
2019/10/19 Efficient Software Packet Processing on Heterogeneous and Asymmetric Hardware Architectures Author: Eva Papadogiannaki, Lazaros Koromilas, Giorgos.
MEET-IP Memory and Energy Efficient TCAM-based IP Lookup
Towards TCAM-based Scalable Virtual Routers
Packet Classification Using Binary Content Addressable Memory
Presentation transcript:

High-Performance Packet Classification on GPU Author: Shijie Zhou, Shreyas G. Singapura and Viktor K. Prasanna Publisher: HPEC 2014 Presenter: Gang Chi Date: 2014/12/2 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C.

Introduction (1/2) This paper investigate GPU’s characteristics in parallelism and memory accessing, and implement our packet classifier using CUDA. The basic operations of this design are binary range-tree search and bitwise AND operation. Optimize the design by storing the range-trees using compact arrays without explicit pointers in shared memory. National Cheng Kung University CSIE Computer & Internet Architecture Lab 2

Introduction (2/2) When the size of rule set is 512, this design can achieve the throughput of 85 MPPS and the average processing latency of 4.9 us per packet. Compared with the implementation on the state-of-the-art multi-core platform, this design demonstrates 1.9x improvement with respect to throughput. National Cheng Kung University CSIE Computer & Internet Architecture Lab 3

CUDA Memory Model National Cheng Kung University CSIE Computer & Internet Architecture Lab 4 TypeLocationAccess cycleSize Global memoryOff-chip>1001~32GB per GPU L1 cacheOn-chip1~3216 or 48KB per SMX L2 cacheOn-chip1~3264KB per SMX RegistersOn-chipn/a32-bit x per SMX

Algorithm Phase 1: each thread examines N/K rules and produces a local classification result. Phase 2: the rule with the highest priority among the K local results is identified in logK steps. National Cheng Kung University CSIE Computer & Internet Architecture Lab 5

Pre-process Pre-process rules to construct a binary range-tree for each individual field. Every leaf node is assigned with BVs, which can infer which rules are matched when reaching the leaf node. National Cheng Kung University CSIE Computer & Internet Architecture Lab 6

Search Each thread performs binary range-tree search sequentially field by field. After 5 tree searches, 5 BVs are produced. Merge the 5 BVs by bitwise AND operation to obtain a final BV. The result is the index of first non-zero bit. Ex: BV=00100, Result=2 Ex: BV=00000, Result=65536 National Cheng Kung University CSIE Computer & Internet Architecture Lab 7

Search in Binary Range Tree National Cheng Kung University CSIE Computer & Internet Architecture Lab 8 Ex: Search 4

Identify Global Result National Cheng Kung University CSIE Computer & Internet Architecture Lab 9

Experimental Platform CUDA 5.0 Intel E x2 2.4 GHz 8-core NVIDIA K20 Kepler GPU MHz 13 SMX with total 2496 CUDA cores 5GB GDDR5 National Cheng Kung University CSIE Computer & Internet Architecture Lab 10

Latency and Throughput National Cheng Kung University CSIE Computer & Internet Architecture Lab 11

Comparison with implementation on Multi-core [8] S. Zhou, Y. Qu and V. K. Prasanna, “Multi-core implementation of decomposition-based packet classification algorithms,” in Parallel Computing Techniques (PaCT), pp , National Cheng Kung University CSIE Computer & Internet Architecture Lab 12