2019/10/19 Efficient Software Packet Processing on Heterogeneous and Asymmetric Hardware Architectures Author: Eva Papadogiannaki, Lazaros Koromilas, Giorgos.

Slides:



Advertisements
Similar presentations
Deep Packet Inspection with DFA-trees and Parametrized Language Overapproximation Author: Daniel Luchaup, Lorenzo De Carli, Somesh Jha, Eric Bach Publisher:
Advertisements

A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
Scalable IPv6 Lookup/Update Design for High-Throughput Routers Authors: Chung-Ho Chen, Chao-Hsien Hsu, Chen -Chieh Wang Presenter: Yi-Sheng, Lin ( 林意勝.
1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.
OpenFlow-Based Server Load Balancing GoneWild Author : Richard Wang, Dana Butnariu, Jennifer Rexford Publisher : Hot-ICE'11 Proceedings of the 11th USENIX.
High-Performance Packet Classification on GPU Author: Shijie Zhou, Shreyas G. Singapura and Viktor K. Prasanna Publisher: HPEC 2014 Presenter: Gang Chi.
Packet Classification using Rule Caching Author: Nitesh B. Guinde, Roberto Rojas-Cessa, Sotirios G. Ziavras Publisher: IISA, 2013 Fourth International.
Fast forwarding table lookup exploiting GPU memory architecture Author : Youngjun Lee,Minseon Jeong,Sanghwan Lee,Eun-Jin Im Publisher : Information and.
Packet Classification Using Multi-Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: COMPSACW, 2013 IEEE 37th Annual (Computer.
MIDeA :A Multi-Parallel Instrusion Detection Architecture Author: Giorgos Vasiliadis, Michalis Polychronakis,Sotiris Ioannidis Publisher: CCS’11, October.
Leveraging Traffic Repetitions for High- Speed Deep Packet Inspection Author: Anat Bremler-Barr, Shimrit Tzur David, Yotam Harchol, David Hay Publisher:
A Hybrid IP Lookup Architecture with Fast Updates Author : Layong Luo, Gaogang Xie, Yingke Xie, Laurent Mathy, Kavé Salamatian Conference: IEEE INFOCOM,
Parallelization and Characterization of Pattern Matching using GPUs Author: Giorgos Vasiliadis 、 Michalis Polychronakis 、 Sotiris Ioannidis Publisher:
Adaptive Multi-Threading for Dynamic Workloads in Embedded Multiprocessors 林鼎原 Department of Electrical Engineering National Cheng Kung University Tainan,
Deterministic Finite Automaton for Scalable Traffic Identification: the Power of Compressing by Range Authors: Rafael Antonello, Stenio Fernandes, Djamel.
DBS A Bit-level Heuristic Packet Classification Algorithm for High Speed Network Author : Baohua Yang, Xiang Wang, Yibo Xue, Jun Li Publisher : th.
Memory-Efficient Regular Expression Search Using State Merging Author: Michela Becchi, Srihari Cadambi Publisher: INFOCOM th IEEE International.
Research on TCAM-based OpenFlow Switch Author: Fei Long, Zhigang Sun, Ziwen Zhang, Hui Chen, Longgen Liao Conference: 2012 International Conference on.
Selective Packet Inspection to Detect DoS Flooding Using Software Defined Networking Author : Tommy Chin Jr., Xenia Mountrouidou, Xiangyang Li and Kaiqi.
Memory-Efficient and Scalable Virtual Routers Using FPGA Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan,
Early Detection of DDoS Attacks against SDN Controllers
Shadow MACs: Scalable Label- switching for Commodity Ethernet Author: Kanak Agarwal, John Carter, Eric Rozner and Colin Dixon Publisher: HotSDN 2014 Presenter:
Updating Designed for Fast IP Lookup Author : Natasa Maksic, Zoran Chicha and Aleksandra Smiljani´c Conference: IEEE High Performance Switching and Routing.
TFA: A Tunable Finite Automaton for Regular Expression Matching Author: Yang Xu, Junchen Jiang, Rihua Wei, Yang Song and H. Jonathan Chao Publisher: ACM/IEEE.
Boundary Cutting for Packet Classification Author: Hyesook Lim, Nara Lee, Geumdan Jin, Jungwon Lee, Youngju Choi, Changhoon Yim Publisher: Networking,
A Fast Regular Expression Matching Engine for NIDS Applying Prediction Scheme Author: Lei Jiang, Qiong Dai, Qiu Tang, Jianlong Tan and Binxing Fang Publisher:
Lightweight Traffic-Aware Packet Classification for Continuous Operation Author: Shariful Hasan Shaikot, Min Sik Kim Presenter: Yen-Chun Tseng Date: 2014/11/26.
Cuckoo Filter: Practically Better Than Bloom Author: Bin Fan, David G. Andersen, Michael Kaminsky, Michael D. Mitzenmacher Publisher: ACM CoNEXT 2014 Presenter:
Lossy Compression of Packet Classifiers Author: Ori Rottenstreich, J’anos Tapolcai Publisher: 2015 IEEE International Conference on Communications Presenter:
Packet Classification Using Dynamically Generated Decision Trees
GFlow: Towards GPU-based High- Performance Table Matching in OpenFlow Switches Author : Kun Qiu, Zhe Chen, Yang Chen, Jin Zhao, Xin Wang Publisher : Information.
IP Routing table compaction and sampling schemes to enhance TCAM cache performance Author: Ruirui Guo a, Jose G. Delgado-Frias Publisher: Journal of Systems.
Gnort: High Performance Network Intrusion Detection Using Graphics Processors Date:101/2/15 Publisher:ICS Author:Giorgos Vasiliadis, Spiros Antonatos,
SRD-DFA Achieving Sub-Rule Distinguishing with Extended DFA Structure Author: Gao Xia, Xiaofei Wang, Bin Liu Publisher: IEEE DASC (International Conference.
Hierarchical Hybrid Search Structure for High Performance Packet Classification Authors : O˜guzhan Erdem, Hoang Le, Viktor K. Prasanna Publisher : INFOCOM,
Deep Packet Inspection as a Service Author : Anat Bremler-Barr, Yotam Harchol, David Hay and Yaron Koral Conference: ACM 10th International Conference.
LightFlow : Speeding Up GPU-based Flow Switching and Facilitating Maintenance of Flow Table Author : Nobutaka Matsumoto and Michiaki Hayashi Conference:
Scalable Multi-match Packet Classification Using TCAM and SRAM Author: Yu-Chieh Cheng, Pi-Chung Wang Publisher: IEEE Transactions on Computers (2015) Presenter:
2018/4/27 PiDFA : A Practical Multi-stride Regular Expression Matching Engine Based On FPGA Author: Jiajia Yang, Lei Jiang, Qiu Tang, Qiong Dai, Jianlong.
Minimizing latency of critical traffic through SDN
A DFA with Extended Character-Set for Fast Deep Packet Inspection
2018/6/26 An Energy-efficient TCAM-based Packet Classification with Decision-tree Mapping Author: Zhao Ruan, Xianfeng Li , Wenjun Li Publisher: 2013.
Hardware accelerator to speed up packet processing in NDN router
2018/11/19 Source Routing with Protocol-oblivious Forwarding to Enable Efficient e-Health Data Transfer Author: Shengru Li, Daoyun Hu, Wenjian Fang and.
SigMatch Fast and Scalable Multi-Pattern Matching
Parallel Processing Priority Trie-based IP Lookup Approach
2018/12/10 Energy Efficient SDN Commodity Switch based Practical Flow Forwarding Method Author: Amer AlGhadhban and Basem Shihada Publisher: 2016 IEEE/IFIP.
Scalable Memory-Less Architecture for String Matching With FPGAs
2019/1/1 High Performance Intrusion Detection Using HTTP-Based Payload Aggregation 2017 IEEE 42nd Conference on Local Computer Networks (LCN) Author: Felix.
Memory-Efficient Regular Expression Search Using State Merging
Virtual TCAM for Data Center Switches
A Small and Fast IP Forwarding Table Using Hashing
Scalable Multi-Match Packet Classification Using TCAM and SRAM
A New String Matching Algorithm Based on Logical Indexing
EMOMA- Exact Match in One Memory Access
Compact DFA Structure for Multiple Regular Expressions Matching
2019/5/5 A Flexible Wildcard-Pattern Matching Accelerator via Simultaneous Discrete Finite Automata Author: Hsiang-Jen Tsai, Chien-Chih Chen, Yin-Chi Peng,
2019/5/8 BitCoding Network Traffic Classification Through Encoded Bit Level Signatures Author: Neminath Hubballi, Mayank Swarnkar Publisher/Conference:
2019/5/13 A Weighted ECMP Load Balancing Scheme for Data Centers Using P4 Switches Presenter:Hung-Yen Wang Authors:Peng Wang, George Trimponias, Hong Xu,
Large-scale Packet Classification on FPGA
OpenSec:Policy-Based Security Using Software-Defined Networking
Design principles for packet parsers
A Hybrid IP Lookup Architecture with Fast Updates
2019/9/3 Adaptive Hashing Based Multiple Variable Length Pattern Search Algorithm for Large Data Sets 比對 Simple Pattern 的方法是基於 Hash 並且可以比對不同長度的 Pattern。
A SRAM-based Architecture for Trie-based IP Lookup Using FPGA
Authors: Ding-Yuan Lee, Ching-Che Wang, An-Yeu Wu Publisher: 2019 VLSI
MEET-IP Memory and Energy Efficient TCAM-based IP Lookup
Towards TCAM-based Scalable Virtual Routers
Packet Classification Using Binary Content Addressable Memory
2019/11/12 Efficient Measurement on Programmable Switches Using Probabilistic Recirculation Presenter:Hung-Yen Wang Authors:Ran Ben Basat, Xiaoqi Chen,
Presentation transcript:

2019/10/19 Efficient Software Packet Processing on Heterogeneous and Asymmetric Hardware Architectures Author: Eva Papadogiannaki, Lazaros Koromilas, Giorgos Vasiliadis, and Sotiris Ioannidis Publisher/Conference: IEEE/ACM Transactions on Networking (Volume: 25 , Issue: 3 , June 2017) Referenced: 4 Presenter: 林宇翔 Date: 2019/05/22 Department of Computer Science and Information Engineering National Cheng Kung University, Taiwan R.O.C. CSIE CIAL Lab 1

Introduction We propose an adaptive scheduling approach that supports the heterogeneous and asymmetric hardware, tailored for network packet processing applications. Our scheduler is able to respond quickly to dynamic performance fluctuations that occur at real time, such as traffic bursts, application overloads…and provide consistently good performance. The experimental results show that our system is able to match the peak throughput (to meet the input traffic rate of the packets) of a diverse set of packet processing applications, while consuming up to 3.5× less energy. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Introduction Overall, the CPU cores are good at handling branch-intensive packet processing workloads, while discrete GPUs tend to operate efficiently in data-parallel workloads. Between those two, the integrated GPU features high energy efficiency without significantly compromising the processing rate or latency. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Power Instrumentation We utilize four high-precision Hall effect current sensors to constantly monitor the three four ATX powersupply power lines (+12.0a, +12.0b +5.0, +3.3 Volts). To calculate their power consumption, we use a utilization-based model. National Cheng Kung University CSIE Computer & Internet Architecture Lab

2019/10/19 Applications IPv4 Packet Forwarding: RadixTrie lookup algorithm and use a routing table of 17,000 entries. Deep Packet Inspection: We port a DFA implementation of the Aho-Corasick algorithm for string searching, and use the content patterns (about 10,000 fixed strings) of the latest Snort distribution, which we compile into the same state machine. Packet Hashing:Once receiving a new packet, the “ packet store “ is updated, and the “ fingerprint table “ is checked to determine whether the packet includes a significant fraction of content cached in the packet store; if yes, an encoded version that eliminates this (recently observed) content is transmitted. We have implemented the MD5 algorithm Encryption:We implement AES-CBC encryption using a different 128-bit key for each communication session. National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab

Architecture Two different models for capturing the network traffic and distributing it to different computational devices for processing, namely master-worker and shared-nothing. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Architecture The advantages of shared-nothing, compared to the masterworker architecture, is that it alleviates the overhead caused by the synchronization required to assure the proper execution of the worker threads. we only use the “shared-nothing” architecture for packet capturing for the remainder of the paper. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Performance Characterization - solo National Cheng Kung University CSIE Computer & Internet Architecture Lab

Performance Characterization - combo National Cheng Kung University CSIE Computer & Internet Architecture Lab

Performance Characterization Different applications (or the same application on different devices) require a different batch size to reach maximum throughput. Computationally intensive applications (i.e. AES) benefit more from large batch sizes, while memory intensive applications (i.e. IPv4 forwarding) require smaller batch sizes to reach the peak throughput. This is mainly the effect of cache sizes in the memory hierarchy of the specific device. Increasing the batch size, after the maximum throughput has been reached, results to linear increases in latency. The performance of DPI has large fluctuations; when there is no match in the input traffic the throughput achieved by all devices is much higher. The number of pattern matches decreases, the DFA algorithm needs to access only a few different states. These states are stored in the cache memory. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Energy Efficiency National Cheng Kung University CSIE Computer & Internet Architecture Lab

Adaptation Algorithm For each combination of our parameter space, we measure the sustained throughput, latency and power, and store them to a dictionary; the dictionary will be used at runtime in order to acquire the most suitable configuration. We use a different red black tree to store each achieved outcome (i.e. throughput, latency, and power) for each configuration. Each node in the tree holds all the configurations that correspond to the requested result. In order to prevent from overloading the tree, before inserting a new node , we check if its performance differs with its parent by a threshold δ. If not, we merge them in order to save space. National Cheng Kung University CSIE Computer & Internet Architecture Lab

Adaptation Algorithm Our scheduling algorithm is laid out as follows. Measure the current traffic rate. Get the best configuration from the red black tree using the desired requirement (i.e. latency-, throughput-, or energy-aware). Change to this configuration only if it was measured better than the current one by a factor of λ. Start creating batches of the specified size. If more than one devices are required, create batches for each device accordingly. The batches are inserted into the queue of the corresponding device(s). Measure the performance3 achieved by each of the devices for the submitted batch(es). If the sustained performance is similar to the one requested from the red black tree (up to a threshold δ), return to Step 1; otherwise, update the tree accordingly, and: If the performance achieved by each device is worse, increase the batch size by a factor of 2 If the performance achieved by each device is better, decrease the batch size by a factor of 2 National Cheng Kung University CSIE Computer & Internet Architecture Lab

2019/10/19 We use an energy-critical policy, i.e. handle all input traffic at the maximum energy efficiency. Evaluation National Cheng Kung University CSIE Computer & Internet Architecture Lab CSIE CIAL Lab