A Practical Packet Reordering Mechanism with Flow Granularity for Parallel Exploiting in Network Processors 13 th WPDRTS April 4, 2005 Beibei Wu, Yang.

Slides:



Advertisements
Similar presentations
Network II.5 simulator ..
Advertisements

A DISTRIBUTED CSMA ALGORITHM FOR THROUGHPUT AND UTILITY MAXIMIZATION IN WIRELESS NETWORKS.
Memory Consistency Models Kevin Boos. Two Papers Shared Memory Consistency Models: A Tutorial – Sarita V. Adve & Kourosh Gharachorloo – September 1995.
Supercharging PlanetLab : a high performance, Multi-Application, Overlay Network Platform Written by Jon Turner and 11 fellows. Presented by Benjamin Chervet.
Multi-granular, multi-purpose and multi-Gb/s monitoring on off-the-shelf systems TELE9752 Group 3.
Efficient Memory Utilization on Network Processors for Deep Packet Inspection Piti Piyachon Yan Luo Electrical and Computer Engineering Department University.
Implementation Approaches with FPGAs Compile-time reconfiguration (CTR) CTR is a static implementation strategy where each application consists of one.
The 9th Israel Networking Day 2014 Scaling Multi-Core Network Processors Without the Reordering Bottleneck Alex Shpiner (Technion/Mellanox) Isaac Keslassy.
Lock-free Cuckoo Hashing Nhan Nguyen & Philippas Tsigas ICDCS 2014 Distributed Computing and Systems Chalmers University of Technology Gothenburg, Sweden.
1 Hybrid cache architecture for high-speed packet processing Department of Computer Science and Information Engineering National Cheng Kung University,
Performance and Energy Bounds for Multimedia Applications on Dual-processor Power-aware SoC Platforms Weng-Fai WONG 黄荣辉 Dept. of Computer Science National.
Multithreading and Dataflow Architectures CPSC 321 Andreas Klappenecker.
Multithreaded ASC Kevin Schaffer and Robert A. Walker ASC Processor Group Computer Science Department Kent State University.
ELEC Fall 05 1 Very- Long Instruction Word (VLIW) Computer Architecture Fan Wang Department of Electrical and Computer Engineering Auburn.
The Impact of Multihop Wireless Channel on TCP Throughput and Loss Zhenghua Fu, Petros Zerfos, Haiyun Luo, Songwu Lu, Lixia Zhang, Mario Gerla INFOCOM2003,
Min-Sheng Lee Efficient use of memory bandwidth to improve network processor throughput Jahangir Hasan 、 Satish ChandraPurdue University T. N. VijaykumarIBM.
A Switch-Based Approach to Starvation in Data Centers Alex Shpiner Joint work with Isaac Keslassy Faculty of Electrical Engineering Faculty of Electrical.
Multiscalar processors
Grid Load Balancing Scheduling Algorithm Based on Statistics Thinking The 9th International Conference for Young Computer Scientists Bin Lu, Hongbin Zhang.
ECE 526 – Network Processing Systems Design
1 Today I/O Systems Storage. 2 I/O Devices Many different kinds of I/O devices Software that controls them: device drivers.
Gnort: High Performance Intrusion Detection Using Graphics Processors Giorgos Vasiliadis, Spiros Antonatos, Michalis Polychronakis, Evangelos Markatos,
High Throughput Route Selection in Multi-Rate Ad Hoc Wireless Networks Dr. Baruch Awerbuch, David Holmer, and Herbert Rubens Johns Hopkins University Department.
Pipelined Two Step Iterative Matching Algorithms for CIOQ Crossbar Switches Deng Pan and Yuanyuan Yang State University of New York, Stony Brook.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
Xinming Chen, Zhen Chen, Beipeng Mu, Lingyun Ruan, Jinli Meng Towards High-performance IPsec on Cavium OCTEON Platform Research Institute of Information.
Real-Time Human Posture Reconstruction in Wireless Smart Camera Networks Chen Wu, Hamid Aghajan Wireless Sensor Network Lab, Stanford University, USA IPSN.
GPU Acceleration of Pyrosequencing Noise Removal Dept. of Computer Science and Engineering University of South Carolina Yang Gao, Jason D. Bakos Heterogeneous.
Measuring Control Plane Latency in SDN-enabled Switches Keqiang He, Junaid Khalid, Aaron Gember-Jacobson, Sourav Das, Chaithan Prakash, Aditya Akella,
A RISC ARCHITECTURE EXTENDED BY AN EFFICIENT TIGHTLY COUPLED RECONFIGURABLE UNIT Nikolaos Vassiliadis N. Kavvadias, G. Theodoridis, S. Nikolaidis Section.
RANI NALAMARU DEPARTMENT OF COMPUTER SCIENCE BALL STATE UNIVERSITY RANI NALAMARU DEPARTMENT OF COMPUTER SCIENCE BALL STATE UNIVERSITY Efficient Transmission.
Martin-1 CSE 5810 CSE 5810 Individual Research Project: Integration of Named Data Networking for Improved Healthcare Data Handling Robert Martin Computer.
Noise Can Help: Accurate and Efficient Per-flow Latency Measurement without Packet Probing and Time Stamping Dept. of Computer Science and Engineering.
Designing Packet Buffers for Internet Routers Friday, October 23, 2015 Nick McKeown Professor of Electrical Engineering and Computer Science, Stanford.
1 Blue Gene Simulator Gengbin Zheng Gunavardhan Kakulapati Parallel Programming Laboratory Department of Computer Science.
1 Andreea Chis under the guidance of Frédéric Desprez and Eddy Caron Scheduling for a Climate Forecast Application ANR-05-CIGC-11.
Scaling Laws for Cognitive Radio Network with Heterogeneous Mobile Secondary Users Yingzhe Li, Xinbing Wang, Xiaohua Tian Department of Electronic Engineering.
Alternative ProcessorsHPC User Forum Panel1 HPC User Forum Alternative Processor Panel Results 2008.
Performance Characterization and Architecture Exploration of PicoRadio Data Link Layer Mei Xu and Rahul Shah EE249 Project Fall 2001 Mentor: Roberto Passerone.
1 Network Emulation Mihai Ivanovici Dr. Razvan Beuran Dr. Neil Davies.
CUDA - 2.
Resource Mapping and Scheduling for Heterogeneous Network Processor Systems Liang Yang, Tushar Gohad, Pavel Ghosh, Devesh Sinha, Arunabha Sen and Andrea.
IEEE HPSR 2014 Scaling Multi-Core Network Processors Without the Reordering Bottleneck Alex Shpiner (Technion / Mellanox) Isaac Keslassy (Technion) Rami.
X. Li, W. LiuICC May 11, 2003A Joint Layer Design Smart Contention Resolution Random Access Wireless Networks With Unknown Multiple Users: A Joint.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
1 Architecture and scalability of a high-speed traffic measurement platform with a highly flexible packet classification Author: Detlef Sas *, Simon Hauger,
Overview High Performance Packet Processing Challenges
Pipelined and Parallel Computing Partition for 1 Hongtao Du AICIP Research Dec 1, 2005 Part 2.
An Efficient Gigabit Ethernet Switch Model for Large-Scale Simulation Dong (Kevin) Jin.
Tracking Millions of Flows In High Speed Networks for Application Identification Tian Pan, Xiaoyu Guo, Chenhui Zhang, Junchen Jiang, Hao Wu and Bin Liut.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
Real-time Embedded System Lab, ASU WCAE_panel_ 1 Panel on Panel on Teaching Embedded Systems Yann-Hang Lee and Aung Oo Computer Science and Engineering.
Virtual-Channel Flow Control William J. Dally
Hyunchul Park†, Kevin Fan†, Scott Mahlke†,
ECE 526 – Network Processing Systems Design Programming Model Chapter 21: D. E. Comer.
Input buffered switches (1)
Chapter 11 System Performance Enhancement. Basic Operation of a Computer l Program is loaded into memory l Instruction is fetched from memory l Operands.
Univ. of TehranIntroduction to Computer Network1 An Introduction to Computer Networks University of Tehran Dept. of EE and Computer Engineering By: Dr.
Graciela Perera Department of Computer Science and Information Systems Slide 1 of 18 INTRODUCTION NETWORKING CONCEPTS AND ADMINISTRATION CSIS 3723 Graciela.
OPERATING SYSTEMS CS 3502 Fall 2017
MadeCR: Correlation-based Malware Detection for Cognitive Radio
EECE571R -- Harnessing Massively Parallel Processors ece
Advanced Computer Networks
A Methodology for System-on-a-Programmable-Chip Resources Utilization
A Scalable Routing Architecture for Prefix Tries
High Throughput Route Selection in Multi-Rate Ad Hoc Wireless Networks
Department of Computer Science University of York
Alternative Processor Panel Results 2008
EE 122: Lecture 7 Ion Stoica September 18, 2001.
Operating System Overview
Presentation transcript:

A Practical Packet Reordering Mechanism with Flow Granularity for Parallel Exploiting in Network Processors 13 th WPDRTS April 4, 2005 Beibei Wu, Yang Xu, Bin Liu, Hongbin Lu Department of Computer Science, Tsinghua University, Beijing, P.R.China

2 Background & Problem Network Processor (NP)  A special purpose, programmable hardware device that combines the flexibility of a RISC processor with the speed of ASIC. They are building blocks used to construct network systems  Data path: Processing Engine (PE) Two Design Goals  High Speed: Multiple PEs packet level parallelism  High Flexibility: Versatile processing requirements unpredictable processing time for each packet The Packet Disordering (PD) Problem  Packets depart in a different order from their arrival  Network performance may be deteriorated greatly

3 Objective To design a practical mechanism which can preserve packet order in NP, at the same time to ensure the utilization of PE NP Model DispatcherAggregator PE0 PEn

4 Design Issues  Global-scope Vs. within-flow-scope order preserving  Pre-processing Vs. post-processing order scheduling The Proposed Solution  Packet chains for all the flow sequence information  The working process Simulation Contents

5 Design Issues(1) the scope of packets to preserve order Global-scope  All packets leave strictly in order Within-flow-scope  Only packets of the same flow leave in order Processing delay of different flows is different in NP within-flow-order-preserving is quite necessary

6 Design Issues(2) Where order scheduling is taking place? Order scheduling location  Pre-processing scheduling SPSL--Sequential Processing Sequential Leaving  Post-processing scheduling UPSL-- Un-sequential Processing Sequential Leaving

7 Design Issues(2) the shortcoming of pre-processing scheduling? DispatcherAggregator PE0 PEn Packet Buffer

8 Design Issues  Global-scope Vs. within-flow-scope order preserving  Pre-processing Vs. post-processing order scheduling The Proposed Solution  Packet chains for sequence information  The working process Simulation Contents

9 in traditional network devices  Sequence number or timestamp  global order preserving in our NP  Packet chains  Providing ability to discriminate among flows The Proposed Solution (1) The methods of reordering

10 The Proposed Solution (2) NP System with Packet Data Buffer Dispatcher Aggregator PE Complex Packet Data Buffer b1b2bk p2pnp1 t1 t2 tm packet thread block blocks disordered packets buffering

11 The Proposed Solution (3) Packet in the NP block in the Packet Data Memory When to transmit packet in which block? How to discriminate among flows?

12 The Proposed Solution (4) using packet chains to record sequence information FlowID End Packet Head Packet f(j) f(1) Flow Table p(d)p(x)p(y)p(z) p(a)p(b) p(e) Block Table FlowID packet Dispatcher

13 The Proposed Solution (5) The Working Process b1 b2 b3 b4 b5 b6 b7 b8 f f f f r f05 f f 45 flowID head end r f f 72 PE 1 PE 2 Packet Data Buffer

14 Design Issues  Global-scope Vs. within-flow-scope order preserving  Pre-processing Vs. post-processing order scheduling The Proposed Solution  Packet chains for sequence information  The working process Simulation Contents

15 throughput vs flow number, for three traces Simulation throughput vs flow number, for three traces Trace1 f1: 100% constant 40bytes f2: 0% Trace2 f1: 95% constant 40bytes f2: 5% random from 40 to 60bytes Trace3 f1: 90% constant 40bytes f2: 10% random from 80 to 120bytes A NP system with 4 PEs, each 4 threads and totally 8*4=32 memory blocks f2 is length related app. packets Where f1 is length unrelated app. Packets,

16 Utilization of 4 PEs under traces with the fewest flows Simulation Utilization of 4 PEs under traces with the fewest flows Time fraction of active thread number for each PE trace1 trace2 trace3 All traces have the fewest flows

17 Buffer Occupation with the fewest flows Simulation Buffer Occupation with the fewest flows Time fraction of free block number for each PE trace1 trace2 trace3 All traces have the fewest flows

18 A Summary A solution to preserve packet order in NP with multiple PE for data plane processing Packet chains to record sequence information of different flows to preserve packet order Reordering within-flow-scope is quite necessary in NP Future work: optimize Memory block and PE resources

19 Thank you for your attention!