Tracking Millions of Flows In High Speed Networks for Application Identification Tian Pan, Xiaoyu Guo, Chenhui Zhang, Junchen Jiang, Hao Wu and Bin Liut.

Slides:



Advertisements
Similar presentations
IP Router Architectures. Outline Basic IP Router Functionalities IP Router Architectures.
Advertisements

A Search Memory Substrate for High Throughput and Low Power Packet Processing Sangyeun Cho, Michel Hanna and Rami Melhem Dept. of Computer Science University.
NetFPGA Project: 4-Port Layer 2/3 Switch Ankur Singla Gene Juknevicius
Practical Caches COMP25212 cache 3. Learning Objectives To understand: –Additional Control Bits in Cache Lines –Cache Line Size Tradeoffs –Separate I&D.
A Scalable and Reconfigurable Search Memory Substrate for High Throughput Packet Processing Sangyeun Cho and Rami Melhem Dept. of Computer Science University.
1 An Efficient, Hardware-based Multi-Hash Scheme for High Speed IP Lookup Hot Interconnects 2008 Socrates Demetriades, Michel Hanna, Sangyeun Cho and Rami.
Bio Michel Hanna M.S. in E.E., Cairo University, Egypt B.S. in E.E., Cairo University at Fayoum, Egypt Currently is a Ph.D. Student in Computer Engineering.
1 Fast Routing Table Lookup Based on Deterministic Multi- hashing Zhuo Huang, David Lin, Jih-Kwon Peir, Shigang Chen, S. M. Iftekharul Alam Department.
CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.
Multithreaded FPGA Acceleration of DNA Sequence Mapping Edward Fernandez, Walid Najjar, Stefano Lonardi, Jason Villarreal UC Riverside, Department of Computer.
Pipelined Parallel AC-based Approach for Multi-String Matching Department of Computer Science and Information Engineering National Cheng Kung University,
Power Efficient IP Lookup with Supernode Caching Lu Peng, Wencheng Lu*, and Lide Duan Dept. of Electrical & Computer Engineering Louisiana State University.
Efficient IP-Address Lookup with a Shared Forwarding Table for Multiple Virtual Routers Author: Jing Fu, Jennifer Rexford Publisher: ACM CoNEXT 2008 Presenter:
1 Architectural Results in the Optical Router Project Da Chuang, Isaac Keslassy, Nick McKeown High Performance Networking Group
Parallel IP Lookup using Multiple SRAM-based Pipelines Authors: Weirong Jiang and Viktor K. Prasanna Presenter: Yi-Sheng, Lin ( 林意勝 ) Date:
1 Multi-Core Architecture on FPGA for Large Dictionary String Matching Department of Computer Science and Information Engineering National Cheng Kung University,
Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,
Performance Evaluation of IPv6 Packet Classification with Caching Author: Kai-Yuan Ho, Yaw-Chung Chen Publisher: ChinaCom 2008 Presenter: Chen-Yu Chaug.
A Practical Packet Reordering Mechanism with Flow Granularity for Parallel Exploiting in Network Processors 13 th WPDRTS April 4, 2005 Beibei Wu, Yang.
Min-Sheng Lee Efficient use of memory bandwidth to improve network processor throughput Jahangir Hasan 、 Satish ChandraPurdue University T. N. VijaykumarIBM.
Chapter 8: Main Memory.
A Flexible Architecture for Simulation and Testing (FAST) Multiprocessor Systems John D. Davis, Lance Hammond, Kunle Olukotun Computer Systems Lab Stanford.
NET-REPLAY: A NEW NETWORK PRIMITIVE Ashok Anand Aditya Akella University of Wisconsin, Madison.
CSIT 301 (Blum)1 Memory. CSIT 301 (Blum)2 Types of DRAM Asynchronous –The processor timing and the memory timing (refreshing schedule) were independent.
A Scalable, Cache-Based Queue Management Subsystem for Network Processors Sailesh Kumar, Patrick Crowley Dept. of Computer Science and Engineering.
Gigabit Routing on a Software-exposed Tiled-Microprocessor
1 Route Table Partitioning and Load Balancing for Parallel Searching with TCAMs Department of Computer Science and Information Engineering National Cheng.
Department of Computer Science and Engineering Applied Research Laboratory 1 A Hardware Based TCP/IP Processing Engine David V. Schuehler
Paper Review Building a Robust Software-based Router Using Network Processors.
ECE 526 – Network Processing Systems Design Network Processor Architecture and Scalability Chapter 13,14: D. E. Comer.
PARALLEL TABLE LOOKUP FOR NEXT GENERATION INTERNET
Authors: Yi Wang, Tian Pan, Zhian Mi, Huichen Dai, Xiaoyu Guo, Ting Zhang, Bin Liu, and Qunfeng Dong Publisher: INFOCOM 2013 mini Presenter: Chai-Yi Chu.
Lecture 19: Virtual Memory
Sujayyendhiren RS, Kaiqi Xiong and Minseok Kwon Rochester Institute of Technology Motivation Experimental Setup in ProtoGENI Conclusions and Future Work.
Author: Haoyu Song, Fang Hao, Murali Kodialam, T.V. Lakshman Publisher: IEEE INFOCOM 2009 Presenter: Chin-Chung Pan Date: 2009/12/09.
Hardware Implementation of Fast Forwarding Engine using Standard Memory and Dedicated Circuit Kazuya ZAITSU, Shingo ATA, Ikuo OKA (Osaka City University,
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
Authors: Haiquan (Chuck) Zhao, Hao Wang, Bill Lin, Jun (Jim) Xu Conf. : The 5th ACM/IEEE Symposium on Architectures for Networking and Communications Systems.
A Hybrid IP Lookup Architecture with Fast Updates Author : Layong Luo, Gaogang Xie, Yingke Xie, Laurent Mathy, Kavé Salamatian Conference: IEEE INFOCOM,
How to Build a CPU Cache COMP25212 – Lecture 2. Learning Objectives To understand: –how cache is logically structured –how cache operates CPU reads CPU.
Authors: Matteo Varvello, Diego Perino, and Leonardo Linguaglossa Publisher: NOMEN 2013 (The 2nd IEEE International Workshop on Emerging Design Choices.
Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.
02/21/2003 CART 1 On-chip MRAM as a High-Bandwidth, Low-Latency Replacement for DRAM Physical Memories Rajagopalan Desikan, Charles R. Lefurgy, Stephen.
File Structures. 2 Chapter - Objectives Disk Storage Devices Files of Records Operations on Files Unordered Files Ordered Files Hashed Files Dynamic and.
Department of Computer Science and Engineering Applied Research Laboratory Architecture for a Hardware Based, TCP/IP Content Scanning System David V. Schuehler.
Efficient Cache Structures of IP Routers to Provide Policy-Based Services Graduate School of Engineering Osaka City University
IPv6-Oriented 4 OC768 Packet Classification with Deriving-Merging Partition and Field- Variable Encoding Scheme Mr. Xin Zhang Undergrad. in Tsinghua University,
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
Algorithms to Accelerate Multiple Regular Expressions Matching for Deep Packet Inspection Sailesh Kumar Sarang Dharmapurikar Fang Yu Patrick Crowley Jonathan.
Introduction: Memory Management 2 Ideally programmers want memory that is large fast non volatile Memory hierarchy small amount of fast, expensive memory.
Connect. Communicate. Collaborate Using Temporal Locality for a Better Design of Flow-oriented Applications Martin Žádník, CESNET TNC 2007, Lyngby.
Research on TCAM-based OpenFlow Switch Author: Fei Long, Zhigang Sun, Ziwen Zhang, Hui Chen, Longgen Liao Conference: 2012 International Conference on.
Memory-Efficient and Scalable Virtual Routers Using FPGA Department of Computer Science and Information Engineering, National Cheng Kung University, Tainan,
Accurate And Flexible Flow-Based Monitoring For High-Speed Networks REPORTER: HSUAN-JU LI 2014/12/25 Field Programmable Logic and Applications (FPL), 2013.
Jeffrey Ellak CS 147. Topics What is memory hierarchy? What are the different types of memory? What is in charge of accessing memory?
Contemporary DRAM memories and optimization of their usage Nebojša Milenković and Vladimir Stanković, Faculty of Electronic Engineering, Niš.
ECE 526 – Network Processing Systems Design Network Address Translator II.
Optimizing Packet Lookup in Time and Space on FPGA Author: Thilan Ganegedara, Viktor Prasanna Publisher: FPL 2012 Presenter: Chun-Sheng Hsueh Date: 2012/11/28.
Taeho Kgil, Trevor Mudge Advanced Computer Architecture Laboratory The University of Michigan Ann Arbor, USA CASES’06.
A Study of Data Partitioning on OpenCL-based FPGAs Zeke Wang (NTU Singapore), Bingsheng He (NTU Singapore), Wei Zhang (HKUST) 1.
CMSC 611: Advanced Computer Architecture
CS522 Advanced database Systems
Lecture 16: Data Storage Wednesday, November 6, 2006.
High-throughput Online Hash Table on FPGA
Cache Memory Presentation I
Chapter 8: Main Memory.
CPSC 457 Operating Systems
Author: Yi Lu, Balaji Prabhakar Publisher: INFOCOM’09
Lu Tang , Qun Huang, Patrick P. C. Lee
A SRAM-based Architecture for Trie-based IP Lookup Using FPGA
Presentation transcript:

Tracking Millions of Flows In High Speed Networks for Application Identification Tian Pan, Xiaoyu Guo, Chenhui Zhang, Junchen Jiang, Hao Wu and Bin Liut Tsinghua National Laboratory for Information Science and Technology Department of Computer Science and Technology, Tsinghua University, Beijing 10084, China 2012 Proceedings IEEE INFOCOM 1 通訊所 周啓松

Outline Framework of system External flow table management ALFE replacement policy Theoretical analysis Evaluation Conclusion 2

FRAMEWORK OF SYSTEM 3

Emerge issues On-chip SRAM is fast but insufficient to accommodate millions of concurrent flows which has to be stored in DDR or RLDRAM. High hash collision rate and high cache miss rate. It is hard to attach parallel off-chip DRAMs to gain performance improvement. 4

Purpose An on-chip/off-chip hierarchical flow table to achieve high speed packet lookup and maintain tens of millions of flows. Adaptive Least Frequently Evicted(ALFE), which keeps the elephant flows longer in the cache thus increasing the cache hit rate. An efficient management scheme is proposed to exploit DRAM's burst feature. Real trace evaluation on Altera FPGA platform indicates, with 200MHz internal clock, small sized cache with 16K entries can achieve up to 80% hit rate, enabling more than 70Mpps line rate flow tracking even at the 40-byte packets. 5

Framework of the Proposed Application Identification 6

Design Trade-offs and System-level Optimization Stateless TCP Handling without Flow Reconstruction ◦ 3-Way shaking ◦ FIN/RST Fixed-allocated Hash Buckets to Exploit DRAM Bursts ◦ Lower utilization of memory space Cache Replacement Policy to Track Elephant Flows ◦ Heavy-tailed distribution lmpact of Misidentification 7

EXTERNAL FLOW TABLE MANAGEMENT 8

Flow Record Data Structure 9

Fixed-allocated Hash Bucket How to store the flow records How to read/write the memory efficiently 10

Device Independent Layer 11

Memory Swapping Logic 12

ALFE REPLACEMENT POLICY 13

Heavy-tailedness 14

Replacement Policy 15

THEORETICAL ANALYSIS 16

Poisson Queuing Network System 17

Throughput and Queue Length 18

EVALUATION 19

Experiment Setup 20

On-chip Flow Cache 21

Off-chip Flow Table 22

Bucket Overflow 23 NE BAW

Memory Utilization 24

Delay and Throughput The 400MHz DRAM I/O interface transfers two data words per DRAM clock cycle. The read latency is 6 DCs. The write latency is 7 DCs. The maximum throughput of DRAM is 44.44Mpps. 25

Trade-off between Utilization and Speed 26

Power 27

System Overall Performance System Throughput ◦ Overall throughput under different cache entry number 28

29 avalanche effect

Conclusion Detail of Matching Engine should be listed. Setup time of flow table should be taken into account System Overall Performance. 30