1 March 2010 A Study of Hardware Assisted IP over InfiniBand and its Impact on Enterprise Data Center Performance Ryan E. Grant 1, Pavan Balaji 2, Ahmad.

Slides:



Advertisements
Similar presentations
Middleware Support for RDMA-based Data Transfer in Cloud Computing Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi Department of Electrical.
Advertisements

A Hybrid MPI Design using SCTP and iWARP Distributed Systems Group Mike Tsai, Brad Penoff, and Alan Wagner Department of Computer Science University of.
Introduction 1-1 Chapter 3 Transport Layer Intro and Multiplexing Computer Networking: A Top Down Approach 6 th edition Jim Kurose, Keith Ross Addison-Wesley.
Head-to-TOE Evaluation of High Performance Sockets over Protocol Offload Engines P. Balaji ¥ W. Feng α Q. Gao ¥ R. Noronha ¥ W. Yu ¥ D. K. Panda ¥ ¥ Network.
04/25/06Pavan Balaji (The Ohio State University) Asynchronous Zero-copy Communication for Synchronous Sockets in the Sockets Direct Protocol over InfiniBand.
Evaluation of ConnectX Virtual Protocol Interconnect for Data Centers Ryan E. GrantAhmad Afsahi Pavan Balaji Department of Electrical and Computer Engineering,
Performance Analysis of Virtualization for High Performance Computing A Practical Evaluation of Hypervisor Overheads Matthew Cawood University of Cape.
Performance Characterization of a 10-Gigabit Ethernet TOE W. Feng ¥ P. Balaji α C. Baron £ L. N. Bhuyan £ D. K. Panda α ¥ Advanced Computing Lab, Los Alamos.
Performance Evaluation of RDMA over IP: A Case Study with the Ammasso Gigabit Ethernet NIC H.-W. Jin, S. Narravula, G. Brown, K. Vaidyanathan, P. Balaji,
1 May 2011 RDMA Capable iWARP over Datagrams Ryan E. Grant 1, Mohammad J. Rashti 1, Pavan Balaji 2, Ahmad Afsahi 1 1 Department of Electrical and Computer.
EEC-484/584 Computer Networks Lecture 12 Wenbing Zhao (Part of the slides are based on Drs. Kurose & Ross ’ s slides for their Computer.
EEC-484/584 Computer Networks Lecture 12 Wenbing Zhao (Part of the slides are based on Drs. Kurose & Ross ’ s slides for their Computer.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
EEC-484/584 Computer Networks Lecture 12 Wenbing Zhao (Part of the slides are based on Drs. Kurose & Ross ’ s slides for their Computer.
Protocols and the TCP/IP Suite
An overview of Infiniband Reykjavik, June 24th 2008 R E Y K J A V I K U N I V E R S I T Y Dept. Computer Science Center for Analysis and Design of Intelligent.
EEC-484/584 Computer Networks Lecture 6 Wenbing Zhao (Part of the slides are based on Drs. Kurose & Ross ’ s slides for their Computer.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Communicating over the Network Network Fundamentals – Chapter 2.
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji  Hemal V. Shah ¥ D. K. Panda 
IWARP Ethernet Key to Driving Ethernet into the Future Brian Hausauer Chief Architect NetEffect, Inc.
P. Balaji, S. Bhagvat, D. K. Panda, R. Thakur, and W. Gropp
Protocols and the TCP/IP Suite Chapter 4. Multilayer communication. A series of layers, each built upon the one below it. The purpose of each layer is.
Virtualizing Modern High-Speed Interconnection Networks with Performance and Scalability Institute of Computing Technology, Chinese Academy of Sciences,
IWARP Redefined: Scalable Connectionless Communication Over High-Speed Ethernet M. J. Rashti, R. E. Grant, P. Balaji and A. Afsahi.
Protocols for Wide-Area Data-intensive Applications: Design and Performance Issues Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi, Brian.
Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission Adventures Installing Infiniband Storage Randy.
1/29/2002 CS Distributed Systems 1 Infiniband Architecture Aniruddha Bohra.
Protocols and the TCP/IP Suite
Towards a Common Communication Infrastructure for Clusters and Grids Darius Buntinas Argonne National Laboratory.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
HPCS Lab. High Throughput, Low latency and Reliable Remote File Access Hiroki Ohtsuji and Osamu Tatebe University of Tsukuba, Japan / JST CREST.
Dynamic Time Variant Connection Management for PGAS Models on InfiniBand Abhinav Vishnu 1, Manoj Krishnan 1 and Pavan Balaji 2 1 Pacific Northwest National.
Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.
© 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect.
The NE010 iWARP Adapter Gary Montry Senior Scientist
InfiniSwitch Company Confidential. 2 InfiniSwitch Agenda InfiniBand Overview Company Overview Product Strategy Q&A.
Electronic visualization laboratory, university of illinois at chicago A Case for UDP Offload Engines in LambdaGrids Venkatram Vishwanath, Jason Leigh.
An Analysis of 10-Gigabit Ethernet Protocol Stacks in Multi-core Environments G. Narayanaswamy, P. Balaji and W. Feng Dept. of Comp. Science Virginia Tech.
2006 Sonoma Workshop February 2006Page 1 Sockets Direct Protocol (SDP) for Windows - Motivation and Plans Gilad Shainer Mellanox Technologies Inc.
Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
InfiniBand in the Lab Erik 1.
© 2012 MELLANOX TECHNOLOGIES 1 Disruptive Technologies in HPC Interconnect HPC User Forum April 16, 2012.
Computer Security Workshops Networking 101. Reasons To Know Networking In Regard to Computer Security To understand the flow of information on the Internet.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
Network Emulation for the Study and Validation of Traffic Models, Congestion and Flow Control in TCP/IP Networks Cheryl Pope Lecturer Department of Computer.
OpenFabrics Enterprise Distribution (OFED) Update
Prepared by: Azara Prakash L.. Contents:-  Data Transmission  Introduction  Socket Description  Data Flow Diagram  Module Design Specification.
Latency Reduction Techniques for Remote Memory Access in ANEMONE Mark Lewandowski Department of Computer Science Florida State University.
Sockets Direct Protocol Over InfiniBand in Clusters: Is it Beneficial? P. Balaji, S. Narravula, K. Vaidyanathan, S. Krishnamoorthy, J. Wu and D. K. Panda.
Using Heterogeneous Paths for Inter-process Communication in a Distributed System Vimi Puthen Veetil Instructor: Pekka Heikkinen M.Sc.(Tech.) Nokia Siemens.
Ethernet. Ethernet standards milestones 1973: Ethernet Invented 1983: 10Mbps Ethernet 1985: 10Mbps Repeater 1990: 10BASE-T 1995: 100Mbps Ethernet 1998:
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage April 2010.
OpenFabrics Interface WG A brief introduction Paul Grun – co chair OFI WG Cray, Inc.
Mr. P. K. GuptaSandeep Gupta Roopak Agarwal
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage September 2010 Brandon.
Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur and D. K. Panda, Mathematics.
Presented by: Xianghan Pei
A Practical Evaluation of Hypervisor Overheads Matthew Cawood Supervised by: Dr. Simon Winberg University of Cape Town Performance Analysis of Virtualization.
Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen
Enhancements for Voltaire’s InfiniBand simulator
Balazs Voneki CERN/EP/LHCb Online group
Infiniband Architecture
Transport Protocols over Circuits/VCs
Net 431: ADVANCED COMPUTER NETWORKS
Protocols and the TCP/IP Suite
COMPUTER NETWORKS CS610 Lecture-35 Hammad Khalid Khan.
Application taxonomy & characterization
Protocols and the TCP/IP Suite
Presentation transcript:

1 March 2010 A Study of Hardware Assisted IP over InfiniBand and its Impact on Enterprise Data Center Performance Ryan E. Grant 1, Pavan Balaji 2, Ahmad Afsahi 1 1 Department of Electrical and Computer Engineering Queen’s University Kingston, ON, Canada K7L 3N6 2 Mathematics and Computer Science Argonne National Laboratory Argonne, IL, USA

2 March 2010 Introduction Motivation Background Information Experimental Framework Experimental Results –Baseline Performance –Offloading Performance –Data Center Performance Results –Performance Bottleneck Investigation –Validation Conclusions –Future Work Questions

3 March 2010 Motivation Sockets-based protocols are used extensively in enterprise data centers IPoIB provides a high performance socket interface that does not rely on upper layer protocol support (such as TCP) Future convergence of networking fabrics, will use elements of InfiniBand and Ethernet Behaviour and performance of such systems is important to study in order to guide the development and installation of advanced networking technologies

4 March 2010 Motivation Why is UD/RC offloading important? –Its new to IPoIB (UD offloading) –It narrows the performance gap between IPoIB- UD and Sockets Direct Protocol (SDP) –UD offloading allows us to effectively use software that does not utilize TCP (TCP is required for SDP)

5 March 2010 Outline Motivation Background Information Experimental Framework Experimental Results –Baseline Performance –Offloading Performance –Data Center Performance Results –Performance Bottleneck Investigation –Validation Conclusions –Future Work Questions

6 March 2010 Background Information InfiniBand –Queue Pair Operation, supporting RDMA and send/receive modes –Mellanox ConnectX host channel adapters support 4x InfiniBand operation Bandwidth of 40 Gigabit/s (32 Gigabits/s data) –The OpenFabrics Enterprise Software Distribution (OFED-1.4) (InfiniBand drivers) supports both SDP and IPoIB protocols

7 March 2010 Background Information Socket-based protocols provide IP functionality, using IP over IB (IPoIB) –IPoIB provides Large receive offload (LRO) and Large send offload (LSO) –Large receive offload aggregates incoming packets –Large send offload segments large messages into appropriate packet sizes in hardware Sockets Direct Protocol (SDP) provides RDMA capabilities –Bypasses the operating system’s TCP/IP stack –Utilizes hardware flow control, offloaded network and transport stack in addition to RDMA –Operates in buffered-copy and zero-copy modes

8 March 2010 Background Information

9 March 2010 Background Information InfiniBand operates in several modes: –Reliable Connection (RC): Keeps traditional connections while offering low level reliability and in order delivery –Unreliable Datagram (UD): non-connection based datagram transmission, with no reliability IB is capable of RDMA, which is utilized for socket interfaces through the SDP API which replaces TCP programming semantics Additional IB modes: unreliable connection and reliable datagram, do not currently have hardware offloading implementations

10 March 2010 Outline Motivation Background Information Experimental Framework Experimental Results –Baseline Performance –Offloading Performance –Data Center Performance Results –Performance Bottleneck Investigation –Validation Conclusions –Future Work Questions

11 March 2010 Experimental Framework OSProcessorsInfiniBand HCASwitchOFED version Fedora Kernel – 2.0 Ghz Quad-Core AMD Opteron ConnectX 4X DDR HCA Firmware: Mellanox 24-port MT47396 Infiniscale-III 1.4 Network performance data was collected using Netperf Performance was validated using iperf-2.0.4

12 March 2010 Baseline Performance Results As expected Verbs RDMA and SDP (RDMA) show major latency advantages for small messages over IPoIB

13 March 2010 Baseline Multi-Stream Performance The use of multiple streams for IPoIB shows good performance for IPoIB- RC/UD while IPoIB RC/UD and SDP Buffered copy are comparable

14 March 2010 Offloading Performance IPoIB-UD with offloading provides similar latency to that of IPoIB-RC

15 March 2010 Offloading Performance Although IPoIB-UD offloading outperforms non-offloaded IPoIB-UD, the single stream is still outperformed by IPoIB-RC

16 March 2010 Offloading Performance With multiple streams, IPoIB-UD-LRO-LSO outperforms IPoIB-RC, and greatly outperforms non offloaded IPoIB-UD

17 March 2010 Offloading Performance Offloaded IPoIB-UD provides a 85.1% improvement in bandwidth over non- offloaded IPoIB-UD Offloaded IPoIB-UD outperforms Multi- stream IPoIB-RC by 7.1%, and provides similar latency Offloaded IPoIB-UD provides bandwidth only 6.5% less than that of SDP

18 March 2010 Data Center Performance

19 March 2010 Data Center Performance Data center throughput shows IPoIB-UD-LRO-LSO to maintain the highest level of throughput, while SDP is unexpectedly the worst performer of the group

20 March 2010 Data Center Performance IPoIB-UD-LRO-LSOIPoIB-RC IPoIB-UD-noLRO-noLSOSDP

21 March 2010 Data Center Performance IPoIB-UD-LRO-LSO provides the highest sustained bandwidth of all of the protocols, beating non-offloaded IPoIB-UD by 15.4% and IPoIB-RC by 5.8% and SDP by 29.1% IPoIB-UD-LRO-LSO provides similar response time to its nearest competitor, IPoIB-RC All of the IPoIB configurations provide higher bandwidth than SDP

22 March 2010 Performance Bottleneck Investigation SDP shows poor throughput and latency, much worse than would be initially expected Given the excellent performance of SDP at the micro-benchmarks level, several tests were conducted to determine the cause of SDP’s poor performance in the data center test It was determined that the large number of simultaneous connections were causing poor performance with SDP The number of connections used by the SDP data center were reduced while increasing the activity level of each connection to confirm this analysis

23 March 2010 Performance Investigation The resulting performance of SDP (with 50 connections) is increased greatly to a level that is more inline with expectations

24 March 2010 Performance Investigation SDP delay - with many connectionsSDP delay - with fewer connections

25 March 2010 Performance Validation IPoIB and SDP show similar performance results on SPECWeb as were seen using the TPC-W benchmarks

26 March 2010 Performance Validation SPECWeb response time results show IPoIB-UD-LRO- LSO to have the overall lowest response times

27 March 2010 Outline Motivation Background Information Experimental Framework Experimental Results –Baseline Performance –Offloading Performance –Data Center Performance Results –Performance Bottleneck Investigation –Validation Conclusions –Future Work Questions

28 March 2010 Conclusions Micro-benchmarks have shown a 85.1% improvement in bandwidth of offloaded IB-UD over the non-offloaded case and a 26.2% maximum reduction in latency Offloaded IPoIB-UD shows a 15.4% improvement in throughput over non offloaded IPoIB-UD IPoIB-UD-LRO-LSO has a 29.1% higher throughput than SDP in our data center testing The benefits of using IPoIB-RC are minimal over those of IPoIB-UD when utilizing offloading capabilities Therefore, for future networks such as CEE the inclusion of a reliable connection mode is most likely unnecessary

29 March 2010 Future Work Resolving the issues holding back SDP performance when using large numbers of connections Utilizing Quality of Service to further enhance enterprise data center performance Combining IPoIB-UD, QoS and Virtual Protocol Interconnect to improve overall data center performance

30 March 2010 Thank You Questions?