IWARP Ethernet Key to Driving Ethernet into the Future Brian Hausauer Chief Architect NetEffect, Inc.

Slides:



Advertisements
Similar presentations
Middleware Support for RDMA-based Data Transfer in Cloud Computing Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi Department of Electrical.
Advertisements

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.
Vik Desai Program Manager Windows Networking Microsoft Corporation
Ethernet Unified Wire November 15, 2006 Kianoosh Naghshineh, CEO Chelsio Communications.
Performance Characterization of a 10-Gigabit Ethernet TOE W. Feng ¥ P. Balaji α C. Baron £ L. N. Bhuyan £ D. K. Panda α ¥ Advanced Computing Lab, Los Alamos.
Performance Evaluation of RDMA over IP: A Case Study with the Ammasso Gigabit Ethernet NIC H.-W. Jin, S. Narravula, G. Brown, K. Vaidyanathan, P. Balaji,
Institute of Computer Science Foundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laboratory Exploiting Spatial.
VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.
CSE 124 Networked Services Fall 2009 B. S. Manoj, Ph.D 10/27/20091CSE 124 Networked Services Fall 2009 Some.
Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.
An overview of Infiniband Reykjavik, June 24th 2008 R E Y K J A V I K U N I V E R S I T Y Dept. Computer Science Center for Analysis and Design of Intelligent.
5/8/2006 Nicole SAN Protocols 1 Storage Networking Protocols Nicole Opferman CS 526.
03/12/08Nuova Systems Inc. Page 1 TCP Issues in the Data Center Tom Lyon The Future of TCP: Train-wreck or Evolution? Stanford University
Windows Server Scalability And Virtualized I/O Fabric For Blade Server
Module – 7 network-attached storage (NAS)
Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji  Hemal V. Shah ¥ D. K. Panda 
Supporting iWARP Compatibility and Features for Regular Network Adapters P. BalajiH. –W. JinK. VaidyanathanD. K. Panda Network Based Computing Laboratory.
Implementing Convergent Networking: Partner Concepts
New Direction Proposal: An OpenFabrics Framework for high-performance I/O apps OFA TAC, Key drivers: Sean Hefty, Paul Grun.
IWARP Redefined: Scalable Connectionless Communication Over High-Speed Ethernet M. J. Rashti, R. E. Grant, P. Balaji and A. Afsahi.
Protocols for Wide-Area Data-intensive Applications: Design and Performance Issues Yufei Ren, Tan Li, Dantong Yu, Shudong Jin, Thomas Robertazzi, Brian.
Scalable Networking for Next-Generation Computing Platforms Yoshio Turner *, Tim Brecht *‡, Greg Regnier §, Vikram Saletore §, John Janakiraman *, Brian.
Roland Dreier Technical Lead – Cisco Systems, Inc. OpenIB Maintainer Sean Hefty Software Engineer – Intel Corporation OpenIB Maintainer Yaron Haviv CTO.
1/29/2002 CS Distributed Systems 1 Infiniband Architecture Aniruddha Bohra.
Mapping of scalable RDMA protocols to ASIC/FPGA platforms
1 March 2010 A Study of Hardware Assisted IP over InfiniBand and its Impact on Enterprise Data Center Performance Ryan E. Grant 1, Pavan Balaji 2, Ahmad.
Towards a Common Communication Infrastructure for Clusters and Grids Darius Buntinas Argonne National Laboratory.
High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.
Slide 1 DESIGN, IMPLEMENTATION, AND PERFORMANCE ANALYSIS OF THE ISCSI PROTOCOL FOR SCSI OVER TCP/IP By Anshul Chadda (Trebia Networks)-Speaker Ashish Palekar.
Lecture 3 Review of Internet Protocols Transport Layer.
Trends In Network Industry - Exploring Possibilities for IPAC Network Steven Lo.
© 2012 MELLANOX TECHNOLOGIES 1 The Exascale Interconnect Technology Rich Graham – Sr. Solutions Architect.
The NE010 iWARP Adapter Gary Montry Senior Scientist
InfiniSwitch Company Confidential. 2 InfiniSwitch Agenda InfiniBand Overview Company Overview Product Strategy Q&A.
1 Using HPS Switch on Bassi Jonathan Carter User Services Group Lead NERSC User Group Meeting June 12, 2006.
11/05/07 1TDC TDC 564 Local Area Networks Lecture 8 IP-based Storage Area Network.
2006 Sonoma Workshop February 2006Page 1 Sockets Direct Protocol (SDP) for Windows - Motivation and Plans Gilad Shainer Mellanox Technologies Inc.
Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.
OFED Interoperability NetEffect April 30, 2007 Sonoma Workshop Presentation.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
Srihari Makineni & Ravi Iyer Communications Technology Lab
User-mode I/O in Oracle 10g with ODM and DAFS Jeff Silberman Systems Architect Network Appliance Session id: Margaret Susairaj Server Technologies.
1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.
IP Communication Fabric Mike Polston HP
IBM Haifa Research Lab © IBM Corporation IsoStack – Highly Efficient Network Processing on Dedicated Cores Leah Shalev Eran Borovik, Julian Satran,
Infiniband Bart Taylor. What it is InfiniBand™ Architecture defines a new interconnect technology for servers that changes the way data centers will be.
An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.
Intel Research & Development ETA: Experience with an IA processor as a Packet Processing Engine HP Labs Computer Systems Colloquium August 2003 Greg Regnier.
Performance Networking ™ Server Blade Summit March 23, 2005.
Ethernet. Ethernet standards milestones 1973: Ethernet Invented 1983: 10Mbps Ethernet 1985: 10Mbps Repeater 1990: 10BASE-T 1995: 100Mbps Ethernet 1998:
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage April 2010.
Prentice HallHigh Performance TCP/IP Networking, Hassan-Jain Chapter 13 TCP Implementation.
Mr. P. K. GuptaSandeep Gupta Roopak Agarwal
Internet Protocol Storage Area Networks (IP SAN)
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage September 2010 Brandon.
Sandeep Singhal, Ph.D Director Windows Core Networking Microsoft Corporation.
Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur and D. K. Panda, Mathematics.
Technical Overview of Microsoft’s NetDMA Architecture Rade Trimceski Program Manager Windows Networking & Devices Microsoft Corporation.
Presented by: Xianghan Pei
Major OS Components CS 416: Operating Systems Design, Spring 2001 Department of Computer Science Rutgers University
© 2007 EMC Corporation. All rights reserved. Internet Protocol Storage Area Networks (IP SAN) Module 3.4.
E Virtual Machines Lecture 5 Network Virtualization Scott Devine VMware, Inc.
Progress in Standardization of RDMA technology Arkady Kanevsky, Ph.D Chair of DAT Collaborative.
Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen
Introduction to Networks
Internetworking: Hardware/Software Interface
Storage Networking Protocols
Application taxonomy & characterization
Presentation transcript:

iWARP Ethernet Key to Driving Ethernet into the Future Brian Hausauer Chief Architect NetEffect, Inc.

Agenda Situation overview Performance considerations NetworkingApplications New generation of adapters Performance discussion and demo Wrap up

Clustering iWARP Ethernet Clustering Myrinet, Quadrics, InfiniBand, etc. Storage iWARP Ethernet SAN Block Storage Fibre Channel networking storage clustering storage LAN iWARP Ethernet LAN Ethernet Data Center Evolution Separate Fabrics for Networking, Storage, and Clustering ▪ ▪ ▪ ▪ ▪ ▪ ▪ switch networking storage clustering networking Applications adapter networking storage clustering adapter ▪ ▪ ▪ ▪ ▪ ▪ ▪ switch NAS Users ▪ ▪ ▪ ▪ ▪ ▪ ▪ switch

Converged iWARP Ethernet SAN Single Adapter for All Traffic Converged Fabric for Networking, Storage, and Clustering Users Smaller footprint Lower complexity Higher bandwidth Lower power Lower heat dissipation NAS Server Blade iWARP Server Blade iWARP Server Blade iWARP Server Blade iWARP Server Blade iWARP Switch ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ switch networking storage clustering adapter Applications

device driver OS Networking Performance Barriers application I/O library user server software kernel TCP/IP software hardware standard Ethernet TCP/IP packet I/O cmd % CPU Overhead 100% application to OS context switches 40% transport processing 40% Intermediate buffer copies 20% context switch Packet Processing Intermediate Buffer Copies Command Context Switches adapter buffer app buffer OS buffer driver buffer I/O Adapter

device driver OS application I/O library user device driver OS NetEffect NE010 iWARP Ethernet server software kernel TCP/IP software hardware standard Ethernet TCP/IP packet adapter buffer app buffer OS buffer driver buffer I/O cmd % CPU Overhead 100% application to OS context switches 40% transport processing 40% Intermediate buffer copies 20% context switch TCP/IP app buffer application to OS context switches 40% Intermediate buffer copies 20% 60% 40% application to OS context switches 40% Transport (TCP) offload RDMA / DDP User-Level Direct Access/ OS Bypass Packet Processing Intermediate Buffer Copies Command Context Switches I/O Adapter standard Ethernet TCP/IP packet Eliminate Networking Performance Barriers With iWARP

Application Performance Barriers In Today’s Data Center Non-overlapped socket send() Usually means data is copied before transmit on wire On receive, transaction control info and data payloads are usually multiplexed on a single byte stream To avoid an additional buffer copy on receive, application often does not pre-post receive buffers

Application Performance Solutions in Tomorrow’s Data Center Windows already provides overlapped I/O to solve copy-on-transmit problem Elimination of copy-on-receive requires application to be RDMA-aware for typical transaction protocols

App performs socket send #2 and blocks OS builds TCP/IP data packets App performs socket send #1 and blocks NIC Tx OS receives TCP/IP data packets and builds ACK packets OS receives ACK packets and unblocks App NIC Rx NIC Tx NIC Rx Local Server Network Remote Server Time ApplicationBlocked! Legacy Sockets App Performance Barrier Non-Overlapped Socket send() OSes typically eliminate application blocking by copying application socket send data into kernel buffers

Local Server Network Remote Server Time OS builds TCP/IP data packets App performs Winsock2 Overlapped socket send #1 NIC Tx OS receives TCP/IP data packets and builds ACK packets OS receives ACK packets and notifies App of completion NIC Rx NIC Tx NIC Rx Enhanced Sockets App Performance Fix Winsock2 Overlapped Socket send() OS builds TCP/IP data packets App performs Winsock2 Overlapped socket send #2 NIC Tx OS receives TCP/IP data packets and builds ACK packets OS receives ACK packets and notifies App of completion NIC Rx NIC Tx NIC Rx ApplicationBlocked!

Application buffers in Host memory Data Payload d Ctrl Msg #4 Legacy Sockets App Performance Barrier No Pre-Posted Socket recv() Data Payload s+1 Ctrl Msg #3 Data Payload p Ctrl Msg #2 Data Payload s Ctrl Msg #1 // Pseudocode showing legacy sockets app receive algorithm while (1) { post socket recv() to obtain transaction control message; identify pre-allocated app buffer pertaining to received control message; post socket recv() to move transaction data payload into identified buffer; } p d s Transaction Protocols such as iSCSI multiplex control info and data payloads on a single byte stream

Application buffers in Host memory Data Payload d Ctrl Msg #4 RDMA Aware Sockets App Performance Fix Use Direct Data Placement (DDP) Intelligent NIC uses iWARP headers embedded in the packets to directly place data payloads in pre-allocated app buffers Eliminates software latency loop from legacy sockets apps Data Payload q+1 Ctrl Msg #3 Data Payload p Ctrl Msg #2 Data Payload q Ctrl Msg #1 p d q iWARP Receive Queue Preposted buffers for Control Messages

Networking Performance Continuum Application Characteristics Networking Offloads Legacy Sockets App Enhanced Sockets App Layer 2 traditional NIC only Legacy Sockets App Enhanced Sockets App RDMA-enabled NIC supporting WSD RDMA aware Sockets App RDMA-enabled NIC supporting RDMA Chimney Availability Now Future Windows Server release

Ethernet Adapters Are Evolving To Require... Networking offloads defined by RDMAC and IETF iWARP extensions to TCP/IP Transport (TCP) offload RDMA / DDP User-Level Direct Access/OS Bypass Ability to eliminate both networking and application performance barriers Simultaneous support for traditional sockets and RDMA-aware applications Industry standard h/w and s/w interfaces Performance > 1 million messages per second < 10% CPU utilization < 10us end-to-end application latency Scalability 100k’s of simultaneous connections Architecture that scales to multiple 10 Gb Ports NE Gb iWARP Ethernet Channel Adapter

iWARP Demonstration Enhanced sockets application running on iWARP hardware through Winsock Direct RDMA-enabled application running on iWARP hardware through iWARP Verbs emulating RDMA-aware sockets application adapter NE010 iWARP Ethernet adapter NE010 iWARP Ethernet

Network Application Performance Unidirectional B/W vs. Message Size NetEffect WSD Overlapped I/ONetEffect WSD Non-Overlapped I/ONetEffect RDMA-aware App Host Stack Overlapped I/OHost Stack Non-Overlapped I/O Message Size (KB) Gb/s PCI-X Bus B/W Limit

Network Application CPU Utilization GBits per CPU GHz versus Message Size Message Size (KB) GBits per CPU GHz Host Stack Overlapped I/ONetEffect WSD Overlapped I/ONetEffect RDMA-aware App Conventional wisdom: Traditional NIC with Host Stack capable of 1 Gb per x86 CPU GHz

Takeaways iWARP Ethernet Channel Adapters Eliminate networking barriers Support Microsoft’s advanced APIs enabling application evolution for performance NetEffect iWARP Ethernet Channel Adapters Industry leading 10 GB Ethernet throughput, CPU utilization and latency Available now

Call To Action Deploy Winsock Direct with iWARP RDMA to boost performance of existing applications Plan for convergence of networking, storage and clustering enabled by 10 GB iWARP Ethernet Channel Adapters Develop RDMA-aware applications for optimal performance

Additional Resources Web Resources NetEffect: iWARP Consortium: Specs RDMA Consortium: IETF RDDP WG: charter.html charter.htmlwww.ietf.org/html.charters/rddp- charter.html White Papers Asynchronous Zero-copy Communication for Synchronous Sockets nowlab.cse.ohio-state.edu/publications/conf- papers/2006/balaji-cac06.pdf Contact info neteffect.com