© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 1 Steven Carter Cisco Systems Makia Minich,

Slides:



Advertisements
Similar presentations
How does a network identify computers and transmissions?
Advertisements

The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 W. Schulte Chapter 5: Inter-VLAN Routing Routing And Switching.
© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 5: Inter-VLAN Routing Routing & Switching.
Chapter 10 Wide Area Networks. Contents The need for Wide area networks (WANs) Point-to-point approaches Statistical multiplexing, TDM, FDM approaches.
1 King Fahd University of Petroleum & Minerals ELECTRICAL ENGINEERING DEPARTMENT Presenter: Abdullah M. Al-Qahtani (953147) Abdallah AL-Gahtani (224172)
An overview of Infiniband Reykjavik, June 24th 2008 R E Y K J A V I K U N I V E R S I T Y Dept. Computer Science Center for Analysis and Design of Intelligent.
5/8/2006 Nicole SAN Protocols 1 Storage Networking Protocols Nicole Opferman CS 526.
Storage area network and System area network (SAN)
Storage Networking Technologies and Virtualization Section 2 DAS and Introduction to SCSI1.
Data Centers and IP PBXs LAN Structures Private Clouds IP PBX Architecture IP PBX Hosting.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Communicating over the Network Network Fundamentals – Chapter 2.
THE EMC EFFECT Page.1 Building the ESN Infrastructure Doing business without barriers EMC Enterprise Storage Network.
1 Wide Area Network. 2 What is a WAN? A wide area network (WAN ) is a data communications network that covers a relatively broad geographic area and that.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco PublicITE I Chapter 6 1 LAN Design LAN Switching and Wireless – Chapter 1.
Networking Components Christopher Biles LTEC Assignment 3.
Chapter 6 High-Speed LANs Chapter 6 High-Speed LANs.
Presented by CCS Network Roadmap Steven M. Carter Network Task Lead Center for Computational Sciences.
Copyright DataDirect Networks - All Rights Reserved - Not reproducible without express written permission Adventures Installing Infiniband Storage Randy.
Voltaire The Grid Backbone™ InfiniBand CERN Seminar Asaf Somekh VP Strategic Alliances June 2006.
Towards a Common Communication Infrastructure for Clusters and Grids Darius Buntinas Argonne National Laboratory.
Maximizing The Compute Power With Mellanox InfiniBand Connectivity Gilad Shainer Wolfram Technology Conference 2006.
Slide 1 DESIGN, IMPLEMENTATION, AND PERFORMANCE ANALYSIS OF THE ISCSI PROTOCOL FOR SCSI OVER TCP/IP By Anshul Chadda (Trebia Networks)-Speaker Ashish Palekar.
Repeaters and Hubs Repeaters: simplest type of connectivity devices that regenerate a digital signal Operate in Physical layer Cannot improve or correct.
InfiniSwitch Company Confidential. 2 InfiniSwitch Agenda InfiniBand Overview Company Overview Product Strategy Q&A.
LAN Switching and Wireless – Chapter 1
11 NETWORK CONNECTION HARDWARE Chapter 3. Chapter 3: NETWORK CONNECTION HARDWARE2 NETWORK INTERFACE ADAPTER  Provides the link between a computer and.
Securing and Monitoring 10GbE WAN Links Steven Carter Center for Computational Sciences Oak Ridge National Laboratory.
Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,
Presented by Leadership Computing Facility (LCF) Roadmap Buddy Bland Center for Computational Sciences Leadership Computing Facility Project.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Services in a Converged WAN Accessing the WAN – Chapter 1.
Chapter 7 Backbone Network. Announcements and Outline Announcements Outline Backbone Network Components  Switches, Routers, Gateways Backbone Network.
Infiniband Bart Taylor. What it is InfiniBand™ Architecture defines a new interconnect technology for servers that changes the way data centers will be.
William Stallings Data and Computer Communications
Disk Interfaces Last Update Copyright Kenneth M. Chipps Ph.D. 1.
Chapter2 Networking Fundamentals
Lecture (Mar 23, 2000) H/W Assignment 3 posted on Web –Due Tuesday March 28, 2000 Review of Data packets LANS WANS.
© 2009 IBM Corporation Statements of IBM future plans and directions are provided for information purposes only. Plans and direction are subject to change.
Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.
CMS week, June 2002, CERN 1 First P2P Measurements on Infiniband Luciano Berti INFN Laboratori Nazionali di Legnaro.
© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Connecting to the Network Introduction to Networking Concepts.
CCNA Guide to Cisco Networking Chapter 2: Network Devices.
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage April 2010.
Presented by NCCS Network Roadmap Josh Lothian High Performance Computing Operations National Center for Computational Sciences.
Rick Claus Sr. Technical Evangelist,
Advanced Computer Networks Lecturer: E EE Eng. Ahmed Hemaid Office: I 114.
Mr. P. K. GuptaSandeep Gupta Roopak Agarwal
Internet Protocol Storage Area Networks (IP SAN)
Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage September 2010 Brandon.
Range Extension Sonoma’06 David Southwell, Jason Gunthorpe – Obsidan Linden Mercer – Naval Research Labs Bill Boas – OpenIB/ SC|05.
ARUN S CS-7 NO:6. HIGH SPEED OPTICAL CABLE TECHNOLOGY HIGH BANDWIDTH UNIVERSAL CONNECTOR SUPPORTS MULTIPLE PROTOCOLS  10Gb/s to 100Gb/s  single universal.
© 2007 EMC Corporation. All rights reserved. Internet Protocol Storage Area Networks (IP SAN) Module 3.4.
© 2006 Cisco Systems, Inc. All rights reserved.Cisco PublicITE I Chapter 6 1 Cisco Routers.
Network Concepts.
Enhancements for Voltaire’s InfiniBand simulator
Wide Area InfiniBand What it is, and why it is
Ryan Leonard Storage and Solutions Architect
Protocols and the TCP/IP Suite
Instructor Materials Chapter 1: LAN Design
Chapter 5: Inter-VLAN Routing
Introduction to Networks
Ken Gunnells, Ph.D. - Networking Paul Crigler - Programming
Chapter 7 Backbone Network
Direct Attached Storage and Introduction to SCSI
Storage Networking Protocols
ECEN “Internet Protocols and Modeling”
Application taxonomy & characterization
Cost Effective Network Storage Solutions
EEC4113 Data Communication & Multimedia System Chapter 1: Introduction by Muhazam Mustapha, July 2010.
Presentation transcript:

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 1 Steven Carter Cisco Systems Makia Minich, Nageswara Rao Oak Ridge National Laboratory

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 2 Agenda  Overview  The Good, The Bad, and The Ugly  IB LAN Case Study: Oak Ridge National Laboratory Center for Computational Sciences  IB WAN Case Study: Department of Energy’s UltraScience Network

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 3 Overview  Data movement requirements are, once again, exploding in the HPC community (sensors produce more data, larger computers compute with higher accuracy, disk subsystems are bigger/faster, etc)  The requirement to move 100’s of GB/s (the rates currently proposed for many of the new petascale systems) within the data center necessitate something more than is being currently provided by the Ethernet community  There also exists a requirement to move large amounts of data between data centers. TCP/IP does not adequately meet this need because of its poor wide-are characteristics.  This is a high-level overview of the pros and cons of using Infiniband to meet these needs and two case studies to reinforce them

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 4 Agenda  Overview  Infiniband: The Good, The Bad, and The Ugly  IB LAN Case Study: Oak Ridge National Laboratory Center for Computational Sciences  IB WAN Case Study: Department of Energy’s UltraScience Network

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 5 The Good  Cool Name (Marketing gets an A+ -- who doesn’t want infinite bandwidth?)  Unified Fabric/IO Virtualization: – Low-latency interconnect - nanoseconds, not low microseconds - not necessarily important in a data center – Storage – Using SRP (SCSI RDMA Protocol) or iSER (iSCSI Extension for RDMA) – IP – Using IPoIB, newer versions run over Connected Mode giving better throughput – Gateways – Gateways give access to legacy Ethernet (careful) and Fibre Channel networks

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 6 The Good (Cont.)  Faster link speeds: – 1x Single Data Rate (SDR) = 2.5 Gb/s (2 Gb/s with 8b/10b signalling) – 4 1x links can be aggregated into a single 4x link – 3 4x links can be aggregated into a single 12x link (single 12x link also available) – Double Data Rate (DDR) currently available, Quad Data Rate (QDR) on the horizon – Many link speeds available: 8Gb/s, 16Gb/s, 24 Gb/s, 32Gb/s, 48 Gb/s, etc.

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 7 The Good (Cont.)  HCA does much of the heavy lifting: – Much of the protocol is done on the Host Channel Adapter (HCA) heavily leveraging DMA – Remote Direct Memory Access (RDMA) gives the ability to transfer data between hosts with very little CPU overhead – RDMA capability is EXTREMELY important because it provides significantly greater capability from the same hardware

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 8 The Good (Cont.)  Nearly 10x less cost for similar bandwidth: – Because of its simplicity, IB switches cost less. Oddly enough, IB HCAs are more complex than 10G NICs, but are also less expensive. – Roughly $500 per port in the switch and $500 for a dual port DDR HCA – Because of RDMA, there is a cost savings in infrastructure as well (i.e. you can do more with fewer hosts)  Higher port density switches: – Switches available with 288 (or more) full-rate ports in a single chassis

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 9 The Bad  IB sounds too much like IP (Can quickly degrade into a “Who’s on first” routine)  IB is not well understood by networking folks  Lacks some of the features of Ethernet important in the Data Center: – Router – no way to natively connect two separate fabrics - The IB Subnet Manager (SM) is integral to the operation of the network (detects hosts, programs routes into the switch, etc). Without a router, you cannot have two different SMs for different operational or administrative domains (Can be worked around at the application layer). – Firewall – No way to dictate who talks to whom by protocol (partitions exist, but are too course grained) – Protocol Analyzers - They exist but are hard to come by, difficult to “roll your own” because of the protocol is embedded in the HCA

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 10 The Ugly  Cabling options: Heavy gage cables with clunky CX4 connectors Short distance (< 20 meters) If mishandled, they have a propensity to fail Heavy connectors can become disengaged Electrical to optical converter Long distance (up to 150 meters) Uses multi-core ribbon fiber (hard to debug) Expensive Heavy connectors can become disengaged

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 11 The Ugly (Continued)  Cabling options: Electrical to optical converter built on the cable Long distance (up to 100 meters) Uses multi-core ribbon fiber (hard to debug) More cost effective than other solutions Heavy connectors can become disengaged

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 12 Agenda  Overview  Infiniband: The Good, The Bad, and The Ugly  IB LAN Case Study: Oak Ridge National Laboratory Center for Computational Sciences  IB WAN Case Study: Department of Energy’s UltraScience Network

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 13 Case Study: ORNL Center for Computational Sciences (CCS)  The Department of Energy established the Leadership Computing Facility at ORNL’s Center for Computational Sciences to field a 1PF supercomputer  The design chosen, the Cray XT series, includes an internal Lustre filesystem capable of sustaining reads and writes of 240GB/s  The problem with making the filesystem part of the machine is that it limits the flexibly of the Lustre filesystem and increases the complexity of the Cray  The problem with decoupling the filesystem from the machine is the high cost involved with to connect it via 10GE at the required speeds

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 14 Ethernet [O(10GB/s)] Ethernet core scaled to match wide-area connectivity and archive Infiniband core scaled to match central file system and data transfer CCS IB Network Roadmap Summary Viz High-Performance Storage System (HPSS) Jaguar Lustre Baker Infiniband [O(100GB/s)] Gateway

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 15 Spider (Linux Cluster) Rizzo (XT3) IB Switch ORNL showed the first successful infiniband implementation on the XT3 Using Infiniband in the XT3’s I/O nodes running a Lustre Router resulted in a > 50% improvement in performance and a significant decrease in CPU utilization XT3 LAN Testing

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 16 Observations  XT3's performance is good (better than 10GE) for RDMA  XT3's poor performance compared to the generic X86_64 host likely a result of PCI-X HCA (known to be sub-optimal)  In its role as a Lustre router, IB allows significantly better performance per I/O node allowing CCS to achieve the required throughput with fewer nodes than would be needed using 10GE

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 17 Agenda  Overview  Infiniband: The Good, The Bad, and The Ugly  IB LAN Case Study: Oak Ridge National Laboratory Center for Computational Sciences  IB WAN Case Study: Department of Energy’s UltraScience Network

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 18 IB over WAN testing  Placed 2 x Obsidian Longbow devices between two test hosts  Provisioned loopback circuits of various lengths on the DOE UltraScience Network and ran test.  RDMA Test Results: Local: 7.5Gbps (Longbow to Longbow) ORNL ORNL (0.2mile): 7.5Gbps ORNL Chicago (1400miles): 7.46Gbps ORNL Seattle (6600 miles): 7.23Gbps ORNL Sunnyvale (8600 miles): 7.2Gbps Obsidian Longbow Ciena CD-CI (SNV) 4x Infiniband SDR OC-192 SONET Host Ciena CD-CI (ORNL) Obsidian Longbow DOE UltraScience Network

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 19 Sunnyvale loopback (8600 miles) – RC problem

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 20 Observations  The Obsidian Longbows appear to be extending sufficient link-level credits  Native IB transports does not appear to suffer from the same wide- area shortcomings as TCP (i.e. Full rate with no tuning)  With the Arbel based HCAs, we saw problems: – RC only performs well at large messages sizes – There seems to be a maximum number of messages allowed in flight (~250) – RC performance does not increase rapidly enough even when message cap is not an issue  The problems seem to be fixed with the new Hermon-based HCAs…

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 21 Obsidian’s Results – Arbel vs. Hermon Arbel to HermonHermon to Arbel

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 22 Summary  Infiniband has the potential to make a great data center interconnect because it provides a unified fabric, faster link speeds, mature RDMA implementation, and lower cost  There does not appear to be the same intrinsic problem with IB in the wide-area as there is with IP/Ethernet, making IB a good candidate to transfer data between data centers

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public Infiniband in the Data Center 23 The End Questions? Comments? Criticisms? For more information: Steven Carter Cisco Systems Makia Minich, Nageswara Rao Oak Ridge National Laboratory