Commodity Ethernet Clusters - Maximizing Your Performance

Slides:

Advertisements

Similar presentations

Polycom Unified Collaboration for IBM Lotus Sametime and IBM Lotus Notes January 2010.

Advertisements

1 UNIT I (Contd..) High-Speed LANs. 2 Introduction Fast Ethernet and Gigabit Ethernet Fast Ethernet and Gigabit Ethernet Fibre Channel Fibre Channel High-speed.

Clustering Technology For Scaleability Jim Gray Microsoft Research

1 The Metro Ethernet Forum Helping Define the Next Generation of Service and Transport Standards Ron Young Chairman of the Board

| Copyright © 2009 Juniper Networks, Inc. | 1 WX Client Rajoo Nagar PLM, WABU.

© 2006 DataCore Software Corp SANmotion New: Simple and Painless Data Migration for Windows Systems Note: Must be displayed using PowerPoint Slideshow.

1 Vidar Stokke Senior Engineer at the Norwegian University of Science and Technology, IT-division, Networking Programme: 1.History of wireless networks.

M A Wajid Tanveer Infrastructure M A Wajid Tanveer

The Platform as a Service Model for Networking Eric Keller, Jennifer Rexford Princeton University INM/WREN 2010.

CMPE 150- Introduction to Computer Networks 1 CMPE 150 Fall 2005 Lecture 6 Introduction to Networks and the Internet.

1 Sizing the Streaming Media Cluster Solution for a Given Workload Lucy Cherkasova and Wenting Tang HPLabs.

Virtual Switching Without a Hypervisor for a More Secure Cloud Xin Jin Princeton University Joint work with Eric Keller(UPenn) and Jennifer Rexford(Princeton)

© 2006 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialBCMSN BCMSN Module 1 Lesson 1 Network Requirements.

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.

NetSlices: Scalable Multi-Core Packet Processing in User-Space Tudor Marian, Ki Suh Lee, Hakim Weatherspoon Cornell University Presented by Ki Suh Lee.

The Development of Mellanox - NVIDIA GPUDirect over InfiniBand A New Model for GPU to GPU Communications Gilad Shainer.

Unified Wire Felix Marti, Open Fabrics Alliance Workshop Sonoma, April 2008 Chelsio Communications.

♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.

Protocols and software for exploiting Myrinet clusters Congduc Pham and the main contributors P. Geoffray, L. Prylli, B. Tourancheau, R. Westrelin.

Beowulf Supercomputer System Lee, Jung won CS843.

Performance Characterization of a 10-Gigabit Ethernet TOE W. Feng ¥ P. Balaji α C. Baron £ L. N. Bhuyan £ D. K. Panda α ¥ Advanced Computing Lab, Los Alamos.

Institute of Computer Science Foundation for Research and Technology – Hellas Greece Computer Architecture and VLSI Systems Laboratory Exploiting Spatial.

RDS and Oracle 10g RAC Update Paul Tsien, Oracle.

VIA and Its Extension To TCP/IP Network Yingping Lu Based on Paper “Queue Pair IP, …” by Philip Buonadonna.

A Comparative Study of Network Protocols & Interconnect for Cluster Computing Performance Evaluation of Fast Ethernet, Gigabit Ethernet and Myrinet.

Embedded Transport Acceleration Intel Xeon Processor as a Packet Processing Engine Abhishek Mitra Professor: Dr. Bhuyan.

1 Performance Evaluation of Gigabit Ethernet & Myrinet

An overview of Infiniband Reykjavik, June 24th 2008 R E Y K J A V I K U N I V E R S I T Y Dept. Computer Science Center for Analysis and Design of Intelligent.

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.

1 AppliedMicro X-Gene ® ARM Processors Optimized Scale-Out Solutions for Supercomputing.

Sockets vs. RDMA Interface over 10-Gigabit Networks: An In-depth Analysis of the Memory Traffic Bottleneck Pavan Balaji  Hemal V. Shah ¥ D. K. Panda 

IWARP Ethernet Key to Driving Ethernet into the Future Brian Hausauer Chief Architect NetEffect, Inc.

1 Down Place Hammersmith London UK 530 Lytton Ave. Palo Alto CA USA.

New Direction Proposal: An OpenFabrics Framework for high-performance I/O apps OFA TAC, Key drivers: Sean Hefty, Paul Grun.

Parallel Computing The Bad News –Hardware is not getting faster fast enough –Too many architectures –Existing architectures are too specific –Programs.

Roland Dreier Technical Lead – Cisco Systems, Inc. OpenIB Maintainer Sean Hefty Software Engineer – Intel Corporation OpenIB Maintainer Yaron Haviv CTO.

Copyright 2009 Fujitsu America, Inc. 0 Fujitsu PRIMERGY Servers “Next Generation HPC and Cloud Architecture” PRIMERGY CX1000 Tom Donnelly April

Towards a Common Communication Infrastructure for Clusters and Grids Darius Buntinas Argonne National Laboratory.

High Performance User-Level Sockets over Gigabit Ethernet Pavan Balaji Ohio State University Piyush Shivam Ohio State University.

Boosting Event Building Performance Using Infiniband FDR for CMS Upgrade Andrew Forrest – CERN (PH/CMD) Technology and Instrumentation in Particle Physics.

© 2007 Cisco Systems, Inc. All rights reserved.Cisco Public 1 Version 4.0 Network Services Networking for Home and Small Businesses – Chapter 6.

The NE010 iWARP Adapter Gary Montry Senior Scientist

InfiniSwitch Company Confidential. 2 InfiniSwitch Agenda InfiniBand Overview Company Overview Product Strategy Q&A.

LAN Switching and Wireless – Chapter 1

Remote Direct Memory Access (RDMA) over IP PFLDNet 2003, Geneva Stephen Bailey, Sandburst Corp., Allyn Romanow, Cisco Systems,

Srihari Makineni & Ravi Iyer Communications Technology Lab

1 Public DAFS Storage for High Performance Computing using MPI-I/O: Design and Experience Arkady Kanevsky & Peter Corbett Network Appliance Vijay Velusamy.

Infiniband Bart Taylor. What it is InfiniBand™ Architecture defines a new interconnect technology for servers that changes the way data centers will be.

An Architecture and Prototype Implementation for TCP/IP Hardware Support Mirko Benz Dresden University of Technology, Germany TERENA 2001.

1 Recommendations Now that 40 GbE has been adopted as part of the 802.3ba Task Force, there is a need to consider inter-switch links applications at 40.

Performance Networking ™ Server Blade Summit March 23, 2005.

Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage April 2010.

Interconnection network network interface and a case study.

Mr. P. K. GuptaSandeep Gupta Roopak Agarwal

COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.

Copyright © 2009 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Principles of Parallel Programming First Edition by Calvin Lin Lawrence Snyder.

Barriers to IB adoption (Storage Perspective) Ashish Batwara Software Solution Architect May 01, 2007.

Mellanox Connectivity Solutions for Scalable HPC Highest Performing, Most Efficient End-to-End Connectivity for Servers and Storage September 2010 Brandon.

Networking update and plans (see also chapter 10 of TP) Bob Dobinson, CERN, June 2000.

Sockets Direct Protocol for Hybrid Network Stacks: A Case Study with iWARP over 10G Ethernet P. Balaji, S. Bhagvat, R. Thakur and D. K. Panda, Mathematics.

Tackling I/O Issues 1 David Race 16 March 2010.

Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.

Advisor: Hung Shi-Hao Presenter: Chen Yu-Jen

Enhancements for Voltaire’s InfiniBand simulator

Joint Techs Workshop InfiniBand Now and Tomorrow

Internetworking: Hardware/Software Interface

Storage Networking Protocols

IBM Power Systems.

Cluster Computers.

Presentation transcript:

Commodity Ethernet Clusters - Maximizing Your Performance ™ Performance Networking Commodity Ethernet Clusters - Maximizing Your Performance Bob Felderman CTO Precision I/O

Ethernet Clusters: What do you think of?

Ethernet Clusters: What you’d like to see High performance clusters using commodity Ethernet technology to accomplish HPTC tasks.

Ethernet Clusters: The Reality Rack ‘n’ Stack with Gigabit Ethernet switches. Generally 1U or 2U systems. Often home-grown, especially at Universities Certainly available pre-configured from many vendors (Dual GigE is standard on many motherboards). Most clusters installed today are GigE. Blade-based system backplanes are now GigE and Infiniband too. Some PCIExpress soon.

Why so few “big iron” Ethernet clusters? Performance Performance Performance Bandwidth Is 1 GigE sufficient? Latency (time to get a message from here to there) Overhead (work the CPU has to do per message) Switch issues Latency Size (number of ports) Flow-Control (xon/xoff messages versus hardware flow-ctrl) What about Price/Performance? Does it matter? Acquisition cost and ongoing cost of operation Ethernet clusters are employed successfully today for embarrassingly parallel problems or for small-scale parallel tasks.

Bandwidth Is 1GigE enough? 10Gigabit Ethernet is available today Dominant cluster interconnect is Myrinet at 2Gig 10Gigabit Ethernet is available today Intel, Neterion, Chelsio and others Various physical media Fiber is common (and expensive) Copper will happen in some form CX-4 used by IB and appearing in 10GigE Twisted Pair - a must for economic deployment? Fat copper versus thin fiber Some standardization efforts at 2Gig and 5Gig Not likely to happen unless the PHY is the big issue

Latency/Overhead (this is the problem) Ethernet Latency measured via the standard sockets kernel path. Traditionally this has been expensive System calls, interrupts, kernel locks May no longer be so bad 3GHz Xeon running linux-2.4.20 20usec one-way latency via kernel path But the 20usec is mostly time that the CPU is busy sending or receiving (overhead!) Quoted Ethernet latency is highly CPU-speed dependent. IB, Myrinet, Quadrics mostly independent of host CPU. Process Interrupt Boundary Crossing OS and Protocol Stack Boundary crossings Locking Interrupts NIC

Latency/Overhead Solutions “TCP is heavyweight” TCP Offload Engines (TOEs) Move protocol processing to the NIC But, is protocol processing really the performance problem? Data movement is expensive TOE + RDMA-NICs (RNICs) Allows direct data placement like VIA/Infiniband Also has OS bypass opportunities Cache load is the expensive operation, so RDMA benefit may be less than expected for HPTC applications data wants to meet processing cycles Infiniband, Myrinet, Quadrics etc. have direct user-level access to the network Much easier to get low latency that way

What is the “Best” Host Interface/API? Traditionally, HPC programmers have been willing to try anything to get performance. But, MPI is really the application programming paradigm of choice. MPI over GM, over IB, over VIA etc. MPI over Sockets (e.g. MPICH ch_p4) is the common API for Ethernet solutions. If you want to leverage mass market and commodity parts, then sockets/IP/Ethernet is the way to go! Assuming you can get the performance you want/need You improve non HPC apps at the same time CPU Overhead is the important metric (UC Berkeley) R. Martin, A. Vahdat, D. Culler, T. Anderson. Effects of Communication Latency, Overhead, and Bandwidth in a Cluster Architecture. International Symposium on Computer Architecture (ISCA) , Denver, CO. June 1997

Ethernet: The Precision I/O Contribution 100% IP/Ethernet host-side solution Software-only at 1Gig – in Beta test today Hardware and Software for 10GigE Dramatically increases server application capacity Higher I/O throughput Improved CPU utilization / Higher transaction capacity Lower latency General-purpose Supports a broad range of enterprise-class applications Benefits both short transactions and streaming workloads Non-disruptive Supports incremental deployment - one server at a time Only required on one end of the connection - no client changes No new fabric, protocols, packet formats, or infrastructure required No application or operating system changes required

Precision I/O: Solution Overview 1 Gigabit Ethernet NIC Precision PacketFlow Driver OS Protocol Stack Kernel Space User Application Process Precision PacketFlow Library Standard Driver Precision PacketFlow bypasses the OS, significantly speeding network I/O performance Q3’ 2005 PacketFlow MPI Software product that extends the life and investment in existing GigE networks Increases MPI performance by 2x Improves associated application performance 20-50% Available for Beta Test Today! Speed Performance for Today's 1 Gigabit Ethernet Network

Precision I/O: Solution Overview Q3’ 2005 PacketFlow MPI Software product that extends the life and investment in existing GigE networks Increases MPI performance by 2x Improves associated application performance 20-50% Available for Beta Test Today! Speed Performance for Today's 1 Gigabit Ethernet Network Lower Latency by reducing time spent on System Calls 200 400 600 800 1000 1200 1400 CPU Cycles TCP/IP Processing System Call What do customers really want? Don’t have the luxury to change?

Precision I/O: Solution Overview Q3’ 2005 PacketFlow MPI Software product that extends the life and investment in existing GigE networks Increases MPI performance by 2x Improves associated application performance 20-50% Available for Beta Test Today! Speed Performance for Today's 1 Gigabit Ethernet Network H1’ 2006 PacketDirect 10Gigabit Ethernet Adds hardware in combination with Precision PacketFlow™ software to deliver 10-gigabit, wire rate performance. Incremental migration from today's 10 GigE deployments Standards based Easy to support Affordable Deliver Higher Bandwidth Standard Ethernet (10Gb) for Future Networks

What are the ecosystem concerns for Ethernet? Ecosystem Issues What are the ecosystem concerns for Ethernet? Fortunately they are quite limited Ethernet/IP is ubiquitous Customers in other domains (and HPTC too) want Ethernet and IP everywhere Key issue: Switch Performance and Scaling Number of ports Latency through switch

Current Ethernet Switch Landscape Most current 1GigE switches have too few ports and too much latency to be interesting. Typically 5 to 50usec when you have 100usec latency on the host, who cares about the switch latency? Latency is often directly proportional to price! (NOT inversely) Small number of ports means a deep tree of switches and more latency end-to-end. 10GigE switches don’t look much better

If you want it, you’ve got to ask Switch vendors are not yet focused on HPC. These features don’t help HPC clustering Layer 4-7 “awareness” Fire walling Load balancing Adding 10GigE as an uplink isn’t sufficient Go out and ask for a layer-2 switch with high-port count low-latency (1usec or less)

But at least one vendor is on board! Fujitsu Labs of America, “A Single Chip Shared Memory Switch with Twelve 10Gb Ethernet Ports” Hot Chips 15, August 2003 Focus on layer-2 switching High-throughput, low-latency core SerDes integration Copper 700ns latency, Optical 1,200ns Shipping today in switches (450ns port-to-port latency)

Summary: Fast Transparent Performance Extends the life and investment of existing Gigabit Ethernet networks. Precision PacketFlow™ MPI significantly improves the performance of your existing MPI applications Available for beta testing today! Transparently increase application performance Precision I/O solutions deliver dramatic performance improvements yet require no changes to your application, OS or network Provides a gradual migration path to future 10 GigE networks Start with Precision PacketFlow today to increase 1 GigE performance Migrate to Precision PacketDirect 10GigE NIC gradually over time Supports incremental deployment - one server at a time Only required on one end of the connection - no client changes No new fabric, protocols, packet formats, or infrastructure required No application or operating system changes required

Performance Networking ™ Performance Networking Precision I/O 4005 Miranda Ave., Suite 210 Palo Alto, CA 94304-1232 650-331-8000 http://www.precisionio.com Bob.Felderman@precisionio.com

Company Founded: February 2003 (spun out of Packet Design LLC) Location: Palo Alto, CA Employees: 45 Funded: Q1 2004, Q3 2005: Foundation Capital, ATV, 3i Market status: 1 Gigabit Ethernet performance boosting software in beta today Derek Proudian, CEO HP, Mohr Davidow, Zip2, Panasas Bob Felderman, CTO USC/ISI, Myricom Ed Roseberry, VP OEM & Biz Dev HP, 3Com, Alteon, Nortel, S2io/Neterion Dan O’Farrell, VP Marketing HP, N.E.T., interWAVE, Sun, Peribit Judy Estrin, Chairman Cisco, Precept, NCD, 3Com, Bridge Van Jacobson, Chief Scientist Cisco, Lawrence Berkeley Labs Narendra Dhara, VP Engineering LSI, NEC, Corona Networks, IntruGuard