Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dolphin Wulfkit and Scali software The Supercomputer interconnect Summation Enterprises Pvt. Ltd. Preferred System Integrators since 1991. Amal D’Silva.

Similar presentations

Presentation on theme: "Dolphin Wulfkit and Scali software The Supercomputer interconnect Summation Enterprises Pvt. Ltd. Preferred System Integrators since 1991. Amal D’Silva."— Presentation transcript:

1 Dolphin Wulfkit and Scali software The Supercomputer interconnect Summation Enterprises Pvt. Ltd. Preferred System Integrators since 1991. Amal D’Silva

2 Agenda Dolphin Wulfkit hardware Scali Software / some commercial benchmarks Summation profile

3 Interconnect Technologies WANLAN I/O MemoryProcessor AT M Myrinet, cLan 100 00010 0001 000100101 20 50 000 1 100 000 1 100 1 000 10 000 ∞ 1 Design space for different technologie s Distance Bandwidt h Latency FibreChannel Cache Proprietary Busses Application areas: Application requirements: Bus Etherne t Cluster Interconnect Requirements SCSI Network Dolphin SCI Technology

4 Interconnect impact on cluster performance Some Real-world examples from Top500 May 2004 List Intel, Bangalore cluster:  574 Xeon 2.4 GHz CPUs/ GigE interconnect  Rpeak: 2755 GFLOPs Rmax: 1196 GFLOPs  Efficiency: 43% Kabru, IMSc, Chennai:  288 Xeon 2.4 GHz CPUs/ Wulfkit 3D interconnect  Rpeak: 1382 GFLOPs Rmax: 1002 GFLOPs  Efficiency: 72% Simply put, Kabru gives 84 % of the performance with HALF the number of CPUs !

5 Commodity interconnect limitations Cluster performance depends primarily on two factors: Bandwidth and Latency Gigabit: Speed limited to 1000mbps (approx 80 Megabytes/s in real world). This is fixed irrespective of processor power With Increasing processor speeds, latency “time taken to move data” from one node to another is playing an increasing role in cluster performance Gigabit typically gives an internode latency of 120 ~ 150 microsecs. As a result, CPUs in a node are often idling waiting to get data from another node In any switch based architecture, the switch becomes the single point of failure. If the switch goes down, so does the cluster.

6 Dolphin Wulfkit advantages Internode bandwidth: 260 Megabytes/s on Xeon/ (over three times faster that Gigabit). Latency: under 5 microsecs ( over TWENTY FIVE times quicker than Gigabit) Matrix type internode connections: No switch, hence no single point of failure Cards can be moved across processor generations. This leads to investment protection

7 Dolphin Wulfkit advantages (contd.) Linear scalability: e.g. adding 8 nodes to a 16 node cluster involves known fixed costs: eight nodes and eight Dolphin SCI cards. With any switch based architecture, there are additional issues like “unused ports” on the switch to be considered. E.g. For Gigabit, one has to “throw away” the 16 port switch and buy a 32 port switch Realworld performance on par /better than proprietary interconnects like Memory Channel (HP) and NUMAlink (SGI), at cost effective price points

8 Wulfkit : The Supercomputer Interconect Wulfkit is based on the Scalable Coherent Interface (SCI), the ANSI/IEEE 1596-1992 standard defines a point-to-point interface and a set of packet protocols. Wulfkit is not a networking technology, but is a purpose-designed cluster interconnect. The SCI interface has two unidirectional links that operate concurrently. Bus imitating protocol with packet-based handshake protocols and guaranteed data delivery. Upto 667 MegaBytes/s internode bandwidth.

9 SCI PCI-SCI Adapter Card 1 slot 2 dimensions SCI PSB PCI LC 2D Adapter Card SCI ADAPTERS (64 bit - 66 MHz)  PCI / SCI ADAPTER (D335)  D330 card with LC3 daughter card  Supports 2 SCI ring connections  Switching over B-Link  Used for WulfKit 2D clusters  PCI 64/66  D339 2-slot version

10 High Performance Interconnect: Torus TopologyTorus Topology IEEE/ANSI std. 1596 SCIIEEE/ANSI std. 1596 SCI 667MBytes/s/segment/ring667MBytes/s/segment/ring Shared Address SpaceShared Address Space System Interconnect Maintenance and LAN Interconnect: 100Mbit/s Ethernet100Mbit/s Ethernet (out of band monitoring)(out of band monitoring)

11 System Architecture 3 4x4 2D Torus SCI cluster Control Node (Frontend) GUI SC I Remote Workstation GUI C S TCP/IP Socket Server daemon Node daemon

12 PCI PSB66 LC-3 3D Torus topology (for greater than 64 ~ 72 nodes)

13 Linköping University - NSC - SCI Clusters Monolith: 200 node, 2xXeon, 2,2 GHz, 3D SCI INGVAR: 32 node, 2xAMD 900 MHz, 2D SCI Otto: 48 node, 2xP4 2.26 GHz, 2D SCI Commercial under installation: 40, 2xXeon, 2D SCI Total 320 SCI nodes Also in Sweden, Umeå University 120 Athlon nodes

14 Slide 14 - 03.05.2015 The difference is in the software... MPI connect middleware and MPIManage Cluster setup/ mgmt tools

15 Slide 15 - 03.05.2015 The difference is in the software... Scali Software Platform Scali MPI ManageScali MPI Manage –Cluster Installation /Management Scali MPI ConnectScali MPI Connect –High Performance MPI Libraries

16 Slide 16 - 03.05.2015 The difference is in the software... Fault TolerantFault Tolerant High BandwidthHigh Bandwidth Low LatencyLow Latency Multi-Thread safeMulti-Thread safe Simultaneous Inter/- Intra-node operationSimultaneous Inter/- Intra-node operation UNIX command line replicatedUNIX command line replicated Exact message size optionExact message size option Manual/debugger mode for selected processesManual/debugger mode for selected processes Explicit host specificationExplicit host specification Job queuingJob queuing –PBS, DQS, LSF, CCS, NQS, Maui Conformance to MPI-1.2 verified through 1665 MPI testsConformance to MPI-1.2 verified through 1665 MPI tests Scali MPI Connect

17 Slide 17 - 03.05.2015 The difference is in the software... Scali MPI Manage features System Installation and ConfigurationSystem Installation and Configuration System Administration System Administration System Monitoring Alarms and Event AutomationSystem Monitoring Alarms and Event Automation Work Load Management Work Load Management Hardware ManagementHardware Management Heterogeneous Cluster SupportHeterogeneous Cluster Support

18 Fault Tolerance 2D Torus topology more routing options XY routing algorithm Node 33 fails (3) Nodes on 33’s ringlets becomes unavailable Cluster fractured with current routing setting 142434441323 33 431222324211213141

19 Fault Tolerance Scali advanced routing algorithm: From the Turn Model family of routing algorithms All nodes but the failed one can be utilised as one big partition 431323 33 421222324111213144142434

20 Slide 20 - 03.05.2015 The difference is in the software... Scali MPI Manage GUI

21 Slide 21 - 03.05.2015 The difference is in the software... Monitoring ctd. Sam 113 51

22 Slide 22 - 03.05.2015 The difference is in the software... System Monitoring Resource Monitoring CPUCPU MemoryMemory DiskDisk Hardware Monitoring TemperatureTemperature Fan SpeedFan Speed Operator Alarms on selected Parameters at Specified Tresholds

23 Slide 23 - 03.05.2015 The difference is in the software... Events/Alarms

24 Slide 24 - 03.05.2015 The difference is in the software... SCI vs. Myrinet 2000: Ping-Pong comparison

25 Slide 25 - 03.05.2015 The difference is in the software... Itanium vs Cray T3E Bandwidth

26 Slide 26 - 03.05.2015 The difference is in the software... Itanium vs T3E Latency

27 Slide 27 - 03.05.2015 The difference is in the software... Max Planck Institute für Plasmaphysik, GermanyMax Planck Institute für Plasmaphysik, Germany University of Alberta, CanadaUniversity of Alberta, Canada University of Manitoba, CanadaUniversity of Manitoba, Canada Etnus Software, USAEtnus Software, USA Oracle Inc., USAOracle Inc., USA University of Florida, USAUniversity of Florida, USA deCODE Genetics, IcelanddeCODE Genetics, Iceland Uni-Heidelberg, GermanyUni-Heidelberg, Germany GMD, GermanyGMD, Germany Uni-Giessen, GermanyUni-Giessen, Germany Uni-Hannover, GermanyUni-Hannover, Germany Uni-Düsseldorf, GermanyUni-Düsseldorf, Germany Linux NetworX, USALinux NetworX, USA Magmasoft AG, GermanyMagmasoft AG, Germany University of Umeå, SwedenUniversity of Umeå, Sweden University of Linkøping, SwedenUniversity of Linkøping, Sweden PGS Inc., USAPGS Inc., USA US Naval Air, USAUS Naval Air, USA Some Reference Customers Spacetec/Tromsø Satellite Station, NorwaySpacetec/Tromsø Satellite Station, Norway Norwegian Defense Research EstablishmentNorwegian Defense Research Establishment Parallab, NorwayParallab, Norway Paderborn Parallel Computing Center, GermanyPaderborn Parallel Computing Center, Germany Fujitsu Siemens computers, GermanyFujitsu Siemens computers, Germany Spacebel, BelgiumSpacebel, Belgium Aerospatiale, FranceAerospatiale, France Fraunhofer Gesellschaft, GermanyFraunhofer Gesellschaft, Germany Lockheed Martin TDS, USALockheed Martin TDS, USA University of Geneva, SwitzerlandUniversity of Geneva, Switzerland University of Oslo, NorwayUniversity of Oslo, Norway Uni-C, DenmarkUni-C, Denmark Paderborn Parallel Computing CenterPaderborn Parallel Computing Center University of Lund, SwedenUniversity of Lund, Sweden University of Aachen, GermanyUniversity of Aachen, Germany DNV, NorwayDNV, Norway DaimlerChrysler, GermanyDaimlerChrysler, Germany AEA Technology, GermanyAEA Technology, Germany BMW AG, GermanyBMW AG, Germany Audi AG, GermanyAudi AG, Germany University of New Mexico, USAUniversity of New Mexico, USA

28 Slide 28 - 03.05.2015 The difference is in the software... Some more Reference Customers Rolls Royce Ltd., UKRolls Royce Ltd., UK Norsk Hydro, NorwayNorsk Hydro, Norway NGU, NorwayNGU, Norway University of Santa Cruz, USAUniversity of Santa Cruz, USA Jodrell Bank Observatory, UKJodrell Bank Observatory, UK NTT, JapanNTT, Japan CEA, FranceCEA, France Ford/Visteon, GermanyFord/Visteon, Germany ABB AG, GermanyABB AG, Germany National Technical University of Athens, GreeceNational Technical University of Athens, Greece Medasys Digital Systems, FranceMedasys Digital Systems, France PDG Linagora S.A., FrancePDG Linagora S.A., France Workstations UK, Ltd., EnglandWorkstations UK, Ltd., England Bull S.A., FranceBull S.A., France The Norwegian Meteorological Institute, NorwayThe Norwegian Meteorological Institute, Norway Nanco Data AB, SwedenNanco Data AB, Sweden Aspen Systems Inc., USAAspen Systems Inc., USA Atipa Linux Solution Inc., USAAtipa Linux Solution Inc., USA California Institute of Technology, USACalifornia Institute of Technology, USA Compaq Computer Corporation Inc., USACompaq Computer Corporation Inc., USA Fermilab, USAFermilab, USA Ford Motor Company Inc., USAFord Motor Company Inc., USA General Dynamics Inc., USAGeneral Dynamics Inc., USA Intel Corporation Inc., USAIntel Corporation Inc., USA IOWA State University, USAIOWA State University, USA Los Alamos National Laboratory, USALos Alamos National Laboratory, USA Penguin Computing Inc., USAPenguin Computing Inc., USA Times N Systems Inc., USATimes N Systems Inc., USA University of Alberta, CanadaUniversity of Alberta, Canada Manash University, AustraliaManash University, Australia University of Southern Mississippi, AustraliaUniversity of Southern Mississippi, Australia Jacusiel Acuna Ltda., ChileJacusiel Acuna Ltda., Chile University of Copenhagen, DenmarkUniversity of Copenhagen, Denmark Caton Sistemas Alternativos, SpainCaton Sistemas Alternativos, Spain Mapcon Geografical Inform, SwedenMapcon Geografical Inform, Sweden Fujitsu Software Corporation, USAFujitsu Software Corporation, USA City Team OY, FinlandCity Team OY, Finland Falcon Computers, FinlandFalcon Computers, Finland Link Masters Ltd., HollandLink Masters Ltd., Holland MIT, USAMIT, USA Paralogic Inc., USAParalogic Inc., USA Sandia National Laboratory, USASandia National Laboratory, USA Sicorp Inc., USASicorp Inc., USA University of Delaware, USAUniversity of Delaware, USA Western Scientific Inc., USAWestern Scientific Inc., USA Group of Parallel and Distr. Processing, BrazilGroup of Parallel and Distr. Processing, Brazil

29 Slide 29 - 03.05.2015 The difference is in the software... Application Benchmarks With Dolphin SCI and Scali MPI

30 Slide 30 - 03.05.2015 The difference is in the software... NAS parallel benchmarks (16cpu/8nodes)

31 Slide 31 - 03.05.2015 The difference is in the software... Magma (16cpus/8nodes)

32 Slide 32 - 03.05.2015 The difference is in the software... Eclipse (16cpus/8nodes)

33 Slide 33 - 03.05.2015 The difference is in the software... FEKO: Parallel Speedup

34 Slide 34 - 03.05.2015 The difference is in the software... Acusolve (16cpus/8nodes)

35 Slide 35 - 03.05.2015 The difference is in the software... Visage (16cpus/8nodes)

36 Slide 36 - 03.05.2015 The difference is in the software... CFD scaling mm5: linear to 400 CPUs

37 Slide 37 - 03.05.2015 The difference is in the software... Scaling - Fluent – Linköping cluster

38 Dolphin Software All Dolphin SW is free open source (GPL or LGPL) SISCI SCI-SOCKET  Low Latency Socket Library  TCP and UDP Replacement  User and Kernel level support  Release 2.0 available SCI-MPICH (RWTH Aachen)  MPICH 1.2 and some MPICH 2 features  New release is being prepared, beta available SCI Interconnect Manager  Automatic failover recovery.  No single point of failure in 2D and 3D networks. Other  SCI Reflective Memory, Scali MPI, Linux Labs SCI Cluster Cray-compatible shmem and Clugres PostgreSQL, MandrakeSoft Clustering HPC solution, Xprime X1 Database Performance Cluster for Microsoft SQL Servers, ClusterFrame from Qlusters and SunCluster 3.1 (Oracle 9i), MySQL Cluster

39 Summation Enterprises Pvt. Ltd. Brief Company Profile

40 Our expertise: Clustering for High Performance Technical Computing, Clustering for High Availability, Terabyte Storage solutions, SANs O.S. skills : Linux (Alpha 64bit, x86:32 and 64bit), Solaris (SPARC and x86), Tru64unix, Windows NT/ 2K/ 2003 and the QNX Realtime O.S.

41 Summation milestones Working with Linux since 1996 First in India to deploy/ support 64bit Alpha Linux workstations (1999) First in India to spec, deploy and support a 26 Processor Alpha Linux cluster (2001) Only company in India to have worked with Gigabit, SCI and Myrinet interconnects Involved with the design, setup, support of many of the largest HPTC clusters in India.

42 Exclusive Distributors / System Integrators in India Dolphin Interconnect AS, Norway –SCI interconnect for Supercomputer performance Scali AS, Norway –Cluster management tools Absoft, Inc., USA – FORTRAN Development tools Steeleye Inc., USA –High Availability Clustering and Disaster Recovery Solutions for Windows & Linux –Summation is the sole Distributor, Consulting services & Technical support partner for Steeleye in India

43 Partnering with Industry leaders Sun Microsystems, Inc. –Focus on Education & Research segments –High Performance Technical Computing, Grid Computing Initiative with Sun Grid Engine (SGE/ SGEE) –HPTC Competency Centre

44 Wulfkit / HPTC users Institute of Mathematical Sciences, Chennai –144 node Dual Xeon Wulfkit 3D cluster, –9 node Dual Xeon Wulfkit 2D cluster –9 node Dual Xeon Ethernet cluster –1.4 TB RAID storage Bhaba Atomic Research Centre, Mumbai –64 node Dual Xeon Wulfkit 2D cluster –40 node P4 Wulfkit 3D cluster –Alpha servers / Linux OpenGL workstations / Rackmount servers Harish Chandra Research Institute, Allahabad –Forty Two node Dual Xeon Wulfkit Cluster, –1.1 TB RAID Storage

45 Wulfkit / HPTC users (contd.) Intel Technology India Pvt. Ltd., Bangalore –Eight node Dual Xeon Wulfkit Clusters (ten nos.) NCRA (TIFR), Pune –4 node Wulfkit 2D cluster Bharat Forge Ltd., Pune –Nine node Dual Xeon Wulfkit 2D cluster Indian Rare Earths Ltd., Mumbai –26 Processor Alpha Linux cluster with RAID storage Tata Institute of Fundamental Research, Mumbai –RISC/Unix servers, Four node Xeon cluster Centre for Advanced Technology, Indore –Alpha/ Sun Workstations

46 Questions ? Amal D’Silva GSM: 98202 83309

Download ppt "Dolphin Wulfkit and Scali software The Supercomputer interconnect Summation Enterprises Pvt. Ltd. Preferred System Integrators since 1991. Amal D’Silva."

Similar presentations

Ads by Google