Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne.

Slides:



Advertisements
Similar presentations
PARMON A Comprehensive Cluster Monitoring System PARMON Team Centre for Development of Advanced Computing, Bangalore, India Contact: Rajkumar Buyya
Advertisements

System Area Network Abhiram Shandilya 12/06/01. Overview Introduction to System Area Networks SAN Design and Examples SAN Applications.
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
Protocols and software for exploiting Myrinet clusters Congduc Pham and the main contributors P. Geoffray, L. Prylli, B. Tourancheau, R. Westrelin.
SGI’2000Parallel Programming Tutorial Supercomputers 2 With the acknowledgement of Igor Zacharov and Wolfgang Mertz SGI European Headquarters.
Today’s topics Single processors and the Memory Hierarchy
Beowulf Supercomputer System Lee, Jung won CS843.
Building Beowulfs for High Performance Computing Duncan Grove Department of Computer Science University of Adelaide
Types of Parallel Computers
CSCI-455/522 Introduction to High Performance Computing Lecture 2.
AASPI Software Computational Environment Tim Kwiatkowski Welcome Consortium Members November 18, 2008.
SHARCNET. Multicomputer Systems r A multicomputer system comprises of a number of independent machines linked by an interconnection network. r Each computer.
Networks of Workstations Prabhaker Mateti Wright State University.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
CS 213 Commercial Multiprocessors. Origin2000 System – Shared Memory Directory state in same or separate DRAMs, accessed in parallel Upto 512 nodes (1024.
NPACI Panel on Clusters David E. Culler Computer Science Division University of California, Berkeley
HELICS Petteri Johansson & Ilkka Uuhiniemi. HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825.
1 BGL Photo (system) BlueGene/L IBM Journal of Research and Development, Vol. 49, No. 2-3.
Server Platforms Week 11- Lecture 1. Server Market $ 46,100,000,000 ($ 46.1 Billion) Gartner.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
Earth Simulator Jari Halla-aho Pekka Keränen. Architecture MIMD type distributed memory 640 Nodes, 8 vector processors each. 16GB shared memory per node.
Dolphin software SCI Software Replace in Title/Slide Master with Company Logo or delete Hugo Kohmann Dolphin Interconnect Solutions.
Parallel Processing Architectures Laxmi Narayan Bhuyan
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
Vincent Keller, Ralf Gruber, EPFL Intelligent GRID Scheduling Service (ISS) K. Cristiano, A. Drotz, R.Gruber, V. Keller, P. Kunszt, P. Kuonen, S. Maffioletti,
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
CPP Staff - 30 CPP Staff - 30 FCIPT Staff - 35 IPR Staff IPR Staff ITER-India Staff ITER-India Staff Research Areas: 1.Studies.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
1 Lecture 7: Part 2: Message Passing Multicomputers (Distributed Memory Machines)
1 In Summary Need more computing power Improve the operating speed of processors & other components constrained by the speed of light, thermodynamic laws,
Better answers Compaq HPTC Solutions Bruce Foster, Ph.D., MBA
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.
CLUSTER COMPUTING STIMI K.O. ROLL NO:53 MCA B-5. INTRODUCTION  A computer cluster is a group of tightly coupled computers that work together closely.
1b.1 Types of Parallel Computers Two principal approaches: Shared memory multiprocessor Distributed memory multicomputer ITCS 4/5145 Parallel Programming,
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
SOS71 Is a Grid cost-effective? Ralf Gruber, EPFL-SIC/FSTI-ISE-LIN, Lausanne.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Cluster Workstations. Recently the distinction between parallel and distributed computers has become blurred with the advent of the network of workstations.
1 Cactus in a nutshell... n Cactus facilitates parallel code design, it enables platform independent computations and encourages collaborative code development.
MIMD Distributed Memory Architectures message-passing multicomputers.
1 Computer System Organization I/O systemProcessor Compiler Operating System (Windows 98) Application (Netscape) Digital Design Circuit Design Instruction.
Copyright © 2002, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners
Loosely Coupled Parallelism: Clusters. Context We have studied older archictures for loosely coupled parallelism, such as mesh’s, hypercubes etc, which.
Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.
Example: Sorting on Distributed Computing Environment Apr 20,
Case Study in Computational Science & Engineering - Lecture 2 1 Parallel Architecture Models Shared Memory –Dual/Quad Pentium, Cray T90, IBM Power3 Node.
Parallel Programming on the SGI Origin2000 With thanks to Igor Zacharov / Benoit Marchand, SGI Taub Computer Center Technion Moshe Goldberg,
Scalable Systems Lab / The University of New Mexico© Summer 2000 by Adrian Riedo- Slide 1 - by Adrian Riedo - Summer 2000 High Performance Computing using.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Probe Plans and Status SciDAC Kickoff July, 2001 Dan Million Randy Burris ORNL, Center for.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
Computing Environment The computing environment rapidly evolving ‑ you need to know not only the methods, but also How and when to apply them, Which computers.
1 Qualifying ExamWei Chen Unified Parallel C (UPC) and the Berkeley UPC Compiler Wei Chen the Berkeley UPC Group 3/11/07.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
SDM Center High-Performance Parallel I/O Libraries (PI) Alok Choudhary, (Co-I) Wei-Keng Liao Northwestern University In Collaboration with the SEA Group.
Cluster Computers. Introduction Cluster computing –Standard PCs or workstations connected by a fast network –Good price/performance ratio –Exploit existing.
Background Computer System Architectures Computer System Software.
Constructing a system with multiple computers or processors 1 ITCS 4/5145 Parallel Programming, UNC-Charlotte, B. Wilkinson. Jan 13, 2016.
Network Connected Multiprocessors
Parallel Computers Definition: “A parallel computer is a collection of processing elements that cooperate and communicate to solve large problems fast.”
Constructing a system with multiple computers or processors
BlueGene/L Supercomputer
Parallel Processing Architectures
Constructing a system with multiple computers or processors
Constructing a system with multiple computers or processors
SiCortex Update IDC HPC User Forum
Constructing a system with multiple computers or processors
Types of Parallel Computers
Cluster Computers.
Presentation transcript:

Swiss-T1 : A Commodity MPI computing solution Mars 1999 Ralf Gruber, EPFL-SIC/CAPA/Swiss-Tx, Lausanne

Swiss-T1 : A Commodity MPI computing solution March 2000 Content: 1.Distributed Commodity HPC 2.Characterisation of machines and applications 3.Swiss-Tx project

July 1998 Past : SUPERCOMPUTER Cray Research Convex Connection Machines KSR Intel Paragon Japanese companies Teracomputers Taken over by SGI Taken over by HP Disappeared Stopped supercomputing Still existing (not main) Develop since 6 years Produced own processors Developped own memory switches Needed special memories Developped own operating system Developped own compiler Special I/O : HW and SW Own communication system ManufacturesWhat happenedWhy it happened

Processor performance evolution July 1998

SMP/NUMA DIGITAL SUN IBM HP SGI ….. Wildfire Starfire SP-2 Exemplar Origin 2000 ….. Off the shelf processors Off the shelf memory switches Off the shelf memories Special parts of operating system Special compiler extensions Special I/O and SW Own communication system ManufacturerParallel serverPresent situation What is the trend ?

March 2000 Commodity Computing (MPI/PCI) PC clusters/Linux: Fast Ethernet: Beowulf SOS cooperation (Alpha): Myrinet/DS10: C-Plant (SNL) T-Net/DS20: Swiss-T1 (EPFL) Customised commodity: Quadrics/ES40: Compaq/Sierra Off the shelf processors Off the shelf memory switches Off the shelf memories Off the shelf local I/O HW and SW Off the shelf operating systems Off the shelf compilers New communication system New distributed file/IO system

March th SOS workshop on Distributed Commodity HPC Participants: SNL, ORNL, Swiss-Tx, LLNL, LANL, ANL, NASA, LBL, PSC, DOE, UNM, Syracuse, Compaq, IBM, Cray, Sun, SME’s Content: Vision, Clusters, Interconnects, Integration, OS, I/O, Applications, Usability, Crystal ball

March 2000 Distributed commodity HPC User’s Group Goals: Characterise the machines Characterise the applications Match machines to applications

Characterise processors, machines, and applications Performance Processors: V mac V mac = peak proc. performance/peak memory BW Parallel machines:  mac  mac = effective proc. perf./effective network perf. Applications:  app  app = operation count/words to be sent

15 juin 1998 In a box: V mac values V mac = R  [Mflop/s] / M  [Mword/s] Table: V mac values for Alpha and boxes and NEC SX-4 Machine N R  M  V mac Alpha server DS DS NEC SX

Between boxes:  mac value  mac = N * R [Mflop/s] * / C [Mword/s] Table:  mac of different machines Machine Type Nproc Peak Eff perf Eff bw  mac Gravitor Beowulf * Swiss-T1 T-Net Swiss-T1 FE Baby T1 C+PCI Origin2K NUMA/MPI NEC SX4 vector Effective performance measured with MATMULT, * estimated. Effective bandwidth measured with point to point

The  app value  app = Operations/Communicated words Material sciences (3D Fourier analysis):  app ~ 50 Beowulf insufficient, Swiss-T1 just about right Crash analysis (3D non-linear FE):  app > 1000 Beowulf sufficient, latency?

The  app value for Finite Elements  app = Operations/Communicated words FE: Ops  Nb of volume nodes Ops  Nb of variables per node square Ops  Nb of non-zero matrix elements Ops  Nb of operations per matrix element FE: Comm  Nb of surface nodes Comm  Nb of variables per node FE:  app  Nb of nodes in one direction  app  Nb of variables per node  app  Nb of non-zero matrix elements  app  Nb of operations per matrix element  app  Nb of surfaces

The  app value Statistics for 3D brick problem (Finite elements) Nb ofNb ofNb MflopMflopkBkB  app SubdNodesinterface/cycle/data /cycle/cycle Nodes/proctransfer/proc  Table: Current day case, 4096 elements

March 2000 Fat-tree/Crossbars 16x16 N=8, P=8, N*P=64 PUs, X=12, BiW=32, L=64

March 2000 Circulant graphs/Crossbars 12x12 K=2 (1/3) N=8, P=8, X=8 BiW=8, L=16 K=3 (1/3/5) N=11, P=6, X=11 BiW=18, L=33 K=4 (1/3/5/7) N=16, P=4, X=16 BiW=32, L=64

March 2000 Fat-tree/Circulant graphs

The Swiss-Tx machines September 1998 Swiss-T0 Machine Swiss-T0 * (Dual) Baby T1* Swiss-T1 Installation Date Place EPFL EPFL 8.99 EPFL 4.00 DGM 1.00 EPFL #P Peak Gflop/s Memory GBytes Disk GBytes Archive TBytes 1** - - Operating system Digital Unix Windows NT Digital Unix Tru64 Unix Connection EasyNet bus FE bus system Crossbar 12x12 FE switch EasyNet bus FE switch Crossbar 12x12 FE switch ? Not decided Crossbar 12x12 FE switch Swiss-T2 * Baby T1 is an upgrade of T0(Dual)** Archive ported from T0 to T1

March 2000 Swiss-T1

Components 32 computational DS20E 2 frontend DS20E 1 development DS20E 300 GB RAID disks 600 GB distributed disks 1 TB DLT archive Fast/Gigabit Ethernet Tru64/TruCluster Unix LSF, GRD/Codine Totalview, Paradyn MPICH/PVM T-Net network technology ( 8+1)12x12 crossbar 100MB/s 32 bit PCI adapter 75 MB/s (64 bit PCI adapter 180 MB/s) Flexible, non-blocking Reliable Optimal routing FCI 5  s MPI 18  s Monitoring system Remote control Up to 3 Tflop/s (  < 100)

March 2000 Swiss-T1 Architecture

March 2000 Swiss-T1 Routing table

Swiss-T1: Software in a Box March 2000 *Digital UnixCompaqOperating system in each box *F77/F90CompaqFortran compilers *HPFCompaqHigh performance Fortran *C/C++CompaqC and C++ compilers *DXMLCompaqDigital math library in each box *MPICompaqSMP message passing interface *Posix threadsCompaqThreading in a box *OpenMPCompaqMultiprocessor usage in a box through directives *KAP-FKAITo parallelise a Fortran code in a multiprocessor box *KAP-CKAITo parallelise a C program in a multiprocessor box

Swiss-T1: Software between Boxes March 2000 *LSFPlatform Inc.Load Sharing Facility for resource management *TotalviewDolphinParallel debugger *ParadynMadison/CSCS Profiler to help parallelising programs *MPI-1/FCISCS AGMessage passing interface between boxes running over TNET *MPICHArgonneMessage passing interface running over Fast Ethernet **PVMUTKParallel virtual machine running over Fast Ethernet *BLACSUTKBasic linear algebra subroutines *ScaLAPACKUTKLinear algebra matrix solvers MPI I/OSCS/LSPMessage passing interface for I/O MONITOREPFLMonitoring of system parameters NAGNAGMath library package EnsightEnsight4D visualisation MEMCOMSMR SAData management system for distributed architectures ShmemEPFLInterface Cray to Swiss-Tx

March 2000 Baby T1 Architecture

Swiss-T1 : Alternative network March 2000

Swiss-T2 : K-Ring architecture

Create SwissTx Company Commercialise T-Net Commercialise dedicated machines Transfer knowhow in parallel application technology

Between boxes:  mac value * measured (SAXPY and Parkbench)** expected  mac = N * R [Mflop/s] * / C [Mword/s] Table : The  mac values for Swiss-T0, Swiss-T0(Dual) and Swiss-T1 for MATMUL Machine N R  % N * R C  mac T0 (Bus) * 400 * 4 * 1100 T0(Dual) (Bus) 8* * 1000 * 4 * 1250 Baby T1 (Switch) 6* * 2400 * 90* 1 27 T1(local) (Switch) 4* * 1600 * 60 ** 1 27 T1(global)(Switch) 32* * * 400** T1 (Fast Ethernet) 32* * 12800* 80** 1160

Time Schedule March st phase2nd phase Swiss-T2 504 processors OS not defined Baby T1 12 processors Digital Unix Swiss-T0(Dual) 16 processors Windows NT Swiss-T0(Dual) 16 processors Digital Unix Swiss-T1 68 processors Digital Unix EasyNet bus based prototypesT-Net switch based prototype/production machines

March 2000 Phase I: Machines installed Swiss-T0: 23 December 97 (accepted 25 May 98) Swiss-T0(Dual): 29 September 98 (accepted 11 Dec. 98 / NT) Swiss-T0(Dual): 29 September 98 (accepted 22 Jan. 99 / Unix) Swiss-T1 Baby: 19 August 99 (accepted 18 Oct. 99 / Unix) Swiss-T1: 21 Jan. 2000

Swiss-T1 Node Architecture Mars 1999

March nd Phase Swiss-Tx: The 8 WPs Managing Board: Michel Deville Technical Team: Ralf Gruber Management: Jean-Michel Lafourcade WP1: Hardware developmentRoland Paul, SCS WP2: Communication software developmentMartin Frey, SCS WP3: System and user environmentMichel Jaunin, SIC-EPFL WP4: Data management issuesRoger Hersch, DI-EPFL WP5: ApplicationsRalf Gruber, CAPA/SIC-EPFL WP6: Swiss-Tx conceptPierre Kuonen, DI-EPFL WP7: ManagementJean-Michel Lafourcade, CAPA/DGM-EPFL WP8: SwissTx Spin-off CompanyJean-Michel Lafourcade, CAPA/DGM-EPFL

;March nd Phase Swiss-Tx: The MUSTs WP1: PCI adapter page table/ 64 bit PCI adapter WP2: Dual processor FCI / Network monitoring / Shmem WP3: Management / Automatic SI / Monitoring / PE / Libraries WP4: MPI-I/O / Distributed file management WP5: Applications WP6: Swiss-Tx architecture / Autoparallelisation WP7: Management WP8: SwissTx Spin-off Company