Planned Machines: ASCI Purple, ALC and M&IC MCR Presented to SOS7 Mark Seager 925-423-3141 ICCD ADH for Advanced Technology Lawrence Livermore.

Slides:



Advertisements
Similar presentations
Commodity Computing Clusters - next generation supercomputers? Paweł Pisarczyk, ATM S. A.
Advertisements

♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
Today’s topics Single processors and the Memory Hierarchy
Beowulf Supercomputer System Lee, Jung won CS843.
SOS7 -- Crystal Ball or a random walk through Mike's brain Mike Merrill March 6, 2003.
Storage area Network(SANs) Topics of presentation
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
Silicon Graphics, Inc. Poster Presented by: SGI Proprietary Technologies for Breakthrough Research Rosario Caltabiano North East Higher Education & Research.
July Terry Jones, Integrated Computing & Communications Dept Fast-OS.
NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.
1 Aug 7, 2004 GPU Req GPU Requirements for Large Scale Scientific Applications “Begin with the end in mind…” Dr. Mark Seager Asst DH for Advanced Technology.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
Arquitectura de Sistemas Paralelos e Distribuídos Paulo Marques Dep. Eng. Informática – Universidade de Coimbra Ago/ Machine.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
1 Computer Science, University of Warwick Architecture Classifications A taxonomy of parallel architectures: in 1972, Flynn categorised HPC architectures.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Storage area network and System area network (SAN)
Mass RHIC Computing Facility Razvan Popescu - Brookhaven National Laboratory.
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Reference: / Parallel Programming Paradigm Yeni Herdiyeni Dept of Computer Science, IPB.
High Performance Computing G Burton – ICG – Oct12 – v1.1 1.
Fabric Management for CERN Experiments Past, Present, and Future Tim Smith CERN/IT.
1 - Q Copyright © 2006, Cluster File Systems, Inc. Lustre Networking with OFED Andreas Dilger Principal System Software Engineer
SDSC RP Update TeraGrid Roundtable Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.
Lappeenranta University of Technology / JP CT30A7001 Concurrent and Parallel Computing Introduction to concurrent and parallel computing.
Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
QCD Project Overview Ying Zhang September 26, 2005.
Maximizing The Compute Power With Mellanox InfiniBand Connectivity Gilad Shainer Wolfram Technology Conference 2006.
CLUSTER COMPUTING STIMI K.O. ROLL NO:53 MCA B-5. INTRODUCTION  A computer cluster is a group of tightly coupled computers that work together closely.
HEAnet Centralised NAS Storage Justin Hourigan, Senior Network Engineer, HEAnet Limited.
Computer Science Section National Center for Atmospheric Research Department of Computer Science University of Colorado at Boulder Blue Gene Experience.
Architectural Considerations for Petaflops and beyond Bill Camp Sandia National Lab’s March 4,2003 SOS7 Durango, CO, USA -
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
Presented by Leadership Computing Facility (LCF) Roadmap Buddy Bland Center for Computational Sciences Leadership Computing Facility Project.
1 Oak Ridge Interconnects Workshop - November 1999 ASCI ASCI ASCI Terascale Simulation Requirements and Deployments David A. Nowak ASCI.
Large Scale Parallel File System and Cluster Management ICT, CAS.
SAN DIEGO SUPERCOMPUTER CENTER SDSC's Data Oasis Balanced performance and cost-effective Lustre file systems. Lustre User Group 2013 (LUG13) Rick Wagner.
Issues Autonomic operation (fault tolerance) Minimize interference to applications Hardware support for new operating systems Resource management (global.
Enterprise at a Global Scale Paul Grun Chief Scientist System Fabric Works (503)
CLUSTER COMPUTING TECHNOLOGY BY-1.SACHIN YADAV 2.MADHAV SHINDE SECTION-3.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Capability Computing – High-End Resources Wayne Pfeiffer Deputy Director NPACI & SDSC NPACI.
Brent Gorda LBNL – SOS7 3/5/03 1 Planned Machines: BluePlanet SOS7 March 5, 2003 Brent Gorda Future Technologies Group Lawrence Berkeley.
Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.
PARALLEL PROCESSOR- TAXONOMY. CH18 Parallel Processing {Multi-processor, Multi-computer} Multiple Processor Organizations Symmetric Multiprocessors Cache.
Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.
Terascala – Lustre for the Rest of Us  Delivering high performance, Lustre-based parallel storage appliances  Simplifies deployment, management and tuning.
Computer Science and Engineering Copyright by Hesham El-Rewini Advanced Computer Architecture CSE 8383 April 11, 2006 Session 23.
1 SOS7: “Machines Already Operational” NSF’s Terascale Computing System SOS-7 March 4-6, 2003 Mike Levine, PSC.
The Distributed Data Interface in GAMESS Brett M. Bode, Michael W. Schmidt, Graham D. Fletcher, and Mark S. Gordon Ames Laboratory-USDOE, Iowa State University.
Efficiency of small size tasks calculation in grid clusters using parallel processing.. Olgerts Belmanis Jānis Kūliņš RTU ETF Riga Technical University.
By Chi-Chang Chen.  Cluster computing is a technique of linking two or more computers into a network (usually through a local area network) in order.
Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.
3/12/2013Computer Engg, IIT(BHU)1 PARALLEL COMPUTERS- 2.
LLNL-PRES This work was performed under the auspices of the U.S. Department of Energy by Lawrence Livermore National Laboratory under Contract DE-AC52-07NA27344.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
Tackling I/O Issues 1 David Race 16 March 2010.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.
Architecture of a platform for innovation and research Erik Deumens – University of Florida SC15 – Austin – Nov 17, 2015.
BLUE GENE Sunitha M. Jenarius. What is Blue Gene A massively parallel supercomputer using tens of thousands of embedded PowerPC processors supporting.
Peter Idoine Managing Director Oracle New Zealand Limited.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
ORNL is managed by UT-Battelle for the US Department of Energy OLCF HPSS Performance Then and Now Jason Hill HPC Operations Storage Team Lead
GPFS Parallel File System
INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.
Toward a Unified HPC and Big Data Runtime
BlueGene/L Supercomputer
Presentation transcript:

Planned Machines: ASCI Purple, ALC and M&IC MCR Presented to SOS7 Mark Seager ICCD ADH for Advanced Technology Lawrence Livermore National Laboratory This work was performed under the auspices of the U.S. Department of Energy by the University of California, Lawrence Livermore National Laboratory under Contract No. W-7405-Eng-48.

Q1: What is unique in structure and function of your machine?  Purple’s unique structure is fat SMPs with 16 rails of Federation interconnect  MCR+ALC’s unique structure is the shared global file system  However, most important point is that applications are highly mobile between Purple, MCR+ALC, White, Q and other clusters of SMP systems…

Purple’s unique structure is fat SMPs with 16 rails of interconnect Purple System 100 TF/s TF/s delivered on sPPM+UMT TB memory, 2.0 PB of 108 GB/s delivered 197 x 64-way Armada SMP w 16 Federation Links 4 Login/network nodes Login/network nodes for login/NFS 8x10 Gb/s for parallel FTP on each Login All external networking is 1-10 Gb/s Ethernet Clustered I/O services for cluster wide file system Fibre Channel2 I/O attach does not extend Programming/Usage Model Application launch over all compute nodes up to 8,192 tasks 1 MPI task/CPU and Shared Memory, full 64b support Scalable MPI (MPI_allreduce, buffer space) Likely usage multiple MPI tasks/node with 4-16 OpenMP/MPI task Single STDIO interface Parallel I/O to single file, multiple serial I/O (1 file/MPI task) 191 Parallel Batch/Interactive/Visualization Nodes System Data and Control Networks … I/O … NFS Login Net NFS Login Net NFS Login Net NFS Login Net 16 Federation links per SMP in four switch planes I/O Fibre Channel 2 I/O Network

Unique feature of ALC+MCR is Lustre Lite shared file system † † Cluster wide file system leverages DOE/NNSA ASCI PathForward Open Source Lustre development OST QsNet Elan3, 100BaseT Control 1,116 P4 Compute Nodes 2 Login nodes with 4 Gb-Enet 2 Service Aggregated OST for Single Lustre file system GW 2 MetaData (fail-over) Servers 32 Gateway 140 MB/s delivered Lustre I/O over 2x1GbE GW 1,152 Port (10x96D32U+4x96D32U) QsNet Elan3 MDS 2 Service QsNet Elan3, 100BaseT Control 924 P4 Compute Nodes 960 Port (10x96D32U+4x80D48U) QsNet Elan3 GW MDS GbEnet Federated Switch OST GbEnet Federated Switch

Q2: What characterizes your applications? Examples are: Intensities of message passing, memory utilization, computing, IO, and data.  Applications characterized as multi-physics package simulations  All applications compute/comms intensive  Each package pushes performance envelope along a different dimension –Some packages are MPI latency dominated –Some packages are MPI BW dominated –Memory BW is critical factor, but expensive memory subsystems don’t perform much better than commodity ones…

Q3: What prior experience guided you to this choice?  Mission and Applications  Budgets  Politics  Delivered performance  Balanced risk and cost performance

Strategic Approach: straddle multiple curves to balance risk and opportunity of new disruptive technologies  Three complementary curves… 1.Delivers to today’s stockpile’s demanding needs  Production environment  For “must have” deliverables now 2.Delivers transition for next generation  “Near production” environment  Provides cycles for science  Provides cycles for stockpile  Leading to next generation production systems  These are the capacity systems in a strategic capacity/capability mix 3.Delivers affordable path to petaFLOP/s  Research environment, leading transition to petaflop systems?  Are there other paths to a breakthrough regime by ? Performance Time Mainframes (RIP) Vendor integrated SMP Cluster (IBM SP, HP SC) IA32/ IA64/AMD + Linux Cell-Based (IBM BG/L) Today FY05 Straddle strategy for stability and preeminence $2M/TF (Purple C) $1.2M/TF (MCR) $170K/TF $10 M/TF (White) $7M/TF (Q) $ 500K /TF Any given technology curve is ultimately limited by Moore’s Law

Q4. Other than your own machine, for your needs what are the best and worst machines? And, why?  Clusters of SMPs with full node OS makes system administration and programming much easier, but scalability is an issue  Vectors suck –10x potential speed-up from vectorization on Cray YMP class machines yielded only 1.5-2x in delivered performance boost to stockpile codes