1 SOS7: “Machines Already Operational” NSF’s Terascale Computing System SOS-7 March 4-6, 2003 Mike Levine, PSC.

Slides:



Advertisements
Similar presentations
Microsoft Keyboard. Cluster and Grid Computing Pittsburgh Supercomputing Center John Kochmar J. Ray Scott (Derek Simmel) (Jason Sommerfield)
Advertisements

O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Center for Computational Sciences Cray X1 and Black Widow at ORNL Center for Computational.
© 2007 IBM Corporation IBM Global Engineering Solutions IBM Blue Gene/P Blue Gene/P System Overview - Hardware.
Report of Liverpool HEP Computing during 2007 Executive Summary. Substantial and significant improvements in the local computing facilities during the.
SACNAS, Sept 29-Oct 1, 2005, Denver, CO What is Cyberinfrastructure? The Computer Science Perspective Dr. Chaitan Baru Project Director, The Geosciences.
IBM 1350 Cluster Expansion Doug Johnson Senior Systems Developer.
♦ Commodity processor with commodity inter- processor connection Clusters Pentium, Itanium, Opteron, Alpha GigE, Infiniband, Myrinet, Quadrics, SCI NEC.
2. Computer Clusters for Scalable Parallel Computing
Planned Machines: ASCI Purple, ALC and M&IC MCR Presented to SOS7 Mark Seager ICCD ADH for Advanced Technology Lawrence Livermore.
Linux Clustering A way to supercomputing. What is Cluster? A group of individual computers bundled together using hardware and software in order to make.
ASKAP Central Processor: Design and Implementation Calibration and Imaging Workshop 2014 ASTRONOMY AND SPACE SCIENCE Ben Humphreys | ASKAP Software and.
IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)
HELICS Petteri Johansson & Ilkka Uuhiniemi. HELICS COW –AMD Athlon MP 1.4Ghz –512 (2 in same computing node) –35 at top500.org –Linpack Benchmark 825.
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
ACAT 2002, Moscow June 24-28thJ. Hernández. DESY-Zeuthen1 Offline Mass Data Processing using Online Computing Resources at HERA-B José Hernández DESY-Zeuthen.
5 Nov 2001CGW'01 CrossGrid Testbed Node at ACC CYFRONET AGH Andrzej Ozieblo, Krzysztof Gawel, Marek Pogoda 5 Nov 2001.
CSC Site Update HP Nordic TIG April 2008 Janne Ignatius Marko Myllynen Dan Still.
Hitachi SR8000 Supercomputer LAPPEENRANTA UNIVERSITY OF TECHNOLOGY Department of Information Technology Introduction to Parallel Computing Group.
Workshop on Parallel Visualization and Graphics Chromium Mike Houston, Stanford University and The Chromium Community.
IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.
NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
Real Parallel Computers. Modular data centers Background Information Recent trends in the marketplace of high performance computing Strohmaier, Dongarra,
Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.
1 In Summary Need more computing power Improve the operating speed of processors & other components constrained by the speed of light, thermodynamic laws,
Operational computing environment at EARS Jure Jerman Meteorological Office Environmental Agency of Slovenia (EARS)
Online Systems Status Review of requirements System configuration Current acquisitions Next steps... Upgrade Meeting 4-Sep-1997 Stu Fuess.
University of Southampton Clusters: Changing the Face of Campus Computing Kenji Takeda School of Engineering Sciences Ian Hardy Oz Parchment Southampton.
Principles of Scalable HPC System Design March 6, 2012 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
Planning and Designing Server Virtualisation.
David Hutchcroft on behalf of John Bland Rob Fay Steve Jones And Mike Houlden [ret.] * /.\ /..‘\ /'.‘\ /.''.'\ /.'.'.\ /'.''.'.\ ^^^[_]^^^ * /.\ /..‘\
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
23 Oct 2002HEPiX FNALJohn Gordon CLRC-RAL Site Report John Gordon CLRC eScience Centre.
Rensselaer Why not change the world? Rensselaer Why not change the world? 1.
Introduction to U.S. ATLAS Facilities Rich Baker Brookhaven National Lab.
HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve.
Crystal Ball Panel ORNL Heterogeneous Distributed Computing Research Al Geist ORNL March 6, 2003 SOS 7.
HYDRA: Using Windows Desktop Systems in Distributed Parallel Computing Arvind Gopu, Douglas Grover, David Hart, Richard Repasky, Joseph Rinkovsky, Steve.
Presented by Leadership Computing Facility (LCF) Roadmap Buddy Bland Center for Computational Sciences Leadership Computing Facility Project.
National Computational Science National Center for Supercomputing Applications National Computational Science NCSA Terascale Clusters Dan Reed Director,
10/22/2002Bernd Panzer-Steindel, CERN/IT1 Data Challenges and Fabric Architecture.
Large Scale Parallel File System and Cluster Management ICT, CAS.
Trip Report SC’04 Pittsburgh Nov 6-12 Fons Rademakers.
O AK R IDGE N ATIONAL L ABORATORY U.S. D EPARTMENT OF E NERGY Facilities and How They Are Used ORNL/Probe Randy Burris Dan Million – facility administrator.
GVis: Grid-enabled Interactive Visualization State Key Laboratory. of CAD&CG Zhejiang University, Hangzhou
Rob Allan Daresbury Laboratory NW-GRID Training Event 25 th January 2007 Introduction to NW-GRID R.J. Allan CCLRC Daresbury Laboratory.
Test Results of the EuroStore Mass Storage System Ingo Augustin CERNIT-PDP/DM Padova.
Copyright  2005 SRC Computers, Inc. ALL RIGHTS RESERVED Overview.
Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.
1 THE EARTH SIMULATOR SYSTEM By: Shinichi HABATA, Mitsuo YOKOKAWA, Shigemune KITAWAKI Presented by: Anisha Thonour.
PC clusters in KEK A.Manabe KEK(Japan). 22 May '01LSCC WS '012 PC clusters in KEK s Belle (in KEKB) PC clusters s Neutron Shielding Simulation cluster.
COMP381 by M. Hamdi 1 Clusters: Networks of WS/PC.
O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY The Center for Computational Sciences 1 State of the CCS SOS 8 April 13, 2004 James B. White.
Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
SYSTEM MODELS FOR ADVANCED COMPUTING Jhashuva. U 1 Asst. Prof CSE
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
Compute and Storage For the Farm at Jlab
Getting the Most out of Scientific Computing Resources
Getting the Most out of Scientific Computing Resources
Berkeley Cluster Projects
PC Farms & Central Data Recording
Appro Xtreme-X Supercomputers
Jeffrey P. Gardner Pittsburgh Supercomputing Center
University of Technology
TYPES OFF OPERATING SYSTEM
BlueGene/L Supercomputer
CS 345A Data Mining MapReduce This presentation has been altered.
Presentation transcript:

1 SOS7: “Machines Already Operational” NSF’s Terascale Computing System SOS-7 March 4-6, 2003 Mike Levine, PSC

2 Outline n Overview of TCS, the US-NSF’s Terascale Computing System. n Answering 3 questions: w Is your machine living up to performance expectations? … w What is the MTBI? … w What is the primary complaint, if any, from users? n [See also PSC web pages & Rolf’s info.]

3 Q1: Performance n Computational and communications performance is very good! w Alpha processors & ES45 servers: very good w Quadrics bw & latency: very good. w ~74% of peak on Linpack; >76% on LSMS n More work on disk IO. n This has been a very ease “port” for most users. w Easier than some Cray  Cray upgrades.

4 Q2: MTBI (Monthly Average) Compare with theoretical prediction of 12 hrs. Expect further improvement (fixing systematic problems).

5 Time Lost to Unscheduled Events Purple: nodes requiring cleanup Worst case is ~3%

6 Q3: Complaints n #1: “I need more time” (not a complaint about performance) w Actual usage >80% of wall clock w Some structural improvements still in progress. w Not a whole lot more is possible! n Work needed on w Rogue OS activity. [recall Prof. Kale’s comment] w MPI & global reduction libraries. [ditto] w System debugging and fragility. w IO performance. wWe have delayed full disk deployment to avoid data corruption & instabilities. w Node cleanup wWe detect & hold out problem nodes until staff clean. n All in all, the users have been VERY pleased. [ditto]

7 Full Machine Job n This system is capable of doing big science

8 TCS (Terascale Computing System) & ETF n Sponsored by the U.S. National Science Foundation n Serving the “very high end” for US academic computational science and engineering w Designed to be used, as a whole, on single problems. (recall full machine job) w Full range of scientific and engineering applications. w Compaq AlphaServer SC hardware and software technology w In general production since April, 2002 n #6 in Top 500; (largest open facility in the world: Nov 2001) n TCS-1: in general production since April, 2002 n Integrated into the PACI program (Partnerships for Academic Computing Infrastructure) w DTF project to build and integrate multiple systems –NCSA, SDSC, Caltech, Argonne. Multi-lamba, transcontinental interconnect w ETF aka Teratrid (Extensible Terascale Facility) integrating TCS with DTF forming –A heterogeneous, extensible scientific/engineering cyberinfrastructure Grid

9 Infrastructure: PSC - TCS machine room Westinghouse) (Not require a new building; just a pipe & wire upgrade; not maxed out) n ~8k ft 2 n Use ~2.5k n Existing room. n (16 yrs old.)

10 Floor Layout n Geometrical constraints invariant twixt US & Japan Full System: Physical Structure

11 Compute Nodes Terascale Computing System Compute Nodes 750 ES45 4-CPU servers 750 ES45 4-CPU servers +13 inline spares +13 inline spares (+2 login nodes) (+2 login nodes) 4 - EV68’s /node 4 - EV68’s /node 1 GHz = 2.Gf [6 Tf] 1 GHz = 2.Gf [6 Tf] 4 GB memory [3.0 TB] 4 GB memory [3.0 TB] 3*18.2 GB disk [41 TB] 3*18.2 GB disk [41 TB] System System User temporary User temporary Fast snapshots Fast snapshots [~90 GB/s] [~90 GB/s] Tru64 Unix Tru64 Unix

12 n ES45 nodes ä 5 nodes per cabinet ä 3 local disks /node

13 Quadrics Compute Nodes Terascale Computing System Quadrics Network 2 “rails” 2 “rails” Higher bandwidth Higher bandwidth (~250 MB/s/rail) (~250 MB/s/rail) Lower latency Lower latency 2.5  s put latency 2.5  s put latency 1 NIC/node/rail 1 NIC/node/rail Federated switch (/rail) Federated switch (/rail) “Fat-tree” (bbw ~0.2 TB/s) “Fat-tree” (bbw ~0.2 TB/s) User virtual memory mapped User virtual memory mapped Hardware retry Hardware retry Heterogeneous Heterogeneous (Alpha Tru64 & Linux, Intel Linux) (Alpha Tru64 & Linux, Intel Linux)

14 Central Switch Assembly n 20 cabinets in center n Minimize max internode distance n 3 out of 4 rows shown n 21 st LL switch, outside (not shown)

15 Quadrics wiring overhead (view towards ceiling)

16 Quadrics Control LAN Compute Nodes Terascale Computing System Management & Control Quadrics switch control: Quadrics switch control: Internal SBC & Ethernet Internal SBC & Ethernet “Insight Manager” on PC’s “Insight Manager” on PC’s Dedicated systems Dedicated systems Cluster/node monitoring & control Cluster/node monitoring & control RMS database RMS database Ethernet & Ethernet & Serial Link Serial Link

17 Quadrics Control LAN Compute Nodes WAN/LAN Terascale Computing System Interactive Nodes Dedicated: 2*ES45 Dedicated: 2*ES45 +8 on compute nodes +8 on compute nodes Shared function nodes Shared function nodes User access User access Gigabit Ethernet to WAN Gigabit Ethernet to WAN Quadrics connected Quadrics connected /usr & indexed store (ISMS) /usr & indexed store (ISMS) Interactive /usr

18 Quadrics Control LAN Compute Nodes File Servers /tmp WAN/LAN Interactive /usr Terascale Computing System File Servers 64, on compute nodes 64, on compute nodes 0.47 TB/server [30 TB] 0.47 TB/server [30 TB] ~500 MB/s [~32 GB/s] ~500 MB/s [~32 GB/s] Temporary user storage Temporary user storage Direct IO Direct IO /tmp /tmp [Each server has [Each server has 24 disks on 24 disks on 8 SCSI chains on 8 SCSI chains on 4 controllers 4 controllers sustain full drive bw.] sustain full drive bw.]

19 Terascale Computing System Summary ES45 Compute Nodes ES45 Compute Nodes 3000 EV68 1 GHz 3000 EV68 1 GHz 6 Tf 6 Tf 3. TB memory 3. TB memory 41 TB node disk, ~90GB/s 41 TB node disk, ~90GB/s Multi-rail fat-tree network Multi-rail fat-tree network Redundant monitor/ctrl Redundant monitor/ctrl WAN/LAN accessible WAN/LAN accessible File servers: 30TB, ~32 GB/s File servers: 30TB, ~32 GB/s Buffer disk store, ~150 TB Buffer disk store, ~150 TB Parallel visualization Parallel visualization Mass store, ~1 TB/hr, > 1 PB Mass store, ~1 TB/hr, > 1 PB ETF coupled (hetero) ETF coupled (hetero) Quadrics Control LAN Compute Nodes File Servers /tmp WAN/LAN Interactive /usr

20 Quadrics Terascale Computing System TCS Application Gateways VizBuffer Disk 340 GB/s (1520Q) 4.5 GB/s (20Q) 3.6 GB/s (16Q) Visualization Intel/Linux Intel/Linux Newest software Newest software ~16 nodes ~16 nodes Parallel rendering Parallel rendering HW/SW compositing HW/SW compositing Quadrics connectedQuadrics connected Image output Image output  Web pages +  Web pages + WAN coupled

21 Buffer Disk & HSM n Quadrics coupled (~225 MB/s/link) n Intermediate between TCS & HSM n Independently managed. n Private transport from TCS. Quadrics Terascale Computing System TCS Application Gateways VizBuffer Disk 340 GB/s (1520Q) 4.5 GB/s (20Q) 3.6 GB/s (16Q) HSM - LSCi >360 MB/s to tape Archive disk WAN/LAN & SDSC

22 Application Gateways n Quadrics coupled (~225 MB/s/link) Coupled to ETF backbone by GigE 30 Gb/s Quadrics Terascale Computing System TCS Application Gateways VizBuffer Disk 340 GB/s (1520Q) 4.5 GB/s (20Q) 3.6 GB/s (16Q) Multi GigE to ETF 30 Gb/s

23 The Front Row n Yes, those are Pittsburgh sports’ colors.