Presentation is loading. Please wait.

Presentation is loading. Please wait.

National Computational Science National Center for Supercomputing Applications National Computational Science NCSA Terascale Clusters Dan Reed Director,

Similar presentations


Presentation on theme: "National Computational Science National Center for Supercomputing Applications National Computational Science NCSA Terascale Clusters Dan Reed Director,"— Presentation transcript:

1 National Computational Science National Center for Supercomputing Applications National Computational Science NCSA Terascale Clusters Dan Reed Director, NCSA and the Alliance Chief Architect, NSF ETF TeraGrid Principal Investigator, NSF NEESgrid William and Jane Marr Gutgsell Professor University of Illinois reed@ncsa.uiuc.edu

2 National Center for Supercomputing ApplicationsNational Computational Science A Blast From the Past … Everybody who has analyzed the logical theory of computers has come to the conclusion that the possibilities of computers are very interesting – if they could be made to be more complicated by several orders of magnitude. December 29, 1959 Richard Feynman Feynman would be proud! 

3 National Center for Supercomputing ApplicationsNational Computational Science NCSA Terascale Linux Clusters 1 TF IA-32 Pentium III cluster (Platinum) –512 1 GHz dual processor nodes –Myrinet 2000 interconnect –5 TB of RAID storage –594 GF (Linpack), production July 2001 1 TF IA-64 Itanium cluster (Titan) –164 800 MHz dual processor nodes –Myrinet 2000 interconnect –678 GF (Linpack), production March 2002 Large-scale calculations on both –molecular dynamics (Schulten) –first nanosecond/day calculations –gas dynamics (Woodward) –others underway via NRAC allocations Software packaging for communities –NMI GRIDS Center, Alliance “In a Box” … Lessons for TeraGrid NCSA machine room

4 National Center for Supercomputing ApplicationsNational Computational Science Platinum Software Configuration Linux –RedHat 6.2 and Linux 2.2.19 SMP Kernel Open PBS –resource management and job control Maui Scheduler –advanced scheduling Argonne MPICH –parallel programming API NCSA VMI –communication middleware –MPICH and Myrinet Myricom GM –Myrinet communication layer NCSA cluster monitor IBM GPFS

5 National Center for Supercomputing ApplicationsNational Computational Science Session Questions Cluster performance and expectations –generally met, though with the usual hiccups MTBI and failure modes –node and disk loss (stay tuned for my next talk …) –copper Myrinet (fiber much more reliable) –avoid open house demonstrations System utilization –heavily oversubscribed (see queue delays below) Primary complaints –long batch queue delays –capacity vs. capability balance –ISV code availability –software tools –debuggers and performance tools –I/O and parallel file system performance

6 National Center for Supercomputing ApplicationsNational Computational Science NCSA IA-32 Cluster Timeline Jan 2001Mar 2001May 2001 Order placed with IBM 512 compute node cluster 2/23 First four racks of IBM hardware arrive 3/1 Head nodes operational 3/10 First 126 processor Myrinet test jobs 3/13 Final IBM hardware shipment 3/22 First application for compute nodes (CMS/Koranda/Litvin) 3/26 Initial Globus installation 3/26 Final Myrinet hardware arrives 3/26 First 512 processor MILC and NAMD runs 5/8 1000p MP Linpack runs 5/11 1008 processor Top500 run @ 594GF 5/14 2.4 Kernel testing 5/28 RedHat 7.1 testing Apr 2001Feb 2001 6/1 Friendly user period begins June 2001 4/5 Myrinet static mapping in place 4/7 CMS runs successfully 4/11 400 processor HPL runs completing 4/12 Myricom engineering assistance July 2001 Production service

7 National Center for Supercomputing ApplicationsNational Computational Science NCSA Resource Usage

8 National Center for Supercomputing ApplicationsNational Computational Science Alliance HPC Usage Source: PACI Usage Database 0 5,000,000 10,000,000 15,000,000 20,000,000 25,000,000 30,000,000 35,000,000 FY98FY99FY00FY01FY02 Normalized CPU Hours (NU) NCSA Total Alliance Partner Total Clusters in Production

9 National Center for Supercomputing ApplicationsNational Computational Science Hero Cluster Jobs Platinum Titan CPU Hours

10 National Center for Supercomputing ApplicationsNational Computational Science Storm Scale Prediction Sample four hour forecast –Center for Analysis and Prediction of Storms –Advanced Regional Prediction System –full-physics mesoscale prediction system Execution environment –NCSA Itanium Linux Cluster –240 processors, 4 hours per night for 46 days Fort Worth forecast –four hour prediction, 3 km grid –initial state includes assimilation of –WSR-88D reflectivity and radial velocity data –surface and upper air data, satellite, and wind On-demand computing required Source: Kelvin Droegemeier Radar Forecast w/Radar 2 hr

11 National Center for Supercomputing ApplicationsNational Computational Science NCSA Multiphase Strategy Multiple user classes –ISV software, hero calculations –distributed resource sharing, parameter studies Four hardware approaches –shared memory multiprocessors –12 32-way IBM IBM p690 systems (2 TF peak) –large memory and ISV support –TeraGrid IPF clusters –64-bit Itanium2/Madison (10 TF peak) –SDSC, ANL, Caltech and PSC coupling –Xeon clusters –32-bit systems for hero calculations –dedicated sub-clusters (2-3 TF each) –allocated for weeks –Condor resource pools –parameter studies and load sharing

12 National Center for Supercomputing ApplicationsNational Computational Science Extensible TeraGrid Facility (ETF) NCSA: Compute IntensiveSDSC: Data IntensivePSC: Compute Intensive IA64 Pwr4 EV68 IA32 EV7 IA64 Sun 10 TF IA-64 128 large memory nodes 230 TB Disk Storage 3 PB Tape Storage GPFS and data mining 4 TF IA-64 DB2, Oracle Servers 500 TB Disk Storage 6 PB Tape Storage 1.1 TF Power4 6 TF EV68 71 TB Storage 0.3 TF EV7 shared-memory 150 TB Storage Server 1.25 TF IA-64 96 Viz nodes 20 TB Storage 0.4 TF IA-64 IA32 Datawulf 80 TB Storage Extensible Backplane Network LA Hub Chicago Hub IA32 Storage Server Disk Storage Cluster Shared Memory Visualization Cluster LEGEND 30 Gb/s IA64 30 Gb/s Sun ANL: VisualizationCaltech: Data collection analysis 40 Gb/s Backplane Router

13 National Center for Supercomputing ApplicationsNational Computational Science NCSA TeraGrid: 10 TF IPF and 230 TB GbE Fabric Myrinet Fabric 2p 1 GHz 4 or 12 GB memory 73 GB scratch Brocade 12000 Switches 256 2x FC 2 TF Itanium2 256 nodes 2p Madison 4 GB memory 73 GB scratch 2p Madison 4 GB memory 73 GB scratch ~700 Madison nodes Storage I/O over Myrinet and/or GbE 230 TB Interactive+Spare Nodes Login, FTP 10 2p Itanium2 Nodes 10 2p Madison Nodes TeraGrid Network 2p Madison 4 GB memory 73 GB scratch Being Installed Now


Download ppt "National Computational Science National Center for Supercomputing Applications National Computational Science NCSA Terascale Clusters Dan Reed Director,"

Similar presentations


Ads by Google