Presentation is loading. Please wait.

Presentation is loading. Please wait.

Oak Ridge Leadership Computing Facility Don Maxwell HPC Technical Coordinator October 8, 2010 Presented To: HPC User Forum, Stuttgart www.olcf.ornl.gov.

Similar presentations


Presentation on theme: "Oak Ridge Leadership Computing Facility Don Maxwell HPC Technical Coordinator October 8, 2010 Presented To: HPC User Forum, Stuttgart www.olcf.ornl.gov."— Presentation transcript:

1 Oak Ridge Leadership Computing Facility Don Maxwell HPC Technical Coordinator October 8, 2010 Presented To: HPC User Forum, Stuttgart

2 2 Oak Ridge Leadership Computing Facility Mission: Deploy and operate the computational resources required to tackle global challenges – Providing world-class computational resources and specialized services for the most computationally intensive problems – Providing stable hardware/software path of increasing scale to maximize productive applications development – Deliver transforming discoveries in materials, biology, climate, energy technologies, etc. – Provide the ability to investigate otherwise inaccessible systems, from supernovae to nuclear reactors to energy grid dynamics 2Managed by UT-Battelle for the Department of Energy

3 3 Our vision for sustained leadership and scientific impact Provide the worlds most powerful open resource for capability computing Follow a well-defined path for maintaining world leadership in this critical area Attract the brightest talent and partnerships from all over the world Deliver cutting-edge science relevant to the missions of DOE and key federal and state agencies Unique opportunity for multi-agency collaboration for science based on synergy of requirements and technology

4 4 4 With UT, we are NSFs National Institute for Computational Sciences for academia 4Managed by UT-Battelle for the Department of Energy 1 PF system to the UT-ORNL Joint Institute for Computational Sciences – Largest grant in UT history – Other partners: Texas Advanced Computing Center, National Center for Atmospheric Research, ORAU, and core universities – 1 of up to 4 leading-edge computing systems planned to increase the availability of computing resources to U.S. researchers A new phase in our relationship with UT – Computational Science Initiative – Governors Chair and joint faculty – Engagement with the scientific community – Research, education, and training mission

5 5 Oak Ridge National Laboratory Leadership Computing Systems Jaguar Kraken NOAA CMRS Worlds most powerful computer NOAAs most powerful computer NSFs most powerful computer

6 6 Jaguar History Jan 2005 XT3 Dev Cabinet Mar Cabinet Single Core April XT3 Cabinets Jun cabinets for total of 56 XT3 25TF Nov 2006 XT4 Dual Core 2.6GHz 32 then 36 cabinets July 2006 XT3 Dual Core 2.6 GHz 50TF March 2007 XT3 and XT4 Combined for total of 124 cabinets 100TF May 2008 XT4 68 cabinets Quad Core 250TF Dec cabinet Quad Core XT5 1PF Nov cabinet Six Core XT5 2PF

7 7 What is Jaguar Today? Jaguar combines a 263 TF Cray XT4 system at ORNLs OLCF with a 2,332 TF Cray XT5 to create a 2.5 PF system System attributeXT5XT4 AMD Opteron processors37,376 Hex-core7,832 Quad-core Memory DIMMS75,77231,776 Node architectureDual socket SMPSingle Socket Memory per core/node (GB)1.3/162/8 Total system memory (TB)30062 Disk capacity (TB)10, Disk bandwidth (GB/s)24044 InterconnectSeaStar2+ 3D torus

8 8 Spider: Center-wide High Speed Parallel File System Spider provides a shared, parallel file system for all systems – Based on Lustre file system Demonstrated bandwidth of over 240 GB/s Over 10 PB of RAID-6 Capacity – DDN 9900 storage controllers with 8+2 disks per RAID group – 13,440 1-TB SATA Drives 192 Dell PowerEdge Storage servers – 3 TB of memory Available from all systems via our high-performance scalable I/O network – Over 3,000 InfiniBand ports – Over 3 miles of cables – Scales as storage grows Spider is the parallel file system for Jaguar Spider uses approximately 400 KW of power

9 9 Jaguar combines a 2.33 PF Cray XT5 with a 263 TF Cray XT4 System components are linked by 4×-DDR InfiniBand (IB) using three Cisco 7024D switches XT5 has 192 IB links XT4 has 48 IB links Spider has 192 IB links Spider Cray XT4 Cray XT5 External Logins

10 10 Building an Exabyte Archive Supercomputers addressing Grand Challenges need to quickly store massive amounts of data The High-Performance Storage System meets the big-storage demands of big science 25PB of Tape Storage Planning for 750PB by 2012 Stanley White, National Center for Computational Sciences High-Performance Storage System adds capacity and speed Fifteen years ago, [national] labs realized they needed something of this size. They recognized Grand Challenge problems were coming up that would require petaflops of computing power. And they realized those jobs had to have a place to put the data.

11 11 Scheduling to Maximize Capability Computing Capability jobs get maximum priority and walltime Jobs are prioritized using several factors to meet DOE goals and to provide flexibility

12 12 Job Failure Trends MPI Forum OpenMPI HWPOISON

13 13 ORNLs Current and Planned Data Centers Computational Sciences Building (40,000 ft 2 ) Maximum building power to 25 MW 6,600 ton chiller plant 1.5 MW UPS and 2.25 MW generator LEED Certified Multiprogram Research Facility (30,000 ft 2 ) Capability computing for national defense 25 MW of power and 8,000 ton chillers LEED Gold Certification Multiprogram Computing & Data Center (140,000 ft 2 ) Up to 100 MW of power Lights out facility Planned for LEED Gold certification

14 14 T. Barron D. Dillow D. Fuller R. Gunasekaran S. Hicks 5 Y. Kim K. Matney R. Miller S. Oral National Center for Computational Sciences J. Hack, Director A. Bland, OLCF Project Director L. Gregg, Division Secretary Operations Council W. McCrosky, Finance Officer H. George, HR Rep. K. Carter, Recruiting M. Richardson*, Facility Mgmt. M. Disney, ES&H Officer R. Adamson, M. Disney, Cyber Security D. Leverman D. Londo 4 J. Lothian D. M. McNamara 4 J. Miller 6 D. Pelfrey G. Phipps, Jr. 6 R. Ray S. Shpanskiy C. St. Pierre B. Tennessen 4 K. Thach T. Watts 4 S. White C. Willis 4 T. Wilson 6 R. Adamson M. Bast J. Becklehimer 4 J. Breazeale 6 J. Brown 6 M. Disney A. Enger 4 C. England J. Evanko 4 A. Funk 4 D. Garman 4 D. Giles M. Hermanson 2 J. Hill S. Koch H. Kuehn C. Leach 6 High-Performance Computing Operations A. Baker S. Allen B. Mintz 7 M. Matheson R. Mills 5 B. Mintz 7 H. Nam G.Ostrouchov 5 N. Podhorszki D. Pugmire R. Sisneros 7 R. Sankaran R. Tchoua A. Tharrington # R. Toedte S. Ahern # E. Apra 5 R. H. Baker D. Banks 3 M. Brown J. Daniel M. Eisenbach M. Fahey J. Gergel 5 S. Hampton 7 W. Joubert # S. Klasky # A. Lopez-Bezanilla 7 Q. Liu 7 Scientific Computing R. Kendall A. Fields Deputy Project Director K. Boudwin B. Hammontree, Site Preparation J. Rogers, Hardware Acquisition R. Kendall, Test & Acceptance Development A. Baker, Commissioning D. Hudson, Project Management K. Stelljes, Cray Project Director Advisory Committee J. Dongarra T. Dunning K. Droegemeier S. Karin D. Reed J. Tomkins J. Levesque N. Wichmann J. Larkin D. Kiefer L. DeRose Cray Supercomputing Center of Excellence Application Performance Tools 5 R. Graham T. Darland R. Barrett W. Bland L. Broto 7 O. Hernandez S. Hodson T. Jones R. Keller G. Koenig J. Kuehn Chief Technology Officer A. Geist Director of Operations J. Rogers OLCF System Architect S. Poole Director of Science B. Messer, Acting INCITE Program J. White Industrial Partnerships S. Tichenor User Assistance And Outreach A. Barker A. Fields J. Buchanan J. Eady 5 D. Frederick C. Fuson E. Gedenk 1 B. Gajus 5 M. Griffith S. Hempfling J. Hines # S. Jones C. Kerns 1 D. Levy 5 M. Miller L. Rael B. Renaud C. Rockett 1 D. Rose 5 J. Smith W. Wade 1 B. Whitten L. Williams 5 B. Settlemyer 5 D. Steinert J. Simmons V. Tipparaju 5 S. Vazhkudai 5 F. Wang V. White Z. Zhang Technology Integration G. Shipman S. Mowery 1 Student 2 Post Graduate 3 JICS 4 Cray, Inc. 5 Matrixed 6 Subcontract 7 Post Doc *Acting # Task Technical Coordinator 1 Student 2 Post Graduate 3 JICS 4 Cray, Inc. 5 Matrixed 6 Subcontract 7 Post Doc *Acting # Task Technical Coordinator ORNL is managed and operated by UT-Battelle, LLC under contract with the DOE. 78 FTEs

15 15 Scientific Computing 15 Scientific Computing facilitates the delivery of leadership science by partnering with users to effectively utilize computational science, visualization and workflow technologies on OLCF resources through: Science team liaisons Developing, tuning, and scaling current and future applications Providing visualizations to present scientific results and augment discovery processes

16 16 We allocate time on the DOE systems through the Innovative and Novel Computational Impact on Theory and Experiment (INCITE) Program Provides awards to academic, government, and industry organizations worldwide needing large allocations of computer time, supporting resources, and data storage to pursue transformational advances in science and industrial competitiveness.

17 17 User Demographics Active Users by Sponsor System time is allocated to each project. We do not charge for time except for proprietary work by commercial companies.

18 18 Glimpse into dark matter Supernovae ignition Protein structure Creation of biofuels Replicating enzyme functions Protein folding Chemical catalyst design Efficient coal gasifiers Combustion Algorithm development Global cloudiness Regional earthquakes Carbon sequestration Airfoil optimization Turbulent flow Propulsor systems Nano-devices Batteries Solar cells Reactor design Contact information Julia C. White, INCITE Manager Some INCITE research topics Next INCITE Call for Proposals: April 2011 Awards for 1-, 2-, or 3- years Average award > 20 million processor hours per year Contact us about discretionary time for INCITE preparation

19 19 Three of six GB finalists ran on Jaguar Gordon Bell Prize Awarded to ORNL Team A team led by ORNLs Thomas Schulthess received the prestigious 2008 Association for Computing Machinery (ACM) Gordon Bell Prize at SC08 For attaining fastest performance ever in a scientific supercomputing application Simulation of superconductors achieved petaflops on ORNLs Cray XT Jaguar supercomputer By modifying the algorithms and software design of the DCA++ code, the team was able to boost its performance tenfold Gordon Bell Finalists DCA++ ORNL LS3DF LBNL SPECFEM3D SDSC RHEA TACC SPaSM LANL VPIC LANL UPDATE: with upgraded Jaguar, DCA++ has exceeded 1.9 PF

20 20 OLCF is working with users to produce scalable, high-performance apps for the petascale 20Managed by UT-Battelle for the U.S. Department of Energy

21 21 Scientific Progress at the Petascale Nuclear Energy High-fidelity predictive simulation tools for the design of next-generation nuclear reactors to safely increase operating margins. Fusion Energy Substantial progress in the understanding of anomalous electron energy loss in the National Spherical Torus Experiment (NSTX). Nano Science Understanding the atomic and electronic properties of nanostructures in next- generation photovoltaic solar cell materials. Turbulence Understanding the statistical geometry of turbulent dispersion of pollutants in the environment. Energy Storage Understanding the storage and flow of energy in next- generation nanostructured carbon tube supercapacitors Biofuels A comprehensive simulation model of lignocellulosic biomass to understand the bottleneck to sustainable and economical ethanol production. 21Managed by UT-Battelle for the U.S. Department of Energy

22 22 Science Results Coherent transport simulations in band-to-band tunneling devices with simulation times of less than an hour => rapidly explore design space Incoherent transport simulations coupling all energies through phonon-interactions. Production runs on 70,000 cores in 12 hours => first atomistic incoherent transport simulations Science Objectives and Impact Identify next generation nano-transistor architectures, and reduce power consumption and increase manufacturability. Model, understand, and design carrier flow in nano- scale semiconductor transistors. Nanoscience / nanotechnology Petascale simulations of nano-electronic devices Research Team M. Luisier and G. Klimeck, Purdue University 3-year INCITE award, with 20 million hours in 2010 OMEN: 3D, 2D, and 1D atomistic devices

23 23 Science Results Science Objectives and Impact Computational Fluid Dynamics Smart-Truck Optimization Research Team Mike Henderson, BMI Corp. Participant in the Industrial Partnerships Program Unprecedented detail and accuracy of a Class 8 Tractor- Trailer aerodynamic simulation. Minimizes drag associated with trailer underside Compresses and accelerates incoming air flow and injecting high energy air into trailer wake => UT-6 Trailer Under Tray System reduces Tractor/Trailer drag by 12% Apply advanced computational techniques from aerospace industry to substantially improve fuel efficiency and reduce emissions of trucks by reducing drag / increasing aerodynamic efficiency If all 1.3 million long haul trucks operated with the drag of a passenger car, the US would annually: Save 6.8 billion gallons of diesel Reduce 75 million tons CO 2 Save $19 billion in fuel costs Aerodynamic Performance Testing Methods - Jaguar CFD analysis of truck and mirrors

24 24 Examples of OLCF Industrial Projects Developing new add-on parts to reduce drag and increase fuel efficiency of Class 8 (18-wheeler) long haul trucks. This will reduce fuel consumption by up to 3,700 gallons per truck per year, and reduce CO 2 by up to 41 tons (82,000 lb) per truck per year. BMI using NASA FUN3D and NASA team is assisting BMI with code refinement (OLCF Directors Discretionary Award) Analyzing unsteady versus steady flows in low pressure turbomachinery and their potential effects on more energy efficient designs. (OLCF Directors Discretionary Award) Studying at the nano scale catalysts that can selectively produce hydrogen from biomass (hydrogen to be used as energy for fuel cells) (OLCF Directors Discretionary Award) Developing a unique CO 2 compression technology for significantly lower cost carbon sequestration (ALCC award) INCITE awards

25 25 The U.S. Department of Energy requires exaflops computing by 2018 to meet the needs of the science communities that depend on leadership computing Our vision: Provide a series of increasingly powerful computer systems and work with user community to scale applications to each of the new computer systems – OLCF-3 Project : New petaflops computer based on early hybrid multi-core technology 10 Year Strategy: Moving to the Exascale OLCF Roadmap from 10-year plan 300 PF System PF ORNL Extreme Scale Computing Facility(140,000 ft 2 ) 2017 ORNL Computational Sciences Building ORNL Multipurpose Research Facility 1 EF OLCF-3 Future systems Today 2 PF, 6-core 1 PF 100 PF

26 26 Similar number of cabinets, cabinet design, and cooling as Jaguar Operating system upgrade of todays Cray Linux Environment New Gemini interconnect 3-D Torus Globally addressable memory Advanced synchronization features New accelerated node design using GPUs 20 PF peak performance 9x performance of todays XT5 3x larger memory 3x larger and 4x faster file system OLCF-3 Titan System Description


Download ppt "Oak Ridge Leadership Computing Facility Don Maxwell HPC Technical Coordinator October 8, 2010 Presented To: HPC User Forum, Stuttgart www.olcf.ornl.gov."

Similar presentations


Ads by Google