1 Application Scalability and High Productivity Computing Nicholas J Wright John Shalf Harvey Wasserman Advanced Technologies Group NERSC/LBNL.

Slides:

Advertisements

Similar presentations

Discussion of Infrastructure Clouds A NERSC Magellan Perspective Lavanya Ramakrishnan Lawrence Berkeley National Lab.

Advertisements

O AK R IDGE N ATIONAL L ABORATORY U. S. D EPARTMENT OF E NERGY Center for Computational Sciences Cray X1 and Black Widow at ORNL Center for Computational.

Multiprocessors— Large vs. Small Scale Multiprocessors— Large vs. Small Scale.

1 Computational models of the physical world Cortical bone Trabecular bone.

Program Analysis and Tuning The German High Performance Computing Centre for Climate and Earth System Research Panagiotis Adamidis.

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

Performance Metrics Inspired by P. Kent at BES Workshop Run larger: Length, Spatial extent, #Atoms, Weak scaling Run longer: Time steps, Optimizations,

IBM 1350 Cluster Expansion Doug Johnson Senior Systems Developer.

Supercomputing Challenges at the National Center for Atmospheric Research Dr. Richard Loft Computational Science Section Scientific Computing Division.

Parallel Research at Illinois Parallel Everywhere

SAN DIEGO SUPERCOMPUTER CENTER at the UNIVERSITY OF CALIFORNIA; SAN DIEGO SDSC RP Update October 21, 2010.

IBM RS6000/SP Overview Advanced IBM Unix computers series Multiple different configurations Available from entry level to high-end machines. POWER (1,2,3,4)

NPACI: National Partnership for Advanced Computational Infrastructure Supercomputing ‘98 Mannheim CRAY T90 vs. Tera MTA: The Old Champ Faces a New Challenger.

IBM RS/6000 SP POWER3 SMP Jari Jokinen Pekka Laurila.

Lecture 1: Introduction to High Performance Computing.

Performance Engineering and Debugging HPC Applications David Skinner

NPACI: National Partnership for Advanced Computational Infrastructure August 17-21, 1998 NPACI Parallel Computing Institute 1 Cluster Archtectures and.

Aim High…Fly, Fight, Win NWP Transition from AIX to Linux Lessons Learned Dan Sedlacek AFWA Chief Engineer AFWA A5/8 14 MAR 2011.

Exascale Evolution 1 Brad Benton, IBM March 15, 2010.

Lecture 2 : Introduction to Multicore Computing Bong-Soo Sohn Associate Professor School of Computer Science and Engineering Chung-Ang University.

GPU Programming with CUDA – Accelerated Architectures Mike Griffiths

Computer System Architectures Computer System Software

The NERSC Global File System (NGF) Jason Hick Storage Systems Group Lead CAS2K11 September 11-14,

Performance Evaluation of Hybrid MPI/OpenMP Implementation of a Lattice Boltzmann Application on Multicore Systems Department of Computer Science and Engineering,

Folklore Confirmed: Compiling for Speed = Compiling for Energy Tomofumi Yuki INRIA, Rennes Sanjay Rajopadhye Colorado State University 1.

Chapter 2 Computer Clusters Lecture 2.3 GPU Clusters for Massive Paralelism.

SDSC RP Update TeraGrid Roundtable Reviewing Dash Unique characteristics: –A pre-production/evaluation “data-intensive” supercomputer based.

Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?

U.S. Department of Energy Office of Science Advanced Scientific Computing Research Program CASC, May 3, ADVANCED SCIENTIFIC COMPUTING RESEARCH An.

Seaborg Cerise Wuthrich CMPS Seaborg  Manufactured by IBM  Distributed Memory Parallel Supercomputer  Based on IBM’s SP RS/6000 Architecture.

U.S. Department of Energy Office of Science Advanced Scientific Computing Research Program NERSC Users Group Meeting Department of Energy Update September.

The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.

Scientific Computing Experimental Physics Lattice QCD Sandy Philpott May 20, 2011 IT Internal Review 12GeV Readiness.

4.2.1 Programming Models Technology drivers – Node count, scale of parallelism within the node – Heterogeneity – Complex memory hierarchies – Failure rates.

Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

1 Metrics for the Office of Science HPC Centers Jonathan Carter User Services Group Lead NERSC User Group Meeting June 12, 2006.

Presented by Leadership Computing Facility (LCF) Roadmap Buddy Bland Center for Computational Sciences Leadership Computing Facility Project.

PDSF at NERSC Site Report HEPiX April 2010 Jay Srinivasan (w/contributions from I. Sakrejda, C. Whitney, and B. Draney) (Presented by Sandy.

Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.

STAR Off-line Computing Capabilities at LBNL/NERSC Doug Olson, LBNL STAR Collaboration Meeting 2 August 1999, BNL.

Brent Gorda LBNL – SOS7 3/5/03 1 Planned Machines: BluePlanet SOS7 March 5, 2003 Brent Gorda Future Technologies Group Lawrence Berkeley.

Nanco: a large HPC cluster for RBNI (Russell Berrie Nanotechnology Institute) Anne Weill – Zrahia Technion,Computer Center October 2008.

Template This is a template to help, not constrain, you. Modify as appropriate. Move bullet points to additional slides as needed. Don’t cram onto a single.

Cray Environmental Industry Solutions Per Nyberg Earth Sciences Business Manager Annecy CAS2K3 Sept 2003.

Welcome to Beyond Petaflops: Specialized Architectures for Power Efficient Scientific Computing SIAM Conference on Computational Science and Engineering.

Modeling Billion-Node Torus Networks Using Massively Parallel Discrete-Event Simulation Ning Liu, Christopher Carothers 1.

Exascale climate modeling 24th International Conference on Parallel Architectures and Compilation Techniques October 18, 2015 Michael F. Wehner Lawrence.

Breakout Group: Debugging David E. Skinner and Wolfgang E. Nagel IESP Workshop 3, October, Tsukuba, Japan.

Lawrence Livermore National Laboratory BRdeS-1 Science & Technology Principal Directorate - Computation Directorate How to Stop Worrying and Learn to Love.

1 OFFICE OF ADVANCED SCIENTIFIC COMPUTING RESEARCH The NERSC Center --From A DOE Program Manager’s Perspective-- A Presentation to the NERSC Users Group.

National Energy Research Scientific Computing Center (NERSC) NERSC View of the Greenbook Bill Saphir Chief Architect NERSC Center Division, LBNL 6/23/2004.

B5: Exascale Hardware. Capability Requirements Several different requirements –Exaflops/Exascale single application –Ensembles of Petaflop apps requiring.

Template This is a template to help, not constrain, you. Modify as appropriate. Move bullet points to additional slides as needed. Don’t cram onto a single.

Comprehensive Scientific Support Of Large Scale Parallel Computation David Skinner, NERSC.

2/22/2001Greenbook 2001/OASCR1 Greenbook/OASCR Activities Focus on technology to enable SCIENCE to be conducted, i.e. Software tools Software libraries.

DiRAC-3 – The future Jeremy Yates, STFC DiRAC HPC Facility.

NICS Update Bruce Loftis 16 December National Institute for Computational Sciences University of Tennessee and ORNL partnership  NICS is the 2.

Presented by NCCS Hardware Jim Rogers Director of Operations National Center for Computational Sciences.

Tackling I/O Issues 1 David Race 16 March 2010.

Presented by Fault Tolerance Challenges and Solutions Al Geist Network and Cluster Computing Computational Sciences and Mathematics Division Research supported.

HPC University Requirements Analysis Team Training Analysis Summary Meeting at PSC September Mary Ann Leung, Ph.D.

Hybrid Parallel Implementation of The DG Method Advanced Computing Department/ CAAM 03/03/2016 N. Chaabane, B. Riviere, H. Calandra, M. Sekachev, S. Hamlaoui.

Fermi National Accelerator Laboratory & Thomas Jefferson National Accelerator Facility SciDAC LQCD Software The Department of Energy (DOE) Office of Science.

Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb

Scientific Computing at Fermilab Lothar Bauerdick, Deputy Head Scientific Computing Division 1 of 7 10k slot tape robots.

Compute and Storage For the Farm at Jlab

A Brief Introduction to NERSC Resources and Allocations

Introduction to Parallel Computing: MPI, OpenMP and Hybrid Programming

Scientific Computing At Jefferson Lab

Facts About High-Performance Computing

Presentation transcript:

1 Application Scalability and High Productivity Computing Nicholas J Wright John Shalf Harvey Wasserman Advanced Technologies Group NERSC/LBNL

2 NERSC- National Energy Research Scientific Computing Center Mission: Accelerate the pace of scientific discovery by providing high performance computing, information, data, and communications services for all DOE Office of Science (SC) research. The production computing facility for DOE SC. Berkeley Lab Computing Sciences Directorate –Computational Research Division (CRD), ESnet –NERSC 2

3 NERSC is the Primary Computing Center for DOE Office of Science NERSC serves a large population Over 3000 users, 400 projects, 500 codes NERSC Serves DOE SC Mission –Allocated by DOE program managers –Not limited to largest scale jobs –Not open to non-DOE applications Strategy: Science First –Requirements workshops by office –Procurements based on science codes –Partnerships with vendors to meet science requirements

4 NERSC Systems for Science Large-Scale Computing Systems Franklin (NERSC-5): Cray XT4 9,532 compute nodes; 38,128 cores ~25 Tflop/s on applications; 356 Tflop/s peak Hopper (NERSC-6): Cray XE6 Phase 1: Cray XT5, 668 nodes, 5344 cores Phase 2: 1.25 Pflop/s peak (late 2010 delivery) HPSS Archival Storage 40 PB capacity 4 Tape libraries 150 TB disk cache NERSC Global Filesystem (NGF) Uses IBM’s GPFS 1.5 PB capacity 5.5 GB/s of bandwidth Clusters 140 Tflops total Carver IBM iDataplex cluster PDSF (HEP/NP) ~1K core throughput cluster Magellan Cloud testbed IBM iDataplex cluster GenePool (JGI) ~5K core throughput cluster Analytics Euclid (512 GB shared memory) Dirac GPU testbed (48 nodes)

5 NERSC Roadmap Top500 Franklin (N5) 19 TF Sustained 101 TF Peak Franklin (N5) +QC 36 TF Sustained 352 TF Peak Hopper (N6) >1 PF Peak NERSC-7 10 PF Peak NERSC PF Peak NERSC-9 1 EF Peak Peak Teraflop/s Users expect 10x improvement in capability every 3-4 years How do we ensure that Users Performance follows this trend and their Productivity is unaffected ?

6 A Plan of Attack 1.Understand the technology trends 2.Understand the science needs 3.Influence the technology & the applications simultaneously? 1.Co-Design !

7 Hardware Trends: The Multicore era Moore’s Law continues unabated Power constraints means cores will double every 18 months not clock speed Memory capacity is not doubling at the same rate – GB/core will decrease Figure courtesy of Kunle Olukotun, Lance Hammond, Herb Sutter, and Burton Smith Power is the Leading Design Constraint

8 From Peter Kogge, DARPA Exascale Study Current Technology Roadmaps will Depart from Historical Gains Power is the Leading Design Constraint

9 … and the power costs will still be staggering From Peter Kogge, DARPA Exascale Study $1M per megawatt per year! (with CHEAP power)

10 Changing Notion of “System Balance” If you pay 5% more to double the FPUs and get 10% improvement, it’s a win (despite lowering your % of peak performance) If you pay 2x more on memory BW (power or cost) and get 35% more performance, then it’s a net loss (even though % peak looks better) Real example: we can give up ALL of the flops to improve memory bandwidth by 20% on the 2018 system We have a fixed budget –Sustained to peak FLOP rate is wrong metric if FLOPs are cheap –Balance involves balancing your checkbook & balancing your power budget –Requires a application co-design make the right trade-offs

11 Summary: Technology Trends: Number Cores  –Flops will be “free” Memory Capacity per core  Memory Bandwidth per core  Network Bandwidth per core  I/O Bandwidth 

12 Navigating Technology Phase Transitions Top500 COTS/MPP + MPI COTS/MPP + MPI (+ OpenMP) GPU CUDA/OpenCL Or Manycore BG/Q, R Exascale + ??? Franklin (N5) 19 TF Sustained 101 TF Peak Franklin (N5) +QC 36 TF Sustained 352 TF Peak Hopper (N6) >1 PF Peak NERSC-7 10 PF Peak NERSC PF Peak NERSC-9 1 EF Peak Peak Teraflop/s 12

13 Application Scalability How can a user continue to be productive in the face of these disruptive technology trends?

14 Source of Workload Information 14 Documents –2005 DOE Greenbook – NERSC Plan –LCF Studies and Reports –Workshop Reports –2008 NERSC assessment Allocations analysis User discussion

15 New Model for Collecting Requirements Joint DOE Program Office / NERSC Workshops Modeled after ESnet method –Two workshops per year –Describe science-based needs over 3-5 years Case study narratives –First workshop is BER, May 7, 8 15

16 Numerical Methods at NERSC (Caveat: survey data from ERCAP requests)

17 Application Trends Weak Scaling –Time to solution is often a non-linear function of problem size Strong Scaling –Latency or Serial fraction will get you in the end. Add features to models – “New” Weak Scaling “Processors” Performance “Processors” Performance

18 Develop Best Practices in Multicore Programming NERSC/Cray Programming Models “Center of Excellence” combines: LBNL strength in languages, tuning, performance analysis Cray strength in languages, compilers, benchmarking Goals: Immediate goal is training material for Hopper users: hybrid OpenMP/MPI Long term input into exascale programming model = OpenMP thread parallelism 18

19 Develop Best Practices in Multicore Programming = OpenMP thread parallelism Conclusions so far: Mixed OpenMP/MPI saves significant memory Running time impact varies with application 1 MPI process per socket is often good Run on Hopper next: 12 vs 6 cores per socket Gemini vs. Seastar 19

20 Co-Design Eating our own dogfood 20

21 Inserting Scientific Apps into the Hardware Development Process Research Accelerator for Multi-Processors (RAMP) –Simulate hardware before it is built! –Break slow feedback loop for system designs –Enables tightly coupled hardware/software/science co-design (not possible using conventional approach)

22 Summary Disruptive technology changes are coming By exploring – new programming models (and revisiting old ones) –Hardware software co-design We hope to ensure that scientists productivity remains high !

23

24 Exascale Machine Wish List - Performance Lightweight Communication –Single-sided messaging Performance Feedback –Why is my code now slower than the last run? –Autotuning Fine grained control of data movement –Cache Bypass

25 Exascale Machine Wish List - Productivity Simplest possible execution model –Portable programming model –Hide inhomogeneity Debugging Support –Race conditions + Deadlocks Reliability –No desire to add error detection to application