NOAA/NWS/Environmental Modeling Center NOAA Operational Forecasting and the HPC Imperative John Michalakes NOAA/NCEP/Environmental Modeling Center (IM.

Slides:



Advertisements
Similar presentations
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Advertisements

Instructor Notes We describe motivation for talking about underlying device architecture because device architecture is often avoided in conventional.
Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.
Data Marshaling for Multi-Core Architectures M. Aater Suleman Onur Mutlu Jose A. Joao Khubaib Yale N. Patt.
SPC Potential Products and Services and Attributes of Operational Supporting NWP Probabilistic Outlooks of Tornado, Severe Hail, and Severe Wind.
XEON PHI. TOPICS What are multicore processors? Intel MIC architecture Xeon Phi Programming for Xeon Phi Performance Applications.
GPGPU Introduction Alan Gray EPCC The University of Edinburgh.
JLab Status & 2016 Planning April 2015 All Hands Meeting Chip Watson Jefferson Lab Outline Operations Status FY15 File System Upgrade 2016 Planning for.
1 WRF Development Test Center A NOAA Perspective WRF ExOB Meeting U.S. Naval Observatory, Washington, D.C. 28 April 2006 Fred Toepfer NOAA Environmental.
1 ESMF in Production at NCEP Mark Iredell Chief NCEP/EMC Global Climate and Weather Modeling Branch May 23, 2006 NCEP: “where America’s climate, weather,
Mesoscale & Microscale Meteorological Division / NCAR ESMF and the Weather Research and Forecast Model John Michalakes, Thomas Henderson Mesoscale and.
Weather Research & Forecasting Model (WRF) Stacey Pensgen ESC 452 – Spring ’06.
Techniques for Efficient Processing in Runahead Execution Engines Onur Mutlu Hyesoon Kim Yale N. Patt.
Tolman, December 1, 2014The NPS- looking forward, 1/10 The NCEP Production Suite Looking forward Hendrik L. Tolman Director, Environmental Modeling Center.
1 NGGPS Dynamic Core Requirements Workshop NCEP Future Global Model Requirements and Discussion Mark Iredell, Global Modeling and EMC August 4, 2014.
Accelerating Machine Learning Applications on Graphics Processors Narayanan Sundaram and Bryan Catanzaro Presented by Narayanan Sundaram.
1 NCEP Mark Iredell Chief NCEP/EMC Global Climate and Weather Modeling Branch May 23, 2006 NCEP: “where America’s climate, weather, and ocean services.
Accelerating SQL Database Operations on a GPU with CUDA Peter Bakkum & Kevin Skadron The University of Virginia GPGPU-3 Presentation March 14, 2010.
The Hurricane Weather Research & Forecasting (HWRF) Prediction System Next generation non-hydrostatic weather research and hurricane prediction system.
GPU Programming with CUDA – Accelerated Architectures Mike Griffiths
Next Generation Global Prediction System (NGGPS)
NSF NCAR | NASA GSFC | DOE LANL ANL | NOAA NCEP GFDL | MIT | U MICH First Field Tests of ESMF GMAO Seasonal Forecast NCAR/LANL CCSM NCEP.
Simultaneous Multithreading: Maximizing On-Chip Parallelism Presented By: Daron Shrode Shey Liggett.
Computer Graphics Graphics Hardware
Status of RSA LAPS/MM5 System Sustainment John McGinley, Steve Albers*, Chris Anderson*, Linda Wharton NOAA Earth System Research Laboratory Global Systems.
Performance Analysis, Profiling and Optimization of Weather Research and Forecasting (WRF) model Negin Sobhani 1,2, Davide Del Vento2,David Gill2, Sam.
The WRF Model The Weather Research and Forecasting (WRF) Model is a mesoscale numerical weather prediction system designed for both atmospheric research.
Architectural Support for Fine-Grained Parallelism on Multi-core Architectures Sanjeev Kumar, Corporate Technology Group, Intel Corporation Christopher.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
High Performance Computing Processors Felix Noble Mirayma V. Rodriguez Agnes Velez Electric and Computer Engineer Department August 25, 2004.
1 Global Model Development Priorities Presented By: Hendrik Tolman & Vijay Tallapragada (NWS/NCEP) Contributors: GCWMB (EMC), NGGPS (NWS)
Problem is to compute: f(latitude, longitude, elevation, time)  temperature, pressure, humidity, wind velocity Approach: –Discretize the.
Srihari Makineni & Ravi Iyer Communications Technology Lab
Higher Resolution Operational Models. Operational Mesoscale Model History Early: LFM, NGM (history) Eta (mainly history) MM5: Still used by some, but.
Soil moisture generation at ECMWF Gisela Seuffert and Pedro Viterbo European Centre for Medium Range Weather Forecasts ELDAS Interim Data Co-ordination.
Multi-core Acceleration of NWP John Michalakes, NCAR John Linford, Virginia Tech Manish Vachharajani, University of Colorado Adrian Sandu, Virginia Tech.
IDC HPC USER FORUM Weather & Climate PANEL September 2009 Broomfield, CO Panel questions: 1 response per question Limit length to 1 slide.
3 rd Annual WRF Users Workshop Promote closer ties between research and operations Develop an advanced mesoscale forecast and assimilation system   Design.
Ligia Bernardet, S. Bao, C. Harrop, D. Stark, T. Brown, and L. Carson Technology Transfer in Tropical Cyclone Numerical Modeling – The Role of the DTC.
Higher Resolution Operational Models. Major U.S. High-Resolution Mesoscale Models (all non-hydrostatic ) WRF-ARW (developed at NCAR) NMM-B (developed.
Hurricane Forecast Improvement Project (HFIP): Where do we stand after 3 years? Bob Gall – HFIP Development Manager Fred Toepfer—HFIP Project manager Frank.
Template This is a template to help, not constrain, you. Modify as appropriate. Move bullet points to additional slides as needed. Don’t cram onto a single.
APPLICATION OF NUMERICAL MODELS IN THE FORECAST PROCESS - FROM NATIONAL CENTERS TO THE LOCAL WFO David W. Reynolds National Weather Service WFO San Francisco.
1)Leverage raw computational power of GPU  Magnitude performance gains possible.
Yang Yu, Tianyang Lei, Haibo Chen, Binyu Zang Fudan University, China Shanghai Jiao Tong University, China Institute of Parallel and Distributed Systems.
Innovation for Our Energy Future Opportunities for WRF Model Acceleration John Michalakes Computational Sciences Center NREL Andrew Porter Computational.
CCSM Performance, Successes and Challenges Tony Craig NCAR RIST Meeting March 12-14, 2002 Boulder, Colorado, USA.
Computer Architecture Lecture 26 Past and Future Ralph Grishman November 2015 NYU.
Template This is a template to help, not constrain, you. Modify as appropriate. Move bullet points to additional slides as needed. Don’t cram onto a single.
NOAA Global Modeling Workshop January 2006NOAA/ESRL FIM Contribution toward future NOAA global modeling system Developed at ESRL, collaboration so.
Co-Processor Architectures Fermi vs. Knights Ferry Roger Goff Dell Senior Global CERN/LHC Technologist |
Vincent N. Sakwa RSMC, Nairobi
The NOAA Environmental Modeling System at NCEP Mark Iredell and the NEMS group NOAA/NWS/NCEP Environmental Modeling Center June 12, 2014.
1 Where the Rubber Meets the Road: Testbed Experiences of the Hydrometeorological Prediction Center David Novak 1, Faye Barthold 2, Mike Bodner 1, and.
Programming Multi-Core Processors based Embedded Systems A Hands-On Experience on Cavium Octeon based Platforms Lab Exercises: Lab 1 (Performance measurement)
My Coordinates Office EM G.27 contact time:
Higher Resolution Operational Models
NOAA Hurricane Forecast Improvement Project Development Fred Toepfer, HFIP Manager Bob Gall, HFIP Development Manager.
Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb
2. WRF model configuration and initial conditions  Three sets of initial and lateral boundary conditions for Katrina are used, including the output from.
Application of Emerging Computational Architectures (GPU, MIC) to Atmospheric Modeling Tom Henderson NOAA Global Systems Division
CS203 – Advanced Computer Architecture
M. Bellato INFN Padova and U. Marconi INFN Bologna
NGGPS NGGPS Priorities: the three legs of the stool
Phil Jones (LANL) Eric Chassignet (FSU)
Mattan Erez The University of Texas at Austin
Many-Core Graph Workload Analysis
Accelerating Quantum Chemistry with Batched and Vectorized Integrals
Chip&Core Architecture
Presentation transcript:

NOAA/NWS/Environmental Modeling Center NOAA Operational Forecasting and the HPC Imperative John Michalakes NOAA/NCEP/Environmental Modeling Center (IM Systems Group) University of Colorado at Boulder ESPC Air-Ocean-Land-Ice Global Coupled PredictionProject Meeting Nov 2014 University of Colorado UMC, Boulder

2 NOAA/NWS/Environmental Modeling Center Outline The HPC Imperative Next-Generation Global Prediction System Accelerators and NWP

3 NOAA/NWS/Environmental Modeling Center HPC Imperative NWP is one of the first HPC applications and, with climate, has been one its key drivers –Exponential growth in HPC capability has translated directly to better forecasts and steadily improving value to the public HPC growth continues toward Peta-/Exaflop, but only parallelism is increasing –More floating point capability –Proportionately less cache, memory and I/O bandwidth  Parallelism scales in one fewer dimension than complexity  Ensembles scale computationally but move problem to I/O Can operational NWP stay on the bus? –More scalable formulations –Expose and exploit all available parallelism, especially fine-grain

4 NOAA/NWS/Environmental Modeling Center HPC Imperative NWP is one of the first HPC applications and, with climate, has been one its key drivers  Exponential growth in HPC capability has translated directly to better forecasts and steadily improving value to the public HPC growth continues toward Peta-/Exaflop, but only parallelism is increasing –More floating point capability –Proportionately less cache, memory and I/O bandwidth  Parallelism scales in one fewer dimension than complexity  Ensembles scale computationally but move problem to I/O Can operational NWP stay on the bus? –More scalable formulations –Expose and exploit all available parallelism, especially fine-grain Fred Toepfer and Ed Mifflin - HFIP

5 NOAA/NWS/Environmental Modeling Center HPC Imperative NWP is one of the first HPC applications and, with climate, has been one its key drivers  Exponential growth in HPC capability has translated directly to better forecasts and steadily improving value to the public HPC growth continues toward Peta-/Exaflop, but only parallelism is increasing –More floating point capability –Proportionately less cache, memory and I/O bandwidth  Parallelism scales in one fewer dimension than complexity  Ensembles scale computationally but move problem to I/O Can operational NWP stay on the bus? –More scalable formulations –Expose and exploit all available parallelism, especially fine-grain

6 NOAA/NWS/Environmental Modeling Center HPC Imperative NWP is one of the first HPC applications and, with climate, has been one its key drivers –Exponential growth in HPC capability has translated directly to better forecasts and steadily improving value to the public HPC growth continues toward Peta-/Exaflop, but only parallelism is increasing –More floating point capability –Proportionately less cache, memory and I/O bandwidth  Parallelism scales in one fewer dimension than complexity  Ensembles scale computationally but move problem to I/O Can operational NWP stay on the bus? –More scalable formulations –Expose and exploit all available parallelism, especially fine-grain

7 NOAA/NWS/Environmental Modeling Center NWP is one of the first HPC applications and, with climate, has been one its key drivers –Exponential growth in HPC capability has translated directly to better forecasts and steadily improving value to the public HPC growth continues toward Peta-/Exaflop, but only parallelism is increasing –More floating point capability –Proportionately less cache, memory and I/O bandwidth  NWP parallelism scales in one dimension less than complexity  Ensembles scale computationally but move problem to I/O Can operational NWP stay on the bus? –More scalable formulations –Expose and exploit all available parallelism, especially fine-grain HPC Imperative ** T1534 GFS will be implemented operationally in Dec to use 1856 WCOSS cores, 64 vertical levels and DT=450s. The plotted point is adjusted to 2855 cores, 128 levels and a 600 s time step to conform to planned higher-resolution configurations. ~3 MW

8 NOAA/NWS/Environmental Modeling Center NWP is one of the first HPC applications and, with climate, has been one its key drivers –Exponential growth in HPC capability has translated directly to better forecasts and steadily improving value to the public HPC growth continues toward Peta-/Exaflop, but only parallelism is increasing –More floating point capability –Proportionately less cache, memory and I/O bandwidth  NWP parallelism scales in one dimension less than complexity  Ensembles scale computationally but move problem to I/O Can operational NWP stay on the bus? –More scalable formulations –Expose and exploit all available parallelism, especially fine-grain HPC Imperative ** T1534 GFS will be implemented operationally in Dec to use 1856 WCOSS cores, 64 vertical levels and DT=450s. The plotted point is adjusted to 2855 cores, 128 levels and a 600 s time step to conform to planned higher-resolution configurations.

9 NOAA/NWS/Environmental Modeling Center NWP is one of the first HPC applications and, with climate, has been one its key drivers –Exponential growth in HPC capability has translated directly to better forecasts and steadily improving value to the public HPC growth continues toward Peta-/Exaflop, but only parallelism is increasing –More floating point capability –Proportionately less cache, memory and I/O bandwidth  NWP parallelism scales in one dimension less than complexity  Ensembles scale computationally but move problem to I/O Can operational NWP stay on the bus? –More scalable formulations –Expose and exploit all available parallelism, especially fine-grain HPC Imperative ** T1534 GFS will be implemented operationally in Dec to use 1856 WCOSS cores, 64 vertical levels and DT=450s. The plotted point is adjusted to 2855 cores, 128 levels and a 600 s time step to conform to planned higher-resolution configurations.

10 NOAA/NWS/Environmental Modeling Center HPC Imperative NWP is one of the first HPC applications and, with climate, has been one its key drivers –Exponential growth in HPC capability has translated directly to better forecasts and steadily improving value to the public HPC growth continues toward Peta-/Exaflop, but only parallelism is increasing –More floating point capability –Proportionately less cache, memory and I/O bandwidth −Parallelism scales in one fewer dimension than complexity  Ensembles scale computationally but move problem to I/O Can operational NWP stay on the bus? –More scalable formulations –Expose and exploit all available parallelism, especially fine-grain

11 NOAA/NWS/Environmental Modeling Center NWP is one of the first HPC applications and, with climate, has been one its key drivers –Exponential growth in HPC capability has translated directly to better forecasts and steadily improving value to the public HPC growth continues toward Peta-/Exaflop, but only parallelism is increasing –More floating point capability –Proportionately less cache, memory and I/O bandwidth −NWP parallelism scales in one dimension less than complexity  Ensembles scale computationally but move problem to I/O Can operational NWP stay on the bus? –More scalable formulations –Expose and exploit all available parallelism, especially fine-grain HPC Imperative Deterministic versus Ensemble Output Volumes

12 NOAA/NWS/Environmental Modeling Center HPC Imperative NWP is one of the first HPC applications and, with climate, has been one its key drivers –Exponential growth in HPC capability has translated directly to better forecasts and steadily improving value to the public HPC growth continues toward Peta-/Exaflop, but only parallelism is increasing –More floating point capability –Proportionately less cache, memory and I/O bandwidth −NWP parallelism scales in one fewer dimension than complexity −Ensembles scale computationally but move problem to I/O Can operational NWP stay on the HPC train?  Expose + exploit all available parallelism, especially fine-grain −More scalable formulations

13 NOAA/NWS/Environmental Modeling Center HPC Imperative NWP is one of the first HPC applications and, with climate, has been one its key drivers –Exponential growth in HPC capability has translated directly to better forecasts and steadily improving value to the public HPC growth continues toward Peta-/Exaflop, but only parallelism is increasing –More floating point capability –Proportionately less cache, memory and I/O bandwidth −NWP parallelism scales in one fewer dimension than complexity −Ensembles scale computationally but move problem to I/O Can operational NWP stay on the HPC train? –Expose and exploit all available parallelism, especially fine-grain  More scalable formulations

14 NOAA/NWS/Environmental Modeling Center Next-Generation Global Prediction System Response to Hurricane Sandy (2012): –$14.8M / 5 year Research to Operations (R2O) Initiative –Produce state-of-the-art prediction system, NGGPS, by 2018 Goals: Meet evolving national requirements over next years –Global high-resolution weather prediction (3-10km) –High-impact weather: hurricanes, severe storms (0.5-2km nesting) –Extend skill to 30 days, seasonal climate –Coupled ocean-atmosphere-ice-wave modeling system –Ensemble forecasting and data assimilation –Aerosol forecasting, others Needed  New non-hydrostatic dynamics scalable to O(100K) cores

15 NOAA/NWS/Environmental Modeling Center Next-Generation Global Prediction System Task: New Scalable Non-hydrostatic Dynamical Core (in 5 years??) Use a model already under development –Coordinate with HIWPP program (Tim Schneider’s talk) –Select from 5 candidate models + current system: ModelOrganizationNumeric MethodGrid NIMNOAA/ESRLFinite VolumeIcosahedral MPASNCAR/LANLFinite VolumeIcosahedral/Unstructured NEPTUNENavy/NRLSpectral ElementCubed-Sphere with AMR HIRAM/FV-3NOAA/GFDLFinite VolumeCubed-Spere, nested NMMBNOAA/EMCFinite difference/Polar FiltersCartesian, Lat-Lon GFS-NH **NOAA/EMCSemi-Lagrangian/SpectralReduced Cartesian ** current operational baseline, non-hydrostatic option under development

16 NOAA/NWS/Environmental Modeling Center Next-Generation Global Prediction System Non-hydrostatic Dynamical core –Coordinating with HIWPP program (Tim Schneider’s presentation) Select from 5 existing models: ModelOrganizationNumeric MethodGrid NIMNOAA/ESRLFinite VolumeIcosahedral MPASNCAR/LANLFinite VolumeIcosahedral/Unstructured NEPTUNENavy/NRLSpectral ElementCubed-Sphere with AMR HIRAM/FV-3NOAA/GFDLFinite VolumeCubed-Spere, nested NMMBNOAA/EMCFinite difference/Polar FiltersCartesian, Lat-Lon GFS-NH **NOAA/EMCSemi-Lagrangian/SpectralReduced Cartesian ** current operational baseline, non-hydrostatic option under development

17 NOAA/NWS/Environmental Modeling Center Next-Generation Global Prediction System Non-hydrostatic Dynamical core –Coordinating with HIWPP program (Tim Schneider’s presentation) Select from 5 existing models: ModelOrganizationNumeric MethodGrid NIMNOAA/ESRLFinite VolumeIcosahedral MPASNCAR/LANLFinite VolumeIcosahedral/Unstructured NEPTUNENavy/NRLSpectral ElementCubed-Sphere with AMR HIRAM/FV-3NOAA/GFDLFinite VolumeCubed-Spere, nested NMMBNOAA/EMCFinite difference/Polar FiltersCartesian, Lat-Lon GFS-NH **NOAA/EMCSemi-Lagrangian/SpectralReduced Cartesian ** current operational baseline, non-hydrostatic option under development Final down-select

18 NOAA/NWS/Environmental Modeling Center Next-Generation Global Prediction System Non-hydrostatic Dynamical core –Coordinating with HIWPP program (Tim Schneider’s presentation) Select from 5 existing models: ModelOrganizationNumeric MethodGrid NIMNOAA/ESRLFinite VolumeIcosahedral MPASNCAR/LANLFinite VolumeIcosahedral/Unstructured NEPTUNENavy/NRLSpectral ElementCubed-Sphere with AMR HIRAM/FV-3NOAA/GFDLFinite VolumeCubed-Spere, nested NMMBNOAA/EMCFinite difference/Polar FiltersCartesian, Lat-Lon GFS-NH **NOAA/EMCSemi-Lagrangian/SpectralReduced Cartesian ** current operational baseline, non-hydrostatic option under development Final down-select Initial down-select Initial down-selection from 5 to 2 cores will be based on performance and scaling benchmarks out to ~100K cores

19 NOAA/NWS/Environmental Modeling Center NGGPS Level-1 Benchmarking Plan Advanced Computing Evaluation Committee: –Co-Chairing with Mark Govett (NOAA/ESRL) Investigate and report near-term and lifetime prospects for performance and scaling of NGGPS candidate models –Test case: Idealized global baroclinic wave Monotonically constrained advection of ten tracer fields Artificially generated initial “checkerboard” patterns –Two workloads: 13km “performance” workload – resources needed to meet operational requirements (8.5 minutes/day) 3km workload measure scalability out to ~150K cores –All conventional multi-core (no accelerators) Verification –Reproduce baseline solution statistics for each model –Return output from runs for additional verification and validation Additional evaluation of software design and readiness for HPC

20 NOAA/NWS/Environmental Modeling Center NGGPS Level-1 Benchmarking Plan Benchmark systems –Edison: National Energy Research Scientific Computing Center (DOE/BNL) 133,824 cores, Xeon Ivy Bridge, 24 cores per node Four million discretionary hours awarded –Stampede: Texas Advanced Computing Center (NSF) 102,400 cores, Xeon Sandy Bridge), 16 cores per node –Pleiades: NASA/Ames Research Center 108,000 cores, Xeon Ivy Bridge, 20 cores per node Possibility of ~100,000 cores of Xeon Haswell by benchmarking time –Yellowstone: National Center for Atmospheric Research (NSF) 72,000 cores, Xeon Sandy Bridge, 16 cores per node

21 NOAA/NWS/Environmental Modeling Center Scaling Operational NWP (Summary) Will have NGGPS Level-1 Benchmark results Spring ‘15 But we know the long-term future for deterministic global forecast models: –Scaling will eventually run out –We aren’t there yet Which models make the most of headroom there is: –Chose models with the best weak scaling characteristics Don’t give up anything on number of time-steps per second Nearby communication patterns instead of non-local –Take longest time step possible Semi-Lagrangain gives 5x DT, but trades accuracy for stability Do skill improvements translate to convective scales? Need for embedded high-res. models: nesting or Panta Rhei approach –Make effective core speeds faster Exploit more parallelism: vertical, tracers, and esp. fine-grained

22 NOAA/NWS/Environmental Modeling Center Effect of Fine-grained optimization on RRTMG* radiative transfer physics Accurate calculation of fluxes and cooling rates from incoming (shortwave) and outgoing (longwave) radiation Used in many weather and climate models –NCAR WRF and MPAS –NCAR CAM5 and CESM1 –NASA GEOS-5 –NOAA NCEP GFS, CFS, RUC –ECMWF Significant computational cost –Coded as 1-D vertical columns but poor vectorization in this dimension One column of a weather or climate model domain (*Iacono et al. JGR, 2008; Mlawer et al., JGR, 1997)

23 NOAA/NWS/Environmental Modeling Center Effect of Fine-grained optimization on RRTMG* radiative transfer physics Accurate calculation of fluxes and cooling rates from incoming (shortwave) and outgoing (longwave) radiation Used in many weather and climate models –NCAR WRF and MPAS –NCAR CAM5 and CESM1 –NASA GEOS-5 –NOAA NCEP GFS, CFS, RUC –ECMWF Significant computational cost –Coded as 1-D vertical columns but poor vectorization in this dimension One column of a weather or climate model domain (*Iacono et al. JGR, 2008; Mlawer et al., JGR, 1997) No Vector!

24 NOAA/NWS/Environmental Modeling Center Performance results: RRTMG Kernel on Xeon Phi and host Xeon (SNB)

25 NOAA/NWS/Environmental Modeling Center Performance results: RRTMG Kernel on Xeon Phi and host Xeon (SNB)

26 NOAA/NWS/Environmental Modeling Center Restructuring RRTMG in NMM-B Concurrency and locality –Original RRTMG called in OpenMP threaded loop over South-North dimension –Rewrite loop to iterate over tiles in two dimensions –Dynamic thread scheduling Vectorization –Originally vertical pencils –Extend inner dimension of lowest-level tiles to width of SIMD unit on KNC –Static definition of VECLEN west -- east north – south (OpenMP) call tree

27 NOAA/NWS/Environmental Modeling Center Restructuring RRTMG in NMM-B Concurrency and locality –Original RRTMG called in OpenMP threaded loop over South-North dimension –Rewrite loop to iterate over tiles in two dimensions –Dynamic thread scheduling Vectorization –Originally vertical pencils –Extend inner dimension of lowest-level tiles to width of SIMD unit on KNC –Static definition of VECLEN

28 NOAA/NWS/Environmental Modeling Center Effect of Fine-grained optimization on RRTMG radiative transfer physics Improvement –2.8x Overall 5.3x in SWRAD 0.75x in LWRAD (degraded) Increasing chunk size results in –2.5x increase in working set size from 407KB to 1034KB per thread –4x increase in L2 misses Memory traffic –Increased from 59 to 124 GB/s, still short of saturation –Key bottlenecks Memory latency Instruction pipeline stalls around hardware division instruction Michalakes, Iacono, Jessup. Optimizing Weather Model Radiative Transfer Physics for Intel's Many Integrated Core (MIC), Architecture. Preprint.

29 NOAA/NWS/Environmental Modeling Center Comparison to GPU Performance (32-bit; shortwave) Haswell Sandy Bridge Ivy Bridge GPU K20M GPU M2070Q MIC Knight’s Corner GPU K40

30 NOAA/NWS/Environmental Modeling Center Outlook for accelerators For now, neither GPU nor current MIC generation are compelling compared with conventional multi-core Xeon –Improving performance on MIC leads to faster Xeon performance Next release of Xeon Phi: Knights Landing –Hostless, no PCI gulf NERSC’s “Cori” system (mid 2016): 9,300 single socket KNL nodes –On-package memory 5x Stream Triad bandwidth over DDR4 –More powerful cores Out-of-order, advanced branch prediction, AVX-512 ISA Overall 3x faster single-thread performance (workload dependent) –Other improvements (NDA)

31 NOAA/NWS/Environmental Modeling Center Outlook for NWP on HPC Deterministic forecasting will stall eventually but still has headroom Recast or develop new modeling systems that emphasize parallelism and locality Continue to investigate hardware and programming models that provide highest possible flops per second-dollar-watt Increased computing power will continue to add value through other approaches (ensembles, data assimilation, coupled systems)

32 NOAA/NWS/Environmental Modeling Center