Presentation is loading. Please wait.

Presentation is loading. Please wait.

WRF Software Development and Performance John Michalakes, NCAR NCAR: W. Skamarock, J. Dudhia, D. Gill, A. Bourgeois, W. Wang, C. Deluca, R. Loft NOAA/NCEP:

Similar presentations


Presentation on theme: "WRF Software Development and Performance John Michalakes, NCAR NCAR: W. Skamarock, J. Dudhia, D. Gill, A. Bourgeois, W. Wang, C. Deluca, R. Loft NOAA/NCEP:"— Presentation transcript:

1 WRF Software Development and Performance John Michalakes, NCAR NCAR: W. Skamarock, J. Dudhia, D. Gill, A. Bourgeois, W. Wang, C. Deluca, R. Loft NOAA/NCEP: Tom Black, Jim Purser, S. Gopal NOAA/FSL: T. Henderson, J. Middlecoff, L.Hart U. Oklahoma: M. Xue AFWA: J. Wegiel, D. McCormick C. Coats (MCNC), J. Schmidt (NRL), V. Balaji (GFDL) S. Chen (UC Davis), J. Edwards (IBM) Acknowledgement : Significant funding for WRF software development from DoD HPCMO CHSSI program (CWO6)

2 June 26, 2002 WRF Users Workshop 2 WRF Software Goals Community Model Good performance Portable across a range of architectures Flexible, maintainable, understandable Facilitate code reuse Multiple dynamics/ physics options Run-time configurable Nested Aspects of Design Single-source code Fortran90 modules, dynamic memory, structures, recursion Hierarchical software architecture Multi-level parallelism CASE: Registry Package-neutral APIs –I/O, data formats –Communication IKJ Storage Order Vector’s not dead yet!

3 June 26, 2002 WRF Users Workshop 3 OMP Software Architecture Driver: I/O, communication, multi- nests, state data Model routines computational, tile-callable, thread-safe Mediation layer: interface between model and driver Interfaces to external packages Config Inquiry I/O API Config Module WRF Tile-callable Subroutines Solve Mediation Layer Model Layer Driver Layer DM comm Threads External Packages Package Independent Package Dependent Data formats, Parallel I/O Message Passing Driver

4 June 26, 2002 WRF Users Workshop 4 Model domains are decomposed for parallelism on two-levels Patch: section of model domain allocated to a distributed memory node Tile: section of a patch allocated to a shared-memory processor within a node; this is also the scope of a model layer subroutine. Distributed memory parallelism is over patches; shared memory parallelism is over tiles within patches Single version of code for efficient execution on: –Distributed-memory –Shared-memory –Clusters of SMPs –Vector and microprocessors WRF Multi-Layer Domain Decomposition Logical domain 1 Patch, divided into multiple tiles Inter-processor communication

5 June 26, 2002 WRF Users Workshop 5 I/O Architecture Requirements of I/O Infrastructure –Efficiency: key concern for operations –Flexibility: key concern in research –Both types of user-institution already heavily invested in I/O infrastructure Operations: GRIB, BUFR Research: NetCDF, HDF –“Portable I/O” – adaptable to range of uses, installations without affecting WRF and other programs that use the I/O infrastructure

6 June 26, 2002 WRF Users Workshop 6 I/O Architecture WRF I/O API –Package-independent interface to NetCDF, Fast-binary, HDF (planned) –Random access of fields by timestamp/name –Full transposition to arbitrary memory order –Built-in support for read/write of parallel file systems (planned) –Data-set centric, not file-centric (planned); Grid Computing Additional WRF model functionality –Collection/distribution of decomposed data to serial datasets –Fast, asynchronous, “quilt-server” I/O from NCEP Eta model

7 June 26, 2002 WRF Users Workshop 7 I/O Performance 0 20,000,000 40,000,000 60,000,000 80,000,000 100,000,000 120,000,000 014 i/o servers bytes/second netcdf bin 5 MB/s 16 MB/s

8 WRF Performance 12 km CONUS –425x300x35 –4.5 million cells –22 Gflop/time step 48 hour forecast –21 minutes on 128p – 8 minutes on 512p I/O time not included Platforms –IBM SP (blackforest.ucar.edu) 293 4x375 Mhz Power3 nodes Peak 1500 Mflop/s/cpu –Compaq TCS (lemieux.psc.edu) 750 4x1 GHz EV68 nodes Peak 2000 Mflop/s/cpu Scaling efficiency (32 to 512pe) –IBM: 69 % –Compaq: 57 % Efficiency relative to peak –32pes: IBM (7%), Compaq (20%) –512pes: IBM (5%), Compaq (11%) Sustained Performance: –IBM: 39 Gflop/second –Compaq: 110 Gflop/second

9 June 26, 2002 WRF Users Workshop 9 Model Performance Efficiency with respect to other models –WRF about 2x cost of NCEP Eta (mid 2001) Complexity: WRF 1.6 times more operations for a given period of integration Code efficiency: WRF.78 of Eta –Scientific or forecast efficiency…?

10 June 26, 2002 WRF Users Workshop 10 Summary Status –Third release: WRFV1.2, April 2002 –Systems: IBM, Compaq, SGI, PC/Alpha Linux –Nesting, 3DVAR: first implementations this Summer WRF software architecture designed to support development and maintenance as a community model serving operational and research users over a range of applications, and on a variety of computing architectures Additional information: http://www.wrf-model.org


Download ppt "WRF Software Development and Performance John Michalakes, NCAR NCAR: W. Skamarock, J. Dudhia, D. Gill, A. Bourgeois, W. Wang, C. Deluca, R. Loft NOAA/NCEP:"

Similar presentations


Ads by Google