Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA.

Slides:



Advertisements
Similar presentations
GEMS Kick- off MPI -Hamburg CTM - IFS interfaces GEMS- GRG Review of meeting in January and more recent thoughts Johannes Flemming.
Advertisements

Weather Research & Forecasting: A General Overview
June 2003Yun (Helen) He1 Coupling MM5 with ISOLSM: Development, Testing, and Application W.J. Riley, H.S. Cooley, Y. He*, M.S. Torn Lawrence Berkeley National.
Copyright HiPERiSM Consulting, LLC, George Delic, Ph.D. HiPERiSM Consulting, LLC (919) P.O. Box 569, Chapel Hill,
MPI and C-Language Seminars Seminar Plan  Week 1 – Introduction, Data Types, Control Flow, Pointers  Week 2 – Arrays, Structures, Enums, I/O,
Nesting. Eta Model Hybrid and Eta Coordinates ground MSL ground Pressure domain Sigma domain  = 0  = 1  = 1 Ptop  = 0.
A Pipelined Execution of Tiled Nested Loops on SMPs with Computation and Communication Overlapping Maria Athanasaki, Aristidis Sotiropoulos, Georgios Tsoukalas,
Figure 1.1 Interaction between applications and the operating system.
Science Advisory Committee Meeting - 20 September 3, 2010 Stanford University 1 04_Parallel Processing Parallel Processing Majid AlMeshari John W. Conklin.
Introduction to Scientific Computing on Linux Clusters Doug Sondak Linux Clusters and Tiled Display Walls July 30 – August 1, 2002.
DMI Update Leif Laursen ( ) Jan Boerhout ( ) CAS2K3, September 7-11, 2003 Annecy, France.
Transitioning CMAQ for NWS Operational AQ Forecasting Jeff McQueen*, Pius Lee*, Marina Tsildulko*, G. DiMego*, B. Katz* R. Mathur,T. Otte, J. Pleim, J.
18.337: Image Median Filter Rafael Palacios Aeronautics and Astronautics department. Visiting professor (IIT-Institute for Research in Technology, University.
Next Gen AQ model Need AQ modeling at Global to Continental to Regional to Urban scales – Current systems using cascading nests is cumbersome – Duplicative.
Tomographic mammography parallelization Juemin Zhang (NU) Tao Wu (MGH) Waleed Meleis (NU) David Kaeli (NU)
Exercise problems for students taking the Programming Parallel Computers course. Janusz Kowalik Piotr Arlukowicz Tadeusz Puzniakowski Informatics Institute.
Lecture 29 Fall 2006 Lecture 29: Parallel Programming Overview.
Parallel Processing LAB NO 1.
Parallel HDF5 Introductory Tutorial May 19, 2008 Kent Yang The HDF Group 5/19/20081SCICOMP 14 Tutorial.
1 CCOS Seasonal Modeling: The Computing Environment S.Tonse, N.J.Brown & R. Harley Lawrence Berkeley National Laboratory University Of California at Berkeley.
Parallel Programming Models Jihad El-Sana These slides are based on the book: Introduction to Parallel Computing, Blaise Barney, Lawrence Livermore National.
RDFS Rapid Deployment Forecast System Visit at: Registration required.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
The WRF Model The Weather Research and Forecasting (WRF) Model is a mesoscale numerical weather prediction system designed for both atmospheric research.
Operating Systems ECE344 Ashvin Goel ECE University of Toronto Threads and Processes.
Computer Science 320 Broadcasting. Floyd’s Algorithm on SMP for i = 0 to n – 1 parallel for r = 0 to n – 1 for c = 0 to n – 1 d rc = min(d rc, d ri +
GU Junli SUN Yihe 1.  Introduction & Related work  Parallel encoder implementation  Test results and Analysis  Conclusions 2.
Sensitivity of Cluster File System Access to I/O Server Selection A. Apon, P. Wolinski, and G. Amerson University of Arkansas.
CMAQ Runtime Performance as Affected by Number of Processors and NFS Writes Patricia A. Bresnahan, a * Ahmed Ibrahim b, Jesse Bash a and David Miller a.
Community Multiscale Air Quality Modeling System CMAQ Air Quality Data Summit February 2008.
Design and Development of MAQM, an Air Quality Model Architecture with Process-Level Modularity Weimin Jiang, Helmut Roth, Qiangliang Li, Steven C. Smyth,
Center for Environmental Research and Technology/Air Quality Modeling University of California at Riverside CMAQ Tagged Species Source Apportionment (TSSA)
ARGONNE NATIONAL LABORATORY Climate Modeling on the Jazz Linux Cluster at ANL John Taylor Mathematics and Computer Science & Environmental Research Divisions.
_______________________________________________________________CMAQ Libraries and Utilities ___________________________________________________Community.
Preliminary Study: Direct and Emission-Induced Effects of Global Climate Change on Regional Ozone and Fine Particulate Matter K. Manomaiphiboon 1 *, A.
Mehmet Can Kurt, The Ohio State University Gagan Agrawal, The Ohio State University DISC: A Domain-Interaction Based Programming Model With Support for.
Parallelization of 2D Lid-Driven Cavity Flow
Towards Exascale File I/O Yutaka Ishikawa University of Tokyo, Japan 2009/05/21.
Adaptation and Application of the CMAQ Modeling System for Real-time Air Quality Forecasting During the Summer of 2004 R. Mathur, J. Pleim, T. Otte, K.
Easy Deployment of the WRF Model on Heterogeneous PC Systems Braden Ward and Shing Yoh Union, New Jersey.
Office of Research and Development Atmospheric Modeling Division, National Exposure Research Laboratory WRF-CMAQ 2-way coupled system: Part I David Wong,
ATmospheric, Meteorological, and Environmental Technologies RAMS Parallel Processing Techniques.
Parallel I/O in CMAQ David Wong, C. E. Yang*, J. S. Fu*, K. Wong*, and Y. Gao** *University of Tennessee, Knoxville, TN, USA **now at: Pacific Northwest.
Belgrade, 26 September 2014 George S. Markomanolis, Oriol Jorba, Kim Serradell Overview of on-going work on NMMB HPC performance at BSC.
Template Reducing Vertical Transport Over Complex Terrain in Photochemical Grid Models Chris Emery, Ed Tai, Ralph Morris, Greg Yarwood ENVIRON International.
Analysis of Ozone Modeling for May – July 2006 in PNW using AIRPACT3 (CMAQ) and CAMx. Robert Kotchenruther, Ph.D. EPA Region 10 Nov CMAQ O 3 Prediction.
Full and Para Virtualization
Outline Why this subject? What is High Performance Computing?
WRF Software Development and Performance John Michalakes, NCAR NCAR: W. Skamarock, J. Dudhia, D. Gill, A. Bourgeois, W. Wang, C. Deluca, R. Loft NOAA/NCEP:
New Features of the 2003 Release of the CMAQ Model Jonathan Pleim 1, Gerald Gipson 2, Shawn Roselle 1, and Jeffrey Young 1 1 ASMD, ARL, NOAA, RTP, NC 2.
Further Developments and Applications for the Adjoint of CMAQ
Enhance CMAQ Performance to Meet Future Challenges: I/O Aspect David Wong AMAD, EPA October 20, 2009.
___________________________________________________________________________CMAQ Basics ___________________________________________________Community Modeling.
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
Sensitivity of PM 2.5 Species to Emissions in the Southeast Sun-Kyoung Park and Armistead G. Russell Georgia Institute of Technology Sensitivity of PM.
Performance of a Semi-Implicit, Semi-Lagrangian Dynamical Core for High Resolution NWP over Complex Terrain L.Bonaventura D.Cesari.
Although this work was reviewed by EPA and approved for publication, it may not necessarily reflect official Agency policy Jonathan Pleim, Shawn Roselle,
Background Computer System Architectures Computer System Software.
15 June 2005RSA TIM – Boulder, CO Hot-Start with RSA Applications by Steve Albers.
Synthesis of work on Budget of Water Vapor and Trace gases in Amazonia Transport and Impacts of Moisture, Aerosols and Trace Gases into and out of the.
Manchester Computing Supercomputing, Visualization & eScience Zoe Chaplin 11 September 2003 CAS2K3 Comparison of the Unified Model Version 5.3 on Various.
PREMAQ: A New Pre-Processor to CMAQ for Air Quality Forecasting Tanya L. Otte*, George Pouliot*, and Jonathan E. Pleim* Atmospheric Modeling Division U.S.
Jack Dongarra University of Tennessee
CMAQ Programs and Options
Initial Adaptation of the Advanced Regional Prediction System to the Alliance Environmental Hydrology Workbench Dan Weber, Henry Neeman, Joe Garfield and.
CMAQ PARALLEL PERFORMANCE WITH MPI AND OpenMP George Delic, Ph
J-Zephyr Sebastian D. Eastham
WRF-GC: on-line two-way coupling of WRF and GEOS-Chem
Threads: Light-Weight Processes
Presentation transcript:

Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

Running in “quasi-operational” mode at NCEP 26+ minutes for a 48 hour forecast (on 33 processors) AQF-CMAQ

Some background: MPI data communication – ghost (halo) regions Code modifications to improve data locality Some vdiff optimization Jerry Gipson’s MEBI/EBI chem solver Tried CGRID( SPC, LAYER, COLUMN, ROW ) – tests Indicated not good for MPI communication of data

DO J = 1, N DO I = 1, M DATA( I,J ) = A( I+2,J ) * A( I,J-1 ) END DO Ghost (Halo) Regions

Exterior Boundary Ghost Region Stencil Exchange Data Communication Function Horizontal Advection and Diffusion Data Requirements

Architectural Changes for Parallel I/O Tests confirmed latest releases (May, Sep 2003) not very scalable Background: Original Version Latest Release AQF Version

Parallel I/O Read and computation Write and computation 2002 Release 2003 Release AQF Computation only Write only Read, write and computation asynchronous Data transfer by message passing

Modifications to the I/O API for Parallel I/O For 3D data, e.g. … INTERP3 (FileName, VarName,ProgName,Date, Time, Data_Buffer ) StartCol, EndCol, StartRow, EndRow,StartLay, EndLay, Ncols*Nrows*Nlays, XTRACT3 (FileName, VarName, StartLay, EndLay, StartRow, EndRow, StartCol, EndCol, Data_Buffer ) Date, Time, INTERPX (FileName, VarName,ProgName, WRPATCH (FileID, VarID,TimeStamp,Record_No ) Called from PWRITE3 (pario) time interpolation spatial subset

DRIVER read IC’s into CGRID begin output timestep loop advstep (determine sync timestep) couple begin sync timestep loop SCIPROC X-Y-Z advect adjadv hdiff decouple vdiff DRYDEP cloud WETDEP gas chem (aero VIS) couple end sync timestep loop decouple write conc and avg conc CONC, ACONC end output timestep loop Standard (May 2003 Release)

DRIVER set WORKERs, WRITER if WORKER, read IC’s into CGRID begin output timestep loop if WORKER advstep (determine sync timestep) couple begin sync timestep loop SCIPROC X-Y-Z advect hdiff decouple vdiff cloud gas chem (aero) couple end sync timestep loop decouple MPI send conc, aconc, drydep, wetdep, (vis) if WRITER completion-wait: for conc, write conc CONC ; for aconc, write aconc, etc. end output timestep loop AQF CMAQ

Switch Power3 Cluster (NESCC’s LPAR) Memory The platform (cypress00, cypress01, cypress02) consists of 3 SP-Nighthawk nodes All cpu’s share user applications with file servers, interactive use, etc. 2X slower than NCEP’s

Switch Memory Power4 p690 Servers (NCEP’s LPAR) Each platform (snow, frost) is composed of 22 p690 (Regatta) servers Each server has 32 cpu’s LPAR-ed into 8 nodes per server (4 cpu’s per node) Some nodes are dedicated to file servers, interactive use, etc. There are effectively 20 servers for general use (160 nodes, 640 cpu’s) 2 “colony” SW connectors/node ~2X performance of cypress nodes

Internal Network “ice” Beowulf Cluster Pentium 3 – 1.4 GHz Memory Isolated from outside network traffic

Network “global” MPICH Cluster Pentium 4 XEON – 2.4 GHz Memory globalglobal1 global2global3global4global5

RESULTS 5 hour, 12Z – 17Z and 24 hour, 12Z – 12Z runs 20 Sept 2002 test data set used for developing the AQF-CMAQ Input Met from ETA, processed thru PRDGEN and PREMAQ Domain seen on following slides CB4 mechanism, no aerosols, Pleim’s Yamartino advection for AQF-CMAQ 166 columns X 142 rows X 22 layers at 12 km resolution PPM advection for May 2003 Release

The Matrix run wall = comparison of run times = comparison of relative wall times 4 = 2 X 2 8 = 4 X 2 32 = 8 X 4 16 = 4 X 4 64 = 8 X 8 64 = 16 X 4 for main science processes

24–Hour AQF vs Release AQF-CMAQ 2003 Release Data shown at peak hour

24–Hour AQF vs Release Less than 0.2 ppb diff between Almost 9 ppb max diff between Yamartino and PPM AQF on cypress and snow

Absolute Run Times 24 Hours, 8 Worker Processors Sec AQF vs Release

AQF vs Release on cypress and AQF on snow Absolute Run Times 24 Hours, 32 Worker Processors Sec

% of Slowest AQF CMAQ – Various Platforms Relative Run Times 5 Hours, 8 Worker Processors

% of Slowest Relative Run Times 5 Hours AQF-CMAQ cypress vs. snow

% of Slowest Number of Worker Processors AQF-CMAQ on Various Platforms Relative Run Times 5 Hours

% AQF-CMAQ on cypress Relative Wall Times 5 Hours

% AQF-CMAQ on snow Relative Wall Times 5 Hours

Relative Wall Times 24 hr, 8 Worker Processors AQF vs Release on cypress % PPM Yamo

AQF vs Release on cypress Relative Wall Times 24 hr, 8 Worker Processors Add snow for 8 and 32 processors %

AQF Horizontal Advection Relative Wall Times 24 hr, 8 Worker Processors x-r-l x-row-loop y-c-l -hppm y-column-loop kernel solver in each loop %

AQF Horizontal Advection Relative Wall Times 24 hr, 32 Worker Processors %

AQF & Release Horizontal Advection Relative Wall Times 24 hr, 32 Worker Processors r- release %

Future Work Add aerosols back in TKE vdiff Improve horizontal advection/diffusion scalability Some I/O improvements Layer variable horizontal advection time steps

Aspect Ratios

Aspect Ratios