Presentation is loading. Please wait.

Presentation is loading. Please wait.

Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA.

Similar presentations


Presentation on theme: "Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA."— Presentation transcript:

1 Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA

2 Running in “quasi-operational” mode at NCEP 26+ minutes for a 48 hour forecast (on 33 processors) AQF-CMAQ

3 Some background: MPI data communication – ghost (halo) regions Code modifications to improve data locality Some vdiff optimization Jerry Gipson’s MEBI/EBI chem solver Tried CGRID( SPC, LAYER, COLUMN, ROW ) – tests Indicated not good for MPI communication of data

4 DO J = 1, N DO I = 1, M DATA( I,J ) = A( I+2,J ) * A( I,J-1 ) END DO Ghost (Halo) Regions

5 02 345 1 34 Exterior Boundary Ghost Region Stencil Exchange Data Communication Function Horizontal Advection and Diffusion Data Requirements

6 Architectural Changes for Parallel I/O Tests confirmed latest releases (May, Sep 2003) not very scalable Background: Original Version Latest Release AQF Version

7 Parallel I/O Read and computation Write and computation 2002 Release 2003 Release AQF Computation only Write only Read, write and computation asynchronous Data transfer by message passing

8 Modifications to the I/O API for Parallel I/O For 3D data, e.g. … INTERP3 (FileName, VarName,ProgName,Date, Time, Data_Buffer ) StartCol, EndCol, StartRow, EndRow,StartLay, EndLay, Ncols*Nrows*Nlays, XTRACT3 (FileName, VarName, StartLay, EndLay, StartRow, EndRow, StartCol, EndCol, Data_Buffer ) Date, Time, INTERPX (FileName, VarName,ProgName, WRPATCH (FileID, VarID,TimeStamp,Record_No ) Called from PWRITE3 (pario) time interpolation spatial subset

9 DRIVER read IC’s into CGRID begin output timestep loop advstep (determine sync timestep) couple begin sync timestep loop SCIPROC X-Y-Z advect adjadv hdiff decouple vdiff DRYDEP cloud WETDEP gas chem (aero VIS) couple end sync timestep loop decouple write conc and avg conc CONC, ACONC end output timestep loop Standard (May 2003 Release)

10 DRIVER set WORKERs, WRITER if WORKER, read IC’s into CGRID begin output timestep loop if WORKER advstep (determine sync timestep) couple begin sync timestep loop SCIPROC X-Y-Z advect hdiff decouple vdiff cloud gas chem (aero) couple end sync timestep loop decouple MPI send conc, aconc, drydep, wetdep, (vis) if WRITER completion-wait: for conc, write conc CONC ; for aconc, write aconc, etc. end output timestep loop AQF CMAQ

11 Switch Power3 Cluster (NESCC’s LPAR) Memory 0213879546121114131015 1211141315 1211141315 The platform (cypress00, cypress01, cypress02) consists of 3 SP-Nighthawk nodes All cpu’s share user applications with file servers, interactive use, etc. 2X slower than NCEP’s

12 Switch Memory 0321 0321 0321 0321 Power4 p690 Servers (NCEP’s LPAR) Each platform (snow, frost) is composed of 22 p690 (Regatta) servers Each server has 32 cpu’s LPAR-ed into 8 nodes per server (4 cpu’s per node) Some nodes are dedicated to file servers, interactive use, etc. There are effectively 20 servers for general use (160 nodes, 640 cpu’s) 2 “colony” SW connectors/node ~2X performance of cypress nodes

13 Internal Network “ice” Beowulf Cluster Pentium 3 – 1.4 GHz Memory 01 01 01 01 01 01 Isolated from outside network traffic

14 Network “global” MPICH Cluster Pentium 4 XEON – 2.4 GHz Memory 01 01 01 01 01 01 globalglobal1 global2global3global4global5

15 RESULTS 5 hour, 12Z – 17Z and 24 hour, 12Z – 12Z runs 20 Sept 2002 test data set used for developing the AQF-CMAQ Input Met from ETA, processed thru PRDGEN and PREMAQ Domain seen on following slides CB4 mechanism, no aerosols, Pleim’s Yamartino advection for AQF-CMAQ 166 columns X 142 rows X 22 layers at 12 km resolution PPM advection for May 2003 Release

16 The Matrix run wall = comparison of run times = comparison of relative wall times 4 = 2 X 2 8 = 4 X 2 32 = 8 X 4 16 = 4 X 4 64 = 8 X 8 64 = 16 X 4 for main science processes

17 24–Hour AQF vs. 2003 Release AQF-CMAQ 2003 Release Data shown at peak hour

18 24–Hour AQF vs. 2003 Release Less than 0.2 ppb diff between Almost 9 ppb max diff between Yamartino and PPM AQF on cypress and snow

19 Absolute Run Times 24 Hours, 8 Worker Processors Sec AQF vs. 2003 Release

20 AQF vs. 2003 Release on cypress and AQF on snow Absolute Run Times 24 Hours, 32 Worker Processors Sec

21 % of Slowest AQF CMAQ – Various Platforms Relative Run Times 5 Hours, 8 Worker Processors

22 % of Slowest Relative Run Times 5 Hours AQF-CMAQ cypress vs. snow

23 % of Slowest Number of Worker Processors AQF-CMAQ on Various Platforms Relative Run Times 5 Hours

24 % AQF-CMAQ on cypress Relative Wall Times 5 Hours

25 % AQF-CMAQ on snow Relative Wall Times 5 Hours

26 Relative Wall Times 24 hr, 8 Worker Processors AQF vs. 2003 Release on cypress % PPM Yamo

27 AQF vs. 2003 Release on cypress Relative Wall Times 24 hr, 8 Worker Processors Add snow for 8 and 32 processors %

28 AQF Horizontal Advection Relative Wall Times 24 hr, 8 Worker Processors x-r-l x-row-loop y-c-l -hppm y-column-loop kernel solver in each loop %

29 AQF Horizontal Advection Relative Wall Times 24 hr, 32 Worker Processors %

30 AQF & Release Horizontal Advection Relative Wall Times 24 hr, 32 Worker Processors r- release %

31 Future Work Add aerosols back in TKE vdiff Improve horizontal advection/diffusion scalability Some I/O improvements Layer variable horizontal advection time steps

32 71 83 71 42 41 35 21204241 35 36 Aspect Ratios 142 166

33 1110 35 36 Aspect Ratios 21 20 17 18


Download ppt "Recent Developments for Parallel CMAQ Jeff Young AMDB/ASMD/ARL/NOAA David Wong SAIC – NESCC/EPA."

Similar presentations


Ads by Google